Publications

Collaborators: René MeziatRenato D. C. Monteiro and Benar F. Svaiter.

DETAILS

  • Dialogue Act Classification in Group Chats with DAG-LSTMs, O. Irsoy, R. Gosangi, H. Zhang, M. Wei, P. Lund, D. Pappadopulo, B. Fahy, N. Neophytou and C. Ortiz (2019)
    Dialogue act (DA) classification has been studied for the past two decades and has several key applications such as workflow automation and conversation analytics. Researchers have used, to address this problem, various traditional machine learning models, and more recently deep neural network models such as hierarchical convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. In this paper, we introduce a new model architecture, directed-acyclic-graph LSTM (DAG-LSTM) for DA classification. A DAG-LSTM exploits the turn-taking structure naturally present in a multi-party conversation, and encodes this relation in its model structure. Using the STAC corpus, we show that the proposed method performs roughly 0.8% better in accuracy and 1.2% better in macro-F1 score when compared to existing methods. The proposed method is generic and not limited to conversation applications.
  • Mining massive hierarchical data using a scalable probabilistic graphical model, K. AlJadda, M. Korayem, C. Ortiz, T. Grainger, J. A. Miller, K. M. Rasheed, K. J. Kochut, H. Peng, W. S.York, R. Ranzinger and M. Porterfield (2018)
    Probabilistic Graphical Models (PGM) are very useful in the fields of machine learning and data mining. The crucial limitation of those models, however, is their scalability. The Bayesian Network, which is one of the most common PGMs used in machine learning and data mining, demonstrates this limitation when the training data consists of random variables, in which each of them has a large set of possible values. In the big data era, one could expect new extensions to the existing PGMs to handle the massive amount of data produced these days by computers, sensors and other electronic devices. With hierarchical data – data that is arranged in a treelike structure with several levels – one may see hundreds of thousands or millions of values distributed over even just a small number of levels. When modeling this kind of hierarchical data across large data sets, unrestricted Bayesian Networks may become infeasible for representing the probability distributions. In this paper, we introduce an extension to Bayesian Networks that can handle massive sets of hierarchical data in a reasonable amount of time and space. The proposed model achieves high precision and high recall when used as a multi-label classifier for the annotation of mass spectrometry data. On another data set of 1.5 billion search logs provided by CareerBuilder.com, the model was able to predict latent semantic relationships among search keywords with high accuracy.
  • Query Sense Disambiguation Leveraging Large Scale User Behavioral Data, M. Korayem, K. Aljadda, C. Ortiz, T. Grainger, J. Miller, and W. York (2015).
    Term ambiguity – the challenge of having multiple potential meanings for a keyword or phrase – can be a major problem for search engines. Contextual information is essential for word sense disambiguation, but search queries are often limited to very few keywords, making the available textual context needed for disambiguation minimal or non-existent. In this paper we propose a novel system to identify and resolve term ambiguity in search queries using large-scale user behavioral data. The proposed system demonstrates that, despite the lack of context in most keyword queries, multiple potential senses of a keyword or phrase within a search query can be accurately identified, disambiguated, and expressed in order to maximize the likelihood of fulfilling a user’s information need. The proposed system overcomes the immediate lack of context by leveraging large-scale user behavioral data from historical query logs. Unlike traditional word sense disambiguation methods that rely on knowledge sources or available textual corpora, our system is language-agnostic, is able to easily handle domain-specific terms and meanings, and is automatically generated so that it does not grow out of date or require manual updating as ambiguous terms emerge or undergo a shift in meaning. The system has been implemented using the Hadoop eco-system and integrated within CareerBuilder’s semantic search engine.
  • An adaptive accelerated first-order method for convex optimization, R. D. C. Monteiro, C. Ortiz and B. F. Svaiter (2012).
    This paper presents a new accelerated variant of Nesterov’s method for solving composite convex optimization problems in which certain acceleration parameters are adaptively (and aggressively) chosen so as to substantially improve its practical performance compared to existing accelerated variants while at the same time preserve the optimal iteration-complexity shared by these methods. Computational results are presented to demonstrate that the proposed adaptive accelerated method endowed with a restarting scheme outperforms other existing accelerated variants.
  • PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems, K. Aljadda, M. Korayem, C. Ortiz, T. Grainger, J. Miller, and W. York (2014).
    In the big data era, scalability has become a crucial requirement for any useful computational model. Probabilistic graphical models are very useful for mining and discovering data insights, but they are not scalable enough to be suitable for big data problems. Bayesian Networks particularly demonstrate this limitation when their data is represented using few random variables while each random variable has a massive set of values. With hierarchical data – data that is arranged in a treelike structure with several levels – one would expect to see hundreds of thousands or millions of values distributed over even just a small number of levels. When modeling this kind of hierarchical data across large data sets, Bayesian networks become infeasible for representing the probability distributions for the following reasons: i) Each level represents a single random variable with hundreds of thousands of values, ii) The number of levels is usually small, so there are also few random variables, and iii) The structure of the network is predefined since the dependency is modeled top-down from each parent to each of its child nodes, so the network would contain a single linear path for the random variables from each parent to each child node. In this paper we present a scalable probabilistic graphical model to overcome these limitations for massive hierarchical data. We believe the proposed model will lead to an easily-scalable, more readable, and expressive implementation for problems that require probabilistic-based solutions for massive amounts of hierarchical data. We successfully applied this model to solve two different challenging probabilistic-based problems on massive hierarchical data sets for different domains, namely, bioinformatics and latent semantic discovery over search logs.
  • Augmenting recommendation systems using a model of semantically-related terms extracted from user behavior, K. Aljadda, M. Korayem, C. Ortiz, C. Russell, D. Bernal, L. Payson, S. Brown and T. Grainger (2014).
    Common difficulties like the cold-start problem and a lack of sufficient information about users due to their limited interactions have been major challenges for most recommender systems (RS). To overcome these challenges and many similar ones that result in low accuracy (precision and recall) recommendations, we propose a novel system that extracts semantically-related search keywords based on the aggregate behavioral data of many users. These semantically-related search keywords can be used to substantially increase the amount of knowledge about a specific user’s interests based upon even a few searches and thus improve the accuracy of the RS. The proposed system is capable of mining aggregate user search logs to discover semantic relationships between key phrases in a manner that is language agnostic, human understandable, and virtually noise-free. These semantically related keywords are obtained by looking at the links between queries of similar users which, we believe, represent a largely untapped source for discovering latent semantic relationships between search terms.
  • An inexact block-decomposition method for extra large-scale conic semidefinite programming, R. D. C. Monteiro, C. Ortiz and B. F. Svaiter (2014).
    In this paper, we present an inexact block-decomposition (BD) first-order method for solving standard form conic semidefinite programming (SDP) which avoids computations of exact projections onto the manifold defined by the affine constraints and, as a result, is able to handle extra large SDP instances. The method is based on a two-block reformulation of the optimality conditions where the first block corresponds to the complementarity conditions and the second one corresponds to the manifolds consisting of both the primal and dual affine feasibility constraints. While the proximal subproblem corresponding to the first block is solved exactly, the one corresponding to the second block is solved inexactly in order to avoid finding the exact solution of the underlying augmented primal-dual linear system. The error condition required by the BD method naturally suggests the use of a relative error condition in the context of the latter augmented primal-dual linear system. Our implementation uses the conjugate gradient (CG) method applied to a reduced positive definite dual linear system to obtain inexact solutions of the augmented linear system. In addition, we implemented the proposed BD method with the following refinements: an adaptive choice of stepsize for performing an extragradient step; and a new dynamic scaling scheme that uses two scaling factors to balance three inclusions comprising the optimality conditions of the conic SDP. Finally, we present computational results showing the efficiency of our method for solving various extra-large SDP instances, including some with at least two million constraints and/or fifty million non-zero coefficients in the affine constraints.
  • A first-order block-decomposition method for solving two-easy-block structured semidefinite programs, R. D. C. Monteiro, C. Ortiz and B. F. Svaiter (2012).
    In this paper, we consider a first-order block-decomposition method for minimizing the sum of a convex differentiable function with Lipschitz continuous gradient, and two other proper closed convex (possibly, nonsmooth) functions with easily computable resolvents. The method presented contains two important ingredients from a computational point of view, namely: an adaptive choice of stepsize for performing an extragradient step; and the use of a scaling factor to balance the blocks. We then specialize the method to the context of conic semidefinite programming (SDP) problems consisting of two easy blocks of constraints. Without putting them in standard form, we show that four important classes of graph-related conic SDP problems automatically possess the above two-easy-block structure, namely: SDPs for θ-functions and θ_+-functions of graph stable set problems, and SDP relaxations of binary integer quadratic and frequency assignment problems. Finally, we present computational results on the aforementioned classes of SDPs showing that our method outperforms the three most competitive codes for large-scale conic semidefinite programs, namely: the boundary point (BP) method introduced by Povh et al., a Newton-CG augmented Lagrangian method, called SDPNAL, by Zhao et al., and a variant of the BP method, called the SPDAD method, by Wen et al.
  • Implementation of a block-decomposition algorithm for solving large-scale conic optimization problems, R. D. C. Monteiro, C. Ortiz and B. F. Svaiter (2011).
    In this paper, we consider block-decomposition first-order methods for solving large-scale conic semidefinite programming problems. Several ingredients are introduced to speed-up the method in its pure form such as: an aggressive choice of stepsize for performing the extragradient step; use of scaled inner products in the primal and dual spaces; dynamic update of the scaled inner product in the primal space for properly balancing the primal and dual relative residuals; and proper choices of the initial primal and dual iterates, as well as the initial parameter for the primal scaled inner product. Finally, we present computational results showing that our method outperforms the two most competitive codes for large-scale conic semidefinite programs, namely: the boundary point method introduced by Povh et al. and the Newton-CG augmented Lagrangian method by Zhao et al.
  • Computable termination criteria for large scale optimal methods on unbounded sets, R. D. C. Monteiro and C. Ortiz, working paper (2012).
    New termination criteria are introduced for a variant of Nesterov’s large-scale optimal method applied to convex saddle point problems. What is innovative of the new criteria is that can be computed by practitioners to problems with closed, convex feasible sets that, in particular, can be unbounded . The termination criteria are obtained by adding a linear perturbation to the saddle function that leads to an approximate solution that converges with order O(1/k^2) after k iterations for smooth problems. Additionally, the iteration complexity of the penalty method approach on linearly constrained convex problems is analyzed for this new termination measure.
  • Semidefinite relaxations of dynamical programs under discrete constraints, C. Ortiz and R. Meziat (2010).
    In this work we propose an exact semidefinite relaxation for non-linear, non-convex dynamical programs under discrete constraints in the state variables and the control variables. We outline some theoretical features of the method and work- out the solutions of a benchmark problem in cybernetics and the classical inventory problem under discrete constraints.
  • Application of the method of moments to the non-convex knight problem, C. Ortiz (2007).
    We present a dynamical program intended to determine the right path which takes a knight from one initial position on the chess board to a final one, while avoiding the attack of several rooks located arbitrarily on the chess board. Our model is based on a dynamical program with discrete state and control variables. To solve this problem properly, we use a semidefinite relaxation based in algebraic and trigonometric moments. By exploiting the symmetries of the knight movements we reduce the dimension in the control. Theory and numerical results are reported.
  • Elliptic curve cryptography implementation for Java, C. Ortiz (2006).
    The elliptic curve groups is a very well studied subject in Algebra. In the 80’s the use of the structure of common groups in the public key cryptography lead to suggest elliptic curve groups for this purpose. At that time, the power of calculation was limited and made the use of elliptic curves to be merely academic. Nowadays, better hardware and more efficient algorithms for basic calculations has brought these groups back as very good option in public key cryptography. The advantages of this approach is that for the same security offered by a integer factorization scheme like RSA, the elliptic curves require considerably fewer bits, making a reduction in cost, size, and time. This paper introduces an implementation of an elliptic curve cryptosystem and explains a platform written in JAVA with various protocols developed on it.