Showing papers on "Graphical model published in 2010"

PDF

Open Access

Journal Article•DOI•

[...]

David M. Blei¹, Lawrence Carin², David B. Dunson²•Institutions (2)

18 Oct 2010-IEEE Signal Processing Magazine

TL;DR: In this paper, a review of probabilistic topic models can be found, which can be used to summarize a large collection of documents with a smaller number of distributions over words.

...read moreread less

Abstract: In this article, we review probabilistic topic models: graphical models that can be used to summarize a large collection of documents with a smaller number of distributions over words. Those distributions are called "topics" because, when fit to data, they capture the salient themes that run through the collection. We describe both finite-dimensional parametric topic models and their Bayesian nonparametric counterparts, which are based on the hierarchical Dirichlet process (HDP). We discuss two extensions of topic models to time-series data-one that lets the topics slowly change over time and one that lets the assumed prevalence of the topics change. Finally, we illustrate the application of topic models to nontext data, summarizing some recent research results in image analysis.

...read moreread less

1,429 citations

Journal Article•DOI•

High-dimensional Ising model selection using ${\ell_1}$-regularized logistic regression

[...]

Pradeep Ravikumar, Martin J. Wainwright, John Lafferty

02 Oct 2010-arXiv: Statistics Theory

TL;DR: It is proved that consistent neighborhood selection can be obtained for sample sizes $n=\Omega(d^3\log p)$ with exponentially decaying error, and when these same conditions are imposed directly on the sample matrices, it is shown that a reduced sample size suffices for the method to estimate neighborhoods consistently.

...read moreread less

Abstract: We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on $\ell_1$-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an $\ell_1$-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes $p$ and maximum neighborhood size $d$ are allowed to grow as a function of the number of observations $n$. Our main results provide sufficient conditions on the triple $(n,p,d)$ and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes $n=\Omega(d^3\log p)$ with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of $n=\Omega(d^2\log p)$ suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.

...read moreread less

848 citations

Posted Content•

An Introduction to Conditional Random Fields

[...]

Charles Sutton¹, Andrew McCallum²•Institutions (2)

University of Edinburgh¹, University of Massachusetts Amherst²

17 Nov 2010-arXiv: Machine Learning

TL;DR: Conditional Random Fields (CRFs) as discussed by the authors are a popular probabilistic method for structured prediction and have seen wide application in natural language processing, computer vision, and bioinformatics.

...read moreread less

Abstract: Often we wish to predict a large number of variables that depend on each other as well as on other observed variables. Structured prediction methods are essentially a combination of classification and graphical modeling, combining the ability of graphical models to compactly model multivariate data with the ability of classification methods to perform prediction using large sets of input features. This tutorial describes conditional random fields, a popular probabilistic method for structured prediction. CRFs have seen wide application in natural language processing, computer vision, and bioinformatics. We describe methods for inference and parameter estimation for CRFs, including practical issues for implementing large scale CRFs. We do not assume previous knowledge of graphical modeling, so this tutorial is intended to be useful to practitioners in a wide variety of fields.

...read moreread less

785 citations

Journal Article•DOI•

High-dimensional Ising model selection using ℓ1-regularized logistic regression

[...]

Pradeep Ravikumar, Martin J. Wainwright¹, John Lafferty²•Institutions (2)

University of California, Berkeley¹, Carnegie Mellon University²

01 Jun 2010-Annals of Statistics

TL;DR: In this paper, the problem of estimating the graph associated with a binary Ising Markov random field is considered, where the neighborhood of any given node is estimated by performing logistic regression subject to an l 1-constraint.

...read moreread less

Abstract: We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on l1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an l1-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n=Ω(d3log p) with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n=Ω(d2log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.

...read moreread less

776 citations

Journal Article•DOI•

Bayesian Compressive Sensing Via Belief Propagation

[...]

Dror Baron¹, Shriram Sarvotham², Richard G. Baraniuk³•Institutions (3)

Technion – Israel Institute of Technology¹, Halliburton², Rice University³

01 Jan 2010-IEEE Transactions on Signal Processing

TL;DR: In this paper, a two-state mixture Gaussian model is used to perform asymptotically optimal Bayesian inference using belief propagation decoding, which represents the CS encoding matrix as a graphical model.

...read moreread less

Abstract: Compressive sensing (CS) is an emerging field based on the revelation that a small collection of linear projections of a sparse signal contains enough information for stable, sub-Nyquist signal acquisition When a statistical characterization of the signal is available, Bayesian inference can complement conventional CS methods based on linear programming or greedy algorithms We perform asymptotically optimal Bayesian inference using belief propagation (BP) decoding, which represents the CS encoding matrix as a graphical model Fast computation is obtained by reducing the size of the graphical model with sparse encoding matrices To decode a length-N signal containing K large coefficients, our CS-BP decoding algorithm uses O(K log(N)) measurements and O(N log2(N)) computation Finally, although we focus on a two-state mixture Gaussian model, CS-BP is easily adapted to other signal models

...read moreread less

468 citations

Proceedings Article•

Extended Bayesian Information Criteria for Gaussian Graphical Models

[...]

Rina Foygel¹, Mathias Drton¹•Institutions (1)

University of Chicago¹

06 Dec 2010

TL;DR: This paper establishes the consistency of an extended Bayesian information criterion for Gaussian graphical models in a scenario where both the number of variables p and the sample size n grow.

...read moreread less

Abstract: Gaussian graphical models with sparsity in the inverse covariance matrix are of significant interest in many modern applications. For the problem of recovering the graphical structure, information criteria provide useful optimization objectives for algorithms searching through sets of graphs or for selection of tuning parameters of other methods such as the graphical lasso, which is a likelihood penalization technique. In this paper we establish the consistency of an extended Bayesian information criterion for Gaussian graphical models in a scenario where both the number of variables p and the sample size n grow. Compared to earlier work on the regression case, our treatment allows for growth in the number of non-zero parameters in the true model, which is necessary in order to cover connected graphs. We demonstrate the performance of this criterion on simulated data when used in conjunction with the graphical lasso, and verify that the criterion indeed performs better than either cross-validation or the ordinary Bayesian information criterion when p and the number of non-zero parameters q both scale with n.

...read moreread less

377 citations

Good Practice in Bayesian Network Modelling

[...]

Serena H. Hamilton, Carmel Pollino

01 Jan 2010

TL;DR: This paper provides guidelines to developing and evaluating Bayesian network models of environmental systems, and presents a case study habitat suitability model for juvenile Astacopsis gouldi, the giant freshwater crayfish of Tasmania.

...read moreread less

346 citations

Posted Content•

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

[...]

Han Liu¹, Kathryn Roeder¹, Larry Wasserman¹•Institutions (1)

Carnegie Mellon University¹

16 Jun 2010-arXiv: Machine Learning

TL;DR: The method has a clear interpretation: the authors use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling, which requires essentially no conditions.

...read moreread less

Abstract: A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include $K$-fold cross-validation ($K$-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including $K$-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.

...read moreread less

324 citations

Posted Content•

Extended Bayesian Information Criteria for Gaussian Graphical Models

[...]

Rina Foygel¹, Mathias Drton¹•Institutions (1)

University of Chicago¹

30 Nov 2010-arXiv: Statistics Theory

TL;DR: In this paper, the consistency of an extended Bayesian information criterion for Gaussian graphical models in a scenario where both the number of variables p and the sample size n grow is established.

...read moreread less

317 citations

Journal Article•DOI•

libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

[...]

Joris M. Mooij¹•Institutions (1)

Max Planck Society¹

01 Mar 2010-Journal of Machine Learning Research

TL;DR: The software package libDAI, a free & open source C++ library that provides implementations of various exact and approximate inference methods for graphical models with discrete-valued variables, is described.

...read moreread less

Abstract: This paper describes the software package libDAI, a free & open source C++ library that provides implementations of various exact and approximate inference methods for graphical models with discrete-valued variables. libDAI supports directed graphical models (Bayesian networks) as well as undirected ones (Markov random fields and factor graphs). It offers various approximations of the partition sum, marginal probability distributions and maximum probability states. Parameter learning is also supported. A feature comparison with other open source software packages for approximate inference is given. libDAI is licensed under the GPL v2+ license and is available at http://www.libdai.org.

...read moreread less

299 citations

Book Chapter•DOI•

Bayesian Nonparametrics: Hierarchical Bayesian nonparametric models with applications

[...]

Yee Whye Teh, Michael I. Jordan

01 Apr 2010

TL;DR: The role of hierarchical modeling in Bayesian nonparametrics is discussed, focusing on models in which the infinite-dimensional parameters are treated hierarchically, and the value of these hierarchical constructions is demonstrated in a wide range of practical applications.

...read moreread less

Abstract: Hierarchical modeling is a fundamental concept in Bayesian statistics. The basic idea is that parameters are endowed with distributions which may themselves introduce new parameters, and this construction recurses. In this review we discuss the role of hierarchical modeling in Bayesian nonparametrics, focusing on models in which the infinite-dimensional parameters are treated hierarchically. For example, we consider a model in which the base measure for a Dirichlet process is itself treated as a draw from another Dirichlet process. This yields a natural recursion that we refer to as a hierarchical Dirichlet process. We also discuss hierarchies based on the Pitman-Yor process and on completely random processes. We demonstrate the value of these hierarchical constructions in a wide range of practical applications, in problems in computational biology, computer vision and natural language processing.

...read moreread less

Journal Article•DOI•

Estimating time-varying networks

[...]

Mladen Kolar, Le Song, Amr Ahmed, Eric P. Xing

01 Mar 2010-The Annals of Applied Statistics

TL;DR: In this paper, a temporally smoothed l 1-regularized logistic regression formalism is proposed to estimate time-varying networks from time series of entity attributes, which can be cast as a standard convex optimization problem and solved efficiently using generic solvers scalable to large networks.

...read moreread less

Abstract: Stochastic networks are a plausible representation of the relational information among entities in dynamic systems such as living cells or social communities. While there is a rich literature in estimating a static or temporally invariant network from observation data, little has been done toward estimating time-varying networks from time series of entity attributes. In this paper we present two new machine learning methods for estimating time-varying networks, which both build on a temporally smoothed l1-regularized logistic regression formalism that can be cast as a standard convex-optimization problem and solved efficiently using generic solvers scalable to large networks. We report promising results on recovering simulated time-varying networks. For real data sets, we reverse engineer the latent sequence of temporally rewiring political networks between Senators from the US Senate voting records and the latent evolving regulatory networks underlying 588 genes across the life cycle of Drosophila melanogaster from the microarray time course.

...read moreread less

Book Chapter•DOI•

A discriminative latent model of object classes and attributes

[...]

Yang Wang¹, Greg Mori¹•Institutions (1)

Simon Fraser University¹

05 Sep 2010

TL;DR: This work presents a discriminatively trained model for joint modelling of object class labels and their visual attributes and captures the correlations among attributes using an undirected graphical model built from training data.

...read moreread less

Abstract: We present a discriminatively trained model for joint modelling of object class labels (e.g. "person", "dog", "chair", etc.) and their visual attributes (e.g. "has head", "furry", "metal", etc.). We treat attributes of an object as latent variables in our model and capture the correlations among attributes using an undirected graphical model built from training data. The advantage of our model is that it allows us to infer object class labels using the information of both the test image itself and its (latent) attributes. Our model unifies object class prediction and attribute prediction in a principled framework. It is also flexible enough to deal with different performance measurements. Our experimental results provide quantitative evidence that attributes can improve object naming.

...read moreread less

Book Chapter•DOI•

Stacked hierarchical labeling

[...]

Daniel Munoz¹, J. Andrew Bagnell¹, Martial Hebert¹•Institutions (1)

Carnegie Mellon University¹

05 Sep 2010

TL;DR: This work bypasses a global probabilistic model and instead directly train a hierarchical inference procedure inspired by the message passing mechanics of some approximate inference procedures in graphical models, which mitigates both the theoretical and empirical difficulties of learning Probabilistic models when exact inference is intractable.

...read moreread less

Abstract: In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work on structured prediction for scene understanding, we bypass a global probabilistic model and instead directly train a hierarchical inference procedure inspired by the message passing mechanics of some approximate inference procedures in graphical models. This approach mitigates both the theoretical and empirical difficulties of learning probabilistic models when exact inference is intractable. In particular, we draw from recent work in machine learning and break the complex inference process into a hierarchical series of simple machine learning subproblems. Each subproblem in the hierarchy is designed to capture the image and contextual statistics in the scene. This hierarchy spans coarse-to-fine regions and explicitly models the mixtures of semantic labels that may be present due to imperfect segmentation. To avoid cascading of errors and overfitting, we train the learning problems in sequence to ensure robustness to likely errors earlier in the inference sequence and leverage the stacking approach developed by Cohen et al.

...read moreread less

Journal Article•DOI•

Nonparametric belief propagation

[...]

Erik B. Sudderth¹, Alexander T. Ihler², Michael Isard³, William T. Freeman⁴, Alan S. Willsky⁴ - Show less +1 more•Institutions (4)

Brown University¹, University of California, Irvine², Microsoft³, Massachusetts Institute of Technology⁴

01 Oct 2010-Communications of The ACM

TL;DR: This work describes an extension of BP to continuous variable models, generalizing particle filtering, and Gaussian mixture filtering techniques for time series to more complex models and illustrates the power of the resulting nonparametric BP algorithm via two applications: kinematic tracking of visual motion and distributed localization in sensor networks.

...read moreread less

Abstract: Continuous quantities are ubiquitous in models of real-world phenomena, but are surprisingly difficult to reason about automatically. Probabilistic graphical models such as Bayesian networks and Markov random fields, and algorithms for approximate inference such as belief propagation (BP), have proven to be powerful tools in a wide range of applications in statistics and artificial intelligence. However, applying these methods to models with continuous variables remains a challenging task. In this work we describe an extension of BP to continuous variable models, generalizing particle filtering, and Gaussian mixture filtering techniques for time series to more complex models. We illustrate the power of the resulting nonparametric BP algorithm via two applications: kinematic tracking of visual motion and distributed localization in sensor networks.

...read moreread less

Proceedings Article•

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

[...]

Han Liu¹, Kathryn Roeder¹, Larry Wasserman¹•Institutions (1)

Carnegie Mellon University¹

06 Dec 2010

TL;DR: StARS as discussed by the authors uses the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling, i.e. with high probability, all true edges will be included in the selected model even when the graph size diverges with the sample size.

...read moreread less

Abstract: A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include K-fold cross-validation (K-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including K-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.

...read moreread less

Posted Content•

Brain covariance selection: better individual functional connectivity models using population prior

[...]

Gaël Varoquaux¹, Alexandre Gramfort¹, Jean-Baptiste Poline², Bertrand Thirion¹•Institutions (2)

French Institute for Research in Computer Science and Automation¹, IBM²

30 Aug 2010-arXiv: Machine Learning

TL;DR: In this paper, the authors describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population.

...read moreread less

Abstract: Spontaneous brain activity, as observed in functional neuroimaging, has been shown to display reproducible structure that expresses brain architecture and carries markers of brain pathologies. An important view of modern neuroscience is that such large-scale structure of coherent activity reflects modularity properties of brain connectivity graphs. However, to date, there has been no demonstration that the limited and noisy data available in spontaneous activity observations could be used to learn full-brain probabilistic models that generalize to new data. Learning such models entails two main challenges: i) modeling full brain connectivity is a difficult estimation problem that faces the curse of dimensionality and ii) variability between subjects, coupled with the variability of functional signals between experimental runs, makes the use of multiple datasets challenging. We describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population. We show that individual models learned from functional Magnetic Resonance Imaging (fMRI) data using this population prior generalize better to unseen data than models based on alternative regularization schemes. To our knowledge, this is the first report of a cross-validated model of spontaneous brain activity. Finally, we use the estimated graphical model to explore the large-scale characteristics of functional architecture and show for the first time that known cognitive networks appear as the integrated communities of functional connectivity graph.

...read moreread less

Posted Content•

Constructions from Dots and Lines

[...]

Marko A. Rodriguez¹, Peter Neubauer•Institutions (1)

AT&T¹

11 Jun 2010-arXiv: Data Structures and Algorithms

TL;DR: The world of graphs in computing is explored and situations in which graphical models are beneficial are exposed.

...read moreread less

Abstract: A graph is a data structure composed of dots (i.e. vertices) and lines (i.e. edges). The dots and lines of a graph can be organized into intricate arrangements. The ability for a graph to denote objects and their relationships to one another allow for a surprisingly large number of things to be modeled as a graph. From the dependencies that link software packages to the wood beams that provide the framing to a house, most anything has a corresponding graph representation. However, just because it is possible to represent something as a graph does not necessarily mean that its graph representation will be useful. If a modeler can leverage the plethora of tools and algorithms that store and process graphs, then such a mapping is worthwhile. This article explores the world of graphs in computing and exposes situations in which graphical models are beneficial.

...read moreread less

Proceedings Article•DOI•

A probabilistic image jigsaw puzzle solver

[...]

Taeg Sang Cho¹, Shai Avidan², William T. Freeman¹•Institutions (2)

Massachusetts Institute of Technology¹, Adobe Systems²

13 Jun 2010

TL;DR: In this article, the problem of reconstructing an image from a bag of square, non-overlapping image patches, the jigsaw puzzle problem, is considered and a graphical model is developed to solve it.

...read moreread less

Abstract: We explore the problem of reconstructing an image from a bag of square, non-overlapping image patches, the jigsaw puzzle problem. Completing jigsaw puzzles is challenging and requires expertise even for humans, and is known to be NP-complete. We depart from previous methods that treat the problem as a constraint satisfaction problem and develop a graphical model to solve it. Each patch location is a node and each patch is a label at nodes in the graph. A graphical model requires a pairwise compatibility term, which measures an affinity between two neighboring patches, and a local evidence term, which we lack. This paper discusses ways to obtain these terms for the jigsaw puzzle problem. We evaluate several patch compatibility metrics, including the natural image statistics measure, and experimentally show that the dissimilarity-based compatibility – measuring the sum-of-squared color difference along the abutting boundary – gives the best results. We compare two forms of local evidence for the graphical model: a sparse-and-accurate evidence and a dense-and-noisy evidence. We show that the sparse-and-accurate evidence, fixing as few as 4 – 6 patches at their correct locations, is enough to reconstruct images consisting of over 400 patches. To the best of our knowledge, this is the largest puzzle solved in the literature. We also show that one can coarsely estimate the low resolution image from a bag of patches, suggesting that a bag of image patches encodes some geometric information about the original image.

...read moreread less

Posted Content•

Graphical Models Concepts in Compressed Sensing

[...]

Andrea Montanari

18 Nov 2010-arXiv: Information Theory

TL;DR: Surprisingly, the analysis of fast approximate message passing algorithms allows to prove exact high-dimensional limit results for the LASSO risk.

...read moreread less

Abstract: This paper surveys recent work in applying ideas from graphical models and message passing algorithms to solve large scale regularized regression problems. In particular, the focus is on compressed sensing reconstruction via ell_1 penalized least-squares (known as LASSO or BPDN). We discuss how to derive fast approximate message passing algorithms to solve this problem. Surprisingly, the analysis of such algorithms allows to prove exact high-dimensional limit results for the LASSO risk. This paper will appear as a chapter in a book on `Compressed Sensing' edited by Yonina Eldar and Gitta Kutyniok.

...read moreread less

Journal Article•DOI•

Penalized Likelihood Methods for Estimation of Sparse High Dimensional Directed Acyclic Graphs

[...]

Ali Shojaie¹, George Michailidis¹•Institutions (1)

University of Michigan¹

01 Sep 2010-Biometrika

TL;DR: In this paper, an efficient penalized likelihood method for estimating the adjacency matrix of directed acyclic graphs, when variables inherit a natural ordering, was proposed, and the adaptive lasso can consistently estimate the true graph under the usual regularity assumptions.

...read moreread less

Abstract: Directed acyclic graphs are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical and biological systems where directed edges between nodes represent the influence of components of the system on each other. Estimation of directed graphs from observational data is computationally NP-hard. In addition, directed graphs with the same structure may be indistinguishable based on observations alone. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose an efficient penalized likelihood method for estimation of the adjacency matrix of directed acyclic graphs, when variables inherit a natural ordering. We study variable selection consistency of lasso and adaptive lasso penalties in high-dimensional sparse settings, and propose an error-based choice for selecting the tuning parameter. We show that although the lasso is only variable selection consistent under stringent conditions, the adaptive lasso can consistently estimate the true graph under the usual regularity assumptions.

...read moreread less

Posted Content•

Sparse Inverse Covariance Selection via Alternating Linearization Methods

[...]

Katya Scheinberg¹, Shiqian Ma², Donald Goldfarb²•Institutions (2)

Lehigh University¹, Columbia University²

30 Oct 2010-arXiv: Learning

TL;DR: In this paper, a first-order method based on an alternating linearization technique was proposed to learn the structure of the graph by estimating a sparse inverse covariance matrix from sample data, by solving a convex maximum likelihood problem with an $\ell_1$-regularization term.

...read moreread less

Abstract: Gaussian graphical models are of great interest in statistical learning. Because the conditional independencies between different nodes correspond to zero entries in the inverse covariance matrix of the Gaussian distribution, one can learn the structure of the graph by estimating a sparse inverse covariance matrix from sample data, by solving a convex maximum likelihood problem with an $\ell_1$-regularization term. In this paper, we propose a first-order method based on an alternating linearization technique that exploits the problem's special structure; in particular, the subproblems solved in each iteration have closed-form solutions. Moreover, our algorithm obtains an $\epsilon$-optimal solution in $O(1/\epsilon)$ iterations. Numerical experiments on both synthetic and real data from gene association networks show that a practical version of this algorithm outperforms other competitive algorithms.

...read moreread less

Journal Article•DOI•

Latent variable graphical model selection via convex optimization

[...]

Venkat Chandrasekaran, Pablo A. Parrilo, Alan S. Willsky

06 Aug 2010-arXiv: Statistics Theory

TL;DR: The modeling framework can be viewed as a combination of dimensionality reduction and graphical modeling (to capture remaining statistical structure not attributable to the latent variables) and it consistently estimates both the number of latent components and the conditional graphical model structure among the observed variables.

...read moreread less

Abstract: Suppose we observe samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of latent components, and to learn a statistical model over the entire collection of variables? We address this question in the setting in which the latent and observed variables are jointly Gaussian, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model. As a first step we give natural conditions under which such latent-variable Gaussian graphical models are identifiable given marginal statistics of only the observed variables. Essentially these conditions require that the conditional graphical model among the observed variables is sparse, while the effect of the latent variables is "spread out" over most of the observed variables. Next we propose a tractable convex program based on regularized maximum-likelihood for model selection in this latent-variable setting; the regularizer uses both the $\ell_1$ norm and the nuclear norm. Our modeling framework can be viewed as a combination of dimensionality reduction (to identify latent variables) and graphical modeling (to capture remaining statistical structure not attributable to the latent variables), and it consistently estimates both the number of latent components and the conditional graphical model structure among the observed variables. These results are applicable in the high-dimensional setting in which the number of latent/observed variables grows with the number of samples of the observed variables. The geometric properties of the algebraic varieties of sparse matrices and of low-rank matrices play an important role in our analysis.

...read moreread less

Journal Article•DOI•

Esaliency (Extended Saliency): Meaningful Attention Using Stochastic Image Modeling

[...]

Tamar Avraham¹, Michael Lindenbaum¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 Apr 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A validated stochastic model is proposed to estimate the probability that an image part is of interest and is referred to as saliency and thus specify saliency in a mathematically well-defined sense.

...read moreread less

Abstract: Computer vision attention processes assign variable-hypothesized importance to different parts of the visual input and direct the allocation of computational resources. This nonuniform allocation might help accelerate the image analysis process. This paper proposes a new bottom-up attention mechanism. Rather than taking the traditional approach, which tries to model human attention, we propose a validated stochastic model to estimate the probability that an image part is of interest. We refer to this probability as saliency and thus specify saliency in a mathematically well-defined sense. The model quantifies several intuitive observations, such as the greater likelihood of correspondence between visually similar image regions and the likelihood that only a few of interesting objects will be present in the scene. The latter observation, which implies that such objects are (relaxed) global exceptions, replaces the traditional preference for local contrast. The algorithm starts with a rough preattentive segmentation and then uses a graphical model approximation to efficiently reveal which segments are more likely to be of interest. Experiments on natural scenes containing a variety of objects demonstrate the proposed method and show its advantages over previous approaches.

...read moreread less

Book Chapter•DOI•

Graphical models concepts in compressed sensing.

[...]

Andrea Montanari

18 Nov 2010

TL;DR: In this paper, Eldar and Kutynok survey recent work in applying ideas from graphical models and message passing algorithms to solve large scale regularized regression problems, focusing on compressed sensing reconstruction via 1 penalized least squares (known as LASSO or BPDN).

...read moreread less

Abstract: This paper surveys recent work in applying ideas from graphical models and message passing algorithms to solve large scale regularized regression problems. In particular, the focus is on compressed sensing reconstruction via `1 penalized least-squares (known as LASSO or BPDN). We discuss how to derive fast approximate message passing algorithms to solve this problem. Surprisingly, the analysis of such algorithms allows to prove exact high-dimensional limit results for the LASSO risk. This paper will appear as a chapter in a book on ‘Compressed Sensing’ edited by Yonina Eldar and Gitta Kutynok.

...read moreread less

Journal Article•DOI•

Learning Non-Stationary Dynamic Bayesian Networks

[...]

Joshua W. Robinson, Alexander J. Hartemink

01 Mar 2010-Journal of Machine Learning Research

TL;DR: The non-stationary DBN model is defined, an MCMC sampling algorithm is presented for learning the structure of the model from time-series data under different assumptions, and the effectiveness of the algorithm is demonstrated on both simulated and biological data.

...read moreread less

Abstract: Learning dynamic Bayesian network structures provides a principled mechanism for identifying conditional dependencies in time-series data An important assumption of traditional DBN structure learning is that the data are generated by a stationary process, an assumption that is not true in many important settings In this paper, we introduce a new class of graphical model called a non-stationary dynamic Bayesian network, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time Some examples of evolving networks are transcriptional regulatory networks during an organism's development, neural pathways during learning, and traffic patterns during the day We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data

...read moreread less

Proceedings Article•

Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning

[...]

François Mairesse¹, Milica Gasic¹, Filip Jurčíček¹, Simon Keizer¹, Blaise Thomson¹, Kai Yu¹, Steve Young¹ - Show less +3 more•Institutions (1)

University of Cambridge¹

11 Jul 2010

TL;DR: Bagel is presented, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators, and can generate natural and informative utterances from unseen inputs in the information presentation domain.

...read moreread less

Abstract: Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents Bagel, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that Bagel can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation performance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data.

...read moreread less

Proceedings Article•

Layered image motion with explicit occlusions, temporal consistency, and depth ordering

[...]

Deqing Sun¹, Erik B. Sudderth¹, Michael J. Black¹•Institutions (1)

Brown University¹

06 Dec 2010

TL;DR: A new probabilistic model of optical flow in layers that addresses many of the shortcomings of previous approaches and achieves state-of-the-art results on the Middlebury benchmark and produces meaningful scene segmentations as well as detected occlusion regions.

...read moreread less

Abstract: Layered models are a powerful way of describing natural scenes containing smooth surfaces that may overlap and occlude each other. For image motion estimation, such models have a long history but have not achieved the wide use or accuracy of non-layered methods. We present a new probabilistic model of optical flow in layers that addresses many of the shortcomings of previous approaches. In particular, we define a probabilistic graphical model that explicitly captures: 1) occlusions and disocclusions; 2) depth ordering of the layers; 3) temporal consistency of the layer segmentation. Additionally the optical flow in each layer is modeled by a combination of a parametric model and a smooth deviation based on an MRF with a robust spatial prior; the resulting model allows roughness in layers. Finally, a key contribution is the formulation of the layers using an image-dependent hidden field prior based on recent models for static scene segmentation. The method achieves state-of-the-art results on the Middlebury benchmark and produces meaningful scene segmentations as well as detected occlusion regions.

...read moreread less

Proceedings Article•

Neural conditional random fields

[...]

Trinh Minh Tri Do, Thierry Artières

31 Mar 2010

TL;DR: A non-linear graphical model for structured prediction that combines the power of deep neural networks to extract high level features with the graphical framework of Markov networks, yielding a powerful and scalable probabilistic model that is applied to signal labeling tasks.

...read moreread less

Abstract: We propose a non-linear graphical model for structured prediction. It combines the power of deep neural networks to extract high level features with the graphical framework of Markov networks, yielding a powerful and scalable probabilistic model that we apply to signal labeling tasks.

...read moreread less

Posted Content•

Modelling Longitudinal Data Using a Pair-Copula Decomposition of Serial Dependence

[...]

Michael S. Smith¹, Aleksey Min², Carlos Almeida, Claudia Czado²•Institutions (2)

Melbourne Business School¹, Technische Universität München²

01 Jun 2010-Social Science Research Network

TL;DR: The pair-copula model is very general and the Bayesian method generalizes many previous approaches for the analysis of longitudinal data, and is shown to be reliable and can improve the estimates of both conditional and unconditional pairwise dependencies substantially.

...read moreread less

Abstract: Copulas have proven to be very successful tools for the flexible modelling of cross-sectional dependence. In this paper we express the dependence structure of continuous-valued time series data using a sequence of bivariate copulas. This corresponds to a type of decomposition recently called a ‘vine’ in the graphical models literature, where each copula is entitled a ‘pair-copula’. We propose a Bayesian approach for the estimation of this dependence structure for longitudinal data. Bayesian selection ideas are used to identify any independence pair-copulas, with the end result being a parsimonious representation of a time-inhomogeneous Markov process of varying order. Estimates are Bayesian model averages over the distribution of the lag structure of the Markov process. Using a simulation study we show that the selection approach is reliable and can improve the estimates of both conditional and unconditional pairwise dependencies substantially. We also show that a vine with selection out-performs a Gaussian copula with a flexible correlation matrix. The advantage of the pair-copula formulation is further demonstrated using a longitudinal model of intraday electricity load. Using Gaussian, Gumbel and Clayton pair-copulas we identify parsimonious decompositions of intraday serial dependence, which improve the accuracy of intraday load forecasts. We also propose a new diagnostic for measuring the goodness of fit of high-dimensional multivariate copulas. Overall, the pair-copula model is very general and the Bayesian method generalizes many previous approaches for the analysis of longitudinal data. Supplemental materials for the article are also available online.

...read moreread less

Collapse