scispace - formally typeset
Search or ask a question

Showing papers on "Graphical model published in 2010"


Journal ArticleDOI
TL;DR: In this paper, a review of probabilistic topic models can be found, which can be used to summarize a large collection of documents with a smaller number of distributions over words.
Abstract: In this article, we review probabilistic topic models: graphical models that can be used to summarize a large collection of documents with a smaller number of distributions over words. Those distributions are called "topics" because, when fit to data, they capture the salient themes that run through the collection. We describe both finite-dimensional parametric topic models and their Bayesian nonparametric counterparts, which are based on the hierarchical Dirichlet process (HDP). We discuss two extensions of topic models to time-series data-one that lets the topics slowly change over time and one that lets the assumed prevalence of the topics change. Finally, we illustrate the application of topic models to nontext data, summarizing some recent research results in image analysis.

1,429 citations


Journal ArticleDOI
TL;DR: It is proved that consistent neighborhood selection can be obtained for sample sizes $n=\Omega(d^3\log p)$ with exponentially decaying error, and when these same conditions are imposed directly on the sample matrices, it is shown that a reduced sample size suffices for the method to estimate neighborhoods consistently.
Abstract: We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on $\ell_1$-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an $\ell_1$-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes $p$ and maximum neighborhood size $d$ are allowed to grow as a function of the number of observations $n$. Our main results provide sufficient conditions on the triple $(n,p,d)$ and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes $n=\Omega(d^3\log p)$ with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of $n=\Omega(d^2\log p)$ suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.

848 citations


Posted Content
TL;DR: Conditional Random Fields (CRFs) as discussed by the authors are a popular probabilistic method for structured prediction and have seen wide application in natural language processing, computer vision, and bioinformatics.
Abstract: Often we wish to predict a large number of variables that depend on each other as well as on other observed variables. Structured prediction methods are essentially a combination of classification and graphical modeling, combining the ability of graphical models to compactly model multivariate data with the ability of classification methods to perform prediction using large sets of input features. This tutorial describes conditional random fields, a popular probabilistic method for structured prediction. CRFs have seen wide application in natural language processing, computer vision, and bioinformatics. We describe methods for inference and parameter estimation for CRFs, including practical issues for implementing large scale CRFs. We do not assume previous knowledge of graphical modeling, so this tutorial is intended to be useful to practitioners in a wide variety of fields.

785 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of estimating the graph associated with a binary Ising Markov random field is considered, where the neighborhood of any given node is estimated by performing logistic regression subject to an l 1-constraint.
Abstract: We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on l1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an l1-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n=Ω(d3log p) with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n=Ω(d2log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.

776 citations


Journal ArticleDOI
TL;DR: In this paper, a two-state mixture Gaussian model is used to perform asymptotically optimal Bayesian inference using belief propagation decoding, which represents the CS encoding matrix as a graphical model.
Abstract: Compressive sensing (CS) is an emerging field based on the revelation that a small collection of linear projections of a sparse signal contains enough information for stable, sub-Nyquist signal acquisition When a statistical characterization of the signal is available, Bayesian inference can complement conventional CS methods based on linear programming or greedy algorithms We perform asymptotically optimal Bayesian inference using belief propagation (BP) decoding, which represents the CS encoding matrix as a graphical model Fast computation is obtained by reducing the size of the graphical model with sparse encoding matrices To decode a length-N signal containing K large coefficients, our CS-BP decoding algorithm uses O(K log(N)) measurements and O(N log2(N)) computation Finally, although we focus on a two-state mixture Gaussian model, CS-BP is easily adapted to other signal models

468 citations


Proceedings Article
06 Dec 2010
TL;DR: This paper establishes the consistency of an extended Bayesian information criterion for Gaussian graphical models in a scenario where both the number of variables p and the sample size n grow.
Abstract: Gaussian graphical models with sparsity in the inverse covariance matrix are of significant interest in many modern applications. For the problem of recovering the graphical structure, information criteria provide useful optimization objectives for algorithms searching through sets of graphs or for selection of tuning parameters of other methods such as the graphical lasso, which is a likelihood penalization technique. In this paper we establish the consistency of an extended Bayesian information criterion for Gaussian graphical models in a scenario where both the number of variables p and the sample size n grow. Compared to earlier work on the regression case, our treatment allows for growth in the number of non-zero parameters in the true model, which is necessary in order to cover connected graphs. We demonstrate the performance of this criterion on simulated data when used in conjunction with the graphical lasso, and verify that the criterion indeed performs better than either cross-validation or the ordinary Bayesian information criterion when p and the number of non-zero parameters q both scale with n.

377 citations


01 Jan 2010
TL;DR: This paper provides guidelines to developing and evaluating Bayesian network models of environmental systems, and presents a case study habitat suitability model for juvenile Astacopsis gouldi, the giant freshwater crayfish of Tasmania.

346 citations


Posted Content
TL;DR: The method has a clear interpretation: the authors use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling, which requires essentially no conditions.
Abstract: A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include $K$-fold cross-validation ($K$-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including $K$-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.

324 citations


Posted Content
TL;DR: In this paper, the consistency of an extended Bayesian information criterion for Gaussian graphical models in a scenario where both the number of variables p and the sample size n grow is established.
Abstract: Gaussian graphical models with sparsity in the inverse covariance matrix are of significant interest in many modern applications. For the problem of recovering the graphical structure, information criteria provide useful optimization objectives for algorithms searching through sets of graphs or for selection of tuning parameters of other methods such as the graphical lasso, which is a likelihood penalization technique. In this paper we establish the consistency of an extended Bayesian information criterion for Gaussian graphical models in a scenario where both the number of variables p and the sample size n grow. Compared to earlier work on the regression case, our treatment allows for growth in the number of non-zero parameters in the true model, which is necessary in order to cover connected graphs. We demonstrate the performance of this criterion on simulated data when used in conjunction with the graphical lasso, and verify that the criterion indeed performs better than either cross-validation or the ordinary Bayesian information criterion when p and the number of non-zero parameters q both scale with n.

317 citations


Journal ArticleDOI
TL;DR: The software package libDAI, a free & open source C++ library that provides implementations of various exact and approximate inference methods for graphical models with discrete-valued variables, is described.
Abstract: This paper describes the software package libDAI, a free & open source C++ library that provides implementations of various exact and approximate inference methods for graphical models with discrete-valued variables. libDAI supports directed graphical models (Bayesian networks) as well as undirected ones (Markov random fields and factor graphs). It offers various approximations of the partition sum, marginal probability distributions and maximum probability states. Parameter learning is also supported. A feature comparison with other open source software packages for approximate inference is given. libDAI is licensed under the GPL v2+ license and is available at http://www.libdai.org.

299 citations


Book ChapterDOI
01 Apr 2010
TL;DR: The role of hierarchical modeling in Bayesian nonparametrics is discussed, focusing on models in which the infinite-dimensional parameters are treated hierarchically, and the value of these hierarchical constructions is demonstrated in a wide range of practical applications.
Abstract: Hierarchical modeling is a fundamental concept in Bayesian statistics. The basic idea is that parameters are endowed with distributions which may themselves introduce new parameters, and this construction recurses. In this review we discuss the role of hierarchical modeling in Bayesian nonparametrics, focusing on models in which the infinite-dimensional parameters are treated hierarchically. For example, we consider a model in which the base measure for a Dirichlet process is itself treated as a draw from another Dirichlet process. This yields a natural recursion that we refer to as a hierarchical Dirichlet process. We also discuss hierarchies based on the Pitman-Yor process and on completely random processes. We demonstrate the value of these hierarchical constructions in a wide range of practical applications, in problems in computational biology, computer vision and natural language processing.

Journal ArticleDOI
TL;DR: In this paper, a temporally smoothed l 1-regularized logistic regression formalism is proposed to estimate time-varying networks from time series of entity attributes, which can be cast as a standard convex optimization problem and solved efficiently using generic solvers scalable to large networks.
Abstract: Stochastic networks are a plausible representation of the relational information among entities in dynamic systems such as living cells or social communities. While there is a rich literature in estimating a static or temporally invariant network from observation data, little has been done toward estimating time-varying networks from time series of entity attributes. In this paper we present two new machine learning methods for estimating time-varying networks, which both build on a temporally smoothed l1-regularized logistic regression formalism that can be cast as a standard convex-optimization problem and solved efficiently using generic solvers scalable to large networks. We report promising results on recovering simulated time-varying networks. For real data sets, we reverse engineer the latent sequence of temporally rewiring political networks between Senators from the US Senate voting records and the latent evolving regulatory networks underlying 588 genes across the life cycle of Drosophila melanogaster from the microarray time course.

Book ChapterDOI
05 Sep 2010
TL;DR: This work presents a discriminatively trained model for joint modelling of object class labels and their visual attributes and captures the correlations among attributes using an undirected graphical model built from training data.
Abstract: We present a discriminatively trained model for joint modelling of object class labels (e.g. "person", "dog", "chair", etc.) and their visual attributes (e.g. "has head", "furry", "metal", etc.). We treat attributes of an object as latent variables in our model and capture the correlations among attributes using an undirected graphical model built from training data. The advantage of our model is that it allows us to infer object class labels using the information of both the test image itself and its (latent) attributes. Our model unifies object class prediction and attribute prediction in a principled framework. It is also flexible enough to deal with different performance measurements. Our experimental results provide quantitative evidence that attributes can improve object naming.

Book ChapterDOI
05 Sep 2010
TL;DR: This work bypasses a global probabilistic model and instead directly train a hierarchical inference procedure inspired by the message passing mechanics of some approximate inference procedures in graphical models, which mitigates both the theoretical and empirical difficulties of learning Probabilistic models when exact inference is intractable.
Abstract: In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work on structured prediction for scene understanding, we bypass a global probabilistic model and instead directly train a hierarchical inference procedure inspired by the message passing mechanics of some approximate inference procedures in graphical models. This approach mitigates both the theoretical and empirical difficulties of learning probabilistic models when exact inference is intractable. In particular, we draw from recent work in machine learning and break the complex inference process into a hierarchical series of simple machine learning subproblems. Each subproblem in the hierarchy is designed to capture the image and contextual statistics in the scene. This hierarchy spans coarse-to-fine regions and explicitly models the mixtures of semantic labels that may be present due to imperfect segmentation. To avoid cascading of errors and overfitting, we train the learning problems in sequence to ensure robustness to likely errors earlier in the inference sequence and leverage the stacking approach developed by Cohen et al.

Journal ArticleDOI
TL;DR: This work describes an extension of BP to continuous variable models, generalizing particle filtering, and Gaussian mixture filtering techniques for time series to more complex models and illustrates the power of the resulting nonparametric BP algorithm via two applications: kinematic tracking of visual motion and distributed localization in sensor networks.
Abstract: Continuous quantities are ubiquitous in models of real-world phenomena, but are surprisingly difficult to reason about automatically. Probabilistic graphical models such as Bayesian networks and Markov random fields, and algorithms for approximate inference such as belief propagation (BP), have proven to be powerful tools in a wide range of applications in statistics and artificial intelligence. However, applying these methods to models with continuous variables remains a challenging task. In this work we describe an extension of BP to continuous variable models, generalizing particle filtering, and Gaussian mixture filtering techniques for time series to more complex models. We illustrate the power of the resulting nonparametric BP algorithm via two applications: kinematic tracking of visual motion and distributed localization in sensor networks.

Proceedings Article
06 Dec 2010
TL;DR: StARS as discussed by the authors uses the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling, i.e. with high probability, all true edges will be included in the selected model even when the graph size diverges with the sample size.
Abstract: A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include K-fold cross-validation (K-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including K-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.

Posted Content
TL;DR: In this paper, the authors describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population.
Abstract: Spontaneous brain activity, as observed in functional neuroimaging, has been shown to display reproducible structure that expresses brain architecture and carries markers of brain pathologies. An important view of modern neuroscience is that such large-scale structure of coherent activity reflects modularity properties of brain connectivity graphs. However, to date, there has been no demonstration that the limited and noisy data available in spontaneous activity observations could be used to learn full-brain probabilistic models that generalize to new data. Learning such models entails two main challenges: i) modeling full brain connectivity is a difficult estimation problem that faces the curse of dimensionality and ii) variability between subjects, coupled with the variability of functional signals between experimental runs, makes the use of multiple datasets challenging. We describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population. We show that individual models learned from functional Magnetic Resonance Imaging (fMRI) data using this population prior generalize better to unseen data than models based on alternative regularization schemes. To our knowledge, this is the first report of a cross-validated model of spontaneous brain activity. Finally, we use the estimated graphical model to explore the large-scale characteristics of functional architecture and show for the first time that known cognitive networks appear as the integrated communities of functional connectivity graph.

Posted Content
TL;DR: The world of graphs in computing is explored and situations in which graphical models are beneficial are exposed.
Abstract: A graph is a data structure composed of dots (i.e. vertices) and lines (i.e. edges). The dots and lines of a graph can be organized into intricate arrangements. The ability for a graph to denote objects and their relationships to one another allow for a surprisingly large number of things to be modeled as a graph. From the dependencies that link software packages to the wood beams that provide the framing to a house, most anything has a corresponding graph representation. However, just because it is possible to represent something as a graph does not necessarily mean that its graph representation will be useful. If a modeler can leverage the plethora of tools and algorithms that store and process graphs, then such a mapping is worthwhile. This article explores the world of graphs in computing and exposes situations in which graphical models are beneficial.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: In this article, the problem of reconstructing an image from a bag of square, non-overlapping image patches, the jigsaw puzzle problem, is considered and a graphical model is developed to solve it.
Abstract: We explore the problem of reconstructing an image from a bag of square, non-overlapping image patches, the jigsaw puzzle problem. Completing jigsaw puzzles is challenging and requires expertise even for humans, and is known to be NP-complete. We depart from previous methods that treat the problem as a constraint satisfaction problem and develop a graphical model to solve it. Each patch location is a node and each patch is a label at nodes in the graph. A graphical model requires a pairwise compatibility term, which measures an affinity between two neighboring patches, and a local evidence term, which we lack. This paper discusses ways to obtain these terms for the jigsaw puzzle problem. We evaluate several patch compatibility metrics, including the natural image statistics measure, and experimentally show that the dissimilarity-based compatibility – measuring the sum-of-squared color difference along the abutting boundary – gives the best results. We compare two forms of local evidence for the graphical model: a sparse-and-accurate evidence and a dense-and-noisy evidence. We show that the sparse-and-accurate evidence, fixing as few as 4 – 6 patches at their correct locations, is enough to reconstruct images consisting of over 400 patches. To the best of our knowledge, this is the largest puzzle solved in the literature. We also show that one can coarsely estimate the low resolution image from a bag of patches, suggesting that a bag of image patches encodes some geometric information about the original image.

Posted Content
TL;DR: Surprisingly, the analysis of fast approximate message passing algorithms allows to prove exact high-dimensional limit results for the LASSO risk.
Abstract: This paper surveys recent work in applying ideas from graphical models and message passing algorithms to solve large scale regularized regression problems. In particular, the focus is on compressed sensing reconstruction via ell_1 penalized least-squares (known as LASSO or BPDN). We discuss how to derive fast approximate message passing algorithms to solve this problem. Surprisingly, the analysis of such algorithms allows to prove exact high-dimensional limit results for the LASSO risk. This paper will appear as a chapter in a book on `Compressed Sensing' edited by Yonina Eldar and Gitta Kutyniok.

Journal ArticleDOI
TL;DR: In this paper, an efficient penalized likelihood method for estimating the adjacency matrix of directed acyclic graphs, when variables inherit a natural ordering, was proposed, and the adaptive lasso can consistently estimate the true graph under the usual regularity assumptions.
Abstract: Directed acyclic graphs are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical and biological systems where directed edges between nodes represent the influence of components of the system on each other. Estimation of directed graphs from observational data is computationally NP-hard. In addition, directed graphs with the same structure may be indistinguishable based on observations alone. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose an efficient penalized likelihood method for estimation of the adjacency matrix of directed acyclic graphs, when variables inherit a natural ordering. We study variable selection consistency of lasso and adaptive lasso penalties in high-dimensional sparse settings, and propose an error-based choice for selecting the tuning parameter. We show that although the lasso is only variable selection consistent under stringent conditions, the adaptive lasso can consistently estimate the true graph under the usual regularity assumptions.

Posted Content
TL;DR: In this paper, a first-order method based on an alternating linearization technique was proposed to learn the structure of the graph by estimating a sparse inverse covariance matrix from sample data, by solving a convex maximum likelihood problem with an $\ell_1$-regularization term.
Abstract: Gaussian graphical models are of great interest in statistical learning. Because the conditional independencies between different nodes correspond to zero entries in the inverse covariance matrix of the Gaussian distribution, one can learn the structure of the graph by estimating a sparse inverse covariance matrix from sample data, by solving a convex maximum likelihood problem with an $\ell_1$-regularization term. In this paper, we propose a first-order method based on an alternating linearization technique that exploits the problem's special structure; in particular, the subproblems solved in each iteration have closed-form solutions. Moreover, our algorithm obtains an $\epsilon$-optimal solution in $O(1/\epsilon)$ iterations. Numerical experiments on both synthetic and real data from gene association networks show that a practical version of this algorithm outperforms other competitive algorithms.

Journal ArticleDOI
TL;DR: The modeling framework can be viewed as a combination of dimensionality reduction and graphical modeling (to capture remaining statistical structure not attributable to the latent variables) and it consistently estimates both the number of latent components and the conditional graphical model structure among the observed variables.
Abstract: Suppose we observe samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of latent components, and to learn a statistical model over the entire collection of variables? We address this question in the setting in which the latent and observed variables are jointly Gaussian, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model. As a first step we give natural conditions under which such latent-variable Gaussian graphical models are identifiable given marginal statistics of only the observed variables. Essentially these conditions require that the conditional graphical model among the observed variables is sparse, while the effect of the latent variables is "spread out" over most of the observed variables. Next we propose a tractable convex program based on regularized maximum-likelihood for model selection in this latent-variable setting; the regularizer uses both the $\ell_1$ norm and the nuclear norm. Our modeling framework can be viewed as a combination of dimensionality reduction (to identify latent variables) and graphical modeling (to capture remaining statistical structure not attributable to the latent variables), and it consistently estimates both the number of latent components and the conditional graphical model structure among the observed variables. These results are applicable in the high-dimensional setting in which the number of latent/observed variables grows with the number of samples of the observed variables. The geometric properties of the algebraic varieties of sparse matrices and of low-rank matrices play an important role in our analysis.

Journal ArticleDOI
TL;DR: A validated stochastic model is proposed to estimate the probability that an image part is of interest and is referred to as saliency and thus specify saliency in a mathematically well-defined sense.
Abstract: Computer vision attention processes assign variable-hypothesized importance to different parts of the visual input and direct the allocation of computational resources. This nonuniform allocation might help accelerate the image analysis process. This paper proposes a new bottom-up attention mechanism. Rather than taking the traditional approach, which tries to model human attention, we propose a validated stochastic model to estimate the probability that an image part is of interest. We refer to this probability as saliency and thus specify saliency in a mathematically well-defined sense. The model quantifies several intuitive observations, such as the greater likelihood of correspondence between visually similar image regions and the likelihood that only a few of interesting objects will be present in the scene. The latter observation, which implies that such objects are (relaxed) global exceptions, replaces the traditional preference for local contrast. The algorithm starts with a rough preattentive segmentation and then uses a graphical model approximation to efficiently reveal which segments are more likely to be of interest. Experiments on natural scenes containing a variety of objects demonstrate the proposed method and show its advantages over previous approaches.

Book ChapterDOI
18 Nov 2010
TL;DR: In this paper, Eldar and Kutynok survey recent work in applying ideas from graphical models and message passing algorithms to solve large scale regularized regression problems, focusing on compressed sensing reconstruction via 1 penalized least squares (known as LASSO or BPDN).
Abstract: This paper surveys recent work in applying ideas from graphical models and message passing algorithms to solve large scale regularized regression problems. In particular, the focus is on compressed sensing reconstruction via `1 penalized least-squares (known as LASSO or BPDN). We discuss how to derive fast approximate message passing algorithms to solve this problem. Surprisingly, the analysis of such algorithms allows to prove exact high-dimensional limit results for the LASSO risk. This paper will appear as a chapter in a book on ‘Compressed Sensing’ edited by Yonina Eldar and Gitta Kutynok.

Journal ArticleDOI
TL;DR: The non-stationary DBN model is defined, an MCMC sampling algorithm is presented for learning the structure of the model from time-series data under different assumptions, and the effectiveness of the algorithm is demonstrated on both simulated and biological data.
Abstract: Learning dynamic Bayesian network structures provides a principled mechanism for identifying conditional dependencies in time-series data An important assumption of traditional DBN structure learning is that the data are generated by a stationary process, an assumption that is not true in many important settings In this paper, we introduce a new class of graphical model called a non-stationary dynamic Bayesian network, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time Some examples of evolving networks are transcriptional regulatory networks during an organism's development, neural pathways during learning, and traffic patterns during the day We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data

Proceedings Article
11 Jul 2010
TL;DR: Bagel is presented, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators, and can generate natural and informative utterances from unseen inputs in the information presentation domain.
Abstract: Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents Bagel, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that Bagel can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation performance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data.

Proceedings Article
06 Dec 2010
TL;DR: A new probabilistic model of optical flow in layers that addresses many of the shortcomings of previous approaches and achieves state-of-the-art results on the Middlebury benchmark and produces meaningful scene segmentations as well as detected occlusion regions.
Abstract: Layered models are a powerful way of describing natural scenes containing smooth surfaces that may overlap and occlude each other. For image motion estimation, such models have a long history but have not achieved the wide use or accuracy of non-layered methods. We present a new probabilistic model of optical flow in layers that addresses many of the shortcomings of previous approaches. In particular, we define a probabilistic graphical model that explicitly captures: 1) occlusions and disocclusions; 2) depth ordering of the layers; 3) temporal consistency of the layer segmentation. Additionally the optical flow in each layer is modeled by a combination of a parametric model and a smooth deviation based on an MRF with a robust spatial prior; the resulting model allows roughness in layers. Finally, a key contribution is the formulation of the layers using an image-dependent hidden field prior based on recent models for static scene segmentation. The method achieves state-of-the-art results on the Middlebury benchmark and produces meaningful scene segmentations as well as detected occlusion regions.

Proceedings Article
31 Mar 2010
TL;DR: A non-linear graphical model for structured prediction that combines the power of deep neural networks to extract high level features with the graphical framework of Markov networks, yielding a powerful and scalable probabilistic model that is applied to signal labeling tasks.
Abstract: We propose a non-linear graphical model for structured prediction. It combines the power of deep neural networks to extract high level features with the graphical framework of Markov networks, yielding a powerful and scalable probabilistic model that we apply to signal labeling tasks.

Posted Content
TL;DR: The pair-copula model is very general and the Bayesian method generalizes many previous approaches for the analysis of longitudinal data, and is shown to be reliable and can improve the estimates of both conditional and unconditional pairwise dependencies substantially.
Abstract: Copulas have proven to be very successful tools for the flexible modelling of cross-sectional dependence. In this paper we express the dependence structure of continuous-valued time series data using a sequence of bivariate copulas. This corresponds to a type of decomposition recently called a ‘vine’ in the graphical models literature, where each copula is entitled a ‘pair-copula’. We propose a Bayesian approach for the estimation of this dependence structure for longitudinal data. Bayesian selection ideas are used to identify any independence pair-copulas, with the end result being a parsimonious representation of a time-inhomogeneous Markov process of varying order. Estimates are Bayesian model averages over the distribution of the lag structure of the Markov process. Using a simulation study we show that the selection approach is reliable and can improve the estimates of both conditional and unconditional pairwise dependencies substantially. We also show that a vine with selection out-performs a Gaussian copula with a flexible correlation matrix. The advantage of the pair-copula formulation is further demonstrated using a longitudinal model of intraday electricity load. Using Gaussian, Gumbel and Clayton pair-copulas we identify parsimonious decompositions of intraday serial dependence, which improve the accuracy of intraday load forecasts. We also propose a new diagnostic for measuring the goodness of fit of high-dimensional multivariate copulas. Overall, the pair-copula model is very general and the Bayesian method generalizes many previous approaches for the analysis of longitudinal data. Supplemental materials for the article are also available online.