Showing papers on "Graphical model published in 2015"

PDF

Open Access

Proceedings Article•

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

[...]

Liang-Chieh Chen¹, George Papandreou², Iasonas Kokkinos³, Kevin Murphy², Alan L. Yuille¹ - Show less +1 more•Institutions (3)

University of California, Los Angeles¹, Google², CentraleSupélec³

07 May 2015

TL;DR: DeepLab as mentioned in this paper combines the responses at the final layer with a fully connected CRF to localize segment boundaries at a level of accuracy beyond previous methods, achieving 71.6% IOU accuracy in the test set.

...read moreread less

Abstract: Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and object detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification (also called "semantic image segmentation"). We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Qualitatively, our "DeepLab" system is able to localize segment boundaries at a level of accuracy which is beyond previous methods. Quantitatively, our method sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 71.6% IOU accuracy in the test set. We show how these results can be obtained efficiently: Careful network re-purposing and a novel application of the 'hole' algorithm from the wavelet community allow dense computation of neural net responses at 8 frames per second on a modern GPU.

...read moreread less

2,469 citations

Book•DOI•

Statistical Learning with Sparsity: The Lasso and Generalizations

[...]

Trevor Hastie, Robert Tibshirani, Martin J. Wainwright

07 May 2015

TL;DR: Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets.

...read moreread less

Abstract: Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data. Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of 1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso. In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.

...read moreread less

2,275 citations

Proceedings Article•DOI•

Conditional Random Fields as Recurrent Neural Networks

[...]

Shuai Zheng¹, Sadeep Jayasumana¹, Bernardino Romera-Paredes¹, Vibhav Vineet², Zhizhong Su, Dalong Du, Chang Huang³, Philip H. S. Torr¹ - Show less +4 more•Institutions (3)

University of Oxford¹, Stanford University², Baidu³

07 Dec 2015

TL;DR: In this article, a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling is introduced.

...read moreread less

Abstract: Pixel-level labelling tasks, such as semantic segmentation, play a central role in image understanding. Recent approaches have attempted to harness the capabilities of deep learning techniques for image recognition to tackle pixel-level labelling tasks. One central issue in this methodology is the limited capacity of deep learning techniques to delineate visual objects. To solve this problem, we introduce a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling. To this end, we formulate Conditional Random Fields with Gaussian pairwise potentials and mean-field approximate inference as Recurrent Neural Networks. This network, called CRF-RNN, is then plugged in as a part of a CNN to obtain a deep network that has desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF modelling with CNNs, making it possible to train the whole deep network end-to-end with the usual back-propagation algorithm, avoiding offline post-processing methods for object delineation. We apply the proposed method to the problem of semantic image segmentation, obtaining top results on the challenging Pascal VOC 2012 segmentation benchmark.

...read moreread less

1,973 citations

Proceedings Article•DOI•

Learning from massive noisy labeled data for image classification

[...]

Tong Xiao¹, Xia Tian², Yi Yang², Chang Huang², Xiaogang Wang¹ - Show less +1 more•Institutions (2)

The Chinese University of Hong Kong¹, Baidu²

07 Jun 2015

TL;DR: A general framework to train CNNs with only a limited number of clean labels and millions of easily obtained noisy labels is introduced and the relationships between images, class labels and label noises are model with a probabilistic graphical model and further integrate it into an end-to-end deep learning system.

...read moreread less

Abstract: Large-scale supervised datasets are crucial to train convolutional neural networks (CNNs) for various computer vision problems. However, obtaining a massive amount of well-labeled data is usually very expensive and time consuming. In this paper, we introduce a general framework to train CNNs with only a limited number of clean labels and millions of easily obtained noisy labels. We model the relationships between images, class labels and label noises with a probabilistic graphical model and further integrate it into an end-to-end deep learning system. To demonstrate the effectiveness of our approach, we collect a large-scale real-world clothing classification dataset with both noisy and clean labels. Experiments on this dataset indicate that our approach can better correct the noisy labels and improves the performance of trained CNNs.

...read moreread less

893 citations

Book•

Machine Learning: A Bayesian and Optimization Perspective

[...]

Sergios Theodoridis

10 Apr 2015

TL;DR: This tutorial text gives a unifying perspective on machine learning by covering both Probabilistic and deterministic approaches - which are based on optimization techniques together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models.

...read moreread less

Abstract: This tutorial text gives a unifying perspective on machine learning by covering bothprobabilistic and deterministic approaches -which are based on optimization techniques together with the Bayesian inference approach, whose essence liesin the use of a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as short courses on sparse modeling, deep learning, and probabilistic graphical models. All major classical techniques: Mean/Least-Squares regression and filtering, Kalman filtering, stochastic approximation and online learning, Bayesian classification, decision trees, logistic regression and boosting methods. The latest trends: Sparsity, convex analysis and optimization, online distributed algorithms, learning in RKH spaces, Bayesian inference, graphical and hidden Markov models, particle filtering, deep learning, dictionary learning and latent variables modeling. Case studies - protein folding prediction, optical character recognition, text authorship identification, fMRI data analysis, change point detection, hyperspectral image unmixing, target localization, channel equalization and echo cancellation, show how the theory can be applied. MATLAB code for all the main algorithms are available on an accompanying website, enabling the reader to experiment with the code.

...read moreread less

397 citations

Proceedings Article•DOI•

Deformable part models are convolutional neural networks

[...]

Ross Girshick¹, Forrest Iandola², Trevor Darrell², Jitendra Malik²•Institutions (2)

Microsoft¹, University of California, Berkeley²

07 Jun 2015

TL;DR: This paper shows that a DPM can be formulated as a CNN, thus providing a synthesis of the two ideas and calls the resulting model a DeepPyramid DPM, which is found to significantly outperform DPMs based on histograms of oriented gradients features (HOG) and slightly outperforms a comparable version of the recently introduced R-CNN detection system, while running significantly faster.

...read moreread less

Abstract: Deformable part models (DPMs) and convolutional neural networks (CNNs) are two widely used tools for visual recognition. They are typically viewed as distinct approaches: DPMs are graphical models (Markov random fields), while CNNs are “black-box” non-linear classifiers. In this paper, we show that a DPM can be formulated as a CNN, thus providing a synthesis of the two ideas. Our construction involves unrolling the DPM inference algorithm and mapping each step to an equivalent CNN layer. From this perspective, it is natural to replace the standard image features used in DPMs with a learned feature extractor. We call the resulting model a DeepPyramid DPM and experimentally validate it on PASCAL VOC object detection. We find that DeepPyramid DPMs significantly outperform DPMs based on histograms of oriented gradients features (HOG) and slightly outperforms a comparable version of the recently introduced R-CNN detection system, while running significantly faster.

...read moreread less

389 citations

Proceedings Article•DOI•

Predicting Program Properties from "Big Code"

[...]

Veselin Raychev¹, Martin Vechev¹, Andreas Krause¹•Institutions (1)

ETH Zurich¹

14 Jan 2015

TL;DR: This work formulating the problem of inferring program properties as structured prediction and showing how to perform both learning and inference in this context opens up new possibilities for attacking a wide range of difficult problems in the context of "Big Code" including invariant generation, decompilation, synthesis and others.

...read moreread less

Abstract: We present a new approach for predicting program properties from massive codebases (aka "Big Code"). Our approach first learns a probabilistic model from existing data and then uses this model to predict properties of new, unseen programs. The key idea of our work is to transform the input program into a representation which allows us to phrase the problem of inferring program properties as structured prediction in machine learning. This formulation enables us to leverage powerful probabilistic graphical models such as conditional random fields (CRFs) in order to perform joint prediction of program properties. As an example of our approach, we built a scalable prediction engine called JSNice for solving two kinds of problems in the context of JavaScript: predicting (syntactic) names of identifiers and predicting (semantic) type annotations of variables. Experimentally, JSNice predicts correct names for 63% of name identifiers and its type annotation predictions are correct in 81% of the cases. In the first week since its release, JSNice was used by more than 30,000 developers and in only few months has become a popular tool in the JavaScript developer community. By formulating the problem of inferring program properties as structured prediction and showing how to perform both learning and inference in this context, our work opens up new possibilities for attacking a wide range of difficult problems in the context of "Big Code" including invariant generation, decompilation, synthesis and others.

...read moreread less

387 citations

Posted Content•

Order Matters: Sequence to sequence for sets

[...]

Oriol Vinyals¹, Samy Bengio¹, Manjunath Kudlur¹•Institutions (1)

Google¹

19 Nov 2015-arXiv: Machine Learning

TL;DR: In this article, the authors show that the order in which we organize input and output data matters significantly when learning an underlying model and propose an extension of the seq2seq framework that goes beyond sequences and handles input sets in a principled way.

...read moreread less

Abstract: Sequences have become first class citizens in supervised learning thanks to the resurgence of recurrent neural networks. Many complex tasks that require mapping from or to a sequence of observations can now be formulated with the sequence-to-sequence (seq2seq) framework which employs the chain rule to efficiently represent the joint probability of sequences. In many cases, however, variable sized inputs and/or outputs might not be naturally expressed as sequences. For instance, it is not clear how to input a set of numbers into a model where the task is to sort them; similarly, we do not know how to organize outputs when they correspond to random variables and the task is to model their unknown joint probability. In this paper, we first show using various examples that the order in which we organize input and/or output data matters significantly when learning an underlying model. We then discuss an extension of the seq2seq framework that goes beyond sequences and handles input sets in a principled way. In addition, we propose a loss which, by searching over possible orders during training, deals with the lack of structure of output sets. We show empirical evidence of our claims regarding ordering, and on the modifications to the seq2seq framework on benchmark language modeling and parsing tasks, as well as two artificial tasks -- sorting numbers and estimating the joint probability of unknown graphical models.

...read moreread less

374 citations

Posted Content•

Fully Connected Deep Structured Networks

[...]

Alexander G. Schwing, Raquel Urtasun

09 Mar 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work unifies this two-stage process for semantic segmentation into a single joint training algorithm and demonstrates the method on the semantic image segmentation task and shows encouraging results on the challenging PASCAL VOC 2012 dataset.

...read moreread less

Abstract: Convolutional neural networks with many layers have recently been shown to achieve excellent results on many high-level tasks such as image classification, object detection and more recently also semantic segmentation. Particularly for semantic segmentation, a two-stage procedure is often employed. Hereby, convolutional networks are trained to provide good local pixel-wise features for the second step being traditionally a more global graphical model. In this work we unify this two-stage process into a single joint training algorithm. We demonstrate our method on the semantic image segmentation task and show encouraging results on the challenging PASCAL VOC 2012 dataset.

...read moreread less

324 citations

Posted Content•

Hinge-Loss Markov Random Fields and Probabilistic Soft Logic

[...]

Stephen H. Bach¹, Matthias Broecheler, Bert Huang², Lise Getoor³•Institutions (3)

Stanford University¹, Virginia Tech², University of California, Santa Cruz³

17 May 2015-arXiv: Learning

TL;DR: In this paper, hinge-loss Markov random fields (HL-MRFs) and probabilistic soft logic (PSL) are proposed to model rich, structured data at scales not previously possible.

...read moreread less

Abstract: A fundamental challenge in developing high-impact machine learning technologies is balancing the need to model rich, structured domains with the ability to scale to big data. Many important problem areas are both richly structured and large scale, from social and biological networks, to knowledge graphs and the Web, to images, video, and natural language. In this paper, we introduce two new formalisms for modeling structured data, and show that they can both capture rich structure and scale to big data. The first, hinge-loss Markov random fields (HL-MRFs), is a new kind of probabilistic graphical model that generalizes different approaches to convex inference. We unite three approaches from the randomized algorithms, probabilistic graphical models, and fuzzy logic communities, showing that all three lead to the same inference objective. We then define HL-MRFs by generalizing this unified objective. The second new formalism, probabilistic soft logic (PSL), is a probabilistic programming language that makes HL-MRFs easy to define using a syntax based on first-order logic. We introduce an algorithm for inferring most-probable variable assignments (MAP inference) that is much more scalable than general-purpose convex optimization methods, because it uses message passing to take advantage of sparse dependency structures. We then show how to learn the parameters of HL-MRFs. The learned HL-MRFs are as accurate as analogous discrete models, but much more scalable. Together, these algorithms enable HL-MRFs and PSL to model rich, structured data at scales not previously possible.

...read moreread less

230 citations

Proceedings Article•

Deep Structured Output Learning for Unconstrained Text Recognition

[...]

Max Jaderberg¹, Max Jaderberg², Karen Simonyan², Karen Simonyan¹, Andrea Vedaldi², Andrew Zisserman², Andrew Zisserman¹ - Show less +3 more•Institutions (2)

Google¹, University of Oxford²

01 Jan 2015

TL;DR: A convolutional neural network based architecture which incorporates a Conditional Random Field graphical model, taking the whole word image as a single input, which achieves state-of-the-art accuracy in lexicon-constrained scenarios, without being specifically modelled for constrained recognition.

...read moreread less

Abstract: We develop a representation suitable for the unconstrained recognition of words in natural images: the general case of no fixed lexicon and unknown length. To this end we propose a convolutional neural network (CNN) based architecture which incorporates a Conditional Random Field (CRF) graphical model, taking the whole word image as a single input. The unaries of the CRF are provided by a CNN that predicts characters at each position of the output, while higher order terms are provided by another CNN that detects the presence of N-grams. We show that this entire model (CRF, character predictor, N-gram predictor) can be jointly optimised by back-propagating the structured output loss, essentially requiring the system to perform multi-task learning, and training uses purely synthetically generated data. The resulting model is a more accurate system on standard real-world text recognition benchmarks than character prediction alone, setting a benchmark for systems that have not been trained on a particular lexicon. In addition, our model achieves state-of-the-art accuracy in lexicon-constrained scenarios, without being specifically modelled for constrained recognition. To test the generalisation of our model, we also perform experiments with random alpha-numeric strings to evaluate the method when no visual language model is applicable.

...read moreread less

Proceedings Article•DOI•

A Deep Hybrid Model for Weather Forecasting

[...]

Aditya Grover¹, Ashish Kapoor², Eric Horvitz²•Institutions (2)

Indian Institute of Technology Delhi¹, Microsoft²

10 Aug 2015

TL;DR: This work studies specifically the power of making predictions via a hybrid approach that combines discriminatively trained predictive models with a deep neural network that models the joint statistics of a set of weather-related variables.

...read moreread less

Abstract: Weather forecasting is a canonical predictive challenge that has depended primarily on model-based methods. We explore new directions with forecasting weather as a data-intensive challenge that involves inferences across space and time. We study specifically the power of making predictions via a hybrid approach that combines discriminatively trained predictive models with a deep neural network that models the joint statistics of a set of weather-related variables. We show how the base model can be enhanced with spatial interpolation that uses learned long-range spatial dependencies. We also derive an efficient learning and inference procedure that allows for large scale optimization of the model parameters. We evaluate the methods with experiments on real-world meteorological data that highlight the promise of the approach.

...read moreread less

Journal Article•DOI•

Asymptotic normality and optimalities in estimation of large Gaussian graphical models

[...]

Zhao Ren¹, Tingni Sun, Cun-Hui Zhang, Harrison H. Zhou¹•Institutions (1)

Yale University¹

01 Jun 2015-Annals of Statistics

TL;DR: In this article, a regression approach is proposed to obtain asymptotically efficient estimation of each entry of a precision matrix under a sparseness condition relative to the sample size.

...read moreread less

Abstract: The Gaussian graphical model, a popular paradigm for studying relationship among variables in a wide range of applications, has attracted great attention in recent years. This paper considers a fundamental question: When is it possible to estimate low-dimensional parameters at parametric square-root rate in a large Gaussian graphical model? A novel regression approach is proposed to obtain asymptotically efficient estimation of each entry of a precision matrix under a sparseness condition relative to the sample size. When the precision matrix is not sufficiently sparse, or equivalently the sample size is not sufficiently large, a lower bound is established to show that it is no longer possible to achieve the parametric rate in the estimation of each entry. This lower bound result, which provides an answer to the delicate sample size question, is established with a novel construction of a subset of sparse precision matrices in an application of Le Cam’s lemma. Moreover, the proposed estimator is proven to have optimal convergence rate when the parametric rate cannot be achieved, under a minimal sample requirement. The proposed estimator is applied to test the presence of an edge in the Gaussian graphical model or to recover the support of the entire model, to obtain adaptive rate-optimal estimation of the entire precision matrix as measured by the matrix $\ell_{q}$ operator norm and to make inference in latent variables in the graphical model. All of this is achieved under a sparsity condition on the precision matrix and a side condition on the range of its spectrum. This significantly relaxes the commonly imposed uniform signal strength condition on the precision matrix, irrepresentability condition on the Hessian tensor operator of the covariance matrix or the $\ell_{1}$ constraint on the precision matrix. Numerical results confirm our theoretical findings. The ROC curve of the proposed algorithm, Asymptotic Normal Thresholding (ANT), for support recovery significantly outperforms that of the popular GLasso algorithm.

...read moreread less

Journal Article•DOI•

Bayesian Inference of Multiple Gaussian Graphical Models

[...]

Christine B. Peterson¹, Francesco C. Stingo², Marina Vannucci³•Institutions (3)

Stanford University¹, University of Texas MD Anderson Cancer Center², Rice University³

01 Mar 2015-Journal of the American Statistical Association

TL;DR: This article addresses the problem of inferring multiple undirected networks in situations where some of the networks may be unrelated, while others share common features, and proposes a Bayesian approach to inference on multiple Gaussian graphical models.

...read moreread less

Abstract: In this article, we propose a Bayesian approach to inference on multiple Gaussian graphical models. Specifically, we address the problem of inferring multiple undirected networks in situations where some of the networks may be unrelated, while others share common features. We link the estimation of the graph structures via a Markov random field (MRF) prior, which encourages common edges. We learn which sample groups have a shared graph structure by placing a spike-and-slab prior on the parameters that measure network relatedness. This approach allows us to share information between sample groups, when appropriate, as well as to obtain a measure of relative network similarity across groups. Our modeling framework incorporates relevant prior knowledge through an edge-specific informative prior and can encourage similarity to an established network. Through simulations, we demonstrate the utility of our method in summarizing relative network similarity and compare its performance against related methods. We ...

...read moreread less

Journal Article•DOI•

Evaluating risk of water mains failure using a Bayesian belief network model

[...]

Golam Kabir¹, Solomon Tesfamariam¹, Alex Francisque¹, Rehan Sadiq¹•Institutions (1)

University of British Columbia¹

01 Jan 2015-European Journal of Operational Research

TL;DR: A Bayesian Belief Network model is presented to evaluate the risk of failure of metallic water mains using structural integrity, hydraulic capacity, water quality, and consequence factors to justify proper decision action for maintenance/rehabilitation/replacement (M/R/R).

...read moreread less

Proceedings Article•DOI•

Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation

[...]

Xiaochuan Fan¹, Kang Zheng¹, Yuewei Lin¹, Song Wang¹•Institutions (1)

University of South Carolina¹

07 Jun 2015

TL;DR: Zhang et al. as discussed by the authors proposed a new learning-based method for estimating 2D human pose from a single image, using Dual-Source Deep Convolutional Neural Networks (DS-CNN).

...read moreread less

Abstract: We propose a new learning-based method for estimating 2D human pose from a single image, using Dual-Source Deep Convolutional Neural Networks (DS-CNN). Recently, many methods have been developed to estimate human pose by using pose priors that are estimated from physiologically inspired graphical models or learned from a holistic perspective. In this paper, we propose to integrate both the local (body) part appearance and the holistic view of each local part for more accurate human pose estimation. Specifically, the proposed DS-CNN takes a set of image patches (category-independent object proposals for training and multi-scale sliding windows for testing) as the input and then learns the appearance of each local part by considering their holistic views in the full body. Using DS-CNN, we achieve both joint detection, which determines whether an image patch contains a body joint, and joint localization, which finds the exact location of the joint in the image patch. Finally, we develop an algorithm to combine these joint detection/localization results from all the image patches for estimating the human pose. The experimental results show the effectiveness of the proposed method by comparing to the state-of-the-art human-pose estimation methods based on pose priors that are estimated from physiologically inspired graphical models or learned from a holistic perspective.

...read moreread less

Proceedings Article•DOI•

The stitched puppet: A graphical model of 3D human shape and pose

[...]

Silvia Zuffi¹, Michael J. Black¹•Institutions (1)

Max Planck Society¹

07 Jun 2015

TL;DR: A new 3D model of the human body that is both realistic and part-based, using a form of particle-based max-product belief propagation, that gives SP the realism of recent 3D body models with the computational advantages of part- based models.

...read moreread less

Abstract: We propose a new 3D model of the human body that is both realistic and part-based. The body is represented by a graphical model in which nodes of the graph correspond to body parts that can independently translate and rotate in 3D and deform to represent different body shapes and to capture pose-dependent shape variations. Pairwise potentials define a “stitching cost” for pulling the limbs apart, giving rise to the stitched puppet (SP) model. Unlike existing realistic 3D body models, the distributed representation facilitates inference by allowing the model to more effectively explore the space of poses, much like existing 2D pictorial structures models. We infer pose and body shape using a form of particle-based max-product belief propagation. This gives SP the realism of recent 3D body models with the computational advantages of part-based models. We apply SP to two challenging problems involving estimating human shape and pose from 3D data. The first is the FAUST mesh alignment challenge, where ours is the first method to successfully align all 3D meshes with no pose prior. The second involves estimating pose and shape from crude visual hull representations of complex body movements.

...read moreread less

Journal Article•DOI•

Learning the Structure of Mixed Graphical Models

[...]

Jason D. Lee¹, Trevor Hastie¹•Institutions (1)

Stanford University¹

01 Jan 2015-Journal of Computational and Graphical Statistics

TL;DR: This work presents a new pairwise model for graphical models with both continuous and discrete variables that is amenable to structure learning and involves a novel symmetric use of the group-lasso norm.

...read moreread less

Abstract: We consider the problem of learning the structure of a pairwise graphical model over continuous and discrete variables. We present a new pairwise model for graphical models with both continuous and discrete variables that is amenable to structure learning. In previous work, authors have considered structure learning of Gaussian graphical models and structure learning of discrete models. Our approach is a natural generalization of these two lines of work to the mixed case. The penalization scheme involves a novel symmetric use of the group-lasso norm and follows naturally from a particular parameterization of the model. Supplementary materials for this article are available online.

...read moreread less

Posted Content•

An End-to-End Neural Network for Polyphonic Piano Music Transcription

[...]

Siddharth Sigtia¹, Emmanouil Benetos¹, Simon Dixon¹•Institutions (1)

Queen Mary University of London¹

07 Aug 2015-arXiv: Machine Learning

TL;DR: An efficient variant of beam search is presented that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications.

...read moreread less

Abstract: We present a supervised neural network model for polyphonic piano music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an acoustic model and a music language model. The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models the correlations between pitch combinations over time. The proposed model is general and can be used to transcribe polyphonic music without imposing any constraints on the polyphony. The acoustic and language model predictions are combined using a probabilistic graphical model. Inference over the output variables is performed using the beam search algorithm. We perform two sets of experiments. We investigate various neural network architectures for the acoustic models and also investigate the effect of combining acoustic and music language model predictions using the proposed architecture. We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models. Results show that convolutional neural network acoustic models yields the best performance across all evaluation metrics. We also observe improved performance with the application of the music language models. Finally, we present an efficient variant of beam search that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications.

...read moreread less

Posted Content•

Structure Inference Machines: Recurrent Neural Networks for Analyzing Relations in Group Activity Recognition

[...]

Zhiwei Deng¹, Arash Vahdat¹, Hexiang Hu¹, Greg Mori¹•Institutions (1)

Simon Fraser University¹

13 Nov 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This article proposed a method to integrate graphical models and deep neural networks into a joint framework for group activity recognition, where the appropriate structure for inference can be learned by imposing gates on edges between nodes.

...read moreread less

Abstract: Rich semantic relations are important in a variety of visual recognition problems. As a concrete example, group activity recognition involves the interactions and relative spatial relations of a set of people in a scene. State of the art recognition methods center on deep learning approaches for training highly effective, complex classifiers for interpreting images. However, bridging the relatively low-level concepts output by these methods to interpret higher-level compositional scenes remains a challenge. Graphical models are a standard tool for this task. In this paper, we propose a method to integrate graphical models and deep neural networks into a joint framework. Instead of using a traditional inference method, we use a sequential inference modeled by a recurrent neural network. Beyond this, the appropriate structure for inference can be learned by imposing gates on edges between nodes. Empirical results on group activity recognition demonstrate the potential of this model to handle highly structured learning tasks.

...read moreread less

Journal Article•DOI•

Directed Information Graphs

[...]

Christopher J. Quinn¹, Negar Kiyavash¹, Todd P. Coleman²•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of California, San Diego²

22 Sep 2015-IEEE Transactions on Information Theory

TL;DR: In this article, the authors propose a graphical model for representing networks of stochastic processes, the minimal generative model graph, which is based on reduced factorizations of the joint distribution over time.

...read moreread less

Abstract: We propose a graphical model for representing networks of stochastic processes, the minimal generative model graph. It is based on reduced factorizations of the joint distribution over time. We show that under appropriate conditions, it is unique and consistent with another type of graphical model, the directed information graph, which is based on a generalization of Granger causality. We demonstrate how directed information quantifies Granger causality in a particular sequential prediction setting. We also develop efficient methods to estimate the topological structure from data that obviate estimating the joint statistics. One algorithm assumes upper bounds on the degrees and uses the minimal dimension statistics necessary. In the event that the upper bounds are not valid, the resulting graph is nonetheless an optimal approximation in terms of Kullback-Leibler (KL) divergence. Another algorithm uses near-minimal dimension statistics when no bounds are known, but the distribution satisfies a certain criterion. Analogous to how structure learning algorithms for undirected graphical models use mutual information estimates, these algorithms use directed information estimates. We characterize the sample-complexity of two plug-in directed information estimators and obtain confidence intervals. For the setting when point estimates are unreliable, we propose an algorithm that uses confidence intervals to identify the best approximation that is robust to estimation error. Last, we demonstrate the effectiveness of the proposed algorithms through the analysis of both synthetic data and real data from the Twitter network. In the latter case, we identify which news sources influence users in the network by merely analyzing tweet times.

...read moreread less

Posted Content•

Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation

[...]

Xiaochuan Fan¹, Kang Zheng¹, Yuewei Lin¹, Song Wang¹•Institutions (1)

University of South Carolina¹

27 Apr 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: The experimental results show the effectiveness of the proposed DS-CNN method by comparing to the state-of-the-art human-pose estimation methods based on pose priors that are estimated from physiologically inspired graphical models or learned from a holistic perspective.

...read moreread less

Journal Article•DOI•

A Bayesian network approach for population synthesis

[...]

Lijun Sun¹, Alexander Erath•Institutions (1)

Massachusetts Institute of Technology¹

01 Dec 2015-Transportation Research Part C-emerging Technologies

TL;DR: This paper introduces a new alternative for population synthesis based on Bayesian networks, and demonstrates and assesses the performance of this approach in generating synthetic population for Singapore, by using the Household Interview Travel Survey data as the known test population.

...read moreread less

Abstract: Agent-based micro-simulation models require a complete list of agents with detailed demographic/socioeconomic information for the purpose of behavior modeling and simulation. This paper introduces a new alternative for population synthesis based on Bayesian networks. A Bayesian network is a graphical representation of a joint probability distribution, encoding probabilistic relationships among a set of variables in an efficient way. Similar to the previously developed probabilistic approach, in this paper, we consider the population synthesis problem to be the inference of a joint probability distribution. In this sense, the Bayesian network model becomes an efficient tool that allows us to compactly represent/reproduce the structure of the population system and preserve privacy and confidentiality in the meanwhile. We demonstrate and assess the performance of this approach in generating synthetic population for Singapore, by using the Household Interview Travel Survey (HITS) data as the known test population. Our results show that the introduced Bayesian network approach is powerful in characterizing the underlying joint distribution, and meanwhile the overfitting of data can be avoided as much as possible.

...read moreread less

Journal Article•DOI•

Bayesian Structure Learning in Sparse Gaussian Graphical Models

[...]

Abdolreza Mohammadi, Ernst Wit

01 Oct 2015-Bayesian Analysis

TL;DR: A novel and efficient Bayesian framework for Gaussian graphical model determination which is a trans-dimensional Markov Chain Monte Carlo (MCMC) approach based on a continuous-time birth-death process and gives a principled and, in practice, sensible approach for structure learning.

...read moreread less

Abstract: Decoding complex relationships among large numbers of variables with relatively few observations is one of the crucial issues in science. One approach to this problem is Gaussian graphical modeling, which describes conditional independence of variables through the presence or absence of edges in the underly- ing graph. In this paper, we introduce a novel and efficient Bayesian framework for Gaussian graphical model determination which is a trans-dimensional Markov Chain Monte Carlo (MCMC) approach based on a continuous-time birth-death process. We cover the theory and computational details of the method. It is easy to implement and computationally feasible for high-dimensional graphs. We show our method outperforms alternative Bayesian approaches in terms of convergence, mixing in the graph space and computing time. Unlike frequentist approaches, it gives a principled and, in practice, sensible approach for structure learning. We illustrate the efficiency of the method on a broad range of simulated data. We then apply the method on large-scale real applications from human and mammary gland gene expression studies to show its empirical usefulness. In addition, we implemented the method in the R package BDgraph which is freely available at http://CRAN.R-project.org/package=BDgraph

...read moreread less

Journal Article•DOI•

Selection and estimation for mixed graphical models.

[...]

Shizhe Chen¹, Daniela Witten¹, Ali Shojaie¹•Institutions (1)

University of Washington¹

01 Mar 2015-Biometrika

TL;DR: In this paper, the problem of estimating the parameters in a pairwise graphical model in which the distribution of each node, conditioned on the others, may have a different exponential family form is considered.

...read moreread less

Abstract: We consider the problem of estimating the parameters in a pairwise graphical model in which the distribution of each node, conditioned on the others, may have a different exponential family form. We identify restrictions on the parameter space required for the existence of a well-defined joint density, and establish the consistency of the neighbourhood selection approach for graph reconstruction in high dimensions when the true underlying graph is sparse. Motivated by our theoretical results, we investigate the selection of edges between nodes whose conditional distributions take different parametric forms, and show that efficiency can be gained if edge estimates obtained from the regressions of particular nodes are used to reconstruct the graph. These results are illustrated with examples of Gaussian, Bernoulli, Poisson and exponential distributions. Our theoretical findings are corroborated by evidence from simulation studies.

...read moreread less

Posted Content•

Probabilistic Bag-Of-Hyperlinks Model for Entity Linking

[...]

Octavian-Eugen Ganea¹, Marina Ganea¹, Aurelien Lucchi¹, Carsten Eickhoff¹, Thomas Hofmann¹ - Show less +1 more•Institutions (1)

ETH Zurich¹

08 Sep 2015-arXiv: Computation and Language

TL;DR: This paper proposed a probabilistic approach that makes use of an effective graphical model to perform collective entity disambiguation, which combines a document-level prior of entity co-occurrences with local information captured from mentions and their surrounding context.

...read moreread less

Abstract: Many fundamental problems in natural language processing rely on determining what entities appear in a given text. Commonly referenced as entity linking, this step is a fundamental component of many NLP tasks such as text understanding, automatic summarization, semantic search or machine translation. Name ambiguity, word polysemy, context dependencies and a heavy-tailed distribution of entities contribute to the complexity of this problem. We here propose a probabilistic approach that makes use of an effective graphical model to perform collective entity disambiguation. Input mentions (i.e.,~linkable token spans) are disambiguated jointly across an entire document by combining a document-level prior of entity co-occurrences with local information captured from mentions and their surrounding context. The model is based on simple sufficient statistics extracted from data, thus relying on few parameters to be learned. Our method does not require extensive feature engineering, nor an expensive training procedure. We use loopy belief propagation to perform approximate inference. The low complexity of our model makes this step sufficiently fast for real-time usage. We demonstrate the accuracy of our approach on a wide range of benchmark datasets, showing that it matches, and in many cases outperforms, existing state-of-the-art methods.

...read moreread less

Proceedings Article•DOI•

pgmpy: Probabilistic Graphical Models using Python

[...]

Ankur Ankan, Abinash Panda

01 Jan 2015

TL;DR: A short introduction to PGMs and various other python packages available for working with PGMs is given and about creating and doing inference over Bayesian Networks and Markov Networks using pgmpy is discussed.

...read moreread less

Abstract: Probabilistic Graphical Models (PGM) is a technique of compactly representing a joint distribution by exploiting dependencies between the random variables. It also allows us to do inference on joint distributions in a computationally cheaper way than the traditional methods. PGMs are widely used in the field of speech recognition, information extraction, image segmentation, modelling gene regulatory networks. pgmpy [pgmpy] is a python library for working with graphical models. It allows the user to create their own graphical models and answer inference or map queries over them. pgmpy has implementation of many inference algorithms like VariableElimination, Belief Propagation etc. This paper first gives a short introduction to PGMs and various other python packages available for working with PGMs. Then we discuss about creating and doing inference over Bayesian Networks and Markov Networks using pgmpy.

...read moreread less

Journal Article•DOI•

High-dimensional Ising model selection with Bayesian information criteria

[...]

Rina Foygel Barber¹, Mathias Drton²•Institutions (2)

University of Chicago¹, University of Washington²

01 Jan 2015-Electronic Journal of Statistics

TL;DR: In this paper, the use of Bayesian information criteria for selection of the graph underlying an Ising model is considered, where the full conditional distributions of each variable form logistic regression models, and variable selection techniques for regression allow one to identify the neighborhood of each node and thus the entire graph.

...read moreread less

Abstract: We consider the use of Bayesian information criteria for selection of the graph underlying an Ising model. In an Ising model, the full conditional distributions of each variable form logistic regression models, and variable selection techniques for regression allow one to identify the neighborhood of each node and, thus, the entire graph. We prove high-dimensional consistency results for this pseudo-likelihood approach to graph selection when using Bayesian information criteria for the variable selection problems in the logistic regressions. The results pertain to scenarios of sparsity, and following related prior work the information criteria we consider incorporate an explicit prior that encourages sparsity.

...read moreread less

Proceedings Article•DOI•

Collective Spammer Detection in Evolving Multi-Relational Social Networks

[...]

Shobeir Fakhraei¹, James R. Foulds², Madhusudana Shashanka, Lise Getoor²•Institutions (2)

University of Maryland, College Park¹, University of California, Santa Cruz²

10 Aug 2015

TL;DR: To identify spammer accounts, the approach makes use of structural features, sequence modelling, and collective reasoning, and a statistical relational model using hinge-loss Markov random fields (HL-MRFs), a class of probabilistic graphical models which are highly scalable.

...read moreread less

Abstract: Detecting unsolicited content and the spammers who create it is a long-standing challenge that affects all of us on a daily basis. The recent growth of richly-structured social networks has provided new challenges and opportunities in the spam detection landscape. Motivated by the Tagged.com social network, we develop methods to identify spammers in evolving multi-relational social networks. We model a social network as a time-stamped multi-relational graph where vertices represent users, and edges represent different activities between them. To identify spammer accounts, our approach makes use of structural features, sequence modelling, and collective reasoning. We leverage relational sequence information using k-gram features and probabilistic modelling with a mixture of Markov models. Furthermore, in order to perform collective reasoning and improve the predictive power of a noisy abuse reporting system, we develop a statistical relational model using hinge-loss Markov random fields (HL-MRFs), a class of probabilistic graphical models which are highly scalable. We use Graphlab Create and Probabilistic Soft Logic (PSL) to prototype and experimentally evaluate our solutions on internet-scale data from Tagged.com. Our experiments demonstrate the effectiveness of our approach, and show that models which incorporate the multi-relational nature of the social network significantly gain predictive performance over those that do not.

...read moreread less

Book Chapter•DOI•

Simplifying, Regularizing and Strengthening Sum-Product Network Structure Learning

[...]

Antonio Vergari¹, Nicola Di Mauro¹, Floriana Esposito¹•Institutions (1)

University of Bari¹

07 Sep 2015

TL;DR: This work enhances one of the best structure learner, LearnSPN, aiming to improve both the structural quality of the learned networks and their achieved likelihoods, and proves its claims by empirically evaluating the learned SPNs on several benchmark datasets against other competitive SPN and PGM structure learners.

...read moreread less

Abstract: The need for feasible inference in Probabilistic Graphical Models PGMs has lead to tractable models like Sum-Product Networks SPNs. Their highly expressive power and their ability to provide exact and tractable inference make them very attractive for several real world applications, from computer vision to NLP. Recently, great attention around SPNs has focused on structure learning, leading to different algorithms being able to learn both the network and its parameters from data. Here, we enhance one of the best structure learner, LearnSPN, aiming to improve both the structural quality of the learned networks and their achieved likelihoods. Our algorithmic variations are able to learn simpler, deeper and more robust networks. These results have been obtained by exploiting some insights in the building process done by LearnSPN, by hybridizing the network adopting tree-structured models as leaves, and by blending bagging estimations into mixture creation. We prove our claims by empirically evaluating the learned SPNs on several benchmark datasets against other competitive SPN and PGM structure learners.

...read moreread less

Collapse