Showing papers by "Yee Whye Teh published in 2013"

PDF

Open Access

Proceedings Article•

Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex

[...]

Sam Patterson¹, Yee Whye Teh²•Institutions (2)

University College London¹, University of Oxford²

05 Dec 2013

TL;DR: A new method, Stochastic gradient Riemannian Langevin dynamics, which is simple to implement and can be applied to large scale data is proposed and achieves substantial performance improvements over the state of the art online variational Bayesian methods.

...read moreread less

Abstract: In this paper we investigate the use of Langevin Monte Carlo methods on the probability simplex and propose a new method, Stochastic gradient Riemannian Langevin dynamics, which is simple to implement and can be applied to large scale data. We apply this method to latent Dirichlet allocation in an online mini-batch setting, and demonstrate that it achieves substantial performance improvements over the state of the art online variational Bayesian methods.

...read moreread less

245 citations

Journal Article•

Fast MCMC sampling for Markov jump processes and extensions

[...]

Vinayak Rao¹, Yee Whye Teh²•Institutions (2)

Duke University¹, University of Oxford²

01 Jan 2013-Journal of Machine Learning Research

TL;DR: This paper's approach is an auxiliary variable Gibbs sampler, and is based on the idea of uniformization, that extends naturally to MJP-based models like Markov-modulated Poisson processes and continuous-time Bayesian networks.

...read moreread less

Abstract: Markov jump processes (or continuous-time Markov chains) are a simple and important class of continuous-time dynamical systems. In this paper, we tackle the problem of simulating from the posterior distribution over paths in these models, given partial and noisy observations. Our approach is an auxiliary variable Gibbs sampler, and is based on the idea of uniformization. This sets up a Markov chain over paths by alternately sampling a finite set of virtual jump times given the current path, and then sampling a new path given the set of extant and virtual jump times. The first step involves simulating a piecewise-constant inhomogeneous Poisson process, while for the second, we use a standard hidden Markov model forward filtering-backward sampling algorithm. Our method is exact and does not involve approximations like time-discretization. We demonstrate how our sampler extends naturally to MJP-based models like Markov-modulated Poisson processes and continuous-time Bayesian networks, and show significant computational benefits over state-of-the-art MCMC samplers for these models.

...read moreread less

117 citations

Journal Article•DOI•

MCMC for Normalized Random Measure Mixture Models

[...]

Stefano Favaro¹, Yee Whye Teh²•Institutions (2)

University of Turin¹, University of Oxford²

02 Oct 2013-arXiv: Methodology

TL;DR: This paper proposes novel Markov chain Monte Carlo methods for posterior sampling in Bayesian nonparametric mixture models with normalized random measure priors, and describes comparative simulation results demonstrating the efficacies of the proposed methods.

...read moreread less

Abstract: This paper concerns the use of Markov chain Monte Carlo methods for posterior sampling in Bayesian nonparametric mixture models with normalized random measure priors. Making use of some recent posterior characterizations for the class of normalized random measures, we propose novel Markov chain Monte Carlo methods of both marginal type and conditional type. The proposed marginal samplers are generalizations of Neal's well-regarded Algorithm 8 for Dirichlet process mixture models, whereas the conditional sampler is a variation of those recently introduced in the literature. For both the marginal and conditional methods, we consider as a running example a mixture model with an underlying normalized generalized Gamma process prior, and describe comparative simulation results demonstrating the efficacies of the proposed methods.

...read moreread less

98 citations

Journal Article•DOI•

MCMC for Normalized Random Measure Mixture Models

[...]

Stefano Favaro¹, Yee Whye Teh²•Institutions (2)

University of Turin¹, University of Oxford²

01 Aug 2013-Statistical Science

TL;DR: In this paper, the use of Markov chain Monte Carlo (MCMC) methods for posterior sampling in Bayesian nonparametric mixture models with normalized random measure priors is discussed.

...read moreread less

Abstract: This paper concerns the use of Markov chain Monte Carlo methods for posterior sampling in Bayesian nonparametric mixture models with normalized random measure priors. Making use of some recent posterior characterizations for the class of normalized random measures, we propose novel Markov chain Monte Carlo methods of both marginal type and conditional type. The proposed marginal samplers are generalizations of Neal’s well-regarded Algorithm 8 for Dirichlet process mixture models, whereas the conditional sampler is a variation of those recently introduced in the literature. For both the marginal and conditional methods, we consider as a running example a mixture model with an underlying normalized generalized Gamma process prior, and describe comparative simulation results demonstrating the efficacies of the proposed methods.

...read moreread less

81 citations

Proceedings Article•DOI•

Dependent Normalized Random Measures

[...]

Changyou Chen¹, Vinayak Rao², Wray Buntine¹, Yee Whye Teh³•Institutions (3)

Australian National University¹, Duke University², University of Oxford³

16 Jun 2013

TL;DR: It is shown that both MNRM and TNRM are marginally normalized random measures, resulting in well understood theoretical properties and in time-varying topic modeling experiments, both models exhibit superior performance over related dependent models.

...read moreread less

Abstract: In this paper we propose two constructions of dependent normalized random measures, a class of nonparametric priors over dependent probability measures. Our constructions, which we call mixed normalized random measures (MNRM) and thinned normalized random measures (TNRM), involve (respectively) weighting and thinning parts of a shared underlying Poisson process before combining them together. We show that both MNRM and TNRM are marginally normalized random measures, resulting in well understood theoretical properties. We develop marginal and slice samplers for both models, the latter necessary for inference in TNRM. In time-varying topic modeling experiments, both models exhibit superior performance over related dependent models such as the hierarchical Dirichlet process and the spatial normalized Gamma process.

...read moreread less

40 citations

Proceedings Article•

Top-down particle filtering for Bayesian decision trees

[...]

Balaji Lakshminarayanan¹, Daniel M. Roy², Yee Whye Teh³•Institutions (3)

University College London¹, University of Cambridge², University of Oxford³

16 Jun 2013

TL;DR: This work presents a sequential Monte Carlo (SMC) algorithm that works in a top-down manner, mimicking the behavior and speed of classic algorithms, and demonstrates empirically that this approach delivers accuracy comparable to the most popular MCMC method, but operates more than an order of magnitude faster, and thus represents a better computation-accuracy tradeoff.

...read moreread less

Abstract: Decision tree learning is a popular approach for classification and regression in machine learning and statistics, and Bayesian formulations--which introduce a prior distribution over decision trees, and formulate learning as posterior inference given data-- have been shown to produce competitive performance. Unlike classic decision tree learning algorithms like ID3, C4.5 and CART, which work in a top-down manner, existing Bayesian algorithms produce an approximation to the posterior distribution by evolving a complete tree (or collection thereof) iteratively via local Monte Carlo modifications to the structure of the tree, e.g., using Markov chain Monte Carlo (MCMC). We present a sequential Monte Carlo (SMC) algorithm that instead works in a top-down manner, mimicking the behavior and speed of classic algorithms. We demonstrate empirically that our approach delivers accuracy comparable to the most popular MCMC method, but operates more than an order of magnitude faster, and thus represents a better computation-accuracy tradeoff.

...read moreread less

28 citations

Proceedings Article•

Bayesian Hierarchical Community Discovery

[...]

Charles Blundell¹, Yee Whye Teh²•Institutions (2)

University College London¹, University of Oxford²

05 Dec 2013

TL;DR: This work describes a family of greedy agglomerative model selection algorithms that take just one pass through the data to learn a fully probabilistic, hierarchical community model.

...read moreread less

Abstract: We propose an efficient Bayesian nonparametric model for discovering hierarchical community structure in social networks. Our model is a tree-structured mixture of potentially exponentially many stochastic blockmodels. We describe a family of greedy agglomerative model selection algorithms that take just one pass through the data to learn a fully probabilistic, hierarchical community model. In the worst case, Our algorithms scale quadratically in the number of vertices of the network, but independent of the number of nested communities. In practice, the run time of our algorithms are two orders of magnitude faster than the Infinite Relational Model, achieving comparable or better accuracy.

...read moreread less

24 citations

Posted Content•

Inferring ground truth from multi-annotator ordinal data: a probabilistic approach

[...]

Balaji Lakshminarayanan, Yee Whye Teh

30 Apr 2013-arXiv: Machine Learning

TL;DR: A new model for crowdsourced ordinal data is proposed that accounts for instance difficulty as well as annotator expertise, and a variational Bayesian inference algorithm for parameter estimation is derived.

...read moreread less

Abstract: A popular approach for large scale data annotation tasks is crowdsourcing, wherein each data point is labeled by multiple noisy annotators. We consider the problem of inferring ground truth from noisy ordinal labels obtained from multiple annotators of varying and unknown expertise levels. Annotation models for ordinal data have been proposed mostly as extensions of their binary/categorical counterparts and have received little attention in the crowdsourcing literature. We propose a new model for crowdsourced ordinal data that accounts for instance difficulty as well as annotator expertise, and derive a variational Bayesian inference algorithm for parameter estimation. We analyze the ordinal extensions of several state-of-the-art annotator models for binary/categorical labels and evaluate the performance of all the models on two real world datasets containing ordinal query-URL relevance scores, collected through Amazon's Mechanical Turk. Our results indicate that the proposed model performs better or as well as existing state-of-the-art methods and is more resistant to `spammy' annotators (i.e., annotators who assign labels randomly without actually looking at the instance) than popular baselines such as mean, median, and majority vote which do not account for annotator expertise.

...read moreread less

21 citations

Proceedings Article•

Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space

[...]

Xinhua Zhang, Wee Sun Lee¹, Yee Whye Teh²•Institutions (2)

National University of Singapore¹, University of Oxford²

05 Dec 2013

TL;DR: In this article, the authors propose a framework for learning in reproducing kernel Hilbert spaces (RKHS) using local invariances that explicitly characterize the behavior of the target function around data instances.

...read moreread less

Abstract: Incorporating invariance information is important for many learning problems. To exploit invariances, most existing methods resort to approximations that either lead to expensive optimization problems such as semi-definite programming, or rely on separation oracles to retain tractability. Some methods further limit the space of functions and settle for non-convex models. In this paper, we propose a framework for learning in reproducing kernel Hilbert spaces (RKHS) using local invariances that explicitly characterize the behavior of the target function around data instances. These invariances are compactly encoded as linear functionals whose value are penalized by some loss function. Based on a representer theorem that we establish, our formulation can be efficiently optimized via a convex program. For the representer theorem to hold, the linear functionals are required to be bounded in the RKHS, and we show that this is true for a variety of commonly used RKHS and invariances. Experiments on learning with unlabeled data and transform invariances show that the proposed method yields better or similar results compared with the state of the art.

...read moreread less

5 citations

Posted Content•

Belief Optimization for Binary Networks: A Stable Alternative to Loopy Belief Propagation

[...]

Max Welling¹, Yee Whye Teh²•Institutions (2)

University College London¹, University of Toronto²

10 Jan 2013-arXiv: Artificial Intelligence

TL;DR: In this article, the authors present an inference algorithm for arbitrary, binary, undirected graphs that directly descend on the Bethe free energy, which is guaranteed to converge to a local minimum.

...read moreread less

Abstract: We present a novel inference algorithm for arbitrary, binary, undirected graphs. Unlike loopy belief propagation, which iterates fixed point equations, we directly descend on the Bethe free energy. The algorithm consists of two phases, first we update the pairwise probabilities, given the marginal probabilities at each unit,using an analytic expression. Next, we update the marginal probabilities, given the pairwise probabilities by following the negative gradient of the Bethe free energy. Both steps are guaranteed to decrease the Bethe free energy, and since it is lower bounded, the algorithm is guaranteed to converge to a local minimum. We also show that the Bethe free energy is equal to the TAP free energy up to second order in the weights. In experiments we confirm that when belief propagation converges it usually finds identical solutions as our belief optimization method. However, in cases where belief propagation fails to converge, belief optimization continues to converge to reasonable beliefs. The stable nature of belief optimization makes it ideally suited for learning graphical models from data.

...read moreread less

5 citations

Posted Content•

Top-down particle filtering for Bayesian decision trees

[...]

Balaji Lakshminarayanan¹, Daniel M. Roy², Yee Whye Teh³•Institutions (3)

University College London¹, University of Cambridge², University of Oxford³

03 Mar 2013-arXiv: Machine Learning

TL;DR: In this article, a sequential Monte Carlo (SMC) algorithm is proposed for decision tree learning, which works in a top-down manner, mimicking the behavior and speed of classic algorithms.

...read moreread less

Abstract: Decision tree learning is a popular approach for classification and regression in machine learning and statistics, and Bayesian formulations---which introduce a prior distribution over decision trees, and formulate learning as posterior inference given data---have been shown to produce competitive performance. Unlike classic decision tree learning algorithms like ID3, C4.5 and CART, which work in a top-down manner, existing Bayesian algorithms produce an approximation to the posterior distribution by evolving a complete tree (or collection thereof) iteratively via local Monte Carlo modifications to the structure of the tree, e.g., using Markov chain Monte Carlo (MCMC). We present a sequential Monte Carlo (SMC) algorithm that instead works in a top-down manner, mimicking the behavior and speed of classic algorithms. We demonstrate empirically that our approach delivers accuracy comparable to the most popular MCMC method, but operates more than an order of magnitude faster, and thus represents a better computation-accuracy tradeoff.

...read moreread less

Constructing dependent random probability measures from completely random measures

[...]

Changyou Chen, Vinayak Rao, Wray Buntine, Yee Whye Teh

01 Jan 2013

Posted Content•

Discovering Multiple Constraints that are Frequently Approximately Satisfied

[...]

Geoffrey E. Hinton¹, Yee Whye Teh²•Institutions (2)

University College London¹, University of Toronto²

10 Jan 2013-arXiv: Learning

TL;DR: In this paper, three methods of learning products of linear constraints using a heavy-tailed probability distribution for the violations are described, each of which is Frequently Approximately Satisfied (FAS) by the data.

...read moreread less

Abstract: Some high-dimensional data.sets can be modelled by assuming that there are many different linear constraints, each of which is Frequently Approximately Satisfied (FAS) by the data. The probability of a data vector under the model is then proportional to the product of the probabilities of its constraint violations. We describe three methods of learning products of constraints using a heavy-tailed probability distribution for the violations.

...read moreread less