Fixed-form variational posterior approximation through stochastic linear regression

doi:10.1214/13-BA858

Home
/
Papers
/
Fixed-form variational posterior approximation through stochastic linear regression

Journal Article•DOI•

Fixed-form variational posterior approximation through stochastic linear regression

01 Dec 2013-Bayesian Analysis (International Society for Bayesian Analysis)-Vol. 8, Iss: 4, pp 837-882

TL;DR: A general algorithm for approximating nonstandard Bayesian posterior distributions that minimizes the Kullback-Leibler divergence of an approximating distribution to the intractable posterior distribu- tion.

read less

Abstract: textWe propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the Kullback-Leibler divergence of an approximating distribution to the intractable posterior distribu- tion. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approxi- mation can be any distribution in the exponential family or any mixture of such distributions, which means that it can be made arbitrarily precise. Several exam- ples illustrate the speed and accuracy of our approximation method in practice.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•

Auto-Encoding Variational Bayes

[...]

Diederik P. Kingma¹, Max Welling¹•Institutions (1)

University of Amsterdam¹

01 Jan 2014

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

...read moreread less

Abstract: How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

...read moreread less

20,769 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Journal Article•DOI•

An Introduction to Variational Autoencoders.

[...]

Diederik P. Kingma¹, Max Welling²•Institutions (2)

Google¹, Qualcomm²

06 Jun 2019-arXiv: Learning

TL;DR: This work provides an introduction to variational autoencoders and some important extensions, which provide a principled framework for learning deep latent-variable models and corresponding inference models.

...read moreread less

Abstract: Variational autoencoders provide a principled framework for learning deep latent-variable models and corresponding inference models. In this work, we provide an introduction to variational autoencoders and some important extensions.

...read moreread less

1,089 citations

Cites methods from "Fixed-form variational posterior ap..."

...In (Salimans and Knowles, 2013), a similar reparameterization as in this work was used in an efficient version of a stochastic variational inference algorithm for learning the natural parameters of exponential-family approximating distributions....
[...]

Journal Article•

Riemann manifold Langevin and Hamiltonian Monte Carlo methods

[...]

Mark Girolami, Ben Calderhead

01 Jan 2011-Journal of the Royal Statistical Society

TL;DR: The methodology proposed automatically adapts to the local structure when simulating paths across this manifold, providing highly efficient convergence and exploration of the target density, and substantial improvements in the time‐normalized effective sample size are reported when compared with alternative sampling approaches.

...read moreread less

Abstract: The paper proposes Metropolis adjusted Langevin and Hamiltonian Monte Carlo sampling methods defined on the Riemann manifold to resolve the shortcomings of existing Monte Carlo algorithms when sampling from target densities that may be high dimensional and exhibit strong correlations. The methods provide fully automated adaptation mechanisms that circumvent the costly pilot runs that are required to tune proposal densities for Metropolis-Hastings or indeed Hamiltonian Monte Carlo and Metropolis adjusted Langevin algorithms. This allows for highly efficient sampling even in very high dimensions where different scalings may be required for the transient and stationary phases of the Markov chain. The methodology proposed exploits the Riemann geometry of the parameter space of statistical models and thus automatically adapts to the local structure when simulating paths across this manifold, providing highly efficient convergence and exploration of the target density. The performance of these Riemann manifold Monte Carlo methods is rigorously assessed by performing inference on logistic regression models, log-Gaussian Cox point processes, stochastic volatility models and Bayesian estimation of dynamic systems described by non-linear differential equations. Substantial improvements in the time-normalized effective sample size are reported when compared with alternative sampling approaches. MATLAB code that is available from http://www.ucl.ac.uk/statistics/research/rmhmc allows replication of all the results reported.

...read moreread less

1,031 citations

Posted Content•

Black Box Variational Inference

[...]

Rajesh Ranganath¹, Sean Gerrish¹, David M. Blei¹•Institutions (1)

Princeton University¹

31 Dec 2013-arXiv: Machine Learning

TL;DR: The authors proposed a black box variational inference algorithm based on a stochastic optimization of the variational objective, where the noisy gradient is computed from Monte Carlo samples from the Variational distribution, which can be applied to many models with little additional derivation.

...read moreread less

Abstract: Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant model-specific analysis, and these efforts can hinder and deter us from quickly developing and exploring a variety of models for a problem at hand. In this paper, we present a "black box" variational inference algorithm, one that can be quickly applied to many models with little additional derivation. Our method is based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the variational distribution. We develop a number of methods to reduce the variance of the gradient, always maintaining the criterion that we want to avoid difficult model-based derivations. We evaluate our method against the corresponding black box sampling based methods. We find that our method reaches better predictive likelihoods much faster than sampling methods. Finally, we demonstrate that Black Box Variational Inference lets us easily explore a wide space of models by quickly constructing and evaluating several models of longitudinal healthcare data.

...read moreread less

582 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

Collapse

References

PDF

Open Access

More filters

Book•

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

17 Aug 2006

TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.

...read moreread less

22,840 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

...read moreread less

10,141 citations

Journal Article•DOI•

A Stochastic Approximation Method

[...]

Herbert Robbins¹, Sutton Monro¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Sep 1951-Annals of Mathematical Statistics

TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.

...read moreread less

Abstract: Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown to the experimenter, and it is desired to find the solution x = θ of the equation M(x) = α, where a is a given constant. We give a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability.

...read moreread less

9,312 citations

"Fixed-form variational posterior ap..." refers background or methods in this paper

...f t>N=2 then Set g= g+ ^g t Set C = C + C^ t end if end for return ^ = C 1g Algorithm 1 is inspired by a long line of research on stochastic approximation, starting with the seminal work of Robbins and Monro (1951). Up to rst order it can be considered a relatively standard stochastic gradient descent algorithm. At each iteration, we have t = C 1g t, where we use the subscript tto indicate the values of , Ca...
[...]
...proximation is given in Appendix D. Contrary to most applications in the literature, Algorithm 1 uses a xed step size w= 1= p N rather than a declining one in updating our statistics. The analyses of Robbins and Monro (1951) and Amari (1997) show that a sequence of learning rates w t = ct 1 is asymptotically ecient in stochastic gradient descent as the number of iterations Ngoes to innity, but this conclusion rests on ...
[...]

Book Chapter•DOI•

Large-Scale Machine Learning with Stochastic Gradient Descent

[...]

Léon Bottou

01 Jan 2010

TL;DR: A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems.

...read moreread less

Abstract: During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for large-scale problems. In particular, second order stochastic gradient and averaged stochastic gradient are asymptotically efficient after a single pass on the training set.

...read moreread less

5,561 citations

Additional excerpts

...Bottou (2010)....
[...]

Book•

Graphical Models, Exponential Families, and Variational Inference

[...]

Martin J. Wainwright¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

16 Dec 2008

TL;DR: The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.

...read moreread less

Abstract: The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building large-scale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fields, including bioinformatics, communication theory, statistical physics, combinatorial optimization, signal and image processing, information retrieval and statistical machine learning. Many problems that arise in specific instances — including the key problems of computing marginals and modes of probability distributions — are best studied in the general setting. Working with exponential family representations, and exploiting the conjugate duality between the cumulant function and the entropy for exponential families, we develop general variational representations of the problems of computing likelihoods, marginal probabilities and most probable configurations. We describe how a wide variety of algorithms — among them sum-product, cluster variational methods, expectation-propagation, mean field methods, max-product and linear programming relaxation, as well as conic programming relaxations — can all be understood in terms of exact or approximate forms of these variational representations. The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.

...read moreread less

4,335 citations

"Fixed-form variational posterior ap..." refers background or methods in this paper

...…distribution qη(x) of more convenient form to the intractable target distribution (Attias, 2000; Beal and Ghahramani, 2006; Jordan et al., 1999; Wainwright and Jordan, 2008). q̂ = arg min q(x) D[q|p] = arg min q(x) Eq(x) [ log q(x) p(x, y) ] , (1) where p(x, y) denotes the unnormalized…...
[...]
...A popular method of obtaining such an approximation is structured or fixedform Variational Bayes, which works by numerically minimizing the Kullback-Leibler divergence of a parameterized approximating distribution qη(x) of more convenient form to the intractable target distribution (Attias, 2000; Beal and Ghahramani, 2006; Jordan et al., 1999; Wainwright and Jordan, 2008)....
[...]