MCMC using Hamiltonian dynamics

doi:10.1201/B10905

Home
/
Papers
/
MCMC using Hamiltonian dynamics

Book•DOI•

MCMC using Hamiltonian dynamics

09 Jun 2012-arXiv: Computation-

TL;DR: In this paper, the authors discuss theoretical and practical aspects of Hamiltonian Monte Carlo, and present some of its variations, including using windows of states for deciding on acceptance or rejection, computing trajectories using fast approximations, tempering during the course of a trajectory to handle isolated modes, and short-cut methods that prevent useless trajectories from taking much computation time.

read less

Abstract: Hamiltonian dynamics can be used to produce distant proposals for the Metropolis algorithm, thereby avoiding the slow exploration of the state space that results from the diffusive behaviour of simple random-walk proposals. Though originating in physics, Hamiltonian dynamics can be applied to most problems with continuous state spaces by simply introducing fictitious "momentum" variables. A key to its usefulness is that Hamiltonian dynamics preserves volume, and its trajectories can thus be used to define complex mappings without the need to account for a hard-to-compute Jacobian factor - a property that can be exactly maintained even when the dynamics is approximated by discretizing time. In this review, I discuss theoretical and practical aspects of Hamiltonian Monte Carlo, and present some of its variations, including using windows of states for deciding on acceptance or rejection, computing trajectories using fast approximations, tempering during the course of a trajectory to handle isolated modes, and short-cut methods that prevent useless trajectories from taking much computation time.

...read moreread less

Citations

PDF

Open Access

More filters

Book•

Machine Learning : A Probabilistic Perspective

[...]

Kevin P. Murphy

24 Aug 2012

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

8,059 citations

Journal Article•DOI•

brms: An R Package for Bayesian Multilevel Models Using Stan

[...]

Paul-Christian Bürkner

29 Aug 2017-Journal of Statistical Software

TL;DR: The brms package implements Bayesian multilevel models in R using the probabilistic programming language Stan, allowing users to fit linear, robust linear, binomial, Poisson, survival, ordinal, zero-inflated, hurdle, and even non-linear models all in a multileVEL context.

...read moreread less

Abstract: The brms package implements Bayesian multilevel models in R using the probabilistic programming language Stan. A wide range of distributions and link functions are supported, allowing users to fit - among others - linear, robust linear, binomial, Poisson, survival, ordinal, zero-inflated, hurdle, and even non-linear models all in a multilevel context. Further modeling options include autocorrelation of the response variable, user defined covariance structures, censored data, as well as meta-analytic standard errors. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. In addition, model fit can easily be assessed and compared with the Watanabe-Akaike information criterion and leave-one-out cross-validation.

...read moreread less

4,353 citations

Cites background or methods from "MCMC using Hamiltonian dynamics"

...Currently, these are the static Hamiltonian Monte-Carlo (HMC) Sampler sometimes also referred to as Hybrid Monte-Carlo (Neal 2011, 2003; Duane et al. 1987) and its extension the No-U-Turn Sampler (NUTS) by Hoffman and Gelman (2014)....
[...]
...One of the main problems of these algorithms is their rather slow convergence for high-dimensional models with correlated parameters (Neal 2011; Hoffman and Gelman 2014; Gelman, Carlin, Stern, and Rubin 2014)....
[...]
...In contrast, Stan implements Hamiltonian Monte Carlo (Duane, Kennedy, Pendleton, and Roweth 1987; Neal 2011) and its extension, the No-U-Turn Sampler (NUTS) (Hoffman and Gelman 2014)....
[...]

Stan: A Probabilistic Programming Language.

[...]

Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel D. Lee, Ben Goodrich, Michael Betancourt, Marcus A. Brubaker, Jiqiang Guo, Peter Li, Allen Riddell - Show less +6 more

01 Jan 2017

TL;DR: Stan is a probabilistic programming language for specifying statistical models that provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler and an adaptive form of Hamiltonian Monte Carlo sampling.

...read moreread less

Abstract: Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.

...read moreread less

2,938 citations

Additional excerpts

...An additional benefit of RHMC are transitions that can cover much larger variations in density, making it uniquely suited to these models; see (Neal 2011)....
[...]

Proceedings Article•

Bayesian Learning via Stochastic Gradient Langevin Dynamics

[...]

Max Welling¹, Yee Whye Teh•Institutions (1)

University of California, Irvine¹

28 Jun 2011

TL;DR: This paper proposes a new framework for learning from large scale datasets based on iterative learning from small mini-batches by adding the right amount of noise to a standard stochastic gradient optimization algorithm and shows that the iterates will converge to samples from the true posterior distribution as the authors anneal the stepsize.

...read moreread less

Abstract: In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an inbuilt protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which monitors a "sampling threshold" and collects samples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients.

...read moreread less

2,080 citations

Cites methods from "MCMC using Hamiltonian dynamics"

...In this paper we will consider a class of MCMC techniques called Langevin dynamics (Neal, 2010)....
[...]
...More sophisticated techniques use Hamiltonian dynamics with momentum variables to allow parameters to move over larger distances without the inefficient random walk behaviour of Langevin dynamics (Neal, 2010)....
[...]

Journal Article•DOI•

Probabilistic machine learning and artificial intelligence

[...]

Zoubin Ghahramani¹•Institutions (1)

University of Cambridge¹

28 May 2015-Nature

TL;DR: This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

...read moreread less

Abstract: How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

...read moreread less

1,457 citations

Cites methods from "MCMC using Hamiltonian dynamics"

...However, this dichotomy is not as stark as it appears: many gradient-based optimisation methods can be turned into integration methods through the use of Langevin and Hamiltonian Monte Carlo methods [27, 28], while integration problems can be turned into optimisation problems through the use of variational approximations[24]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Equation of state calculations by fast computing machines

[...]

N. Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, Edward Teller - Show less +1 more

01 Jun 1953-Journal of Chemical Physics

TL;DR: In this article, a modified Monte Carlo integration over configuration space is used to investigate the properties of a two-dimensional rigid-sphere system with a set of interacting individual molecules, and the results are compared to free volume equations of state and a four-term virial coefficient expansion.

...read moreread less

Abstract: A general method, suitable for fast computing machines, for investigating such properties as equations of state for substances consisting of interacting individual molecules is described. The method consists of a modified Monte Carlo integration over configuration space. Results for the two‐dimensional rigid‐sphere system have been obtained on the Los Alamos MANIAC and are presented here. These results are compared to the free volume equation of state and to a four‐term virial coefficient expansion.

...read moreread less

35,161 citations

"MCMC using Hamiltonian dynamics" refers background or methods in this paper

...Chapter 1 MCMC using Hamiltonian dynamics Radford M. Neal, University of Toronto Hamiltonian dynamics can be used to produce distant proposals for the Metropolis algorithm, thereby avoiding the slow exploration of the state space that results from the diffusive behaviour of simple random-walk proposals....
[...]
...Markov Chain Monte Carlo (MCMC) originated with the classic paper of Metropolis et al. (1953), where it was used to simulate the distribution of states for a system of idealized molecules....
[...]
...Chapter 1 MCMC using Hamiltonian dynamics Radford M. Neal, University of Toronto Hamiltonian dynamics can be used to produce distant proposals for the Metropolis algorithm, thereby avoiding the slow exploration of the state space that results from the diffusive behaviour of simple random-walk…...
[...]

Journal Article•DOI•

Monte Carlo Sampling Methods Using Markov Chains and Their Applications

[...]

W. K. Hastings¹•Institutions (1)

University of Toronto¹

01 Apr 1970-Biometrika

TL;DR: A generalization of the sampling method introduced by Metropolis et al. as mentioned in this paper is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates.

...read moreread less

Abstract: SUMMARY A generalization of the sampling method introduced by Metropolis et al. (1953) is presented along with an exposition of the relevant theory, techniques of application and methods and difficulties of assessing the error in Monte Carlo estimates. Examples of the methods, including the generation of random orthogonal matrices and potential applications of the methods to numerical problems arising in statistics, are discussed. For numerical problems in a large number of dimensions, Monte Carlo methods are often more efficient than conventional numerical methods. However, implementation of the Monte Carlo methods requires sampling from high dimensional probability distributions and this may be very difficult and expensive in analysis and computer time. General methods for sampling from, or estimating expectations with respect to, such distributions are as follows. (i) If possible, factorize the distribution into the product of one-dimensional conditional distributions from which samples may be obtained. (ii) Use importance sampling, which may also be used for variance reduction. That is, in order to evaluate the integral J = X) p(x)dx = Ev(f), where p(x) is a probability density function, instead of obtaining independent samples XI, ..., Xv from p(x) and using the estimate J, = Zf(xi)/N, we instead obtain the sample from a distribution with density q(x) and use the estimate J2 = Y{f(xj)p(x1)}/{q(xj)N}. This may be advantageous if it is easier to sample from q(x) thanp(x), but it is a difficult method to use in a large number of dimensions, since the values of the weights w(xi) = p(x1)/q(xj) for reasonable values of N may all be extremely small, or a few may be extremely large. In estimating the probability of an event A, however, these difficulties may not be as serious since the only values of w(x) which are important are those for which x -A. Since the methods proposed by Trotter & Tukey (1956) for the estimation of conditional expectations require the use of importance sampling, the same difficulties may be encountered in their use. (iii) Use a simulation technique; that is, if it is difficult to sample directly from p(x) or if p(x) is unknown, sample from some distribution q(y) and obtain the sample x values as some function of the corresponding y values. If we want samples from the conditional dis

...read moreread less

14,965 citations

"MCMC using Hamiltonian dynamics" refers background or methods in this paper

...not symmetrical, it must be accepted or rejected based on both the ratio of the probability densities of q∗ and q and on the ratio of the probability densities for proposing q from q∗ and vice versa (Hastings, 1970). To see the equivalence with HMC using one leapfrog step, we can write the Metropolis-Hastings acceptance probability as follows: min " 1, exp(−U(q∗)) exp(−U(q)) Yd i=1 exp − qi − q∗ i +(ε2/2)[∂...
[...]
...Since this proposal is not symmetrical, it must be accepted or rejected based on both the ratio of the probability densities of q∗ and q and on the ratio of the probability densities for proposing q from q∗ and vice versa (Hastings, 1970)....
[...]

Book•

Mathematical Methods of Classical Mechanics

[...]

Vladimir I. Arnold

01 Jan 1974

TL;DR: In this paper, Newtonian mechanics: experimental facts investigation of the equations of motion, variational principles Lagrangian mechanics on manifolds oscillations rigid bodies, differential forms symplectic manifolds canonical formalism introduction to pertubation theory.

...read moreread less

Abstract: Part 1 Newtonian mechanics: experimental facts investigation of the equations of motion. Part 2 Lagrangian mechanics: variational principles Lagrangian mechanics on manifolds oscillations rigid bodies. Part 3 Hamiltonian mechanics: differential forms symplectic manifolds canonical formalism introduction to pertubation theory.

...read moreread less

11,008 citations

Book•

Understanding Molecular Simulation: From Algorithms to Applications

[...]

Daan Frenkel, Berend Smit

01 Jan 2001

TL;DR: In this paper, the physics behind molecular simulation for materials science is explained, and the implementation of simulation methods is illustrated in pseudocodes and their practical use in the case studies used in the text.

...read moreread less

Abstract: From the Publisher: This book explains the physics behind the "recipes" of molecular simulation for materials science. Computer simulators are continuously confronted with questions concerning the choice of a particular technique for a given application. Since a wide variety of computational tools exists, the choice of technique requires a good understanding of the basic principles. More importantly, such understanding may greatly improve the efficiency of a simulation program. The implementation of simulation methods is illustrated in pseudocodes and their practical use in the case studies used in the text. Examples are included that highlight current applications, and the codes of the case studies are available on the World Wide Web. No prior knowledge of computer simulation is assumed.

...read moreread less

6,901 citations

Journal Article•DOI•

Reversible jump Markov chain Monte Carlo computation and Bayesian model determination

[...]

Peter H.R. Green

01 Dec 1995-Biometrika

TL;DR: In this article, the authors propose a new framework for the construction of reversible Markov chain samplers that jump between parameter subspaces of differing dimensionality, which is flexible and entirely constructive.

...read moreread less

Abstract: Markov chain Monte Carlo methods for Bayesian computation have until recently been restricted to problems where the joint distribution of all variables has a density with respect to some fixed standard underlying measure. They have therefore not been available for application to Bayesian model determination, where the dimensionality of the parameter vector is typically not fixed. This paper proposes a new framework for the construction of reversible Markov chain samplers that jump between parameter subspaces of differing dimensionality, which is flexible and entirely constructive. It should therefore have wide applicability in model determination problems. The methodology is illustrated with applications to multiple change-point analysis in one and two dimensions, and to a Bayesian comparison of binomial experiments.

...read moreread less

6,188 citations