scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Stan : A Probabilistic Programming Language

11 Jan 2017-Journal of Statistical Software (Foundation for Open Access Statistics)-Vol. 76, Iss: 1, pp 1-32
TL;DR: Stan as discussed by the authors is a probabilistic programming language for specifying statistical models, where a program imperatively defines a log probability function over parameters conditioned on specified data and constants, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration.
Abstract: Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.

Content maybe subject to copyright    Report

JSS
Journal of Statistical Software
MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/
Stan: A Probabilistic Programming Language
Bob Carpenter
Columbia University
Daniel Lee
Columbia University
Marcus A. Brubaker
TTI-Chicago
Allen Riddell
Dartmouth College
Andrew Gelman
Columbia University
Ben Goodrich
Columbia University
Jiqiang Guo
Columbia Univesity
Matt Hoffman
Adobe Research Labs
Michael Betancourt
University College London
Peter Li
Columbia University
Abstract
Stan is a probabilistic programming language for specifying statistical models. A Stan
program imperatively defines a log probability function over parameters conditioned on
specified data and constants. As of version 2.2.0, Stan provides full Bayesian inference
for continuous-variable models through Markov chain Monte Carlo methods such as the
No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized
maximum likelihood estimates are calculated using optimization methods such as the
Broyden-Fletcher-Goldfarb-Shanno algorithm.
Stan is also a platform for computing log densities and their gradients and Hessians,
which can be used in alternative algorithms such as variational Bayes, expectation propa-
gation, and marginal inference using approximate integration. To this end, Stan is set up
so that the densities, gradients, and Hessians, along with intermediate quantities of the
algorithm such as acceptance probabilities, are easily accessible.
Stan can be called from the command line, through R using the RStan package, or
through Python using the PyStan package. All three interfaces support sampling or
optimization-based inference and analysis, and RStan and PyStan also provide access
to log probabilities, gradients, Hessians, and data I/O.
Keywords: probabilistic program, Bayesian inference, algorithmic differentiation, Stan.

2 Stan: A Probabilistic Programming Language
1. Introduction
The goal of the Stan project is to provide a flexible probabilistic programming language for
statistical modeling along with a suite of inference tools for fitting models that are robust,
scalable, and efficient.
Stan differs from BUGS (Lunn, Thomas, and Spiegelhalter 2000; Lunn, Spiegelhalter, Thomas,
and Best 2009; Lunn, Jackson, Best, Thomas, and Spiegelhalter 2012) and JAGS (Plummer
2003) in two primary ways. First, Stan is based on a new imperative probabilistic program-
ming language that is more flexible and expressive than the declarative graphical modeling
languages underlying BUGS or JAGS, in ways such as declaring variables with types and
supporting local variables and conditional statements. Second, Stan’s Markov chain Monte
Carlo (MCMC) techniques are based on Hamiltonian Monte Carlo (HMC), a more efficient
and robust sampler than Gibbs sampling or Metropolis-Hastings for models with complex
posteriors.
1
Stan has interfaces for the command-line shell (CmdStan), Python (PyStan), and R (RStan),
and runs on Windows, Mac OS X, and Linux, and is open-source licensed.
The next section provides an overview of how Stan works by way of an extended example, after
which the details of Stan’s programming language and inference mechanisms are provided.
2. Core Functionality
This section describes the use of Stan from the command line for estimating a Bayesian model
using both MCMC sampling for full Bayesian inference and optimization to provide a point
estimate at the posterior mode.
2.1. Model for estimating a Bernoulli parameter
Consider estimating the chance of success parameter for a Bernoulli distribution based on a
sequence of observed binary outcomes. Figure 1 provides an implementation of such a model
in Stan.
2
The model treats the observed binary data, y[1],...,y[N], as independent and
identically distributed, with success probability theta. The vectorized likelihood statement
can also be coded using a loop as in BUGS, although it will run more slowly than the vectorized
form:
1
Neal (2011) analyzes the scaling benfit of HMC with dimensionality. Hoffman and Gelman (2014) provide
practical comparisions of Stan’s adaptive HMC algorithm with Gibbs, Metropolis, and standard HMC samplers.
2
This model is available in the Stan source distribution in src/models/basic_estimators/bernoulli.stan.

Journal of Statistical Software 3
data {
int<lower=0> N; // N >= 0
int<lower=0,upper=1> y[N]; // y[n] in { 0, 1 }
}
parameters {
real<lower=0,upper=1> theta; // theta in [0, 1]
}
model {
theta ~ beta(1,1); // prior
y ~ bernoulli(theta); // likelihood
}
Figure 1: Model for estimating a Bernoulli parameter.
for (n in 1:N)
y[n] ~ bernoulli(theta);
A beta(1,1) (i.e., uniform) prior is placed on theta, although there is no special behavior
for conjugate priors in Stan. The prior could be dropped from the model altogether because
parameters start with uniform distributions on their support, here constrained to be between
0 and 1 in the parameter declaration for theta.
2.2. Data format
Data for running Stan from the command line can be included in R dump format. All of the
variables declared in the data block of the Stan program must be defined in the data file. For
example, 10 observations for the model in Figure 1 could be encoded as
3
3
This data file is provided with the Stan distrbution in file src/models/basic_estimators/bernoulli.R.
stan.

4 Stan: A Probabilistic Programming Language
N <- 10
y <- c(0,1,0,0,0,0,0,0,0,1)
This defines the contents of two variables, an integer N and a 10-element integer array y. The
variable N is declared in the data block of the program as being an integer greater than or
equal to zero; the variable y is declared as an integer array of size N with entries between 0
and 1 inclusive.
In RStan and PyStan, data can also be passed directly through memory without the need to
read or write to a file.
2.3. Compling the model
After a C++ compiler and make are installed,
4
the Bernoulli model in Figure 1 can be trans-
lated to C++ and compiled with a single command. First, the directory must be changed to
$stan, which we use as a shorthand for the directory in which Stan was unpacked.
5
> cd $stan
> make src/models/basic_estimators/bernoulli
This produces an executable file bernoulli (bernoulli.exe on Windows) on the same path
as the model. Forward slashes can be used with make on Windows.
2.4. Running the sampler
Command to sample from the model
The executable can be run with default options by specifying a path to the data file. The
first command in the following example changes the current directory to that containing the
model, which is where the data resides and where the executable is built. From there, the
path to the data is just the file name bernoulli.data.R.
> cd $stan/src/models/basic_estimators
> ./bernoulli sample data file=bernoulli.data.R
For Windows, the ./ before the command should be removed. This call specifies that sampling
should be performed with the model instantiated using the data in the specified file.
Terminal output from sampler
The output is as follows, starting with a summary of the command-line options used, including
defaults; these are also written into the samples file as comments.
4
Appropriate versions are built into Linux. The RTools package suffices for Windows; it is available from
http://cran.r-project.org/bin/windows/Rtools/. The Xcode package contains everything needed for the
Mac; see https://developer.apple.com/xcode/ for more information.
5
Before the first model is built, make must build the model translator (target bin/stanc) and posterior
summary tool (target bin/print), along with an optimized version of the C++ library (target bin/libstan.a).
Please be patient and consider make option -j2 or -j4 (or higher) to run in the specified number of processes
if two or four (or more) computational cores are available.

Journal of Statistical Software 5
method = sample (Default)
sample
num_samples = 1000 (Default)
num_warmup = 1000 (Default)
save_warmup = 0 (Default)
thin = 1 (Default)
adapt
engaged = 1 (Default)
gamma = 0.050000000000000003 (Default)
delta = 0.80000000000000004 (Default)
kappa = 0.75 (Default)
t0 = 10 (Default)
init_buffer = 75 (Default)
term_buffer = 50 (Default)
window = 25 (Default)
algorithm = hmc (Default)
hmc
engine = nuts (Default)
nuts
max_depth = 10 (Default)
metric = diag_e (Default)
stepsize = 1 (Default)
stepsize_jitter = 0 (Default)
id = 0 (Default)
data
file = bernoulli.data.R
init = 2 (Default)
random
seed = 4294967295 (Default)
output
file = output.csv (Default)
diagnostic_file = (Default)
refresh = 100 (Default)
Gradient evaluation took 4e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take
0.04 seconds.
Adjust your expectations accordingly!
Iteration: 1 / 2000 [ 0%] (Warmup)
Iteration: 100 / 2000 [ 5%] (Warmup)
...
Iteration: 1000 / 2000 [ 50%] (Warmup)
Iteration: 1001 / 2000 [ 50%] (Sampling)
...
Iteration: 2000 / 2000 [100%] (Sampling)

Citations
More filters
Journal ArticleDOI
TL;DR: The brms package implements Bayesian multilevel models in R using the probabilistic programming language Stan, allowing users to fit linear, robust linear, binomial, Poisson, survival, ordinal, zero-inflated, hurdle, and even non-linear models all in a multileVEL context.
Abstract: The brms package implements Bayesian multilevel models in R using the probabilistic programming language Stan. A wide range of distributions and link functions are supported, allowing users to fit - among others - linear, robust linear, binomial, Poisson, survival, ordinal, zero-inflated, hurdle, and even non-linear models all in a multilevel context. Further modeling options include autocorrelation of the response variable, user defined covariance structures, censored data, as well as meta-analytic standard errors. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. In addition, model fit can easily be assessed and compared with the Watanabe-Akaike information criterion and leave-one-out cross-validation.

4,353 citations

Journal ArticleDOI
TL;DR: PM2.5 exposure may be related to additional causes of death than the five considered by the GBD and that incorporation of risk information from other, nonoutdoor, particle sources leads to underestimation of disease burden, especially at higher concentrations.
Abstract: Exposure to ambient fine particulate matter (PM2.5) is a major global health concern. Quantitative estimates of attributable mortality are based on disease-specific hazard ratio models that incorporate risk information from multiple PM2.5 sources (outdoor and indoor air pollution from use of solid fuels and secondhand and active smoking), requiring assumptions about equivalent exposure and toxicity. We relax these contentious assumptions by constructing a PM2.5-mortality hazard ratio function based only on cohort studies of outdoor air pollution that covers the global exposure range. We modeled the shape of the association between PM2.5 and nonaccidental mortality using data from 41 cohorts from 16 countries-the Global Exposure Mortality Model (GEMM). We then constructed GEMMs for five specific causes of death examined by the global burden of disease (GBD). The GEMM predicts 8.9 million [95% confidence interval (CI): 7.5-10.3] deaths in 2015, a figure 30% larger than that predicted by the sum of deaths among the five specific causes (6.9; 95% CI: 4.9-8.5) and 120% larger than the risk function used in the GBD (4.0; 95% CI: 3.3-4.8). Differences between the GEMM and GBD risk functions are larger for a 20% reduction in concentrations, with the GEMM predicting 220% higher excess deaths. These results suggest that PM2.5 exposure may be related to additional causes of death than the five considered by the GBD and that incorporation of risk information from other, nonoutdoor, particle sources leads to underestimation of disease burden, especially at higher concentrations.

1,283 citations

Journal ArticleDOI
TL;DR: A practical approach to forecasting “at scale” that combines configurable models with analyst-in-the-loop performance analysis, and a modular regression model with interpretable parameters that can be intuitively adjusted by analysts with domain knowledge about the time series are described.
Abstract: Forecasting is a common data science task that helps organizations with capacity planning, goal setting, and anomaly detection. Despite its importance, there are serious challenges associated with ...

1,166 citations

Journal ArticleDOI
TL;DR: Dynamic Nested Sampling as discussed by the authors adaptively allocating samples based on posterior structure, which has the benefits of Markov Chain Monte Carlo algorithms that focus exclusively on posterior estimation while retaining nested sampling's ability to estimate evidences and sample from complex, multi-modal distributions.
Abstract: We present dynesty, a public, open-source, Python package to estimate Bayesian posteriors and evidences (marginal likelihoods) using Dynamic Nested Sampling. By adaptively allocating samples based on posterior structure, Dynamic Nested Sampling has the benefits of Markov Chain Monte Carlo algorithms that focus exclusively on posterior estimation while retaining Nested Sampling's ability to estimate evidences and sample from complex, multi-modal distributions. We provide an overview of Nested Sampling, its extension to Dynamic Nested Sampling, the algorithmic challenges involved, and the various approaches taken to solve them. We then examine dynesty's performance on a variety of toy problems along with several astronomical applications. We find in particular problems dynesty can provide substantial improvements in sampling efficiency compared to popular MCMC approaches in the astronomical literature. More detailed statistical results related to Nested Sampling are also included in the Appendix.

886 citations

References
More filters
Journal Article
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

272,030 citations

Journal ArticleDOI
TL;DR: In this article, a modified Monte Carlo integration over configuration space is used to investigate the properties of a two-dimensional rigid-sphere system with a set of interacting individual molecules, and the results are compared to free volume equations of state and a four-term virial coefficient expansion.
Abstract: A general method, suitable for fast computing machines, for investigating such properties as equations of state for substances consisting of interacting individual molecules is described. The method consists of a modified Monte Carlo integration over configuration space. Results for the two‐dimensional rigid‐sphere system have been obtained on the Los Alamos MANIAC and are presented here. These results are compared to the free volume equation of state and to a four‐term virial coefficient expansion.

35,161 citations

Book
01 Nov 2008
TL;DR: Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization, responding to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems.
Abstract: Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization. It responds to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems. For this new edition the book has been thoroughly updated throughout. There are new chapters on nonlinear interior methods and derivative-free methods for optimization, both of which are used widely in practice and the focus of much current research. Because of the emphasis on practical methods, as well as the extensive illustrations and exercises, the book is accessible to a wide audience. It can be used as a graduate text in engineering, operations research, mathematics, computer science, and business. It also serves as a handbook for researchers and practitioners in the field. The authors have strived to produce a text that is pleasant to read, informative, and rigorous - one that reveals both the beautiful nature of the discipline and its practical side.

17,420 citations

Book
01 Jan 1995
TL;DR: Detailed notes on Bayesian Computation Basics of Markov Chain Simulation, Regression Models, and Asymptotic Theorems are provided.
Abstract: FUNDAMENTALS OF BAYESIAN INFERENCE Probability and Inference Single-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian Approaches Hierarchical Models FUNDAMENTALS OF BAYESIAN DATA ANALYSIS Model Checking Evaluating, Comparing, and Expanding Models Modeling Accounting for Data Collection Decision Analysis ADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional Approximations REGRESSION MODELS Introduction to Regression Models Hierarchical Linear Models Generalized Linear Models Models for Robust Inference Models for Missing Data NONLINEAR AND NONPARAMETRIC MODELS Parametric Nonlinear Models Basic Function Models Gaussian Process Models Finite Mixture Models Dirichlet Process Models APPENDICES A: Standard Probability Distributions B: Outline of Proofs of Asymptotic Theorems C: Computation in R and Stan Bibliographic Notes and Exercises appear at the end of each chapter.

16,079 citations

Journal ArticleDOI
TL;DR: The focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normal- ity after transformations and marginalization, and the results are derived as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations.
Abstract: The Gibbs sampler, the algorithm of Metropolis and similar iterative simulation methods are potentially very helpful for summarizing multivariate distributions. Used naively, however, iterative simulation can give misleading answers. Our methods are simple and generally applicable to the output of any iterative simulation; they are designed for researchers primarily interested in the science underlying the data and models they are analyzing, rather than for researchers interested in the probability theory underlying the iterative simulations themselves. Our recommended strategy is to use several independent sequences, with starting points sampled from an overdispersed distribution. At each step of the iterative simulation, we obtain, for each univariate estimand of interest, a distributional estimate and an estimate of how much sharper the distributional estimate might become if the simulations were continued indefinitely. Because our focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normality after transformations and marginalization, we derive our results as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations. The methods are illustrated on a random-effects mixture model applied to experimental measurements of reaction times of normal and schizophrenic patients.

13,884 citations