scispace - formally typeset
Open AccessJournal ArticleDOI

Stan : A Probabilistic Programming Language

Reads0
Chats0
TLDR
Stan as discussed by the authors is a probabilistic programming language for specifying statistical models, where a program imperatively defines a log probability function over parameters conditioned on specified data and constants, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration.
Abstract
Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.

read more

Content maybe subject to copyright    Report

JSS
Journal of Statistical Software
MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/
Stan: A Probabilistic Programming Language
Bob Carpenter
Columbia University
Daniel Lee
Columbia University
Marcus A. Brubaker
TTI-Chicago
Allen Riddell
Dartmouth College
Andrew Gelman
Columbia University
Ben Goodrich
Columbia University
Jiqiang Guo
Columbia Univesity
Matt Hoffman
Adobe Research Labs
Michael Betancourt
University College London
Peter Li
Columbia University
Abstract
Stan is a probabilistic programming language for specifying statistical models. A Stan
program imperatively defines a log probability function over parameters conditioned on
specified data and constants. As of version 2.2.0, Stan provides full Bayesian inference
for continuous-variable models through Markov chain Monte Carlo methods such as the
No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized
maximum likelihood estimates are calculated using optimization methods such as the
Broyden-Fletcher-Goldfarb-Shanno algorithm.
Stan is also a platform for computing log densities and their gradients and Hessians,
which can be used in alternative algorithms such as variational Bayes, expectation propa-
gation, and marginal inference using approximate integration. To this end, Stan is set up
so that the densities, gradients, and Hessians, along with intermediate quantities of the
algorithm such as acceptance probabilities, are easily accessible.
Stan can be called from the command line, through R using the RStan package, or
through Python using the PyStan package. All three interfaces support sampling or
optimization-based inference and analysis, and RStan and PyStan also provide access
to log probabilities, gradients, Hessians, and data I/O.
Keywords: probabilistic program, Bayesian inference, algorithmic differentiation, Stan.

2 Stan: A Probabilistic Programming Language
1. Introduction
The goal of the Stan project is to provide a flexible probabilistic programming language for
statistical modeling along with a suite of inference tools for fitting models that are robust,
scalable, and efficient.
Stan differs from BUGS (Lunn, Thomas, and Spiegelhalter 2000; Lunn, Spiegelhalter, Thomas,
and Best 2009; Lunn, Jackson, Best, Thomas, and Spiegelhalter 2012) and JAGS (Plummer
2003) in two primary ways. First, Stan is based on a new imperative probabilistic program-
ming language that is more flexible and expressive than the declarative graphical modeling
languages underlying BUGS or JAGS, in ways such as declaring variables with types and
supporting local variables and conditional statements. Second, Stan’s Markov chain Monte
Carlo (MCMC) techniques are based on Hamiltonian Monte Carlo (HMC), a more efficient
and robust sampler than Gibbs sampling or Metropolis-Hastings for models with complex
posteriors.
1
Stan has interfaces for the command-line shell (CmdStan), Python (PyStan), and R (RStan),
and runs on Windows, Mac OS X, and Linux, and is open-source licensed.
The next section provides an overview of how Stan works by way of an extended example, after
which the details of Stan’s programming language and inference mechanisms are provided.
2. Core Functionality
This section describes the use of Stan from the command line for estimating a Bayesian model
using both MCMC sampling for full Bayesian inference and optimization to provide a point
estimate at the posterior mode.
2.1. Model for estimating a Bernoulli parameter
Consider estimating the chance of success parameter for a Bernoulli distribution based on a
sequence of observed binary outcomes. Figure 1 provides an implementation of such a model
in Stan.
2
The model treats the observed binary data, y[1],...,y[N], as independent and
identically distributed, with success probability theta. The vectorized likelihood statement
can also be coded using a loop as in BUGS, although it will run more slowly than the vectorized
form:
1
Neal (2011) analyzes the scaling benfit of HMC with dimensionality. Hoffman and Gelman (2014) provide
practical comparisions of Stan’s adaptive HMC algorithm with Gibbs, Metropolis, and standard HMC samplers.
2
This model is available in the Stan source distribution in src/models/basic_estimators/bernoulli.stan.

Journal of Statistical Software 3
data {
int<lower=0> N; // N >= 0
int<lower=0,upper=1> y[N]; // y[n] in { 0, 1 }
}
parameters {
real<lower=0,upper=1> theta; // theta in [0, 1]
}
model {
theta ~ beta(1,1); // prior
y ~ bernoulli(theta); // likelihood
}
Figure 1: Model for estimating a Bernoulli parameter.
for (n in 1:N)
y[n] ~ bernoulli(theta);
A beta(1,1) (i.e., uniform) prior is placed on theta, although there is no special behavior
for conjugate priors in Stan. The prior could be dropped from the model altogether because
parameters start with uniform distributions on their support, here constrained to be between
0 and 1 in the parameter declaration for theta.
2.2. Data format
Data for running Stan from the command line can be included in R dump format. All of the
variables declared in the data block of the Stan program must be defined in the data file. For
example, 10 observations for the model in Figure 1 could be encoded as
3
3
This data file is provided with the Stan distrbution in file src/models/basic_estimators/bernoulli.R.
stan.

4 Stan: A Probabilistic Programming Language
N <- 10
y <- c(0,1,0,0,0,0,0,0,0,1)
This defines the contents of two variables, an integer N and a 10-element integer array y. The
variable N is declared in the data block of the program as being an integer greater than or
equal to zero; the variable y is declared as an integer array of size N with entries between 0
and 1 inclusive.
In RStan and PyStan, data can also be passed directly through memory without the need to
read or write to a file.
2.3. Compling the model
After a C++ compiler and make are installed,
4
the Bernoulli model in Figure 1 can be trans-
lated to C++ and compiled with a single command. First, the directory must be changed to
$stan, which we use as a shorthand for the directory in which Stan was unpacked.
5
> cd $stan
> make src/models/basic_estimators/bernoulli
This produces an executable file bernoulli (bernoulli.exe on Windows) on the same path
as the model. Forward slashes can be used with make on Windows.
2.4. Running the sampler
Command to sample from the model
The executable can be run with default options by specifying a path to the data file. The
first command in the following example changes the current directory to that containing the
model, which is where the data resides and where the executable is built. From there, the
path to the data is just the file name bernoulli.data.R.
> cd $stan/src/models/basic_estimators
> ./bernoulli sample data file=bernoulli.data.R
For Windows, the ./ before the command should be removed. This call specifies that sampling
should be performed with the model instantiated using the data in the specified file.
Terminal output from sampler
The output is as follows, starting with a summary of the command-line options used, including
defaults; these are also written into the samples file as comments.
4
Appropriate versions are built into Linux. The RTools package suffices for Windows; it is available from
http://cran.r-project.org/bin/windows/Rtools/. The Xcode package contains everything needed for the
Mac; see https://developer.apple.com/xcode/ for more information.
5
Before the first model is built, make must build the model translator (target bin/stanc) and posterior
summary tool (target bin/print), along with an optimized version of the C++ library (target bin/libstan.a).
Please be patient and consider make option -j2 or -j4 (or higher) to run in the specified number of processes
if two or four (or more) computational cores are available.

Journal of Statistical Software 5
method = sample (Default)
sample
num_samples = 1000 (Default)
num_warmup = 1000 (Default)
save_warmup = 0 (Default)
thin = 1 (Default)
adapt
engaged = 1 (Default)
gamma = 0.050000000000000003 (Default)
delta = 0.80000000000000004 (Default)
kappa = 0.75 (Default)
t0 = 10 (Default)
init_buffer = 75 (Default)
term_buffer = 50 (Default)
window = 25 (Default)
algorithm = hmc (Default)
hmc
engine = nuts (Default)
nuts
max_depth = 10 (Default)
metric = diag_e (Default)
stepsize = 1 (Default)
stepsize_jitter = 0 (Default)
id = 0 (Default)
data
file = bernoulli.data.R
init = 2 (Default)
random
seed = 4294967295 (Default)
output
file = output.csv (Default)
diagnostic_file = (Default)
refresh = 100 (Default)
Gradient evaluation took 4e-06 seconds
1000 transitions using 10 leapfrog steps per transition would take
0.04 seconds.
Adjust your expectations accordingly!
Iteration: 1 / 2000 [ 0%] (Warmup)
Iteration: 100 / 2000 [ 5%] (Warmup)
...
Iteration: 1000 / 2000 [ 50%] (Warmup)
Iteration: 1001 / 2000 [ 50%] (Sampling)
...
Iteration: 2000 / 2000 [100%] (Sampling)

Citations
More filters
Journal ArticleDOI

Extended fisheries recovery timelines in a changing environment.

TL;DR: This work reveals how a changing environmental context can delay the rebuilding of depleted fish stocks, and provides a framework to account for the potential impacts of environmental change on the productivity of wildlife populations more broadly.
Journal ArticleDOI

A Trend in the Effective Spin Distribution of LIGO Binary Black Holes with Mass

TL;DR: In this paper, the authors fit a mass-dependent distribution to the GWTC-1 BBHs from the first and second observing runs of Advanced LIGO and Advanced Virgo and found a negative correlation between mass and the mean effective spin, and positive correlation with its dispersion at 75% and 80% confidence.
Proceedings ArticleDOI

Testing probabilistic programming systems

TL;DR: This paper characterize 118 previously reported bugs in three open-source PP systems—Edward, Pyro and Stan—and pro- pose ProbFuzz, an extensible system for testing PP systems, and finds 67 previously unknown bugs in recent versions of these PP systems.
Journal ArticleDOI

Model-based evaluation of school- and non-school-related measures to control the COVID-19 pandemic.

TL;DR: The authors used an age-structured transmission model fitted to age-specific seroprevalence and hospital admission data to assess the effects of school-based measures at different time points during the COVID-19 pandemic in the Netherlands.
References
More filters
Journal Article

R: A language and environment for statistical computing.

R Core Team
- 01 Jan 2014 - 
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Journal ArticleDOI

Equation of state calculations by fast computing machines

TL;DR: In this article, a modified Monte Carlo integration over configuration space is used to investigate the properties of a two-dimensional rigid-sphere system with a set of interacting individual molecules, and the results are compared to free volume equations of state and a four-term virial coefficient expansion.
Book

Numerical Optimization

TL;DR: Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization, responding to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems.
Book

Bayesian Data Analysis

TL;DR: Detailed notes on Bayesian Computation Basics of Markov Chain Simulation, Regression Models, and Asymptotic Theorems are provided.
Journal ArticleDOI

Inference from Iterative Simulation Using Multiple Sequences

TL;DR: The focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normal- ity after transformations and marginalization, and the results are derived as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations.