Stan : A Probabilistic Programming Language

doi:10.18637/JSS.V076.I01

JSS

Journal of Statistical Software

MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/

Stan: A Probabilistic Programming Language

Bob Carpenter

Columbia University

Daniel Lee

Columbia University

Marcus A. Brubaker

TTI-Chicago

Allen Riddell

Dartmouth College

Andrew Gelman

Columbia University

Ben Goodrich

Columbia University

Jiqiang Guo

Columbia Univesity

Matt Hoﬀman

Adobe Research Labs

Michael Betancourt

University College London

Peter Li

Columbia University

Abstract

Stan is a probabilistic programming language for specifying statistical models. A Stan

program imperatively deﬁnes a log probability function over parameters conditioned on

speciﬁed data and constants. As of version 2.2.0, Stan provides full Bayesian inference

for continuous-variable models through Markov chain Monte Carlo methods such as the

No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized

maximum likelihood estimates are calculated using optimization methods such as the

Broyden-Fletcher-Goldfarb-Shanno algorithm.

Stan is also a platform for computing log densities and their gradients and Hessians,

which can be used in alternative algorithms such as variational Bayes, expectation propa-

gation, and marginal inference using approximate integration. To this end, Stan is set up

so that the densities, gradients, and Hessians, along with intermediate quantities of the

algorithm such as acceptance probabilities, are easily accessible.

Stan can be called from the command line, through R using the RStan package, or

through Python using the PyStan package. All three interfaces support sampling or

optimization-based inference and analysis, and RStan and PyStan also provide access

to log probabilities, gradients, Hessians, and data I/O.

Keywords: probabilistic program, Bayesian inference, algorithmic diﬀerentiation, Stan.

2 Stan: A Probabilistic Programming Language

1. Introduction

The goal of the Stan project is to provide a ﬂexible probabilistic programming language for

statistical modeling along with a suite of inference tools for ﬁtting models that are robust,

scalable, and eﬃcient.

Stan diﬀers from BUGS (Lunn, Thomas, and Spiegelhalter 2000; Lunn, Spiegelhalter, Thomas,

and Best 2009; Lunn, Jackson, Best, Thomas, and Spiegelhalter 2012) and JAGS (Plummer

2003) in two primary ways. First, Stan is based on a new imperative probabilistic program-

ming language that is more ﬂexible and expressive than the declarative graphical modeling

languages underlying BUGS or JAGS, in ways such as declaring variables with types and

supporting local variables and conditional statements. Second, Stan’s Markov chain Monte

Carlo (MCMC) techniques are based on Hamiltonian Monte Carlo (HMC), a more eﬃcient

and robust sampler than Gibbs sampling or Metropolis-Hastings for models with complex

posteriors.

1

Stan has interfaces for the command-line shell (CmdStan), Python (PyStan), and R (RStan),

and runs on Windows, Mac OS X, and Linux, and is open-source licensed.

The next section provides an overview of how Stan works by way of an extended example, after

which the details of Stan’s programming language and inference mechanisms are provided.

2. Core Functionality

This section describes the use of Stan from the command line for estimating a Bayesian model

using both MCMC sampling for full Bayesian inference and optimization to provide a point

estimate at the posterior mode.

2.1. Model for estimating a Bernoulli parameter

Consider estimating the chance of success parameter for a Bernoulli distribution based on a

sequence of observed binary outcomes. Figure 1 provides an implementation of such a model

in Stan.

2

The model treats the observed binary data, y[1],...,y[N], as independent and

identically distributed, with success probability theta. The vectorized likelihood statement

can also be coded using a loop as in BUGS, although it will run more slowly than the vectorized

form:

1

Neal (2011) analyzes the scaling benﬁt of HMC with dimensionality. Hoﬀman and Gelman (2014) provide

practical comparisions of Stan’s adaptive HMC algorithm with Gibbs, Metropolis, and standard HMC samplers.

2

This model is available in the Stan source distribution in src/models/basic_estimators/bernoulli.stan.

Journal of Statistical Software 3

data {

int<lower=0> N; // N >= 0

int<lower=0,upper=1> y[N]; // y[n] in { 0, 1 }

}

parameters {

real<lower=0,upper=1> theta; // theta in [0, 1]

}

model {

theta ~ beta(1,1); // prior

y ~ bernoulli(theta); // likelihood

}

Figure 1: Model for estimating a Bernoulli parameter.

for (n in 1:N)

y[n] ~ bernoulli(theta);

A beta(1,1) (i.e., uniform) prior is placed on theta, although there is no special behavior

for conjugate priors in Stan. The prior could be dropped from the model altogether because

parameters start with uniform distributions on their support, here constrained to be between

0 and 1 in the parameter declaration for theta.

2.2. Data format

Data for running Stan from the command line can be included in R dump format. All of the

variables declared in the data block of the Stan program must be deﬁned in the data ﬁle. For

example, 10 observations for the model in Figure 1 could be encoded as

3

This data ﬁle is provided with the Stan distrbution in ﬁle src/models/basic_estimators/bernoulli.R.

stan.

4 Stan: A Probabilistic Programming Language

N <- 10

y <- c(0,1,0,0,0,0,0,0,0,1)

This deﬁnes the contents of two variables, an integer N and a 10-element integer array y. The

variable N is declared in the data block of the program as being an integer greater than or

equal to zero; the variable y is declared as an integer array of size N with entries between 0

and 1 inclusive.

In RStan and PyStan, data can also be passed directly through memory without the need to

read or write to a ﬁle.

2.3. Compling the model

After a C++ compiler and make are installed,

4

the Bernoulli model in Figure 1 can be trans-

lated to C++ and compiled with a single command. First, the directory must be changed to

$stan, which we use as a shorthand for the directory in which Stan was unpacked.

5

> cd $stan

> make src/models/basic_estimators/bernoulli

This produces an executable ﬁle bernoulli (bernoulli.exe on Windows) on the same path

as the model. Forward slashes can be used with make on Windows.

2.4. Running the sampler

Command to sample from the model

The executable can be run with default options by specifying a path to the data ﬁle. The

ﬁrst command in the following example changes the current directory to that containing the

model, which is where the data resides and where the executable is built. From there, the

path to the data is just the ﬁle name bernoulli.data.R.

> cd $stan/src/models/basic_estimators

> ./bernoulli sample data file=bernoulli.data.R

For Windows, the ./ before the command should be removed. This call speciﬁes that sampling

should be performed with the model instantiated using the data in the speciﬁed ﬁle.

Terminal output from sampler

The output is as follows, starting with a summary of the command-line options used, including

defaults; these are also written into the samples ﬁle as comments.

4

Appropriate versions are built into Linux. The RTools package suﬃces for Windows; it is available from

http://cran.r-project.org/bin/windows/Rtools/. The Xcode package contains everything needed for the

Mac; see https://developer.apple.com/xcode/ for more information.

5

Before the ﬁrst model is built, make must build the model translator (target bin/stanc) and posterior

summary tool (target bin/print), along with an optimized version of the C++ library (target bin/libstan.a).

Please be patient and consider make option -j2 or -j4 (or higher) to run in the speciﬁed number of processes

if two or four (or more) computational cores are available.

Journal of Statistical Software 5

method = sample (Default)

sample

num_samples = 1000 (Default)

num_warmup = 1000 (Default)

save_warmup = 0 (Default)

thin = 1 (Default)

adapt

engaged = 1 (Default)

gamma = 0.050000000000000003 (Default)

delta = 0.80000000000000004 (Default)

kappa = 0.75 (Default)

t0 = 10 (Default)

init_buffer = 75 (Default)

term_buffer = 50 (Default)

window = 25 (Default)

algorithm = hmc (Default)

hmc

engine = nuts (Default)

nuts

max_depth = 10 (Default)

metric = diag_e (Default)

stepsize = 1 (Default)

stepsize_jitter = 0 (Default)

id = 0 (Default)

data

file = bernoulli.data.R

init = 2 (Default)

random

seed = 4294967295 (Default)

output

file = output.csv (Default)

diagnostic_file = (Default)

refresh = 100 (Default)

Gradient evaluation took 4e-06 seconds

1000 transitions using 10 leapfrog steps per transition would take

0.04 seconds.

Adjust your expectations accordingly!

Iteration: 1 / 2000 [ 0%] (Warmup)

Iteration: 100 / 2000 [ 5%] (Warmup)

...

Iteration: 1000 / 2000 [ 50%] (Warmup)

Iteration: 1001 / 2000 [ 50%] (Sampling)

...

Iteration: 2000 / 2000 [100%] (Sampling)

Stan : A Probabilistic Programming Language

Citations

References

Related Papers (5)