scispace - formally typeset
Open AccessJournal ArticleDOI

Bayesian Computing with INLA: A Review

Reads0
Chats0
TLDR
Integrated nested Laplace approximations (INLA) as mentioned in this paper approximates the integrand with a second-order Taylor expansion around the mode and computes the integral analytically.
Abstract
The key operation in Bayesian inference is to compute high-dimensional integrals. An old approximate technique is the Laplace method or approximation, which dates back to Pierre-Simon Laplace (1774). This simple idea approximates the integrand with a second-order Taylor expansion around the mode and computes the integral analytically. By developing a nested version of this classical idea, combined with modern numerical techniques for sparse matrices, we obtain the approach of integrated nested Laplace approximations (INLA) to do approximate Bayesian inference for latent Gaussian models (LGMs). LGMs represent an important model abstraction for Bayesian inference and include a large proportion of the statistical models used today. In this review, we discuss the reasons for the success of the INLA approach, the R-INLA package, why it is so accurate, why the approximations are very quick to compute, and why LGMs make such a useful concept for Bayesian computing.

read more

Content maybe subject to copyright    Report

Edinburgh Research Explorer
Bayesian Computing with INLA: A Review
Citation for published version:
Rue, H, Riebler, A, Sørbye, SH, Illian, J, Simpson, DP & Lindgren, F 2017, 'Bayesian Computing with INLA:
A Review', Annual Review of Statistics and its Application, vol. 4, no. 1. https://doi.org/10.1146/annurev-
statistics-060116-054045
Digital Object Identifier (DOI):
10.1146/annurev-statistics-060116-054045
Link:
Link to publication record in Edinburgh Research Explorer
Document Version:
Early version, also known as pre-print
Published In:
Annual Review of Statistics and its Application
General rights
Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)
and / or other copyright owners and it is a condition of accessing these publications that users recognise and
abide by the legal requirements associated with these rights.
Take down policy
The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer
content complies with UK legislation. If you believe that the public display of this file breaches copyright please
contact openaccess@ed.ac.uk providing details, and we will remove access to the work immediately and
investigate your claim.
Download date: 09. Aug. 2022

Bayesian Computing with
INLA: A Review
H˚avard Rue
1
, Andrea Riebler
1
, Sigrunn H.
Sørbye
2
,Janine B. Illian
3
, Daniel P. Simpson
4
and Finn K. Lindgren
5
1
Department of Mathematical Sciences, Norwegian University of Science and
Technology, N-7491 Trondheim, Norway; email: hrue@math.ntnu.no
2
Department of Mathematics and Statistics, UiT The Arctic University of
Norway, 9037 Tromsø, Norway
3
Centre for Research into Ecological and Environmental Modelling, School of
Mathematics and Statistics, University of St Andrews, St Andrews, Fife KY16
9LZ, United Kingdom
4
Department of Mathematical Sciences, University of Bath, Claverton Down,
Bath, BA2 7AY, United Kingdom
5
School of Mathematics, The University of Edinburgh, James Clerk Maxwell
Building, The King’s Buildings, Peter Guthrie Tait Road, Edinburgh, EH9 3FD,
United Kingdom
Xxxx. Xxx. Xxx. Xxx. YYYY. AA:1–30
This article’s doi:
10.1146/((please add article doi))
Copyright
c
YYYY by Annual Reviews.
All rights reserved
Keywords
Gaussian Markov random fields, Laplace approximations, approximate
Bayesian inference, latent Gaussian models, numerical integration,
sparse matrices
Abstract
The key operation in Bayesian inference, is to compute high-
dimensional integrals. An old approximate technique is the Laplace
method or approximation, which dates back to Pierre-Simon Laplace
(1774). This simple idea approximates the integrand with a second
order Taylor expansion around the mode and computes the integral
analytically. By developing a nested version of this classical idea, com-
bined with modern numerical techniques for sparse matrices, we obtain
the approach of Integrated Nested Laplace Approximations (INLA) to
do approximate Bayesian inference for latent Gaussian models (LGMs).
LGMs represent an important model-abstraction for Bayesian inference
and include a large proportion of the statistical models used today. In
this review, we will discuss the reasons for the success of the INLA-
approach, the R-INLA package, why it is so accurate, why the approxi-
mations are very quick to compute and why LGMs make such a useful
concept for Bayesian computing.
1

Contents
1. INTRODUCTION ........ . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . .. 2
2. BACKGROUND ON THE KEY COMPONENTS ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . .... . 4
2.1. Latent Gaussian Models (LGMs) . . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . ... . . ... . . ... . 4
2.2. Additive Models .. . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . 5
2.3. Gaussian Markov Random Fields (GMRFs) ... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . . 5
2.4. Additive Models and GMRFs . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... 6
2.5. Laplace Approximations .. . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . .. 7
3. Putting It All Together: INLA .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... 10
3.1. Approximating the Posterior Marginals for the Hyperparameters . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . . 12
3.2. Approximating the Posterior Marginals for the Latent Field .. . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . 13
4. THE R-INLA PACKAGE: EXAMPLES . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . 15
4.1. A Simple Example. . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . 15
4.2. A Less Simple Example Including Measurement Error .... . .... . .... . .... . .... . .... . ... . . ... . . ... . . ... . 17
4.3. A Spatial Example . . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... 17
4.4. Special Features . . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . . 19
5. A CHALLENGE FOR THE FUTURE: PRIORS . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . ... . . 22
6. DISCUSSION .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .... . .. 24
1. INTRODUCTION
A key obstacle in Bayesian statistics is to actually do the Bayesian inference. From a
mathematical point of view, the inference step is easy, transparent and defined by first
principles: We simply update prior beliefs about the unknown parameters with available
information in observed data, and obtain the posterior distribution for the parameters.
Based on the posterior, we can compute relevant statistics for the parameters of interest,
including marginal distributions, means, variances, quantiles, credibility intervals, etc. In
practice, this is much easier said than done.
The introduction of simulation based inference, through the idea of Markov chain Monte
Carlo (Robert and Casella 1999), hit the statistical community in the early 1990’s and rep-
resented a major break-through in Bayesian inference. MCMC provided a general recipe to
generate samples from posteriors by constructing a Markov chain with the target posterior
as the stationary distribution. This made it possible (in theory) to extract and compute
whatever one could wish for. Additional major developments have paved the way for pop-
ular user-friendly MCMC-tools, like WinBUGS (Spiegelhalter et al. 1995), JAGS (Plummer
2016), and the new initiative Stan (Stan Development Team 2015), which uses Hamiltonian
Monte Carlo. Armed with these and similar tools, Bayesian statistics has quickly grown in
popularity and Bayesian statistics is now well-represented in all the major research journals
in all branches of statistics.
In our opinion, however, from the point of view of applied users, the impact of the
Bayesian revolution has been less apparent. This is not a statement about how Bayesian
statistics itself is viewed by that community, but about its rather “cumbersome” inference,
which still requires a lot of CPU and hence human time– as well as tweaking of simulation
and model parameters to get it right. Re-running a lot of alternative models gets even
more cumbersome, making the iterative process of model building in statistical analysis
impossible (Box and Tiao 1973, Sec. 1.1.4). For this reason, simulation based inference
2 Rue et al.

(and hence in most cases also Bayesian statistics) has too often been avoided as being
practically infeasible.
In this paper, we review a different take on doing Bayesian inference that recently has
facilitated the uptake of Bayesian modelling within the community of applied users. The
given approach is restricted to the specific class of latent Gaussian models (LGMs) which,
as will be clear soon, includes a wide variety of commonly applied statistical models making
this restriction less limiting than it might appear at first sight. The crucial point here is
that we can derive integrated nested Laplace approximation (INLA methodology) for LGMs,
a deterministic approach to approximate Bayesian inference. Performing inference within a
reasonable time-frame, in most cases INLA is both faster and more accurate than MCMC
alternatives. Being used to trading speed for accuracy this might seem like a contradiction
to most readers. The corresponding R-package (R-INLA, see www.r-inla.org), has turned
out to be very popular in applied sciences and applied statistics, and has become a versatile
tool for quick and reliable Bayesian inference.
Recent examples of applications using the R-INLA package for statistical analysis, include
disease mapping (Schr¨odle and Held 2011b,a; Ugarte et al. 2014, 2016; Papoila et al. 2014;
Goicoa et al. 2016; Riebler et al. 2016), age-period-cohort models (Riebler and Held 2016),
evolution of the Ebola virus (Santermans et al. 2016), studies of relationship between access
to housing, health and well-being in cities (Kandt et al. 2016), study of the prevalence
and correlates of intimate partner violence against men in Africa (Tsiko 2015), search for
evidence of gene expression heterosis (Niemi et al. 2015), analysis of traffic pollution and
hospital admissions in London (Halonen et al. 2016), early transcriptome changes in maize
primary root tissues in response to moderate water deficit conditions by RNA-Sequencing
(Opitz et al. 2016), performance of inbred and hybrid genotypes in plant breeding and
genetics (Lithio and Nettleton 2015), a study of Norwegian emergency wards (Goth et al.
2014), effects of measurement errors (Kr¨oger et al. 2016; Muff et al. 2015; Muff and Keller
2015), network meta-analysis (Sauter and Held 2015), time-series analysis of genotyped
human campylobacteriosis cases from the Manawatu region of New Zealand (Friedrich et al.
2016), modeling of parrotfish habitats (Roos et al. 2015b), Bayesian outbreak detection
(Salmon et al. 2015), studies of long-term trends in the number of Monarch butterflies
(Crewe and Mccracken 2015), long-term effects on hospital admission and mortality of road
traffic noise (Halonen et al. 2015), spatio-temporal dynamics of brain tumours (Iulian et al.
2015), ovarian cancer mortality (Garc´ıa-P´erez et al. 2015), the effect of preferential sampling
on phylodynamic inference (Karcher et al. 2016), analysis of the impact of climate change on
abundance trends in central Europe (Bowler et al. 2015), investigation of drinking patterns
in US Counties from 2002 to 2012 (Dwyer-Lindgren et al. 2015), resistance and resilience
of terrestrial birds in drying climates (Selwood et al. 2015), cluster analysis of population
amyotrophic lateral sclerosis risk (Rooney et al. 2015), malaria infection in Africa (Noor
et al. 2014), effects of fragmentation on infectious disease dynamics (Jousimo et al. 2014),
soil-transmitted helminth infection in sub-Saharan Africa (Karagiannis-Voules et al. 2015),
analysis of the effect of malaria control on Plasmodium falciparum in Africa between 2000
and 2015 (Bhatt et al. 2015), adaptive prior weighting in generalized regression (Held and
Sauter 2016), analysis of hand, foot, and mouth disease surveillance data in China (Bauer
et al. 2016), estimate the biomass of anchovies in the coast of Per´u (Quiroz et al. 2015),
and many others.
We review the key components that make up INLA in Section 2 and in Section 3 we
combine these to outline why and in which situations INLA works. In Section 4 we
www.annualreviews.org
Bayesian computing with INLA 3

show some examples of the use of R-INLA, and discuss some special features that expand
the class of models that R-INLA can be applied to. In Section 5, we discuss a specific
challenge in Bayesian methodology, and, in particular, reason why it is important to provide
better suggestions for default priors. We conclude with a general discussion and outlook in
Section 6.
2. BACKGROUND ON THE KEY COMPONENTS
In this section, we review the key components of the INLA-approach to approximate
Bayesian inference. We introduce these concepts using a top-down approach, starting with
latent Gaussian models (LGMs), and what type of statistical models may be viewed as
LGMs. We also discuss the types of Gaussians/Gaussian-processes that are computation-
ally efficient within this formulation, and illustrate Laplace approximation to perform in-
tegration a method that has been around for a very long time yet proves to be a key
ingredient in the methodology we review here.
Due to the top-down structure of this text we occasionally have to mention specific
concepts before properly introducing and/or defining them we ask the reader to bear with
us in these cases.
2.1. Latent Gaussian Models (LGMs)
The concept of latent Gaussian models represents a very useful abstraction subsuming a
large class of statistical models, in the sense that the task of statistical inference can be
unified for the entire class (Rue et al. 2009). This is obtained using a three-stage hierarchical
model formulation, in which observations y can be assumed to be conditionally independent,
given a latent Gaussian random field x and hyperparameters θ
1
,
y | x, θ
1
Y
i∈I
π(y
i
| x
i
, θ
1
).
The versatility of the model class relates to the specification of the latent Gaussian field:
x | θ
2
N
µ(θ
2
), Q
1
(θ
2
)
which includes all random terms in a statistical model, describing the underlying dependence
structure of the data. The hyperparameters θ = (θ
1
, θ
2
), control the Gaussian latent field
and/or the likelihood for the data, and the posterior reads
π(x, θ|y) π(θ) π(x|θ)
Y
i∈I
π(y
i
|x
i
, θ). (1)
We make the following critical assumptions :
1. The number of hyperparameters |θ| is small, typically 2 to 5, but not exceeding 20.
2. The distribution of the latent field, x|θ is Gaussian and required to be a Gaussian
Markov random field (GMRF) (or do be close to one) when the dimension n is high
(10
3
to 10
5
).
3. The data y are mutually conditionally independent of x and θ, implying that each
observation y
i
only depends on one component of the latent field, e.g. x
i
. Most
components of x will not be observed.
These assumptions are required both for computational reasons and to ensure, with a high
degree of certainty, that the approximations we describe below are accurate.
4 Rue et al.

Citations
More filters
Journal ArticleDOI

Meta-analysis reveals declines in terrestrial but increases in freshwater insect abundances.

TL;DR: It is found that freshwater insect populations have increased overall, perhaps owing to clean water efforts and climate change, and patterns of variation suggest that local-scale drivers are likely responsible for many changes in population trends, providing hope for directed conservation actions.
Journal ArticleDOI

Spatial modeling with R-INLA: A review

TL;DR: The large success of spatial modeling with R‐INLA and the types of spatial models that can be fitted are discussed, an overview of recent developments for areal models are given, and the stochastic partial differential equation approach is given and some of the ways it can be extended beyond the assumptions of isotropy and separability are described.
Posted Content

Validating Bayesian Inference Algorithms with Simulation-Based Calibration

TL;DR: It is argued that SBC is a critical part of a robust Bayesian workflow, as well as being a useful tool for those developing computational algorithms and statistical software.
Journal ArticleDOI

Situating Ecology as a Big-Data Science: Current Advances, Challenges, and Solutions

TL;DR: Ecology has joined a world of big data: data that exceed the analytical capacities of individuals or disciplines or the “Four Vs” axes of volume, variety, veracity, and velocity.
Journal ArticleDOI

Effectiveness of the measures to flatten the epidemic curve of COVID-19. The case of Spain.

TL;DR: The measures taken by the Spanish Government on March 14, 2020 to mitigate the epidemic curve of COVID-19 managed to flatten the curve and although they have not managed to enter the decrease phase, they are on the way to do so.
References
More filters
Journal ArticleDOI

Bayesian measures of model complexity and fit

TL;DR: In this paper, the authors consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined and derive a measure pD for the effective number in a model as the difference between the posterior mean of the deviances and the deviance at the posterior means of the parameters of interest, which is related to other information criteria and has an approximate decision theoretic justification.
Journal ArticleDOI

Strictly Proper Scoring Rules, Prediction, and Estimation

TL;DR: The theory of proper scoring rules on general probability spaces is reviewed and developed, and the intuitively appealing interval score is proposed as a utility function in interval estimation that addresses width as well as coverage.
Book ChapterDOI

On the Experimental Attainment of Optimum Conditions

TL;DR: The work described in this article is the result of a study extending over the past few years by a chemist and a statistician, which has come about mainly in answer to problems of determining optimum conditions in chemical investigations, but they believe that the methods will be of value in other fields where experimentation is sequential and the error fairly small.
Journal ArticleDOI

Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations

TL;DR: This work considers approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non‐Gaussian response variables and can directly compute very accurate approximations to the posterior marginals.
Related Papers (5)
Frequently Asked Questions (11)
Q1. What have the authors contributed in "Bayesian computing with inla: a review" ?

In this review, the authors will discuss the reasons for the success of the INLAapproach, the R-INLA package, why it is so accurate, why the approximations are very quick to compute and why LGMs make such a useful concept for Bayesian computing. 

The authors will now discuss this issue and their current plan to provide good sensible “ default ” priors. Besides others the authors plan to integrate automatic tests for prior sensitivity, following the work of Roos and Held ( 2011 ) ; Roos et al. ( 2015a ). The ability to incorporate prior knowledge in Bayesian statistics is a great tool and potentially very useful. The authors will argue for this through a simple example, showing what can go wrong, how they can think about the problem and how they can fix it. 

In addition, borrowing strength and smoothing can reduce the effect of the model dimension growing with n, since the effective dimension can then grow much more slowly with n. 

If the authors want to use 5 integration points in each dimension, the cost would be 5k to cover all combinations in k dimensions, which is 125 (k = 3) and 625 (k = 4). 

To create a separable space-time model, with an AR(1) dependency in time, the authors can specifyf(space, model=spde, group=time, control. 

If the authors rewind to the point where the authors replaced the linear effect with a smooth effect, the authors realise that the authors do this because the authors want a more flexible model than the linear effect, i.e. the authors also want to capture deviations from the linear effect. 

Another way to interpret the accuracy in computing posterior marginals using Laplace approximations, is to not look at the error-rate but at the implicit constant upfront. 

Consequences of measurement error forinference in cross-lagged panel design-the example of the reciprocal causal relationship between subjective health and socio-economic status. 

The authors deliberately wrote priors since it is common practice to define independent priors for each θj , while what the authors really should aim for is a joint prior for all θ, when appropriate. 

The default approach used now is outlined in Martins et al. (2013, Sec. 3.2), and involves correction of local skewness (in terms of difference in scale) and an integration-free method to approximate marginals from a skewness-corrected Gaussian. 

The given approach is restricted to the specific class of latent Gaussian models (LGMs) which, as will be clear soon, includes a wide variety of commonly applied statistical models making this restriction less limiting than it might appear at first sight.