City, University of London Institutional Repository
Citation: Friston, K. J., FitzGerald, T., Rigoli, F., Schwartenbeck, P. and Pezzulo, G.
(2017). Active Inference: A Process Theory. Neural Computation, 29(1), pp. 1-49. doi:
10.1162/NECO_a_00912
This is the published version of the paper.
This version of the publication may differ from the final published
version.
Permanent repository link: https://openaccess.city.ac.uk/id/eprint/16683/
Link to published version: http://dx.doi.org/10.1162/NECO_a_00912
Copyright: City Research Online aims to make research outputs of City,
University of London available to a wider audience. Copyright and Moral
Rights remain with the author(s) and/or copyright holders. URLs from
City Research Online may be freely distributed and linked to.
Reuse: Copies of full items can be used for personal research or study,
educational, or not-for-profit purposes without prior permission or
charge. Provided that the authors, title and full bibliographic details are
credited, a hyperlink and/or URL is given for the original metadata page
and the content is not changed in any way.
City Research Online: http://openaccess.city.ac.uk/ publications@city.ac.uk
City Research Online
ARTICLE Communicated by Ollie Hulme
Active Inference: A Process Theory
Karl Friston
k.friston@ucl.ac.uk
Wellcome Trust Centre for Neuroimaging, UCL, London WC1N 3BG, U.K.
Thomas FitzGerald
thomas.fitzgerald@ucl.ac.uk
Wellcome Trust Centre for Neuroimaging, UCL, London WC1N 3BG, U.K.,
and Max Planck–UCL Centre for Computational Psychiatry and Ageing Research,
London WC1B 5BE, U.K.
Francesco Rigoli
f.rigoli@ucl.ac.uk
Wellcome Trust Centre for Neuroimaging, UCL, London WC1N 3BG, U.K.
Philipp Schwartenbeck
philipp.schwartenbeck.12@alumni.ucl.ac.uk
Wellcome Trust Centre for Neuroimaging, UCL, London WC1N 3BG, U.K.;
Max Planck–UCL Centre for Computational Psychiatry and Ageing Research,
London, WC1B 5BE, U.K.; Centre for Neurocognitive Research, University
of Salzburg, 5020 Salzburg, Austria; and Neuroscience Institute,
Christian-Doppler-Klinik, Paracelsus Medical University Salzburg,
A-5020 Salzburg, Austria
Giovanni Pezzulo
giovanni.pezzulo@gmail.com
Institute of Cognitive Sciences and Technologies, National Research Council,
00185 Rome, Italy
This article describes a process theory based on active inference and be-
lief propagation. Starting from the premise that all neuronal processing
(and action selection) can be explained by maximizing Bayesian model
evidence—or minimizing variational free energy—we ask whether neu-
ronal responses can be described as a gradient descent on variational free
energy. Using a standard (Markov decision process) generative model, we
derive the neuronal dynamics implicit in this description and reproduce
a remarkable range of well-characterized neuronal phenomena. These in-
clude repetition suppression, mismatch negativity, violation responses,
place-cell activity, phase precession, theta sequences, theta-gamma cou-
pling, evidence accumulation, race-to-bound dynamics, and transfer of
dopamine responses. Furthermore, the (approximately Bayes’ optimal)
Neural Computation 29, 1–49 (2017)
c
2016 Massachusetts Institute of Technology
doi:10.1162/NECO_a_00912
2K.Fristonetal.
behavior prescribed by these dynamics has a degree of face validity, pro-
viding a formal explanation for reward seeking, context learning, and
epistemic foraging. Technically, the fact that a gradient descent appears
to be a valid description of neuronal activity means that variational free
energy is a Lyapunov function for neuronal dynamics, which therefore
conform to Hamilton’s principle of least action.
1 Introduction
There has been a paradigm shift in the cognitive neurosciences over the
past decade toward the Bayesian brain and predictive coding (Ballard, Hin-
ton, & Sejnowski, 1983; Rao & Ballard, 1999; Knill & Pouget, 2004; Yuille &
Kersten, 2006; De Bruin & Michael, 2015). At the same time, there has been
a resurgence of enactivism; emphasizing the embodied aspect of percep-
tion (O’Regan & No
¨
e, 2001; Friston, Mattout, & Kilner, 2011; Ballard, Kit,
Rothkopf, & Sullivan, 2013; Clark, 2013; Seth, 2013; Barrett & Simmons, 2015;
Pezzulo, Rigoli, & Friston, 2015). Even in consciousness research and phi-
losophy, related ideas are finding traction (Clark, 2013; Hohwy, 2013, 2014).
Many of these developments have informed (and have been informed by) a
variational principle of least free energy (Friston, Kilner, & Harrison, 2006;
Friston, 2012), namely, active (Bayesian) inference.
However, the enthusiasm for Bayesian theories of brain function is ac-
companied by an understandable skepticism about their usefulness, par-
ticularly in furnishing testable process theories (Bowers & Davis, 2012).
Indeed, one could argue that many current normative theories fail to pro-
vide detailed and physiologically plausible predictions about the processes
that might implement them. And when they do, their connection with
a normative or variational principle is often obscure. In this work, we
show that process theories can be derived in a relatively straightforward
way from variational principles. The level of detail we consider is fairly
coarse; however, the explanatory scope of the resulting process theory is
remarkable—and provides an integrative (and simplifying) perspective on
many phenomena that are studied in systems neuroscience. The aim of this
article is to describe the basic ideas and illustrate the emergent processes
using simulations of neuronal responses. We anticipate revisiting some is-
sues in depth: in particular, a companion paper focuses on learning and
the emergence of habits as a natural consequence of observing one’s own
behavior (Friston et al., 2016).
This article has three sections. The first describes active inference, com-
bining earlier formulations of planning as inference (Botvinick & Tous-
saint, 2012; Friston et al., 2014) with Bayesian model averaging (FitzGerald,
Dolan, & Friston, 2014) and learning (FitzGerald, Dolan, & Friston, 2015).
Importantly, action (i.e., policy selection), perception (i.e., state estimation),
and learning (i.e., reinforcement learning) all minimize the same quantity:
variational free energy. This refinement of previous schemes considers an
explicit representation of past and future states, conditioned on competing
Active Inference 3
policies. This leads to Bayesian belief updates that are informed by beliefs
about the future (prediction) and context learning that is informed by beliefs
about the past (postdiction). Technically, these updates implement a form
of Bayesian smoothing, with explicit representations of states over time,
which include future (i.e., counterfactual) states. Furthermore, the implicit
variational updates have some biological plausibility in the sense that they
eschew neuronally implausible computations. For example, expectations
about future states are sigmoid functions of linear mixtures of the pre-
ceding and subsequent states. An alternative parameterization, which did
not appeal to explicit representations over time, would require recursive
matrix multiplication, for which no neuronally plausible implementation
has been proposed. Under this belief parameterization, learning is medi-
ated by classical associative (synaptic) plasticity. The remaining sections
use simulations of foraging in a radial maze to illustrate some key aspects
of inference and learning, respectively.
The inference section describes the behavioral and neuronal correlates
of belief updating during inference or planning, with an emphasis on elec-
trophysiological correlates and the encoding of precision by dopamine. It
illustrates a number of phenomena that are ubiquitous in empirical stud-
ies. These include repetition suppression (de Gardelle, Waszczuk, Egner,
& Summerfield, 2013), violation and omission responses (Bendixen, San-
Miguel, & Schroger, 2012), and neuronal responses that are characteris-
tic of the hippocampus, namely, place cell activity (Moser, Rowland, &
Moser, 2015), theta-gamma coupling, theta sequences and phase precession
(Burgess, Barry, & O’Keefe, 2007; Lisman & Redish, 2009). We also touch on
dynamics seen in parietal and prefrontal cortex, such as evidence accumula-
tion and race-to-bound or threshold (Huk & Shadlen, 2005, Gold & Shadlen,
2007; Hunt et al., 2012; Solway & Botvinick, 2012; de Lafuente, Jazayeri, &
Shadlen, 2015; FitzGerald, Moran, Friston, & Dolan, 2015; Latimer, Yates,
Meister, Huk, & Pillow, 2015).
The final section considers context learning and illustrates the transfer
of dopamine responses to conditioned stimuli, as agents become familiar
with experimental contingencies (Fiorillo, Tobler, & Schultz, 2003). We con-
clude with a brief demonstration of epistemic foraging. The aim of these
simulations is to illustrate how all of the phenomena emerge from a sin-
gle imperative (to minimize free energy) and how they contextualize each
other.
2 Active Inference and Learning
This section provides a brief overview of active inference that builds
on our previous treatments of Markov decision processes. Specifically, it
introduces a parameterization of posterior beliefs about the past and future
that makes state estimation (i.e., belief updating) biologically plausible.
(A slightly fuller version of this material can be found in Friston et al.,
2016.) Active inference is based on the premise that everything minimizes
4K.Fristonetal.
variational free energy (Friston, 2013). This leads to some surprisingly sim-
ple update rules for action, perception, policy selection, learning, and the
encoding of uncertainty or its complement, precision. Although some of
the intervening formalism looks complicated, what comes out at the end
are update rules that will be familiar to many readers (e.g., integrate-and-
fire dynamics with sigmoid activation functions and plasticity with asso-
ciative and decay terms). This means that the underlying theory can be
tied to neuronal processes in a fairly straightforward way. Furthermore, the
formalism accommodates a number of established normative approaches,
thereby providing an integrative framework.
In principle, the scheme described in this section can be applied to any
paradigm or choice behavior. Indeed, earlier versions have been used to
model waiting games (Friston et al., 2013), the urn task and evidence accu-
mulation (FitzGerald, Schwartenbeck, Moutoussis, Dolan, & Friston, 2015),
trust games from behavioral economics (Moutoussis, Trujillo-Barreto, El-
Deredy, Dolan, & Friston, 2014; Schwartenbeck, FitzGerald, Mathys, Dolan,
Kronbichler et al., 2015), addictive behavior (Schwartenbeck, FitzGerald,
Mathys, Dolan, Wurst et al., 2015), two-step maze tasks (Friston, Rigoli
et al., 2015), and engineering benchmarks such as the mountain car prob-
lem (Friston, Adams, & Montague, 2012). It has also been used in the setting
of computational fMRI (Schwartenbeck, FitzGerald, Mathys, Dolan, & Fris-
ton, 2015).
In brief, active inference separates the problems of optimizing action
and perception by assuming that action fulfills predictions based on in-
ferred states of the world. Optimal predictions are therefore based on (sen-
sory) evidence that is evaluated using a generative model of (observed)
outcomes. This allows one to frame behavior as fulfilling optimistic pre-
dictions, where the optimism is prescribed by prior preferences or goals
(Friston et al., 2014). In other words, action realizes predictions that are
biased toward preferred outcomes. More specifically, the generative model
entails beliefs about future states and policies, where policies that lead to
preferred outcomes are more likely. This enables action to realize the next
(proximal) outcome predicted by the policy that leads to (distal) goals. This
behavior emerges when action and inference maximize the evidence or
marginal likelihood of the model generating predictions. Note that action
is prescribed by predictions of the next outcome and is not itself part of
the inference process. This separation of action and perceptual inference or
state estimation can be understood by associating action with peripheral
reflexes in the motor system that fulfill top-down motor predictions about
how we move (Feldman, 2009; Adams, Shipp, & Friston, 2013).
The models considered in this article include states of the world in
the past and the future. This enables agents to select policies that will
maximize model evidence in the future by minimizing expected free en-
ergy. Furthermore, it enables learning about contingencies based on state
transitions that are inferred retrospectively. We will see that this leads to a
Bayes-optimal arbitration between epistemic (explorative) and pragmatic