scispace - formally typeset
Open AccessJournal ArticleDOI

Bayes in the sky: Bayesian inference and model selection in cosmology

Roberto Trotta
- 04 Jul 2008 - 
- Vol. 49, Iss: 2, pp 71-104
TLDR
This review is an introduction to Bayesian methods in cosmology and astrophysics and recent results in the field, and presents Bayesian probability theory and its conceptual underpinnings, Bayes' Theorem and the role of priors.
Abstract
The application of Bayesian methods in cosmology and astrophysics has flourished over the past decade, spurred by data sets of increasing size and complexity. In many respects, Bayesian methods have proven to be vastly superior to more traditional statistical tools, offering the advantage of higher efficiency and of a consistent conceptual basis for dealing with the problem of induction in the presence of uncertainty. This trend is likely to continue in the future, when the way we collect, manipulate and analyse observations and compare them with theoretical models will assume an even more central role in cosmology. This review is an introduction to Bayesian methods in cosmology and astrophysics and recent results in the field. I first present Bayesian probability theory and its conceptual underpinnings, Bayes' Theorem and the role of priors. I discuss the problem of parameter inference and its general solution, along with numerical techniques such as Monte Carlo Markov Chain methods. I then review the th...

read more

Content maybe subject to copyright    Report

arXiv:0803.4089v1 [astro-ph] 28 Mar 2008
Contemporary Physics, Vol. 00, No. 00, Month–Month 2006, 1–41
Bayes in the sky:
Bayesian inference and model selection in cosmology
Roberto Trotta
Oxford University, Astrophysics Department
Denys Wilkinson Building, Keble Rd, Oxford, OX1 3RH, UK
(March 28, 2008 )
The application of Bayesian methods in cosmology and astrophysics has flourished over the past decade, spurred
by data sets of increasing size and complexity. In many respects, Bayesian methods have proven to be vastly superior
to more traditional statistical tools, offering the advantage of higher efficiency and of a consistent conceptual basis
for dealing with the problem of induction in the presence of uncertainty. This trend is likely to continue in the future,
when the way we collect, manipulate and analyse observations and compare them with theoretical models will assume
an even more central role in cosmology.
This review is an introduction to Bayesian methods in cosmology and astrophysics and recent results in the field.
I first present Bayesian probability theory and its conceptual underpinnings, Bayes’ Theorem and the role of priors.
I discuss the problem of parameter inference and its general solution, along with numerical techniques such as Monte
Carlo Markov Chain methods. I then review the theory and application of Bayesian model comparison, discussing
the notions of Bayesian evidence and effective model complexity, and how to compute and interpret those quantities.
Recent developments in cosmological parameter extraction and Bayesian cosmological model building are summarized,
highlighting the challenges that lie ahead.
Keywords: Bayesian methods; model comparison; cosmology; parameter inference; data analysis; statistical methods.
1 Introduction
At first glance, it might appear surprising that a trivial mathematical result obtained by an obscure
minister over 200 hundred years ago ought still to excite so much interest across so many disciplines,
from econometrics to biostatistics, from financial risk analysis to cosmology. Published posthumously
thanks to Richard Price in 1763, “An essay towards solving a pr ob lem in the doctrine of chances” by the
rev. Thomas Bayes (1701(?)–1761) [1] had nothing in it that could herald the growing importance and
enormous domain of application that the subject of Bayesian probability theory would acquire more th an
two centuries afterwards. However, u pon reflection there is a very good reason why Bayesian methods
are undoub tedly on the rise in this particular historical epoch: the exponential increase in computational
power of the last few decades made massive numerical inf erence feasible for the first time, thus opening
the door to th e exploitation of the power and flexibility of a rich set of Bayesian tools. Thanks to fast and
cheap computing machines, previously un s olvable inference p roblems became tractable, and algorithms for
numerical simulation ourished almost overnight.
Historically, the connections between physics and Bayesian statistics have always been very strong.
Many ideas were developed because of related physical problems, and physicists made several distinguished
contributions. One has only to think of people like Laplace, Bernouilli, Gauss, Metropolis, Jeffreys, etc.
Cosmology is perhaps among the latest disciplines to have embraced Bayesian methods, a development
mainly driven by the data explosion of the last decade, as Figure 1 indicates. However, motivated by
difficult and computationally intensive inference problems, cosmologists are increasingly coming up with
new solutions that add to the richness of a growing Bayesian literature.
Email: rxt@astro.ox.ac.uk
Contemporary Physics
ISSN 0010-7514 print/ISSN 1366-5812 online
c
2006 Taylor & Francis
http://www.tandf.co.uk/journals
DOI: 10.108 0/0010751YYxxxxxxxx

2 Roberto Trotta
Some cosmologists are sceptic regarding the usefulness of employing more advanced statistical methods,
perhaps because they think with Mark Twain that there are “lies, damned lies and statistics”. One
argument that is often heard is that there is no point in bothering too much about refined statistical
analyses, as better data will in the future resolve the question one way or another, be it the nature of
dark energy or the initial conditions of the Universe. I strongly disagree with this view, and would instead
argue that sophisticated statistical tools will be increasingly central for modern cosmology. This opinion
is motivated by the following reasons:
(i) The complexity of the modelling of both our theories and observations will always increase, thus
requiring corresp on dingly more refined statistical and data analysis skills. In fact, the scientific return
of the next generation of surveys will be limited by the level of sophistication and efficiency of our
inference tools.
(ii) The discovery zone for new physics is when a potentially new effect is s een at the 3–4 σ level. This is
when tantalizing suggestion for an effect starts to accumulate but there is no firm evidence yet. In this
potential discovery region a careful application of statistics can make the difference between claiming
or missing a new discovery.
(iii) If you are a theoretician, you do not want to waste your time trying to explain an effect that is not
there in the first place. A better appreciation of the interpretation of statistical statements might help
in identifying r ob ust claims from spurious ones.
(iv) Limited resources mean that we need to focus our efforts on the most promising avenues. Experiment
forecast and optimization will increasingly become prominent as we need to use all of our current
knowledge (and the associated un certainty) to identify the observations and strategies that are likely
to give the highest scientific return in a given field.
(v) Sometimes there will be no better data! This is the case for the many problems associated with cosmic
variance limited measurements on large scales, for example in the cosmic background radiation, where
the small number of independent dir ections on the sky makes it impossible to reduce the error below
a certain level.
This review focuses on Bayesian methodologies and related issues, presenting some illustrative results
where appropriate and reviewing the current state–of–the art of Bayesian methods in cosmology. The
emphasis is on the innovative character of Bayesian tools. T he level is introductory, pitched for graduate
students who are approaching the field for the first time, aiming at bridging the gap between basic textbook
examples and application to current research. In the last sections we present some more advanced material
that we hope might be useful for the seasoned practitioner, too. A basic understanding of cosmology and
of the interplay between theory and cosmological observations (at the level of the intr oductory chapters
in [2]) is assumed. A full list of references is provided as a comprehensive guidance to relevant literature
across disciplines.
This paper is organized in two main parts. The first part, sections 2–4, focuses on probability th eory,
methodological issu es and Bayesian methods generally. In section 2 we present the fundamental distinction
between probability as frequency or as degree of belief, we introd uce Bayes’ Theorem and discuss the
meaning and role of priors in Bayesian theory. Section 3 is d evoted to Bayesian parameter inference and
related issues in parameter extraction. Section 4 deals with the topic of Bayesian model comparison from
a conceptual and technical point of view, covering Occam’s r azor principle, its practical implementation in
the form of the Bayesian evidence, the effective number of model parameters and information criteria for
approximate mo del comparison. The second part presents applications to cosmological parameter inference
and related topics (section 5) and to Bayesian cosmological model building (section 6), including multi–
model inference and mo del comparison forecasting. Section 7 gives our conclusions.
2 Bayesian probability theory
In this section we introduce the basic concepts and the notation we employ. Af ter a d iscussion of wh at
probability is, we turn to the central formula for Bayesian inference, namely Bayes theorem. The whole of

Bayes in the sky 3
Number of Bayesian papers in cosmology and astrophysics
0
5
10
15
20
25
30
’90
.
. .
. .
. .
. .
. .
. .
’91
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
’92
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
’93
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
’94
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
’95
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
’96
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
’97
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
’98
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
’99
. .
. .
. .
. .
. .
. .
. .
. .
. .
’00
. .
. .
. .
. .
. .
. .
. .
. .
. .
’01
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
’02
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
’03
. .
. .
. .
. .
. .
. .
. .
. .
. .
’04
. .
. .
. .
. .
. .
. .
’05
. .
. .
. .
. .
. .
. .
’06
. .
. .
. .
. .
. .
’07
. .
. .
. .
. .
. .
Year
All papers (incl. conference proceedings) Journal articles only
Figure 1. The evolution of the B –word: number of articles in astronomy and cosmology with Bayesian” in the title, as a function of
publication year. The number of papers employing one form or another of Bayesian methods is of course much larger than that. Up
until about 1995, Bayesian papers were concerned mostly with image reconstruction techniques, while in subsequent years the domain
of application grew to i nclude signal processing, parameter extraction, object detection, cosmological model building, decision theory
and experim ent optimization, and much more. It appears that interest in Bayesian statistics began gr owing around 2002 (source:
NASA/ADS).
Bayesian inference follows from this extremely simple cornerstone. We then pr esent some views about the
meaning of priors and their role in Bayesian theory, an issue w hich has always been (wrongly) considered
a weak point of Bayesian statistics.
There are many excellent textbooks on Bayesian statistics: the works by Sir Harold Jeffreys [3] and
Bruno de Finetti [4] are classics, while an excellent modern introduction with an extensive reading list
is given by [5]. A good textbook is [6]. Worth reading as a source of inspiration is the though–provoking
monograph by E .T . Jaynes [7]. Computational aspects are treated in [8], while MacKay [9] has a quite
unconventional but inspiring choice of topics with many useful exercices. Two very good textbooks on
the subject written by physicists are [10, 11]. A nice introdu ctory review aimed at physicists is [12] (see
also [13]). Tom Loredo has s ome masterfully written introductory material, to o [14, 15]. A good source
expanding on many of the topics covered here is Ref. [16].
2.1 What is probability?
2.1.1 Probability as frequency. The classical approach to statistics defines the probability of an event
as
“the number of times the event occurs over the total number of trials, in the limit of an infinite series of
equiprobable repetitions.”
This is the so–called frequentist school of thought. This definition of probability is however un s atisfactory
in many respects.
(i) Strikingly, this definition of probability in terms of relative frequency of outcomes is circular, i.e. it
assumes that repeated trials have the same pr obab ility of outcomes but it was the the very notion
of probability th at we were trying to define in the first place!
(ii) It cannot handle with unrepeatable situations, such as the probability that I will be overrun by a

4 Roberto Trotta
car when crossing the street, or, in the cosmological context, questions concerning the properties of
the observable Universe as a whole, of wh ich we have exactly one sample. In deed, perfectly legitimate
questions such as “what is the probability that it was raining in Oxford when William I was crowned?”
cannot even be formulated in classical statistics.
(iii) The definition only holds exactly for an infinite sequence of repetitions. In practice we always handle
with a finite number of measurements, sometimes with actually only a very small number of them. How
can we assess when “how many repetitions” are suffi cient? And what shall we do when we have only
a handful of repetitions? Frequentist statistics does not say, except sometimes devising complicated
ad-hockeries to correct for “small sample s ize” effects. In practice, physicists tend to forget about the
“infinite series” requirement and use this definitions an d the resu lts that go with it (for example, about
asymptotic distrib utions of test s tatistics) for whatever number of samples they happen to be working
with.
Another, more subtle aspects has to do with the notion of “randomn ess”. Restricting ourselves to classical
(non–chaotic) physical systems for now, let us consider the paradigmatic example of a series of coin tosses.
From an observed sequence of heads and tails we would like to come up w ith a statistical statement about
the fairness of the coin, which is deemed to be “fair” if the probability of getting heads is p
H
= 0.5.
At rst sight, it might appear plausible that the task is to determine whether the coin possesses some
physical property (for example, a tensor of intertia symmetric about th e plane of the coin) that will
ensure that the outcome is in different with respect to the interchange of heads and tails. As forcefully
argued by Jaynes [7], however, the probability of the outcome of a sequence of tosses has nothing to
do with the physical properties of the coin being tested! In fact, a skilled coin–tosser (or a purpose–built
machine, see [17]) can influence the outcome quite independently of whether the coin is well–balanced (i.e.,
symmetric) or heavily loaded. The key to the outcome is in fact the definition of random toss. In a loose,
intuitive fashion, we sense that a carefully controlled toss, say in w hich we are ab le to set quite precisely
the spin and speed of the coin, will spoil the randomness” of the experiment in fact, we might well call
it “cheating”. However, lacking a precise operational definition of what a “random toss” means, we cannot
meaningfully talk of the probability of getting heads as of a physical property of the coin itself. It appears
that the outcome depends on our state of knowledge about the initial conditions of the system (angular
momentum and velocity of th e toss): an lack of precise information about the initial conditions results in
a state of knowledge of indifference about the possible outcome with respect to the specification of heads
or tails. If however we insist on defin ing probability in terms of the outcome of random experiments, we
immediately get locked up in a circularity when we try to specify what “random” means. For example,
one could say that
“a random toss is one for which the sequence of heads and tails is compatible with assu m ing the hypothesis
p
H
= 0.5.
But the latter statement is exactly what we were trying to test in the first place by using a sequence of
random tosses! We are back to the problem of circular definition we highlighted above.
2.1.2 Probability as degree of belief. Many of the limitations above can be avoided and paradoxes
resolved by taking a Bayesian stance about probabilities. The Bayesian viewpoint is based on the simple
and intuitive tenet that
“probability is a measure of the degree of belief about a proposition”.
It is immediately clear that this definition of probability applies to any event, regardless whether we are
considering repeated experiments (e.g., what is the probability of obtaining 10 heads in as many tosses of
a coin?) or one–off situations (e.g., what is the probability that it will rain tomorrow?). Another advantage
is that it deals with uncertainty independently of its origin, i.e. there is no distinction between “statistical
uncertainty” coming from the finite precision of the measurement apparatus and the associated random
noise and “systematic uncertainty”, deriving from d eterministic effects that are only partially known
(e.g., calibration uncertainty of a detector). From the coin tossing example above we learn that it makes
go od sense to think of probability as a state of knowledge in presence of partial information and that

Bayes in the sky 5
“randomness” is really a consequence of our lack of information about the exact conditions of the s ystem
(if we knew the precise way the coin is flipped we could predict th e outcome of any toss with certainty.
The case of qu antum probabilities is discussed below). The rule for manipulating states of belief is given
by Bayes’ Theorem, which is introduced in Eq. (5) below.
It seems to us that the above arguments strongly favour the Bayesian v iew of probability (a more
detailed discussion can be found in [7, 14]). Ultimately, as physicists we might as well take the pragmatic
view that the approach th at yields demonstrably superior results ought to be preferred. In many real–life
cases, there are s everal good reasons to prefer a Bayesian viewpoint:
(i) Classic frequentist methods are often based on asymptotic properties of estimators. Only a handful of
cases exist that are simple enough to be amenable to analytic treatment (in physical problems one most
often encounters the Normal and the Poisson distribution). Often, methods based on such distributions
are employed not because they accurately describe the problem at hand, but because of the lack of
better tools. This can lead to serious mistakes. Bayesian inference is not concerned by such problems:
it can be shown that application of Bayes’ Theorem recovers frequentist results (in the long run) for
cases simple enough where such results exist, while remaining applicable to questions that cann ot even
be asked in a frequentist context.
(ii) Bayesian inference deals effortlessly with nuisance parameters. Those are parameters that have an in-
fluence on th e data but are of no interest for us. For example, a problem commonly encountered in
astrophysics is the estimation of a signal in the presence of a background rate (see [14, 18, 19]). Th e
particles of interest might be photons, neutrinos or cosmic rays. Measurements of the s ource s must
account for uncertainty in the background, described by a nuisance parameter b. The Bayesian p roce-
dure is straightforward: infer the joint probability of s and b and then integrate over the uninteresting
nuisance parameter b (“marginalization”, see Eq. (16)). Frequentist methods offer no simple way of
dealing with nu isance parameters (the very name derives from the difficu lty of accounting for them in
classical statistics). However neglecting nuisance parameters or fixing them to their best–fit value can
result in a very serious und erestimation of the uncertainty on the parameters of interest (see [20] for
an example involving galaxy evolution models).
(iii) In many situations prior information is highly relevant and omitting it would result in seriously wrong
inferences. The simplest case is wh en the parameters of interest have a physical meaning that restricts
their possible values: masses, count rates, power and light intensity are examples of quantities that
must be positive. Frequentist procedures based only on the likelihood can give best–fit estimates that
are negative, and h en ce meaningless, unless special care is taken (for example, constrained likelihood
methods). This often happens in the regime of small counts or low s ignal to noise. The use of Bayes’
Theorem ensures that relevant prior information is accounted for in the final inference and that phys-
ically meaningless results are weeded out from the beginning.
(iv) Bayesian statistics only deals with the data that were actually observed, while frequentist methods focu s
on the distribution of possible data that have not been obtained. As a consequence, frequentist results
can depend on what the experimenter thinks about the probability of data that have not been observed.
(this is called the “stopping rule” problem). This state of affairs is obviously absurd. Our inferences
should not depend on the probability of what could have happened but should be conditional on
whatever has actually occurred. This is built into Bayesian methods from the beginning since inferences
are by construction conditional on the observed data.
However one looks at the question, it is fair to say that the debate among statisticians is far from settled
(for a d iscus sion geared for physicists, see [21]). Louis Lyons neatly summarized the state of affairs by
saying that [22]
“Bayesians address the qu estion everyone is interested in by using assumptions no–one believes, while frequen-
tists use impeccable logic to deal with an issue of no interest to anyone”.

Figures
Citations
More filters
Journal ArticleDOI

Planck 2015 results. XX. Constraints on inflation

TL;DR: In this article, the authors report on the implications for cosmic inflation of the 2018 Release of the Planck CMB anisotropy measurements, which are fully consistent with the two previous Planck cosmological releases, but have smaller uncertainties thanks to improvements in the characterization of polarization at low and high multipoles.
Journal ArticleDOI

MultiNest: an efficient and robust Bayesian inference tool for cosmology and particle physics

TL;DR: The developments presented here lead to further improvements in sampling efficiency and robustness, as compared to the original algorit hm presented in Feroz & Hobson (2008), which itself significantly outperformed existi ng MCMC techniques in a wide range of astrophysical inference problems.
Journal ArticleDOI

Planck 2013 results. XXII. Constraints on inflation

Peter A. R. Ade, +324 more
TL;DR: In this article, the authors present the implications for cosmic inflation of the Planck measurements of the cosmic microwave background (CMB) anisotropies in both temperature and polarization based on the full Planck survey.
Journal ArticleDOI

Cosmology and Fundamental Physics with the Euclid Satellite

Luca Amendola, +81 more
TL;DR: Euclid is a European Space Agency medium-class mission selected for launch in 2020 within the cosmic vision 2015-2025 program as discussed by the authors, which will explore the expansion history of the universe and the evolution of cosmic structures by measuring shapes and red-shift of galaxies as well as the distribution of clusters of galaxies over a large fraction of the sky.
References
More filters
Journal ArticleDOI

A new look at the statistical model identification

TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.
Journal ArticleDOI

Estimating the Dimension of a Model

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Book

Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach

TL;DR: The second edition of this book is unique in that it focuses on methods for making formal statistical inference from all the models in an a priori set (Multi-Model Inference).

Estimating the dimension of a model

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Book

Bayesian Data Analysis

TL;DR: Detailed notes on Bayesian Computation Basics of Markov Chain Simulation, Regression Models, and Asymptotic Theorems are provided.
Related Papers (5)

Planck 2015 results. XIII. Cosmological parameters

Peter A. R. Ade, +260 more
Frequently Asked Questions (8)
Q1. What are the contributions in this paper?

This trend is likely to continue in the future, when the way the authors collect, manipulate and analyse observations and compare them with theoretical models will assume an even more central role in cosmology. This review is an introduction to Bayesian methods in cosmology and astrophysics and recent results in the field. 

There is little doubt that the field of cosmostatistics will grow in importance in the future, and Bayesian methods will have a great role to play. 

Markov Chain Monte Carlo techniques are nowadays a standard inference tool to derive parameter constraints, and many algorithms are available to explore the posterior pdf in a variety of settings. 

Thanks to the increasing availability of cheap computational power, it now becomes possible to handle problems that were of intractable complexity until a few years ago. 

Once trained, the network can then interpolate extremely fast between samples to deliver a complete Markov chain within a few minutes. 

When considering the capabilities of future experiments, it is common stance to predict their performance in terms of constraints on relevant parameters, assuming a fiducial point in parameter space as the true model (often, the current best–fit model). 

A technique based on the comparison of the Bayesian evidence for different data sets has been employed in [124], while Ref. [125] uses a technique similar in spirit to the hyperparameter approach outlined above to perform a binning of mutually inconsistent observations suffering from undetected systematics, as explained in [126]. 

At the time of writing (January 2008), the total density is known with an error of order 1%, and it is likely that this uncertainty will be reduced by another two orders of magnitude in the mid–term [91].