scispace - formally typeset
Open AccessJournal ArticleDOI

Smoothing algorithms for state–space models

Reads0
Chats0
TLDR
A generalised two-filter smoothing formula is proposed which only requires approximating probability distributions and applies to any state–space model, removing the need to make restrictive assumptions used in previous approaches to this problem.
Abstract
Two-filter smoothing is a principled approach for performing optimal smoothing in non-linear non-Gaussian state-space models where the smoothing dis- tributions are computed through the combination of 'forward' and 'backward' time filters. The 'forward' filter is the standard Bayesian filter but the 'backward' filter, generally referred to as the backward information filter, is not a probability measure on the space of the hidden Markov process. In cases where the backward information filter can be computed in closed form, this technical point is not important. However, forgeneralstate-spacemodelswherethereisnoclosedformexpression,thisprohibits the use of flexible numerical techniques such as Sequential Monte Carlo (SMC) to approximate the two-filter smoothing formula. We propose here a generalised two- filter smoothing formula which only requires approximating probability distributions and applies to any state-space model, removing the need to make restrictive assump- tions used in previous approaches to this problem. SMC algorithms are developed to implement this generalised recursion and we illustrate their performance on various problems.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, 200X 1
Smoothing Algorithms for State-Space Models
Mark Briers, Arnaud Doucet, and Simon Maskell
Abstract
A prevalent problem in statistical signal processing, applied statistics, and time series analysis is the calculation of the smoothed
posterior distribution, which describes the uncertainty associated with a state, or a sequence of states, conditional on data from the
past, the present, and the future. The aim of this paper is to provide a rigorous foundation for the calculation, or approximation, of
such smoothed distributions, to facilitate a robust and efficient implementation. Through a cohesive and generic exposition of the
scientific literature we offer several novel extensions such that one can perform smoothing in the most general case. Experimental
results for: a Jump Markov Linear System; a comparison of particle smoothing methods; and parameter estimation using a particle
implementation of the EM algorithm, are provided.
Index Terms
State space smoothing, Hidden Markov Model, Kalman filter, Kalman smoother, Jump Markov Linear System, Particle filter,
Particle smoother, Parameter estimation.
I. INTRODUCTION
One is often interested in quantifying the uncertainty associated with an unobserved variable, X, given some information
pertaining to that variable, Y . In a sequential context, this relates to the calculation of the posterior density,
1
p(x
1:T
|y
1:T
),
where x
1:T
= {x
1
, . . . , x
T
}, y
1:T
= {y
1
, . . . , y
T
}, are generic points in path space of the signal and observation processes,
and the discrete time index t N
+
. To facilitate sequential estimation, a first order Markovian state space model is often
assumed[46]. Furthermore, the observations are assumed to be conditionally independent given the state X
t
= x
t
. One thus
has mathematical expressions for the state transition and observation densities, as follows:
X
t
|X
t1
= x
t1
f(·|x
t1
); (1)
Y
t
|X
t
= x
t
g(·|x
t
), (2)
with an appropriately defined prior density, X
1
µ (·).
For many state-space applications (e.g. tracking[4]) it is more convenient to compute the marginal posterior distribution of
the state at a particular time instance conditional on a sequence of data, p(x
t
|y
1:l
). If l < t then this process is known as
prediction; if l = t then it is commonly referred to as filtering; and if l > t then one is conducting the process of smoothing.
The main objective of this article is to present a generic exposition of the vast array of literature available in the scientific
research community that attempt to solve the ‘smoothing’ problem. Moreover, we are able to offer several novel improvement
strategies for the techniques available by identifying and solving problems which were previously ignored. We illustrate these
novel algorithms on several examples.
The smoothing problem is commonly segmented into three problems: fixed-interval smoothing, where one is interested in
calculating p(x
t
|y
1:T
) for all time indices t = 1, . . . , T ; fixed lag smoothing, where one calculates p(x
t
|y
1:t+L
) where L > 0
is some fixed value (this density is calculated on-line); and fixed point smoothing, where p(x
t
|y
1:D
) is calculated for a fixed
value t with D > t increasing.
The most common application of smoothing is the (off-line) fixed-interval smoothing algorithm. The authors assert that it
is possible to employ a single smoothing scheme, based on fixed-interval smoothing, to solve all three problems. Details
of this proposition are now provided.
Several schemes have been suggested to solve the fixed-lag smoothing problem, with a favoured technique being the
augmentation of the states over the lag L. In general, however, one then has a computational cost which is exponential
in the dimension of the state space, n
x
. A more suitable technique would be to store the latest filtering distribution
p(x
T 1
|y
1:T 1
), and calculate the new filtering distribution p(x
T
|y
1:T
) on the receipt of a new measurement. One is then
able to employ a (traditionally considered) fixed-interval smoothing algorithm to calculate p(x
T L
|y
1:T
). This gives rise
to a computational cost which is linear in the length of the lag.
A similar situation occurs with fixed point smoothing, in that it is possible to calculate a distribution p(x
t
|y
1:D
) on
the receipt of a new measurement by calculating a new filtering distribution at the latest time instance and stepping
Manuscript received XX; revised XX.
M. Briers is with the Advanced Signal and Information Processing group, QinetiQ Ltd, Malvern, UK, and Cambridge University Engineering Department,
Cambridge, UK, Email: m.briers@signal.qinetiq.com, or mb511@eng.cam.ac.uk
A. Doucet is with Cambridge University Engineering Department, Cambridge, UK, Email: ad2@eng.cam.ac.uk
S. Maskell is with the Advanced Signal and Information Processing group, QinetiQ Ltd, Malvern, UK, Email: s.maskell@signal.qinetiq.com
1
Appropriate nomenclature and notation can be substituted for the discrete analogue.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, 200X 2
back through the lag. However, one can calculate a revised smoothed distribution without the need for recursing through
the intermediate history of the states, using the same fixed-interval type smoothing algorithm repository by calculating
a distribution over the joint state (X
t
, X
D
). Note that while this method is computationally attractive and allows the
methodology to be presented in a generic framework, particle based methods for fixed point smoothing would exhibit
degeneracy as the number of data increased[11]. However, in most cases, the fixed-lag smoothing distribution converges
exponentially fast to a fixed distribution so it is possible to stop including new data.
It is noted that one is also often interested in calculating the joint smoothing distribution, particularly in the fixed-interval
(p(x
1:T
|y
1:T
)) and fixed-lag (p(x
T L:T
|y
1:T
)) scenarios, and we are able to perform such calculations by exploiting the
sequential nature of the problem, which is discussed in the sequel.
The format of the paper is as follows: the following section identifies the filtering and smoothing recursions for general state
space models and suggests a rigorous framework on which one can base smoothing algorithms. The subsequent five sections
then discuss the algorithmic implementation of such recursions under differing scenarios: finite state space hidden Markov
models, linear Gaussian state space models, jump Markov linear systems, analytic approximations for general state models,
and sequential Monte Carlo approximations for general state space models, respectively. Simulation results are presented in
Section VIII, with conclusions drawn in Section IX.
II. FILTERING AND SMOOTHING FOR GENERAL STATE SPACE MODELS
The derivations of the recursions used in the filtering, forward-backward smoothing, and two-filter smoothing procedures for
general state space models are now described. Note that by replacing the integrals with summations, and the notation used to
denote a probability density function with an appropriate term to denote a probability mass function, one arrives at expressions
for both the continuous and discrete state space case.
A. Filtering
Filtering is the term used to describe the process of recursively calculating the marginal posterior distribution, and is a pre-
requisite in all of the algorithms discussed herein. We therefore summarise the filtering procedure in each of the subsequent
sections. The filtering recursion can be derived as follows:
p(x
t
|y
1:t
) =
p(y
t
|x
t
, y
1:t1
)p(x
t
|y
1:t1
)
p(y
t
|y
1:t1
)
g(y
t
|x
t
)p(x
t
|y
1:t1
), (3)
where:
p(x
t
|y
1:t1
) =
Z
f(x
t
|x
t1
)p(x
t1
|y
1:t1
)dx
t1
.
One can see that this algorithm is recursive in nature; equation (3) relies solely upon the state transition and observation
densities, and the posterior density from the previous time instance.
B. Forward-backward smoothing
It is possible to deduce the ‘standard’ forward-backward recursive expression for the marginal smoothed posterior distribution,
p(x
t
|y
1:T
), as follows[1]:
p(x
t
|y
1:T
) =
Z
p(x
t
, x
t+1
|y
1:T
)dx
t+1
=
Z
p(x
t+1
|y
1:T
)p(x
t
|x
t+1
, y
1:T
)dx
t+1
=
Z
p(x
t+1
|y
1:T
)p(x
t
|x
t+1
, y
1:t
)dx
t+1
= p(x
t
|y
1:t
)
Z
p(x
t+1
|y
1:T
)f(x
t+1
|x
t
)
p(x
t+1
|y
1:t
)
dx
t+1
. (4)
It is thus possible to compute the filtered and predicted distributions in a forward (filtering) recursion of the algorithm (by
calculating p(x
t
|y
1:t
)), and then execute a backward recursion with each smoothed distribution (p(x
t
|y
1:T
)) relying upon the
quantities calculated and the previous (in reverse time) smoothed distribution (p(x
t+1
|y
1:T
)).

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, 200X 3
C. Two-filter smoothing
One can compute the marginal smoothed posterior distribution by combining the output of two (independent) filters, one
that recurses in the forwards time direction and calculates p(x
t
|y
1:t1
), the other in the backward time direction calculating
p(y
t:T
|x
t
), to give the required distribution p(x
t
|y
1:T
)[6], [31], [32]. This is constructed as follows:
p(x
t
|y
1:T
) = p(x
t
|y
1:t1
, y
t:T
)
=
p(x
t
|y
1:t1
)p(y
t:T
|y
1:t1
, x
t
)
p(y
t:T
|y
1:t1
)
p(x
t
|y
1:t1
)p(y
t:T
|x
t
). (5)
p(x
t
|y
1:t
)p(y
t+1:T
|x
t
)
This form of smoothing reduces the complexity of the smoothing procedure for certain modelling assumptions.
A Backward Information Filter[37] can be used to calculate p(y
t:T
|x
t
) sequentially from p(y
t+1:T
|x
t+1
):
p(y
t:T
|x
t
) =
Z
p(y
t
, y
t+1:T
, x
t+1
|x
t
)dx
t+1
=
Z
p(y
t+1:T
|x
t+1
)f(x
t+1
|x
t
)g(y
t
|x
t
)dx
t+1
. (6)
Note that p(y
t:T
|x
t
) is not a probability density function (pdf) in argument x
t
and thus its integral over x
t
might not be finite.
However, it is often more convenient in practice to propagate a pdf. Moreover, particle based methods can only be used to
approximate finite measures. To cater for these cases, and thus ensure that p(y
t:T
|x
t
) is integrable over x
t
, we introduce an
artificial prior distribution over x
t
denoted γ
t
(x
t
). This concept is now described.
Given the initial condition:
ep(x
T
|y
T
) =
g(y
T
|x
T
)γ
T
(x
T
)
p(y
T
)
and defining the sequence of probability distributions:
ep(x
t:T
|y
t:T
) γ
t
(x
t
)
T
Y
i=t+1
f(x
i
|x
i1
)
T
Y
i=t
g(y
i
|x
i
),
where t < T , one can see that:
p(y
t:T
|x
t
) =
Z
· · ·
Z
p(y
t:T
, x
t+1:T
|x
t
)dx
t+1:T
=
Z
· · ·
Z
p(x
t+1:T
|x
t
)p(y
t:T
|x
t:T
)dx
t+1:T
=
Z
· · ·
Z
T
Y
i=t+1
f(x
i
|x
i1
)
T
Y
i=t
g(y
i
|x
i
)dx
t+1:T
=
Z
· · ·
Z
γ
t
(x
t
)
γ
t
(x
t
)
T
Y
i=t+1
f(x
i
|x
i1
)
T
Y
i=t
g(y
i
|x
i
)dx
t+1:T
Z
· · ·
Z
ep(x
t:T
|y
t:T
)
γ
t
(x
t
)
dx
t+1:T
=
ep(x
t
|y
t:T
)
γ
t
(x
t
)
. (7)
This derivation allows one to construct a filtering-like recursion to determine the information quantity, whilst ensuring that one
propagates finite measures:
p(y
t:T
|x
t
) =
Z
p(y
t+1:T
|x
t+1
)f(x
t+1
|x
t
)g(y
t
|x
t
)dx
t+1
Z
ep(x
t+1
|y
t+1:T
)
γ
t+1
(x
t+1
)
f(x
t+1
|x
t
)g(y
t
|x
t
)dx
t+1
=
g(y
t
|x
t
)
γ
t
(x
t
)
Z
ep(x
t+1
|y
t+1:T
)
f(x
t+1
|x
t
)γ
t
(x
t
)
γ
t+1
(x
t+1
)
dx
t+1

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, 200X 4
1) Prediction: The “prediction” stage can be defined as follows:
ep(x
t
|y
t+1:T
) ,
Z
ep(x
t+1
|y
t+1:T
)
f(x
t+1
|x
t
)γ
t
(x
t
)
γ
t+1
(x
t+1
)
dx
t+1
. (8)
Note that (8) is only a probability measure if one defines the artificial prior as follows:
γ
t
(x
t
) ,
γ
1
(x
1
) for t = 1;
R
· · ·
R
γ
1
(x
1
)
Q
t1
t
0
=1
f(x
t
0
+1
|x
t
0
)dx
1:t1
= p(x
t
) for t 2;
(9)
which gives the (generally time inhomogeneous) backward Markov kernel:
γ(x
t
|x
t+1
) =
f(x
t+1
|x
t
)γ
t
(x
t
)
γ
t+1
(x
t+1
)
where on substitution:
ep(x
t
|y
t+1:T
) =
Z
ep(x
t+1
|y
t+1:T
)γ(x
t
|x
t+1
)dx
t+1
.
Note that the prior distribution γ
1
(x
1
) appearing in (9) is an arbitrary prior distribution and not necessarily that used in the
filtering recursion. There are certain circumstances where employing this formulation γ
1
(x
1
) 6= µ(x
1
) is useful. For example,
if the Markov kernel f(x
t+1
|x
t
) admits an invariant distribution π(x) it is convenient to define γ
t
(x
t
) = π (x), even if
µ(x
1
) 6= π(x), as then γ
t
(x
t
) = π(x) for all t 2. Moreover, γ(x
t
|x
t+1
) = f(x
t
|x
t+1
) if f is π-reversible. If one takes the
actual (filtering) prior distribution µ(x
1
) then one uses the abusive notation p(x
t
|x
t+1
) = γ(x
t
|x
t+1
). The use of such a prior
distribution was (implicitly) used in references [6], [22], [35].
The calculation of (9) is not generally analytically tractable when one uses the prior µ(·). That is, in the general case the
backward Markov kernel p(x
t
|x
t+1
) may not be known or even analytic and one necessarily has to propagate a finite measure,
and so the additional degree of freedom provided by (8) is important in applications; see experimental results.
2) Update: The “update” stage is simply taken to be:
ep(x
t
|y
t:T
) =
g(y
t
|x
t
)ep(x
t
|y
t+1:T
)
R
g(y
t
|x
0
t
)ep(x
0
t
|y
t+1:T
)dx
0
t
, (10)
which has been renormalised to be a probability measure.
One can therefore see that:
p(x
t
|y
1:T
) p(x
t
|y
1:t1
)p(y
t:T
|x
t
)
p(x
t
|y
1:t1
)ep(x
t
|y
t:T
)
γ
t
(x
t
)
. (11)
Since the artificial prior distribution is introduced in (8), it needs to be removed when calculating the smoothing distribution,
hence the factor γ
1
t
(x
t
).
For clarity, we will refer to (5) as an independent two-filter smoother, and to (11) as an artificial two-filter smoother.
3) Propagation of a probability measure: It is often algorithmically appealing to be able to propagate a probability measure,
for example in the unscented information filter (as will be discussed later in the paper), rather than a finite measure as one would
by the (general) use of equation (8). As stated, the calculation of (9) is not analytically tractable, could be computationally
infeasible, or an approximation could be insufficient. Thus, it is convenient to define the joint distribution:
ep(x
t
, x
t+1
|y
t:T
) ep(x
t
, x
t+1
|y
t+1:T
)eq(x
t
|x
t+1
, y
t
)
g(y
t
|x
t
)f(x
t+1
|x
t
)γ
t
(x
t
)
γ
t+1
(x
t+1
)eq(x
t
|x
t+1
, y
t
)
, (12)
such that one is able to directly compute p(y
t:T
|x
t
, x
t+1
) in an analogous manner to that done in (10). Further details will be
provided on the use of (12) when deemed appropriate later in the paper.
D. Joint Distribution
As mentioned, it is possible to capitalise on the sequential nature of the problem when one constructs the joint distribution,
by considering the following factorisation[15]:
p(x
1:T
|y
1:T
) = p(x
T
|y
1:T
)
T 1
Y
t=1
p(x
t
|x
t+1:T
, y
1:T
). (13)

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, 200X 5
It is possible to exploit the modelling assumptions (1)-(2) to yield the following simple result:
p(x
t
|x
t+1:T
, y
1:T
) = p(x
t
|x
t+1
, y
1:t
)
=
p(x
t
|y
1:t
)f(x
t+1
|x
t
)
p(x
t+1
|y
1:t
)
p(x
t
|y
1:t
)f(x
t+1
|x
t
).
It is thus possible to calculate p(x
1:T
|y
1:T
) based solely on p(x
t
|y
1:t
) and the state transition density f (x
t+1
|x
t
), which are
required for all values of t.
The authors note that alternative factorisations exist[5], which can be constructed as the symmetrical relationship of (13)
when one uses an artificial prior distribution defined by (9), as follows:
p(x
1:T
|y
1:T
) = p(x
1
|y
1:T
)
T
Y
t=2
p(x
t
|x
t1
, y
1:T
),
where
p(x
t
|x
t1
, y
1:T
) = p(x
t
|x
t1
, y
t:T
)
=
p(y
t:T
|x
t
)f(x
t+1
|x
t
)
p(y
t:T
|x
t1
)
ep(x
t
|y
t:T
)γ(x
t
|x
t+1
)
ep(x
t1
|y
t:T
)
,
where γ(x
t
|x
t+1
) is the backward Markov kernel if γ
t
(x
t
) is defined through (9). These alternative factorisations, however,
do not seem to bring any advantage over (13).
III. FINITE STATE SPACE HIDDEN MARKOV MODEL
Define X
t
as a discrete-time, time-homogenous, n
x
-state, first-order Markov chain. By using a discrete analogue to the
previous section, one is able to construct the following algorithmic arguments.
A. Filtering
Let α
t1
(x
t1
) , p(X
t1
= x
t1
|y
1:t1
). The filtering recursion based on the algorithm described by Rabiner[42] is given
as follows:
p(X
t
= x
t
|y
1:t1
) =
X
x
t1
f(X
t
= x
t
|X
t1
= x
t1
)α
t1
(x
t1
)
α
t
(x
t
) =
p(X
t
= x
t
|y
1:t1
)g(y
t
|X
t
= x
t
)
P
x
0
t
p(X
t
= x
0
t
|y
1:t1
)g(y
t
|X
t
= x
0
t
)
with suitably defined initial conditions µ(X
1
= x
1
), x
1
S = {1, . . . , n
x
}.
B. Forward-backward smoothing
A solution that enables one to conduct two passes through the data, the first (forward) pass calculating α
t
(j) and the second
(backward) pass calculating the required quantity, p(X
t
= x
t
|y
1:T
), is given as:
p(X
t
= x
t
|y
1:T
) = α
t
(x
t
)
X
x
t+1
p(X
t+1
= x
t+1
|y
1:T
)f(X
t+1
= x
t+1
|X
t
= x
t
)
P
x
0
t
α
t
(x
0
t
)f(X
t+1
= x
t+1
|X
t
= x
0
t
)
.
Surprisingly, this formula is not used very often, not least because it avoids the need to rescale the results to prevent numerical
instabilities of the independent smoother which is used routinely in this context.

Citations
More filters
Book

Machine Learning : A Probabilistic Perspective

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Book Chapter

A Tutorial on Particle Filtering and Smoothing: Fifteen years later

TL;DR: A complete, up-to-date survey of particle filtering methods as of 2008, including basic and advanced particle methods for filtering as well as smoothing.

Bayesian Filtering and Smoothing

Simo Särkkä
TL;DR: This compact, informal introduction for graduate students and advanced undergraduates presents the current state-of-the-art filtering and smoothing methods in a unified Bayesian framework and learns what non-linear Kalman filters and particle filters are, how they are related, and their relative advantages and disadvantages.

贝叶斯滤波与平滑 (Bayesian filtering and smoothing)

Simo Särkkä
TL;DR: This compact, informal introduction for graduate students and advanced undergraduates presents the current state-of-the-art filtering and smoothing methods in a unified Bayesian framework and learns what non-linear Kalman filters and particle filters are, how they are related, and their relative advantages and disadvantages.
Journal ArticleDOI

An Overview of Existing Methods and Recent Advances in Sequential Monte Carlo

TL;DR: This paper is intended to serve both as an introduction to SMC algorithms for nonspecialists and as a reference to recent contributions in domains where the techniques are still under significant development, including smoothing, estimation of fixed parameters and use of SMC methods beyond the standard filtering contexts.
References
More filters
Journal ArticleDOI

A tutorial on hidden Markov models and selected applications in speech recognition

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Journal ArticleDOI

A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking

TL;DR: Both optimal and suboptimal Bayesian algorithms for nonlinear/non-Gaussian tracking problems, with a focus on particle filters are reviewed.
Journal ArticleDOI

Novel approach to nonlinear/non-Gaussian Bayesian state estimation

TL;DR: An algorithm, the bootstrap filter, is proposed for implementing recursive Bayesian filters, represented as a set of random samples, which are updated and propagated by the algorithm.
Related Papers (5)