scispace - formally typeset
Open AccessBook ChapterDOI

A Bayesian Approach to Model Checking Biological Systems

TLDR
This work presents the first algorithm for performing statistical Model Checking using Bayesian Sequential Hypothesis Testing, and shows that this Bayesian approach outperforms current statistical Model checking techniques, which rely on tests from Classical statistics, by requiring fewer system simulations.
Abstract
Recently, there has been considerable interest in the use of Model Checking for Systems Biology. Unfortunately, the state space of stochastic biological models is often too large for classical Model Checking techniques. For these models, a statistical approach to Model Checking has been shown to be an effective alternative. Extending our earlier work, we present the first algorithm for performing statistical Model Checking using Bayesian Sequential Hypothesis Testing. We show that our Bayesian approach outperforms current statistical Model Checking techniques, which rely on tests from Classical (aka Frequentist) statistics, by requiring fewer system simulations. Another advantage of our approach is the ability to incorporate prior Biological knowledge about the model being verified. We demonstrate our algorithm on a variety of models from the Systems Biology literature and show that it enables faster verification than state-of-the-art techniques, even when no prior knowledge is available.

read more

Content maybe subject to copyright    Report

A Bayesian Approach to
Model Checking Biological Systems
Sumit K. Jha
1
, Edmund M. Clarke
1
, Christopher J. Langmead
1,2
,
Axel Legay
3
, Andr´e Platzer
1
, and Paolo Zuliani
1
1
Computer Science Department, Carnegie Mellon University, USA
2
Lane Center for Computational Biology, Carnegie Mellon University, USA
3
Institut d’Informatique INRIA, Rennes, France
Abstract. Recently, there has been considerable interest in the use of
Model Checking for Systems Biology. Unfortunately, the state space of
stochastic biological models is often too large for classical Model Check-
ing techniques. For these models, a statistical approach to Model Check-
ing has been shown to be an effective alternative. Extending our earlier
work, we present the first algorithm for performing statistical Model
Checking using Bayesian Sequential Hypothesis Testing. We show that
our Bayesian approach outperforms current statistical Model Checking
techniques, which rely on tests from Classical (aka Frequentist) statis-
tics, by requiring fewer system simulations. Another advantage of our
approach is the ability to incorporate prior Biological knowledge about
the model being verified. We demonstrate our algorithm on a variety
of models from the Systems Biology literature and show that it enables
faster verification than state-of-the-art techniques, even when no prior
knowledge is available.
1 Introduct ion
Computational models are increasingly used in the field of Systems Biology to
examine the dynamics of biological processes (e.g., [8,10,20,30,34,37]). By ‘com-
putational’, we mean discrete-variable and continuous or discrete-time models
[4], where the components of the system interact and evolve by obeying a set
of instructions or rules. In contrast to differential equation-based models, which
are also widely used in Systems Biology, computational models can provide in-
sights into the role of stochastic effects over discrete-populations of molecules or
cells. Recently, there has been considerable interest in the application of Model
This research was sponsored by the GSRC (University of California) under contract
no. SA423679952, National Science Foundation under contracts no. CCF0429120,
no. CNS0411152, and no. CCF0541245, Semiconductor Research Corporation un-
der contract no. 2005TJ1366, Air Force (University of Vanderbilt) under contract
no. 18727S3, International Collaboration for Advanced Security Technology of the
National Science Council, Taiwan, under contract no. 1010717, the U.S. Department
of Energy Career Award (DE-FG02-05ER25696), and a Pittsburgh Life-Sciences
Greenhouse Young Pioneer Award.
P. Degano and R. Gorrieri (Eds.): CMSB 2009, LNBI 5688, pp. 218–234, 2009.
c
Springer-Verlag Berlin Heidelberg 2009

A Bayesian Approach to Model Checking Biological Systems 219
Checking [15] as a powerful tool for formally reasoning about the dynamic prop-
erties of such models (e.g., [1,6,9,11,14,18,24,38]). This paper presents a new
Model Checking algorithm that is well-suited for verifying properties of very
large stochastic models, such as those created and used in Systems Biology.
The stochastic nature of most computational models from Systems Biology
gives rise to an instance of the Probabilistic Model Checking (PMC) problem
[13,15,31]. Suppose M is a stochastic model over a set of states S, s
0
is a starting
state, φ is a dynamic property expressed as a formula in temporal logic, and
θ [0, 1] is a probability threshold. The PMC problem is: given the 4-tuple
(M,s
0
), to decide algorithmically whether M,s
0
|= P
θ
(φ). In this paper,
property φ is expressed in BLTL - Bounded Linear Temporal Logic [36,35,19].
Given these, PMC algorithms decide whether the model satisfies the property
with at least probability θ.
Existing algorithms for solving the PMC problem fall into one of two cate-
gories. The first category comprises numerical methods (e.g. [2,3,12,16,31]) which
can compute the probability with which the property holds with high precision.
Numerical methods are generally only suitable for small systems ( 10
6
to 10
7
states). In a Biological System, the number of states can easily exceed this limit,
which motivates the need for algorithms for solving the PMC problem in an
approximate fashion. Approximate methods (e.g., [23,26,39,46]) work by sam-
pling a set of traces from the model. Each trace is then evaluated to determine
whether it satisfies the property. The number of satisfying traces is used to
(approximately) decide whether M,s
0
|= P
θ
(φ).
Approximate PMC methods can be further divided into two sub-categories:
(i) those that seek to estimate the probability that the property holds and then
compare that estimate to θ (e.g., [26,39]), and (ii) those that reduce the PMC
problem to a hypothesis testing problem (e.g., [46,47]). That is, deciding between
two hypotheses H
0
: P
θ
(φ)versusH
1
: P
(φ). Hypothesis-testing based
methods are more efficient than those based on estimation when θ (which is
specified by the user) is significantly different than the true probability that the
property holds (which is determined by M and s
0
)[45].
Existing PMC methods based on hypothesis testing rely on Classical (aka
Frequentist ) statistical procedures, like Wald’s Sequential Probability Ratio Test
(SPRT) [42], to answer the decision problem. Our algorithm performs hypothesis
testing, but uses Bayesian statistical procedures. This distinction is not trivial,
as Bayesian and Classical statistics are two very different fields. We will show that
in practice, our Bayesian approach requires fewer samples than Wald’s SPRT.
Finally, we note that because we adopt a Bayesian approach, our algorithm can
incorporate prior knowledge, in the form of a probability distribution, P (θ),
when available. This is relevant because in a Biological setting, it is often the
case that prior knowledge is available.
The contributions of this paper are as follows:
The first application of Bayesian Sequential Hypothesis Testing to statistical
Model Checking,

220 S.K. Jha et al.
The first hypothesis-testing based statistical Model Checking algorithm de-
signed for composite hypotheses, which can in particular include prior knowl-
edge via a mixture of prior distributions,
A theorem proving that our algorithm terminates with probability 1,
Error bounds for our algorithm, and
A series of case studies using Systems Biology models demonstrating that our
method is empirically more efficient than existing algorithms for statistical
Model Checking.
2 Background and Related Work
Our algorithm can be applied to any stochastic model M with a well-defined
probability space over traces. Several well-studied stochastic models like (discrete
and continuous) Markov Chains satisfy this property [47]. We assume that each
execution of the system can be represented by a sequence of states and the time
spent in these states. The sequence σ =(s
0
,t
0
), (s
1
,t
1
),... denotes an execution
of the system along states s
0
,s
1
,... with durations t
0
,t
1
,... R. The system
stays in state s
i
for duration t
i
and makes a transition to s
i+1
.Werequirethat
the sum
i
t
i
must diverge, that is, the system can not make infinitely many
state switches in finite time.
2.1 Specifying Properties in Temporal Logic
Our algorithm verifies properties of M expressed as formulas in Probabilistic
Bounded Linear Temporal Logic (PBLTL). We first define the syntax and se-
mantics of Bounded Linear Temporal Logic (BLTL) [36,35,19] and then extend
that logic to PBLTL.
For a stochastic model M, let the set of state variables SV be a finite set of
real-valued variables. A Boolean predicate over SV is a constraint of the form
xv,wherex SV , ∼∈{, , =},andv R. A BLTL property is built on a
finite set of Boolean predicates over SV using Boolean connectives and temporal
operators. The syntax of the logic is given by the following grammar:
φ ::= xv | (φ
1
φ
2
) | (φ
1
φ
2
) φ
1
| (φ
1
U
t
φ
2
),
where ∼∈{, , =}, x SV , v Q,andt Q
0
. We can define additional
temporal operators such as F
t
ψ = True U
t
ψ,orG
t
ψ = ¬F
t
¬ψ in terms of the
bounded until U
t
.
We define the semantics of BLTL with respect to executions of M.The
fact that an execution σ satisfies property φ is denoted by σ |= φ.Letσ =
(s
0
,t
0
), (s
1
,t
1
),... be an execution of the model along states s
0
,s
1
,... with du-
rations t
0
,t
1
,... R. We denote the execution trace starting at state i by σ
i
(in
particular, σ
0
denotes the original execution σ). The value of the state variable
x in σ at the state i is denoted by V (σ, i, x). The semantics of BLTL for a trace
σ
k
starting at the k
th
state (k N)isdenedasfollows:
σ
k
|= x v if and only if V (σ, k, x) v;
σ
k
|= φ
1
φ
2
if and only if σ
k
|= φ
1
or σ
k
|= φ
2
;

A Bayesian Approach to Model Checking Biological Systems 221
σ
k
|= φ
1
φ
2
if and only if σ
k
|= φ
1
and σ
k
|= φ
2
;
σ
k
|= ¬φ
1
if and only if σ
k
|= φ
1
does not hold (written σ
k
|= φ
1
);
σ
k
|= φ
1
U
t
φ
2
if and only if there exists i N such that (a)
0l<i
t
k+l
t,
(b) σ
k+i
|= φ
2
and (c)foreach0 j<i, σ
k+j
|= φ
1
.
Statistical Model Checking is based on evaluating whether σ |= φ holds on
sample simulations σ of the system. In practice, sample simulations only have
a finite duration. The question is how long these simulations have to be for the
formula φ to have a well-defined semantics such that σ |= φ can be checked.
If σ is too short, say of duration 2, the semantics of φ
1
U
5
φ
2
may be unclear.
But at what duration of the simulation can we stop because we know that the
truth-value for σ |= φ will never change by continuing the simulation? In [29],
we prove that finite simulations of bounded duration are always sufficient for
Model Checking BLTL on traces.
We can now define Probabilistic Bounded Linear Temporal Logic.
Definition 1. A Probabilistic Bounded LTL (PBLTL) formula is a formula of
the form P
θ
(φ),whereφ is a BLTL formula and θ (0, 1).
We say that M satisfies PBLTL property P
θ
(φ), denoted by M|= P
θ
(φ), if
and only if the probability that an execution of M satisfies BLTL property φ is
greater than or equal to θ. The problem is well-defined [47] since one can always
assign a unique probability measure to the set of executions of M that satisfy
a formula in BLTL. Note that counterexamples to the BLTL property φ are not
counterexamples to the PBLTL property P
θ
(φ), because the truth of P
θ
(φ)
depends on the likelihood of all counterexamples to φ. This makes PMC more
difficult than standard Model Checking, because one counterexample to φ is not
enough to answer P
θ
(φ).
2.2 Existing Statistical Probabilistic Model Checking Algorithms
As outlined in the introduction, Probabilistic Model Checking algorithms can
either be exact (e.g. [2,3,12,16,31]), or statistical in nature. In practice, statistical
methods (e.g., [23,26,32,39,46]), which iteratively draw sample traces from the
model, are generally better suited to Model Checking Biological systems because
they scale better. Our method is statistical, and so we will compare and contrast
our method to existing statistical methods in this section.
Existing PMC methods based on hypothesis testing rely on Classical (aka Fre-
quentist) statistical procedures, like Wald’s Sequential Probability Ratio Test
(SPRT) [42], to answer the decision problem. Younes and Simmons introduced the
first algorithm for statistical Model Checking [45,46,47] for verifying probabilis-
tic temporal properties of stochastic systems. Their work uses the SPRT, which is
designed for simple hypothesis testing
1
. Specifically, the SPRT decides between
1
A simple hypothesis completely specifies a distribution. For example, a Bernoulli dis-
tribution of parameter p is fully specified by the hypothesis p =0.5 (or some other
fixed value). A composite hypothesis has instead free parameters, e.g. the hypothesis
p<0.3, for a Bernoulli distribution.

222 S.K. Jha et al.
the simple null hypothesis H
0
: M,s
0
|= P
=θ
0
(φ) against the simple alternate hy-
pothesis H
1
: M ,s
0
|= P
=θ
1
(φ), where θ
0
1
. It can be shown that the SPRT is
optimal for simple hypothesis testing, in the sense that it minimizes the expected
number of samples among all the tests satisfying the same Type I and II errors [43],
when either H
0
or H
1
is true. The PMC problem is instead a choice between two
composite hypotheses H
0
: M,s
0
|= P
θ
[φ]versusH
1
: M,s
0
|= P
[φ]. The
SPRT is not defined unless θ
0
= θ
1
, so Younes and Simmons overcome this prob-
lem by separating the two hypotheses by an indifference region (θ δ, θ + δ), where
0 <1 is a user-specified parameter. It can be shown that the SPRT with indif-
ference region can be used for testing composite hypotheses, while respecting the
same Type I and II errors of a standard SPRT [21, Section 3.4]. However, in this case
the test is no longer optimal, and the maximum expected sample size may be much
bigger than the optimal fixed sample size sampling test - see [7] and [21, Section
3.6]. We note that our algorithm solves the composite hypothesis testing problem,
but does so using Bayesian statistics, and thus requires no indifference region.
The method of [26] uses a fixed number of samples and estimates the probabil-
ity the property holds as the number of satisfying traces divided by the number
of sampled traces. Their algorithm guarantees the accuracy of the results using
Chernoff-Hoeffding bounds. In particular, their algorithm can guarantee that the
difference in the estimated and the true probability is less than , with probabil-
ity ρ,whereρ<1and>0 are user-specified parameters. Grosu and Smolka use
a similar technique for verifying formulas in LTL [23]. Their algorithm randomly
samples lassos from a uchi automaton in an on-the-fly fashion. The method of
[32] is also Bayesian, like the algorithm in this paper, but estimates the prob-
ability the property holds and does not invoke hypothesis testing. Unlike the
algorithm in this paper, [32] is fully Bayesian in the sense that it explicitly con-
siders the prior distributions over the initial state and parameters of the model,
in addition to the prior over the property.
Finally, Sen et al. [39,40] used the p-value for the null hypothesis as a statistic
for hypothesis testing. The p-value is defined as the probability of obtaining
observations at least as extreme as the one that was actually seen, given that
the null hypothesis is true. It is important to realize that a p-value is not the
probability that the null hypothesis is true. Sen et al.’s method does not have a
way to control the Type I and II errors.
3 Bayesian Statistical Model Chec king
In this section, we first review some important concepts from statistical Model
Checking, and then introduce theory and terminology from Bayesian statistics.
We then present our algorithm in Sec. 3.2.
Recall that the PMC problem is to decide whether M|= P
θ
(φ), where
θ (0, 1) and φ is a BLTL formula. Let p be the (unknown but fixed) probability
of the model satisfying φ: thus, the PMC problem can now be stated as deciding
between two hypotheses:
H
0
: p θH
1
: p<θ.

Citations
More filters
Book ChapterDOI

Statistical model checking: an overview

TL;DR: The model checking problem for stochastic systems with respect to such logics is typically solved by a numerical approach [31,8,35,22,21,5] that iteratively computes (or approximates) the exact measure of paths satisfying relevant subformulas as discussed by the authors.
Journal ArticleDOI

Uppaal SMC tutorial

TL;DR: The modeling features of the Uppaal SMC tool, new verification algorithms and ways of applying them to potentially complex case studies are demonstrated.
Journal ArticleDOI

A Survey of Statistical Model Checking

TL;DR: SMC provides a more widely applicable and scalable alternative to analysis of properties of stochastic systems using numerical and symbolic methods, while emphasizing current limitations and tradeoffs between precision and scalability.
Book ChapterDOI

Statistical model checking for cyber-physical systems

TL;DR: It is described how Statistical Model Checking works and how Importance Sampling with the Cross-Entropy Technique can be used to address the main problem with rare events.
Proceedings ArticleDOI

Bayesian statistical model checking with application to Simulink/Stateflow verification

TL;DR: It is proved that Bayesian SMC can make the probability of giving a wrong answer arbitrarily small, which enables faster verification than state-of-the-art statistical techniques, while retaining the same error bounds.
References
More filters
Journal ArticleDOI

Exact Stochastic Simulation of Coupled Chemical Reactions

TL;DR: In this article, a simulation algorithm for the stochastic formulation of chemical kinetics is proposed, which uses a rigorously derived Monte Carlo procedure to numerically simulate the time evolution of a given chemical system.

Model checking

TL;DR: Model checking tools, created by both academic and industrial teams, have resulted in an entirely novel approach to verification and test case generation that often enables engineers in the electronics industry to design complex systems with considerable assurance regarding the correctness of their initial designs.
Book

Theory of probability

TL;DR: In this paper, the authors introduce the concept of direct probabilities, approximate methods and simplifications, and significant importance tests for various complications, including one new parameter, and various complications for frequency definitions and direct methods.
Proceedings ArticleDOI

The temporal logic of programs

Amir Pnueli
TL;DR: A unified approach to program verification is suggested, which applies to both sequential and parallel programs, and the main proof method is that of temporal reasoning in which the time dependence of events is the basic concept.
Frequently Asked Questions (13)
Q1. What have the authors contributed in "A bayesian approach to model checking biological systems" ?

Extending their earlier work, the authors present the first algorithm for performing statistical Model Checking using Bayesian Sequential Hypothesis Testing. The authors show that their Bayesian approach outperforms current statistical Model Checking techniques, which rely on tests from Classical ( aka Frequentist ) statistics, by requiring fewer system simulations. Another advantage of their approach is the ability to incorporate prior Biological knowledge about the model being verified. The authors demonstrate their algorithm on a variety of models from the Systems Biology literature and show that it enables faster verification than state-of-the-art techniques, even when no prior knowledge is available. 

the authors note that because the authors adopt a Bayesian approach, their algorithm can incorporate prior knowledge, in the form of a probability distribution, P (θ), when available. 

the Beta distribution is the appropriate choice for summarizing the prior probability distribution in Statistical Model Checking. 

In their experiments, the Bayesian Model Checking algorithm used uniform priors, and accepted a hypothesis when it was 10000 times more likely than the other hypothesis (Bayes Factor threshold T = 10000). 

The statistical estimation engine of the PRISM model checker always needed 92042 samples to estimate the probability of the BLTL formulae being true. 

The authors also studied SBML models using the implementation of Gillespie’s Stochastic Simulation Algorithm in Matlab’s Systems Biology Toolbox. 

It can be shown that the SPRT is optimal for simple hypothesis testing, in the sense that it minimizes the expected number of samples among all the tests satisfying the same Type The authorand II errors [43], when either H ′0 or H ′ 1 is true. 

the authors see that the Bayes factor can be computed by means of standard, well-known numerical methods, thereby simplifying the implementation of the algorithm. 

The contributions of this paper are as follows: • The first application of Bayesian Sequential Hypothesis Testing to statisticalModel Checking,• The first hypothesis-testing based statistical Model Checking algorithm designed for composite hypotheses, which can in particular include prior knowledge via a mixture of prior distributions, • A theorem proving that their algorithm terminates with probability 1, • Error bounds for their algorithm, and • A series of case studies using Systems Biology models demonstrating that ourmethod is empirically more efficient than existing algorithms for statistical Model Checking. 

(ii) The performance of both the Wald’s algorithm [42] and their Bayesian Model Checking algorithm degrades as the threshold probability (i.e., θ) in the PBLTL temporal logic formula gets close to the actual probability of the model satisfying the BLTL formula. 

This advantage in efficiency is important in the context of Systems Biology as the cost of generating traces is not necessarily negligible. 

The syntax of the logic is given by the following grammar:φ ::= x∼v | (φ1 ∨ φ2) | (φ1 ∧ φ2) | ¬φ1 | (φ1Utφ2), where ∼ ∈ {≥,≤,=}, x ∈ SV , v ∈ Q, and t ∈ Q≥0. 

The authors check the property that the probability of the number of bound Cyclin molecules exceeds 3 units within 0.5 time units exceeds θ (for various values of θ):H0 : M |= P≥θ[ F0.5 (cyclin bound > 3) ]