How can the authors incorporate prior knowledge into their algorithm?

the authors note that because the authors adopt a Bayesian approach, their algorithm can incorporate prior knowledge, in the form of a probability distribution, P (θ), when available.

What is the correct choice for summarizing the prior probability distribution in Statistical Model Checking?

the Beta distribution is the appropriate choice for summarizing the prior probability distribution in Statistical Model Checking.

What is the Bayesian Model Checking algorithm?

In their experiments, the Bayesian Model Checking algorithm used uniform priors, and accepted a hypothesis when it was 10000 times more likely than the other hypothesis (Bayes Factor threshold T = 10000).

How many samples did prism need to estimate the probability of the BLTL formulae being?

The statistical estimation engine of the PRISM model checker always needed 92042 samples to estimate the probability of the BLTL formulae being true.

How did the authors study the Bayesian Model Checking algorithm?

The authors also studied SBML models using the implementation of Gillespie’s Stochastic Simulation Algorithm in Matlab’s Systems Biology Toolbox.

How can the Bayes factor be computed?

the authors see that the Bayes factor can be computed by means of standard, well-known numerical methods, thereby simplifying the implementation of the algorithm.

What is the performance of the Bayesian Model Checking algorithm?

(ii) The performance of both the Wald’s algorithm [42] and their Bayesian Model Checking algorithm degrades as the threshold probability (i.e., θ) in the PBLTL temporal logic formula gets close to the actual probability of the model satisfying the BLTL formula.

What is the advantage of Bayesian priors in Systems Biology?

This advantage in efficiency is important in the context of Systems Biology as the cost of generating traces is not necessarily negligible.

What is the syntax of the logic given by Wald’s SPRT?

The syntax of the logic is given by the following grammar:φ ::= x∼v | (φ1 ∨ φ2) | (φ1 ∧ φ2) | ¬φ1 | (φ1Utφ2), where ∼ ∈ {≥,≤,=}, x ∈ SV , v ∈ Q, and t ∈ Q≥0.

What is the probability of the number of bound Cyclin molecules?

The authors check the property that the probability of the number of bound Cyclin molecules exceeds 3 units within 0.5 time units exceeds θ (for various values of θ):H0 : M |= P≥θ[ F0.5 (cyclin bound > 3) ]

(Open Access) A Bayesian Approach to Model Checking Biological Systems (2009) | Sumit Kumar Jha

Q: What have the authors contributed in "A bayesian approach to model checking biological systems" ?

Extending their earlier work, the authors present the first algorithm for performing statistical Model Checking using Bayesian Sequential Hypothesis Testing. The authors show that their Bayesian approach outperforms current statistical Model Checking techniques, which rely on tests from Classical ( aka Frequentist ) statistics, by requiring fewer system simulations. Another advantage of their approach is the ability to incorporate prior Biological knowledge about the model being verified. The authors demonstrate their algorithm on a variety of models from the Systems Biology literature and show that it enables faster verification than state-of-the-art techniques, even when no prior knowledge is available.

Q: What is the way to test a simple hypothesis?

It can be shown that the SPRT is optimal for simple hypothesis testing, in the sense that it minimizes the expected number of samples among all the tests satisfying the same Type The authorand II errors [43], when either H ′0 or H ′ 1 is true.

Q: What is the first application of Bayesian Sequential Hypothesis Testing to statistical Model Checking?

The contributions of this paper are as follows: • The first application of Bayesian Sequential Hypothesis Testing to statisticalModel Checking,• The first hypothesis-testing based statistical Model Checking algorithm designed for composite hypotheses, which can in particular include prior knowledge via a mixture of prior distributions, • A theorem proving that their algorithm terminates with probability 1, • Error bounds for their algorithm, and • A series of case studies using Systems Biology models demonstrating that ourmethod is empirically more efficient than existing algorithms for statistical Model Checking.

A Bayesian Approach to

Model Checking Biological Systems



Sumit K. Jha

, Edmund M. Clarke

, Christopher J. Langmead

1,2

Axel Legay

, Andr´e Platzer

, and Paolo Zuliani

Computer Science Department, Carnegie Mellon University, USA

Lane Center for Computational Biology, Carnegie Mellon University, USA

Institut d’Informatique INRIA, Rennes, France

Abstract. Recently, there has been considerable interest in the use of

Model Checking for Systems Biology. Unfortunately, the state space of

stochastic biological models is often too large for classical Model Check-

ing techniques. For these models, a statistical approach to Model Check-

ing has been shown to be an eﬀective alternative. Extending our earlier

work, we present the ﬁrst algorithm for performing statistical Model

Checking using Bayesian Sequential Hypothesis Testing. We show that

our Bayesian approach outperforms current statistical Model Checking

techniques, which rely on tests from Classical (aka Frequentist) statis-

tics, by requiring fewer system simulations. Another advantage of our

approach is the ability to incorporate prior Biological knowledge about

the model being veriﬁed. We demonstrate our algorithm on a variety

of models from the Systems Biology literature and show that it enables

faster veriﬁcation than state-of-the-art techniques, even when no prior

knowledge is available.

1 Introduct ion

Computational models are increasingly used in the ﬁeld of Systems Biology to

examine the dynamics of biological processes (e.g., [8,10,20,30,34,37]). By ‘com-

putational’, we mean discrete-variable and continuous or discrete-time models

[4], where the components of the system interact and evolve by obeying a set

of instructions or rules. In contrast to diﬀerential equation-based models, which

are also widely used in Systems Biology, computational models can provide in-

sights into the role of stochastic eﬀects over discrete-populations of molecules or

cells. Recently, there has been considerable interest in the application of Model



This research was sponsored by the GSRC (University of California) under contract

no. SA423679952, National Science Foundation under contracts no. CCF0429120,

no. CNS0411152, and no. CCF0541245, Semiconductor Research Corporation un-

der contract no. 2005TJ1366, Air Force (University of Vanderbilt) under contract

no. 18727S3, International Collaboration for Advanced Security Technology of the

National Science Council, Taiwan, under contract no. 1010717, the U.S. Department

of Energy Career Award (DE-FG02-05ER25696), and a Pittsburgh Life-Sciences

Greenhouse Young Pioneer Award.

P. Degano and R. Gorrieri (Eds.): CMSB 2009, LNBI 5688, pp. 218–234, 2009.

 Springer-Verlag Berlin Heidelberg 2009

A Bayesian Approach to Model Checking Biological Systems 219

Checking [15] as a powerful tool for formally reasoning about the dynamic prop-

erties of such models (e.g., [1,6,9,11,14,18,24,38]). This paper presents a new

Model Checking algorithm that is well-suited for verifying properties of very

large stochastic models, such as those created and used in Systems Biology.

The stochastic nature of most computational models from Systems Biology

gives rise to an instance of the Probabilistic Model Checking (PMC) problem

[13,15,31]. Suppose M is a stochastic model over a set of states S, s

is a starting

state, φ is a dynamic property expressed as a formula in temporal logic, and

θ ∈ [0, 1] is a probability threshold. The PMC problem is: given the 4-tuple

(M,s

,φ,θ), to decide algorithmically whether M,s

|= P

≥θ

(φ). In this paper,

property φ is expressed in BLTL - Bounded Linear Temporal Logic [36,35,19].

Given these, PMC algorithms decide whether the model satisﬁes the property

with at least probability θ.

Existing algorithms for solving the PMC problem fall into one of two cate-

gories. The ﬁrst category comprises numerical methods (e.g. [2,3,12,16,31]) which

can compute the probability with which the property holds with high precision.

Numerical methods are generally only suitable for small systems (≈ 10

to 10

states). In a Biological System, the number of states can easily exceed this limit,

which motivates the need for algorithms for solving the PMC problem in an

approximate fashion. Approximate methods (e.g., [23,26,39,46]) work by sam-

pling a set of traces from the model. Each trace is then evaluated to determine

whether it satisﬁes the property. The number of satisfying traces is used to

(approximately) decide whether M,s

|= P

≥θ

(φ).

Approximate PMC methods can be further divided into two sub-categories:

(i) those that seek to estimate the probability that the property holds and then

compare that estimate to θ (e.g., [26,39]), and (ii) those that reduce the PMC

problem to a hypothesis testing problem (e.g., [46,47]). That is, deciding between

two hypotheses — H

: P

≥θ

(φ)versusH

: P

<θ

(φ). Hypothesis-testing based

methods are more eﬃcient than those based on estimation when θ (which is

speciﬁed by the user) is signiﬁcantly diﬀerent than the true probability that the

property holds (which is determined by M and s

)[45].

Existing PMC methods based on hypothesis testing rely on Classical (aka

Frequentist ) statistical procedures, like Wald’s Sequential Probability Ratio Test

(SPRT) [42], to answer the decision problem. Our algorithm performs hypothesis

testing, but uses Bayesian statistical procedures. This distinction is not trivial,

as Bayesian and Classical statistics are two very diﬀerent ﬁelds. We will show that

in practice, our Bayesian approach requires fewer samples than Wald’s SPRT.

Finally, we note that because we adopt a Bayesian approach, our algorithm can

incorporate prior knowledge, in the form of a probability distribution, P (θ),

when available. This is relevant because in a Biological setting, it is often the

case that prior knowledge is available.

The contributions of this paper are as follows:

• The ﬁrst application of Bayesian Sequential Hypothesis Testing to statistical

Model Checking,

220 S.K. Jha et al.

• The ﬁrst hypothesis-testing based statistical Model Checking algorithm de-

signed for composite hypotheses, which can in particular include prior knowl-

edge via a mixture of prior distributions,

• A theorem proving that our algorithm terminates with probability 1,

• Error bounds for our algorithm, and

• A series of case studies using Systems Biology models demonstrating that our

method is empirically more eﬃcient than existing algorithms for statistical

Model Checking.

2 Background and Related Work

Our algorithm can be applied to any stochastic model M with a well-deﬁned

probability space over traces. Several well-studied stochastic models like (discrete

and continuous) Markov Chains satisfy this property [47]. We assume that each

execution of the system can be represented by a sequence of states and the time

spent in these states. The sequence σ =(s

), (s

),... denotes an execution

of the system along states s

,... with durations t

,... ∈ R. The system

stays in state s

for duration t

and makes a transition to s

i+1

.Werequirethat

the sum



∞

must diverge, that is, the system can not make inﬁnitely many

state switches in ﬁnite time.

2.1 Specifying Properties in Temporal Logic

Our algorithm veriﬁes properties of M expressed as formulas in Probabilistic

Bounded Linear Temporal Logic (PBLTL). We ﬁrst deﬁne the syntax and se-

mantics of Bounded Linear Temporal Logic (BLTL) [36,35,19] and then extend

that logic to PBLTL.

For a stochastic model M, let the set of state variables SV be a ﬁnite set of

real-valued variables. A Boolean predicate over SV is a constraint of the form

x∼v,wherex ∈ SV , ∼∈{≥, ≤, =},andv ∈ R. A BLTL property is built on a

ﬁnite set of Boolean predicates over SV using Boolean connectives and temporal

operators. The syntax of the logic is given by the following grammar:

φ ::= x∼v | (φ

∨ φ

) | (φ

∧ φ

) |¬φ

| (φ

where ∼∈{≥, ≤, =}, x ∈ SV , v ∈ Q,andt ∈ Q

≥0

. We can deﬁne additional

temporal operators such as F

ψ = True U

ψ,orG

ψ = ¬F

¬ψ in terms of the

bounded until U

We deﬁne the semantics of BLTL with respect to executions of M.The

fact that an execution σ satisﬁes property φ is denoted by σ |= φ.Letσ =

), (s

),... be an execution of the model along states s

,... with du-

rations t

,...∈ R. We denote the execution trace starting at state i by σ

(in

particular, σ

denotes the original execution σ). The value of the state variable

x in σ at the state i is denoted by V (σ, i, x). The semantics of BLTL for a trace

starting at the k

state (k ∈ N)isdeﬁnedasfollows:

• σ

|= x ∼ v if and only if V (σ, k, x) ∼ v;

• σ

|= φ

∨ φ

if and only if σ

|= φ

or σ

|= φ

;

A Bayesian Approach to Model Checking Biological Systems 221

• σ

|= φ

∧ φ

if and only if σ

|= φ

and σ

|= φ

;

• σ

|= ¬φ

if and only if σ

|= φ

does not hold (written σ

|= φ

);

• σ

|= φ

if and only if there exists i ∈ N such that (a)



0≤l<i

k+l

≤ t,

(b) σ

k+i

|= φ

and (c)foreach0≤ j<i, σ

k+j

|= φ

Statistical Model Checking is based on evaluating whether σ |= φ holds on

sample simulations σ of the system. In practice, sample simulations only have

a ﬁnite duration. The question is how long these simulations have to be for the

formula φ to have a well-deﬁned semantics such that σ |= φ can be checked.

If σ is too short, say of duration 2, the semantics of φ

may be unclear.

But at what duration of the simulation can we stop because we know that the

truth-value for σ |= φ will never change by continuing the simulation? In [29],

we prove that ﬁnite simulations of bounded duration are always suﬃcient for

Model Checking BLTL on traces.

We can now deﬁne Probabilistic Bounded Linear Temporal Logic.

Deﬁnition 1. A Probabilistic Bounded LTL (PBLTL) formula is a formula of

the form P

≥θ

(φ),whereφ is a BLTL formula and θ ∈ (0, 1).

We say that M satisﬁes PBLTL property P

≥θ

(φ), denoted by M|= P

≥θ

(φ), if

and only if the probability that an execution of M satisﬁes BLTL property φ is

greater than or equal to θ. The problem is well-deﬁned [47] since one can always

assign a unique probability measure to the set of executions of M that satisfy

a formula in BLTL. Note that counterexamples to the BLTL property φ are not

counterexamples to the PBLTL property P

≥θ

(φ), because the truth of P

≥θ

(φ)

depends on the likelihood of all counterexamples to φ. This makes PMC more

diﬃcult than standard Model Checking, because one counterexample to φ is not

enough to answer P

≥θ

(φ).

2.2 Existing Statistical Probabilistic Model Checking Algorithms

As outlined in the introduction, Probabilistic Model Checking algorithms can

either be exact (e.g. [2,3,12,16,31]), or statistical in nature. In practice, statistical

methods (e.g., [23,26,32,39,46]), which iteratively draw sample traces from the

model, are generally better suited to Model Checking Biological systems because

they scale better. Our method is statistical, and so we will compare and contrast

our method to existing statistical methods in this section.

Existing PMC methods based on hypothesis testing rely on Classical (aka Fre-

quentist) statistical procedures, like Wald’s Sequential Probability Ratio Test

(SPRT) [42], to answer the decision problem. Younes and Simmons introduced the

ﬁrst algorithm for statistical Model Checking [45,46,47] for verifying probabilis-

tic temporal properties of stochastic systems. Their work uses the SPRT, which is

designed for simple hypothesis testing

. Speciﬁcally, the SPRT decides between

A simple hypothesis completely speciﬁes a distribution. For example, a Bernoulli dis-

tribution of parameter p is fully speciﬁed by the hypothesis p =0.5 (or some other

ﬁxed value). A composite hypothesis has instead free parameters, e.g. the hypothesis

p<0.3, for a Bernoulli distribution.

222 S.K. Jha et al.

the simple null hypothesis H



: M,s

|= P

=θ

(φ) against the simple alternate hy-

pothesis H



: M ,s

|= P

=θ

(φ), where θ

<θ

. It can be shown that the SPRT is

optimal for simple hypothesis testing, in the sense that it minimizes the expected

number of samples among all the tests satisfying the same Type I and II errors [43],

when either H



or H



is true. The PMC problem is instead a choice between two

composite hypotheses H

: M,s

|= P

≥θ

[φ]versusH

: M,s

|= P

<θ

[φ]. The

SPRT is not deﬁned unless θ

= θ

, so Younes and Simmons overcome this prob-

lem by separating the two hypotheses by an indiﬀerence region (θ− δ, θ + δ), where

0 <δ<1 is a user-speciﬁed parameter. It can be shown that the SPRT with indif-

ference region can be used for testing composite hypotheses, while respecting the

same Type I and II errors of a standard SPRT [21, Section 3.4]. However, in this case

the test is no longer optimal, and the maximum expected sample size may be much

bigger than the optimal ﬁxed sample size sampling test - see [7] and [21, Section

3.6]. We note that our algorithm solves the composite hypothesis testing problem,

but does so using Bayesian statistics, and thus requires no indiﬀerence region.

The method of [26] uses a ﬁxed number of samples and estimates the probabil-

ity the property holds as the number of satisfying traces divided by the number

of sampled traces. Their algorithm guarantees the accuracy of the results using

Chernoﬀ-Hoeﬀding bounds. In particular, their algorithm can guarantee that the

diﬀerence in the estimated and the true probability is less than , with probabil-

ity ρ,whereρ<1and>0 are user-speciﬁed parameters. Grosu and Smolka use

a similar technique for verifying formulas in LTL [23]. Their algorithm randomly

samples lassos from a B¨uchi automaton in an on-the-ﬂy fashion. The method of

[32] is also Bayesian, like the algorithm in this paper, but estimates the prob-

ability the property holds and does not invoke hypothesis testing. Unlike the

algorithm in this paper, [32] is fully Bayesian in the sense that it explicitly con-

siders the prior distributions over the initial state and parameters of the model,

in addition to the prior over the property.

Finally, Sen et al. [39,40] used the p-value for the null hypothesis as a statistic

for hypothesis testing. The p-value is deﬁned as the probability of obtaining

observations at least as extreme as the one that was actually seen, given that

the null hypothesis is true. It is important to realize that a p-value is not the

probability that the null hypothesis is true. Sen et al.’s method does not have a

way to control the Type I and II errors.

3 Bayesian Statistical Model Chec king

In this section, we ﬁrst review some important concepts from statistical Model

Checking, and then introduce theory and terminology from Bayesian statistics.

We then present our algorithm in Sec. 3.2.

Recall that the PMC problem is to decide whether M|= P

≥θ

(φ), where

θ ∈ (0, 1) and φ is a BLTL formula. Let p be the (unknown but ﬁxed) probability

of the model satisfying φ: thus, the PMC problem can now be stated as deciding

between two hypotheses:

: p  θH

: p<θ.

A Bayesian Approach to Model Checking Biological Systems

Figures

Citations

Statistical model checking: an overview

Uppaal SMC tutorial

A Survey of Statistical Model Checking

Statistical model checking for cyber-physical systems

Bayesian statistical model checking with application to Simulink/Stateflow verification

References

Exact Stochastic Simulation of Coupled Chemical Reactions

Model checking

Theory of probability

The temporal logic of programs

On Observing Nondeterminism and Concurrency

Related Papers (5)

PRISM 4.0: verification of probabilistic real-time systems

Probabilistic Verification of Discrete Event Systems Using Acceptance Sampling

Principles of Model Checking

The temporal logic of programs

Sequential Tests of Statistical Hypotheses

Frequently Asked Questions (13)

Q1. What have the authors contributed in "A bayesian approach to model checking biological systems" ?

Q2. How can the authors incorporate prior knowledge into their algorithm?

Q3. What is the correct choice for summarizing the prior probability distribution in Statistical Model Checking?

Q4. What is the Bayesian Model Checking algorithm?

Q5. How many samples did prism need to estimate the probability of the BLTL formulae being?

Q6. How did the authors study the Bayesian Model Checking algorithm?

Q7. What is the way to test a simple hypothesis?

Q8. How can the Bayes factor be computed?

Q9. What is the first application of Bayesian Sequential Hypothesis Testing to statistical Model Checking?

Q10. What is the performance of the Bayesian Model Checking algorithm?

Q11. What is the advantage of Bayesian priors in Systems Biology?

Q12. What is the syntax of the logic given by Wald’s SPRT?

Q13. What is the probability of the number of bound Cyclin molecules?