Verification and control of partially observable probabilistic systems

doi:10.1007/S11241-017-9269-4

Real-Time Syst (2017) 53:354–402

DOI 10.1007/s11241-017-9269-4

Veriﬁcation and control of partially observable

probabilistic systems

Gethin Norman

1

· David Parker

2

· Xueyi Zou

3

Published online: 8 March 2017

Abstract We present automated techniques for the veriﬁcation and control of partially

observable, probabilistic systems for both discrete and dense models of time. For

the discrete-time case, we formally model these systems using partially observable

Markov decision processes; for dense time, we propose an extension of probabilistic

timed automata in which local states are partially visible to an observer or controller.

We give probabilistic temporal logics that can express a range of quantitative properties

of these models, relating to the probability of an event’s occurrence or the expected

value of a reward measure. We then propose techniques to either verify that such

a property holds or synthesise a controller for the model which makes it true. Our

approach is based on a grid-based abstraction of the uncountable belief space induced

by partial observability and, for dense-time models, an integer discretisation of real-

time behaviour. The former is necessarily approximate since the underlying problem is

undecidable, however we show how both lower and upper bounds on numerical results

can be generated. We illustrate the effectiveness of the approach by implementing it

in the PRISM model checker and applying it to several case studies from the domains

of task and network scheduling, computer security and planning.

Keywords Formal veriﬁcation · Probabilistic veriﬁcation · Controller synthesis

B

David Parker

d.a.parker@cs.bham.ac.uk

1

School of Computing Science, University of Glasgow, Glasgow, UK

2

School of Computer Science, University of Birmingham, Birmingham, UK

3

Department of Computer Science, University of York, York, UK

123

Real-Time Syst (2017) 53:354–402 355

1 Introduction

Guaranteeing the correctness of complex computerised systems often needs to take

into account quantitative aspects of system behaviour. This includes the modelling of

probabilistic phenomena, such as failure rates f or physical components, uncertainty

arising from unreliable sensing of a continuous environment, or the explicit use of

randomisation to break symmetry. It also includes timing characteristics, such as time-

outs or delays in communication or security protocols. To further complicate matters,

such systems are often nondeterministic because their behaviour depends on inputs or

instructions from some external entity such as a controller or scheduler.

Automated veriﬁcation techniques such as probabilistic model checking have been

successfully used to analyse quantitative properties of probabilistic systems across a

variety of application domains, including wireless communication protocols, computer

security and task scheduling. These systems are commonly modelled using Markov

decision processes (MDPs), if assuming a discrete notion of time, or probabilistic

timed automata (PTAs), if using a dense model of time. On these models, we can

consider two problems: veriﬁcation that it satisﬁes some formally speciﬁed property

for any possible resolution of nondeterminism; or, dually, synthesis of a controller (i.e.,

a means to resolve nondeterminism) under which a property is guaranteed to hold.

For either case, an important consideration is the extent to which the system’s state is

observable to the entity controlling it. For example, to verify that a security protocol

is functioning correctly, it may be essential to model the fact that some data held by

a participant is not externally visible; or, when synthesising an optimal schedule for

sending packets over a network, a scheduler may not be implementable in practice if

it bases its decisions on information about the state of the network that is unavailable

due to the delays and costs associated with probing it.

Partially observable MDPs (POMDPs) are a natural way to extend MDPs in order to

tackle this problem. However, the analysis of POMDPs is considerably more difﬁcult

than MDPs since key problems are undecidable (Madani et al.

2003). A variety of

veriﬁcation problems have been studied for these models (see, e.g., de Alfaro

1999;

Baier et al.

2008; Chatterjee et al. 2013) and the use of POMDPs is common in

ﬁelds such as AI and planning (Cassandra 1998), but there is limited progress in

the development of practical techniques for probabilistic veriﬁcation in this area, or

exploration of their applicability.

In this paper, we present novel techniques for veriﬁcation and control of partially

observable, probabilistic systems under both discrete and dense models of time. We

use POMDPs in the case of discrete-time models and, for dense time, propose a model

called partially observable probabilistic timed automata (POPTAs), which extends

the existing model of PTAs with a notion of partial observability. The semantics of

a POPTA is an inﬁnite-state POMDP. In order to specify veriﬁcation and control

problems on POMDPs and POPTAs, we deﬁne temporal logics to express properties

of these models relating to the probability of an event (e.g., the probability of some

observation eventually being made) or the expected value of various reward measures

(e.g., the expected time until some observation). Nondeterminism in both a POMDP

and a POPTA is resolved by a strategy that decides which actions to take and when to

take them, based only on the history of observations (not states). The core problems

123

356 Real-Time Syst (2017) 53:354–402

we address are how to verify that a temporal logic property holds for all possible

strategies, and how to synthesise a strategy under which the property holds.

In order to achieve this, we use a combination of techniques. To analyse a POMDP,

we use grid-based techniques (Lovejoy et al.

1991; Yu and Bertsekas 2004), which

transform it to a fully observable but continuous-space MDP and then approximate its

solution based on a ﬁnite set of grid points. We use this to construct and solve a strategy

of the POMDP. The result is a pair of lower and upper bounds on the property of interest

for the POMDP. If this is not precise enough, we can reﬁne the grid and repeat. In

the case of POPTAs, we develop a digital clocks discretisation, which extends the

existing notion for PTAs (Kwiatkowska et al.

2006). The discretisation reduces the

analysis to a ﬁnite POMDP, and hence we can use the techniques we have developed for

analysing POMDPs. We deﬁne the conditions under which temporal logic properties

are preserved by the discretisation step and prove the correctness of the reduction

under these conditions.

We implemented these methods in a prototype tool based on PRISM (Kwiatkowska

et al.

2011; PRISM), and investigated their applicability by developing a number of

case studies including: wireless network scheduling, a task scheduling problem, a

covert channel prevention device (the NRL pump) and a non-repudiation protocol.

Despite the undecidability of the POMDP problems we consider, we show that use-

ful results can be obtained, often with precise bounds. In each case study, partial

observability, nondeterminism, probability and, in the case of the dense-time models,

real-time behaviour are all crucial ingredients to the analysis. This is a combination

not supported by any existing techniques or tools.

A preliminary conference version of this paper, was published as Norman et al.

(

2015).

1.1 Related work

POMDPs are common in ﬁelds such as AI and planning: they have many appli-

cations (Cassandra

1998) and tool support exists (Poupart 2005). However, unlike

veriﬁcation, the focus in these ﬁelds is usually on ﬁnite-horizon and discounted reward

objectives. Early undecidability for key problems can be found in, e.g., Madani et al.

(

2003). POMDPs have also been applied to problems such as scheduling in wireless

networks since, in practice, information about the s tate of wireless connections is often

unavailable and varies over time; see e.g. Johnston and Krishnamurthy (

2006), Li and

Neely (

2011), Yang et al. (2011), Jagannathan et al. (2013), and Gopalan et al. (2015).

POMDPs have also been studied by the formal veriﬁcation community, see e.g. de

Alfaro (

1999), Baier et al. (2008), and Chatterjee et al. (2013), establishing unde-

cidability and complexity results for various qualitative and quantitative veriﬁcation

problems. In the case of qualitative analysis, Chatterjee et al. (

2015) presents an

approach for the veriﬁcation and synthesis of POMDPs against LTL properties when

restricting to ﬁnite-memory strategies. This has been implemented and applied to an

autonomous system (Svoreˆnová et al.

2015). For quantitative properties, the recent

work of Chatterjee (2016) extends approaches developed for ﬁnite-horizon objectives

to approximate the minimum expected reward of reaching a target (while ensuring

123

Real-Time Syst (2017) 53:354–402 357

the target is reached with probability 1), under the r equirement that all rewards in the

POMDP are positive.

Work in this area often also studies related models such as Rabin’s probabilistic

automata (Baier et al.

2008), which can be seen as a special case of POMDPs, and

partially observable stochastic games (POSGs) (Chatterjee and Doyen

2014), which

generalise them. More practically oriented work includes: Giro andRabe (

2012), which

proposes a counter-example-driven reﬁnement method to approximately solve MDPs

in which components have partial observability of each other; and Cerný et al. (

2011),

which synthesises concurrent program constructs using a search over memoryless

strategies in a POSG.

Theoretical results (Bouyer et al.

2003) and algorithms (Cassez et al. 2007;

Finkbeiner and Peter

2012) have been developed for synthesis of partially observ-

able timed games. In Bouyer et al. (

2003), it is shown that the synthesis problem is

undecidable and, if the resources of the controller are ﬁxed, decidable but prohibitively

expensive. The algorithms require constraints on controllers: in Cassez et al. (

2007),

controllers only respond to changes made by the environment and, in Finkbeiner and

Peter (

2012), their structure must be ﬁxed in advance. We are not aware of any work

for probabilistic real-time models in this area.

1.2 Outline

Section

2 describes the discrete-time models of MDPs and POMDPs, and Sect. 3

presents our approach for POMDP veriﬁcation and strategy synthesis. In Sect. 4,we

introduce the dense-time models of PTAs and POPTAs, and then, in Sect.

5, give our

veriﬁcation and strategy synthesis approach for POPTAs using digital clocks. Section

6

describes the implementation of our techniques for analysing POMDPs and POPTAs in

a prototype tool, and demonstrates its applicability using several case studies. Finally,

Sect.

7 concludes the paper.

2 Partially observable Markov decision processes

In this section, we consider systems exhibiting probabilistic, nondeterministic and

discrete-time behaviour. We ﬁrst introduce MDPs, and then describe POMDPs, which

extend these to include partial observability. For a more detailed tutorial on veriﬁcation

techniques for MDPs, we refer the reader to, for example, Forejt et al. (

2011).

2.1 Markov decision processes

Let Dist(X) denote the set of discrete probability distributions over a set X , δ

x

the

distribution that selects x ∈ X with probability 1, and R the set of non-negative real

numbers.

Deﬁnition 1 (MDP)AnMDPisatupleM = (S, ¯s, A, P, R) where:

– S is a set of states;

– ¯s ∈ S is an initial state;

– A is a set of actions;

123

358 Real-Time Syst (2017) 53:354–402

– P : S × A → Dist(S) is a (partial) probabilistic transition function;

– R = (R

S

, R

A

) is a reward structure where R

S

: S → R is a state reward function

and R

A

: S × A → R an action reward function.

An MDP M represents the evolution of a system exhibiting both probabilistic and

nondeterministic behaviour through states from the set S. Each state s ∈ S of M has

asetA(s)

def

={a ∈ A | P(s, a) is deﬁned} of available actions. The choice between

which available action is chosen in a state is nondeterministic. In a state s, if action

a ∈ A(s) is selected, then the probability of moving to state s

′

equals P(s, a)(s

′

).

A path of M is a ﬁnite or inﬁnite sequence π = s

0

a

0

−→ s

1

a

1

−→ ···, where s

i

∈ S,

a

i

∈ A(s

i

) and P(s

i

, a

i

)(s

i+1

)>0 for all i ∈ N.The(i + 1)th state s

i

of path π

is denoted π(i ) and, if π is ﬁnite, last(π) denotes its ﬁnal state. We write FPaths

M

and IPaths

M

, respectively, for the set of all ﬁnite and inﬁnite paths of M starting

in the initial state ¯s. MDPs are also annotated with rewards, which can be used to

model a variety of quantitative measures of interest. A reward of R(s) is accumulated

when passing through state s and a reward of R(s, a) when taking action a from

state s.

A strategy of M (also called a policy or scheduler) is a way of resolving the choice

of action in each state, based on the MDP’s execution so far.

Deﬁnition 2 (Strategy)Astrategy of an MDP M = (S, ¯s, A, P, R) is a function

σ : FPaths

M

→Dist(A) such that, for any π ∈ FPaths

M

,wehaveσ(π)(a)>0 only if

a ∈ A(last(π)).Let

M

denote the set of all strategies of M.

Astrategyismemoryless if its choices only depend on the current state, ﬁnite-

memory if it sufﬁces to switch between a ﬁnite set of modes and deterministic if it

always selects an action with probability 1.

When M is under the control of a strategy σ , the resulting behaviour is captured

by a probability measure Pr

σ

M

over the inﬁnite paths of M (Kemeny et al.

1976).

Furthermore, given a random variable f : IPaths

M

→R over the inﬁnite paths of M,

using the probability measure Pr

σ

M

, we can deﬁne the expected value of the variable

f with respect to the strategy σ , denoted E

σ

M

( f ).

2.2 Partially observable Markov decision processes

POMDPs extend MDPs by restricting the extent to which their current state can be

observed, in particular by strategies that control them. In this paper (as in, e.g., Baier

et al.

2008; Chatterjee et al. 2013), we adopt the following notion of observability.

Deﬁnition 3 (POMDP) A POMDP is a tuple M = (S, ¯s, A, P, R, O, obs) where:

– (S, ¯s, A, P, R) is an MDP;

– O is a ﬁnite set of observations;

– obs : S → O is a labelling of states with observations;

such that, for any states s, s

′

∈ S with obs(s) = obs(s

′

), their available actions must

be identical, i.e., A(s) = A(s

′

).

123

Verification and control of partially observable probabilistic systems

Citations

The probabilistic model checker Storm

Finite-State Controllers of POMDPs using Parameter Synthesis

Temporal logic control of POMDPs via label-based stochastic simulation relations

Parameter Synthesis for Markov Models.

Probabilistic Model Checking of Robots Deployed in Extreme Environments

References

Markov Decision Processes: Discrete Stochastic Dynamic Programming

A theory of timed automata

Markov Decision Processes

PRISM 4.0: verification of probabilistic real-time systems

The Theory and Practice of Concurrency

Related Papers (5)

PRISM 4.0: verification of probabilistic real-time systems

A Storm is Coming: A Modern Probabilistic Model Checker

Principles of Model Checking

Planning and Acting in Partially Observable Stochastic Domains

A survey of point-based POMDP solvers