scispace - formally typeset
Open AccessJournal ArticleDOI

Verification and control of partially observable probabilistic systems

TLDR
Probabilistic temporal logics are given that can express a range of quantitative properties of partially observable, probabilistic systems for both discrete and dense models of time, relating to the probability of an event’s occurrence or the expected value of a reward measure.
Abstract
We present automated techniques for the verification and control of partially observable, probabilistic systems for both discrete and dense models of time. For the discrete-time case, we formally model these systems using partially observable Markov decision processes; for dense time, we propose an extension of probabilistic timed automata in which local states are partially visible to an observer or controller. We give probabilistic temporal logics that can express a range of quantitative properties of these models, relating to the probability of an event's occurrence or the expected value of a reward measure. We then propose techniques to either verify that such a property holds or synthesise a controller for the model which makes it true. Our approach is based on a grid-based abstraction of the uncountable belief space induced by partial observability and, for dense-time models, an integer discretisation of real-time behaviour. The former is necessarily approximate since the underlying problem is undecidable, however we show how both lower and upper bounds on numerical results can be generated. We illustrate the effectiveness of the approach by implementing it in the PRISM model checker and applying it to several case studies from the domains of task and network scheduling, computer security and planning.

read more

Content maybe subject to copyright    Report

Real-Time Syst (2017) 53:354–402
DOI 10.1007/s11241-017-9269-4
Verification and control of partially observable
probabilistic systems
Gethin Norman
1
· David Parker
2
· Xueyi Zou
3
Published online: 8 March 2017
© The Author(s) 2017. This article is published with open access at Springerlink.com
Abstract We present automated techniques for the verification and control of partially
observable, probabilistic systems for both discrete and dense models of time. For
the discrete-time case, we formally model these systems using partially observable
Markov decision processes; for dense time, we propose an extension of probabilistic
timed automata in which local states are partially visible to an observer or controller.
We give probabilistic temporal logics that can express a range of quantitative properties
of these models, relating to the probability of an event’s occurrence or the expected
value of a reward measure. We then propose techniques to either verify that such
a property holds or synthesise a controller for the model which makes it true. Our
approach is based on a grid-based abstraction of the uncountable belief space induced
by partial observability and, for dense-time models, an integer discretisation of real-
time behaviour. The former is necessarily approximate since the underlying problem is
undecidable, however we show how both lower and upper bounds on numerical results
can be generated. We illustrate the effectiveness of the approach by implementing it
in the PRISM model checker and applying it to several case studies from the domains
of task and network scheduling, computer security and planning.
Keywords Formal verification · Probabilistic verification · Controller synthesis
B
David Parker
d.a.parker@cs.bham.ac.uk
1
School of Computing Science, University of Glasgow, Glasgow, UK
2
School of Computer Science, University of Birmingham, Birmingham, UK
3
Department of Computer Science, University of York, York, UK
123

Real-Time Syst (2017) 53:354–402 355
1 Introduction
Guaranteeing the correctness of complex computerised systems often needs to take
into account quantitative aspects of system behaviour. This includes the modelling of
probabilistic phenomena, such as failure rates f or physical components, uncertainty
arising from unreliable sensing of a continuous environment, or the explicit use of
randomisation to break symmetry. It also includes timing characteristics, such as time-
outs or delays in communication or security protocols. To further complicate matters,
such systems are often nondeterministic because their behaviour depends on inputs or
instructions from some external entity such as a controller or scheduler.
Automated verification techniques such as probabilistic model checking have been
successfully used to analyse quantitative properties of probabilistic systems across a
variety of application domains, including wireless communication protocols, computer
security and task scheduling. These systems are commonly modelled using Markov
decision processes (MDPs), if assuming a discrete notion of time, or probabilistic
timed automata (PTAs), if using a dense model of time. On these models, we can
consider two problems: verification that it satisfies some formally specified property
for any possible resolution of nondeterminism; or, dually, synthesis of a controller (i.e.,
a means to resolve nondeterminism) under which a property is guaranteed to hold.
For either case, an important consideration is the extent to which the system’s state is
observable to the entity controlling it. For example, to verify that a security protocol
is functioning correctly, it may be essential to model the fact that some data held by
a participant is not externally visible; or, when synthesising an optimal schedule for
sending packets over a network, a scheduler may not be implementable in practice if
it bases its decisions on information about the state of the network that is unavailable
due to the delays and costs associated with probing it.
Partially observable MDPs (POMDPs) are a natural way to extend MDPs in order to
tackle this problem. However, the analysis of POMDPs is considerably more difficult
than MDPs since key problems are undecidable (Madani et al.
2003). A variety of
verification problems have been studied for these models (see, e.g., de Alfaro
1999;
Baier et al.
2008; Chatterjee et al. 2013) and the use of POMDPs is common in
fields such as AI and planning (Cassandra 1998), but there is limited progress in
the development of practical techniques for probabilistic verification in this area, or
exploration of their applicability.
In this paper, we present novel techniques for verification and control of partially
observable, probabilistic systems under both discrete and dense models of time. We
use POMDPs in the case of discrete-time models and, for dense time, propose a model
called partially observable probabilistic timed automata (POPTAs), which extends
the existing model of PTAs with a notion of partial observability. The semantics of
a POPTA is an infinite-state POMDP. In order to specify verification and control
problems on POMDPs and POPTAs, we define temporal logics to express properties
of these models relating to the probability of an event (e.g., the probability of some
observation eventually being made) or the expected value of various reward measures
(e.g., the expected time until some observation). Nondeterminism in both a POMDP
and a POPTA is resolved by a strategy that decides which actions to take and when to
take them, based only on the history of observations (not states). The core problems
123

356 Real-Time Syst (2017) 53:354–402
we address are how to verify that a temporal logic property holds for all possible
strategies, and how to synthesise a strategy under which the property holds.
In order to achieve this, we use a combination of techniques. To analyse a POMDP,
we use grid-based techniques (Lovejoy et al.
1991; Yu and Bertsekas 2004), which
transform it to a fully observable but continuous-space MDP and then approximate its
solution based on a finite set of grid points. We use this to construct and solve a strategy
of the POMDP. The result is a pair of lower and upper bounds on the property of interest
for the POMDP. If this is not precise enough, we can refine the grid and repeat. In
the case of POPTAs, we develop a digital clocks discretisation, which extends the
existing notion for PTAs (Kwiatkowska et al.
2006). The discretisation reduces the
analysis to a finite POMDP, and hence we can use the techniques we have developed for
analysing POMDPs. We define the conditions under which temporal logic properties
are preserved by the discretisation step and prove the correctness of the reduction
under these conditions.
We implemented these methods in a prototype tool based on PRISM (Kwiatkowska
et al.
2011; PRISM), and investigated their applicability by developing a number of
case studies including: wireless network scheduling, a task scheduling problem, a
covert channel prevention device (the NRL pump) and a non-repudiation protocol.
Despite the undecidability of the POMDP problems we consider, we show that use-
ful results can be obtained, often with precise bounds. In each case study, partial
observability, nondeterminism, probability and, in the case of the dense-time models,
real-time behaviour are all crucial ingredients to the analysis. This is a combination
not supported by any existing techniques or tools.
A preliminary conference version of this paper, was published as Norman et al.
(
2015).
1.1 Related work
POMDPs are common in fields such as AI and planning: they have many appli-
cations (Cassandra
1998) and tool support exists (Poupart 2005). However, unlike
verification, the focus in these fields is usually on finite-horizon and discounted reward
objectives. Early undecidability for key problems can be found in, e.g., Madani et al.
(
2003). POMDPs have also been applied to problems such as scheduling in wireless
networks since, in practice, information about the s tate of wireless connections is often
unavailable and varies over time; see e.g. Johnston and Krishnamurthy (
2006), Li and
Neely (
2011), Yang et al. (2011), Jagannathan et al. (2013), and Gopalan et al. (2015).
POMDPs have also been studied by the formal verification community, see e.g. de
Alfaro (
1999), Baier et al. (2008), and Chatterjee et al. (2013), establishing unde-
cidability and complexity results for various qualitative and quantitative verification
problems. In the case of qualitative analysis, Chatterjee et al. (
2015) presents an
approach for the verification and synthesis of POMDPs against LTL properties when
restricting to finite-memory strategies. This has been implemented and applied to an
autonomous system (Svoreˆnová et al.
2015). For quantitative properties, the recent
work of Chatterjee (2016) extends approaches developed for finite-horizon objectives
to approximate the minimum expected reward of reaching a target (while ensuring
123

Real-Time Syst (2017) 53:354–402 357
the target is reached with probability 1), under the r equirement that all rewards in the
POMDP are positive.
Work in this area often also studies related models such as Rabin’s probabilistic
automata (Baier et al.
2008), which can be seen as a special case of POMDPs, and
partially observable stochastic games (POSGs) (Chatterjee and Doyen
2014), which
generalise them. More practically oriented work includes: Giro andRabe (
2012), which
proposes a counter-example-driven refinement method to approximately solve MDPs
in which components have partial observability of each other; and Cerný et al. (
2011),
which synthesises concurrent program constructs using a search over memoryless
strategies in a POSG.
Theoretical results (Bouyer et al.
2003) and algorithms (Cassez et al. 2007;
Finkbeiner and Peter
2012) have been developed for synthesis of partially observ-
able timed games. In Bouyer et al. (
2003), it is shown that the synthesis problem is
undecidable and, if the resources of the controller are fixed, decidable but prohibitively
expensive. The algorithms require constraints on controllers: in Cassez et al. (
2007),
controllers only respond to changes made by the environment and, in Finkbeiner and
Peter (
2012), their structure must be fixed in advance. We are not aware of any work
for probabilistic real-time models in this area.
1.2 Outline
Section
2 describes the discrete-time models of MDPs and POMDPs, and Sect. 3
presents our approach for POMDP verification and strategy synthesis. In Sect. 4,we
introduce the dense-time models of PTAs and POPTAs, and then, in Sect.
5, give our
verification and strategy synthesis approach for POPTAs using digital clocks. Section
6
describes the implementation of our techniques for analysing POMDPs and POPTAs in
a prototype tool, and demonstrates its applicability using several case studies. Finally,
Sect.
7 concludes the paper.
2 Partially observable Markov decision processes
In this section, we consider systems exhibiting probabilistic, nondeterministic and
discrete-time behaviour. We first introduce MDPs, and then describe POMDPs, which
extend these to include partial observability. For a more detailed tutorial on verification
techniques for MDPs, we refer the reader to, for example, Forejt et al. (
2011).
2.1 Markov decision processes
Let Dist(X) denote the set of discrete probability distributions over a set X , δ
x
the
distribution that selects x X with probability 1, and R the set of non-negative real
numbers.
Definition 1 (MDP)AnMDPisatupleM = (S, ¯s, A, P, R) where:
S is a set of states;
¯s S is an initial state;
A is a set of actions;
123

358 Real-Time Syst (2017) 53:354–402
P : S × A Dist(S) is a (partial) probabilistic transition function;
R = (R
S
, R
A
) is a reward structure where R
S
: S R is a state reward function
and R
A
: S × A R an action reward function.
An MDP M represents the evolution of a system exhibiting both probabilistic and
nondeterministic behaviour through states from the set S. Each state s S of M has
asetA(s)
def
={a A | P(s, a) is defined} of available actions. The choice between
which available action is chosen in a state is nondeterministic. In a state s, if action
a A(s) is selected, then the probability of moving to state s
equals P(s, a)(s
).
A path of M is a finite or infinite sequence π = s
0
a
0
s
1
a
1
···, where s
i
S,
a
i
A(s
i
) and P(s
i
, a
i
)(s
i+1
)>0 for all i N.The(i + 1)th state s
i
of path π
is denoted π(i ) and, if π is finite, last) denotes its final state. We write FPaths
M
and IPaths
M
, respectively, for the set of all finite and infinite paths of M starting
in the initial state ¯s. MDPs are also annotated with rewards, which can be used to
model a variety of quantitative measures of interest. A reward of R(s) is accumulated
when passing through state s and a reward of R(s, a) when taking action a from
state s.
A strategy of M (also called a policy or scheduler) is a way of resolving the choice
of action in each state, based on the MDP’s execution so far.
Definition 2 (Strategy)Astrategy of an MDP M = (S, ¯s, A, P, R) is a function
σ : FPaths
M
Dist(A) such that, for any π FPaths
M
,wehaveσ(π)(a)>0 only if
a A(last)).Let
M
denote the set of all strategies of M.
Astrategyismemoryless if its choices only depend on the current state, finite-
memory if it suffices to switch between a finite set of modes and deterministic if it
always selects an action with probability 1.
When M is under the control of a strategy σ , the resulting behaviour is captured
by a probability measure Pr
σ
M
over the infinite paths of M (Kemeny et al.
1976).
Furthermore, given a random variable f : IPaths
M
R over the infinite paths of M,
using the probability measure Pr
σ
M
, we can define the expected value of the variable
f with respect to the strategy σ , denoted E
σ
M
( f ).
2.2 Partially observable Markov decision processes
POMDPs extend MDPs by restricting the extent to which their current state can be
observed, in particular by strategies that control them. In this paper (as in, e.g., Baier
et al.
2008; Chatterjee et al. 2013), we adopt the following notion of observability.
Definition 3 (POMDP) A POMDP is a tuple M = (S, ¯s, A, P, R, O, obs) where:
(S, ¯s, A, P, R) is an MDP;
O is a finite set of observations;
obs : S O is a labelling of states with observations;
such that, for any states s, s
S with obs(s) = obs(s
), their available actions must
be identical, i.e., A(s) = A(s
).
123

Citations
More filters
Journal ArticleDOI

The probabilistic model checker Storm

TL;DR: The main features of Storm are reported and how to effectively use them are explained and an empirical evaluation of different configurations of Storm on the QComp 2019 benchmark set is presented.
Proceedings Article

Finite-State Controllers of POMDPs using Parameter Synthesis

TL;DR: Uncertainty in Artificial Intelligence: Thirty-Fourth Conference (2018) August 6-10, 2018, Monterey, California, USA
Journal ArticleDOI

Temporal logic control of POMDPs via label-based stochastic simulation relations

TL;DR: This work constructs a finite-state abstraction on which a control policy is synthesized and refined back to the original belief model, and introduces a new notion of label-based approximate stochastic simulation to quantify the deviation between belief models.
Posted Content

Parameter Synthesis for Markov Models.

TL;DR: This paper presents various analysis algorithms for parametric Markov chains and Markov decision processes, and gives a detailed account of the various algorithms, presents a software tool realising these techniques, and reports on an extensive experimental evaluation on benchmarks.
Journal ArticleDOI

Probabilistic Model Checking of Robots Deployed in Extreme Environments

TL;DR: A framework for probabilistic model checking on a layered Markov model to verify the safety and reliability requirements of such robots, both at pre-mission stage and during runtime.
References
More filters
Book

Markov Decision Processes: Discrete Stochastic Dynamic Programming

TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.
Journal ArticleDOI

A theory of timed automata

TL;DR: Alur et al. as discussed by the authors proposed timed automata to model the behavior of real-time systems over time, and showed that the universality problem and the language inclusion problem are solvable only for the deterministic automata: both problems are undecidable (II i-hard) in the non-deterministic case and PSPACE-complete in deterministic case.
MonographDOI

Markov Decision Processes

TL;DR: Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.
Book ChapterDOI

PRISM 4.0: verification of probabilistic real-time systems

TL;DR: A major new release of the PRISMprobabilistic model checker is described, adding, in particular, quantitative verification of (priced) probabilistic timed automata.
Book

The Theory and Practice of Concurrency

A. W. Roscoe
TL;DR: This book provides a detailed foundation for working with CSP, using as little mathematics as possible, and introduces the ideas behind operational, denotational and algebraic models of CSP.
Related Papers (5)