scispace - formally typeset
Open AccessJournal ArticleDOI

Failure diagnosis using discrete-event models

Reads0
Chats0
TLDR
A discrete-event systems (DES) approach to the failure diagnosis problem is proposed, applicable to systems that fall naturally in the class of DES; moreover, for the purpose of diagnosis, continuous-variable dynamic systems can often be viewed as DES at a higher level of abstraction.
Abstract
Detection and isolation of failures in large, complex systems is a crucial and challenging task. The increasingly stringent requirements on performance and reliability of complex technological systems have necessitated the development of sophisticated and systematic methods for the timely and accurate diagnosis of system failures. We propose a discrete-event systems (DES) approach to the failure diagnosis problem. This approach is applicable to systems that fall naturally in the class of DES; moreover, for the purpose of diagnosis, continuous-variable dynamic systems can often be viewed as DES at a higher level of abstraction. We present a methodology for modeling physical systems in a DES framework and illustrate this method with examples. We discuss the notion of diagnosability, the construction procedure of the diagnoser, and necessary and sufficient conditions for diagnosability. Finally, we illustrate our approach using realistic models of two different heating, ventilation, and air conditioning (HVAC) systems, one diagnosable and the other not diagnosable. While the modeling methodology presented here has been developed for the purpose of failure diagnosis, its scope is not restricted to this problem; it can also be used to develop DES models for other purposes such as control.

read more

Content maybe subject to copyright    Report

IEEE
TRANSACTIONS ON CONTROL
SYSTEMS
TECHNOLOGY,
VOL.
4, NO.
2,
MARCH 1996
105
Failure Diagnosis Using Discrete-Event Models
Meera Sampath,
Student Member,
IEEE,
Raja Sengupta, Stephane Lafortune,.
Member,
IEEE,
Kasim Sinnamohideen,
Member,
IEEE,
and Demosthenis
C.
Teneketzis,
Member,
IEEE
Abstruct-
Detection and isolation of failures in large, com-
plex systems is a crucial and challenging task. The increasingly
stringent requirements on performance and reliability of com-
plex technological systems have necessitated the development
of sophisticated and systematic methods for the timely and
accurate diagnosis of system failures. We propose a discrete-event
systems (DES) approach to the failure diagnosis problem. This
approach is applicable to systems that fall naturally in the class of
DES; moreover, for the purpose of diagnosis, continuous-variable
dynamic systems can often be viewed as DES at a higher level
of abstraction. We present a methodology for modeling physical
systems in a DES framework and illustrate this method with
examples. We discuss the notion of diagnosability, the construc-
tion procedure of the diagnoser, and necessary and sufficient
conditions for diagnosability. Finally, we illustrate our approach
using realistic models of two different heating, ventilation, and air
conditioning
(HVAC)
systems, one diagnosable and the other not
diagnosable. While the modeling methodology presented here has
been developed for the purpose of failure diagnosis, its scope is
not restricted to this problem; it can also be used to develop DES
models for other purposes such as control.
A
detailed treatment of
the theory underlying our approach can be found in a companion
paper
[27].
I.
INTRODUCTION
ETECTION and isolation of failures in large, complex
D
systems is a crucial and challenging task. Most practical
systems employ some means
of
fault detection, the sim-
plest
of
such schemes involving threshold logic, alarms, and
warning systems. The increasingly stringent requirements on
performance and reliability of complex technological systems,
however, have necessitated the development of sophisticated
and systematic methods for the timely and accurate diagnosis
of system failures. The problem of failure diagnosis has
received considerable attention in the literature of reliability
engineering, control, and computer science and a wide vari-
ety of schemes have been proposed. Failure diagnosis using
fault trees has been studied in detail by reliability engineers
[171,
[I
61, 1321, 181, [34]. Quantitative, analytical-model-based
methods have been extensively studied by control systems
researchers (see [lo], [33] and [35] and references therein;
also see [3] and [31]) while expert systems and model-
based reasoning schemes for diagnosis have been proposed
by computer scientists (see, e.g., 151, [201, [91, [71, [61, 1221,
Manuscript received May 16, 1994. Recommended by Assocaite Editor,
X.
Cao. This work was supported in part by
NSF
Grants ECS-9057967, ECS-
9312134, and ECS-9204419, with additional support from DEC and GE.
M. Sampath,
R.
Sengupta,
S.
Lafortune, and
D.
Teneketzis
are
with the
Department of Electrical Engineering and Computer Science, University
of
Michigan, Ann Arbor, MI 48109-2122 USA.
K.
Sinnamohideen is with Johnson Controls, Inc., Milwaukee,
WI
53201
USA.
Publisher Item Identifier
S
1063-6536(96)02070-2.
[ll], 1231, and [26]). A detailed discussion of several of
these methods has appeared in 1241. For a brief overview
of the salient features of the aforementioned methods, see
[28]. Recently, the problem of failure diagnosis has also been
studied in the framework
of
discrete-event systems (DES)
141, [141, [181, 1191, [291, [34]. In 1181 and [19], the authors
propose a state-based approach to diagnosability; they study
the problems of off-line diagnosis and on-line diagnosis where
the basic idea of the diagnostic procedure is to “test and
observe.” Extensions of the above work can be found in
[4] where the authors study testability of DES.
In
[14], the
authors present a template monitoring scheme based on timing
and sequencing relationships of events for fault monitoring in
manufacturing systems. In [34], the authors propose a Petri net
based method for failure diagnosis of manufacturing systems
which uses Petri net models for failure detection and fault
trees for failure isolation.
We propose in this paper and in the companion paper [27]
a DES approach to the failure diagnosis problem that expands
on the work in [29] and is different from the DES-based
approaches mentioned above. DES are characterized by a
discrete-state space of logical values and event driven dynam-
ics. Most large scale dynamic systems can be viewed as DES at
some level of abstraction. Hence, the proposed method of fault
diagnosis is applicable not only to systems that fall naturally
in the class of DES (communication networks and computer
systems, for instance), but also to systems traditionally treated
as continuous variable dynamic systems and modeled by
differential equations. One of the major advantages of the
proposed method is that it does not require detailed in-depth
modeling of the system to be diagnosed and hence is ideally
suited for the diagnosis of large complex systems like heating,
ventilation and air conditioning (HVAC) units, power plants,
and semiconductor processes. Other application areas include
automated manufacturing systems like automobile manufac-
turing where systematic diagnostic procedures are necessary
to check equipment integrity before they leave the production
line. Fig.
1
illustrates the overall system architecture which
contains in it a DES-based diagnostic subsystem. We assume a
two-level system architecture. At the lower level is the system
itself with its set of controllers; these low-level controllers
typically consist of equipment controllers and multivariable
controllers. The upper level consists
of
the supervisor, which
performs the tasks of control and coordination of the low-level
controllers, failure diagnosis, failure recoveryhystem recon-
figuration following failure identification, and coordination of
all
of
these subsystem operations. The interface between the
two layers conveys information
on
occurrences of observable
1063-6536/96$05.00
0
1996 IEEE

106
Observable Event
IEEE
TRANSACTIONS
ON
CONTROL
SYSTEMS
TECHNOLOGY,
VOL.
4,
NO.
2,
MARCH
1996
Type of Failure
DIAGNOSER
~
b
SUPERVISOR
COORDINATION
Commands
Observable events
I
INTERFACE
f
CONTROLLER(S)
I
I
I
Fig. 1.
The conceptual system architecture.
events in the system to the supervisor and communicates the
commands issued by the supervisor to the system.
Our approach to failure diagnosis involves two major steps:
developing a discrete-event model of the system to be diag-
nosed followed by construction of the diagnoser. The discrete-
event model that we develop captures both the normal and the
failed behavior of the system. The failures are modeled as
unobservable events and the objective is to infer about past
occurrences of these failures on the basis of the observed
events. The diagnoser is a finite-state machine
(FSM)
built
from the system model. This machine performs diagnosis
when it observes on-line the behavior of the system. The
diagnoser provides estimates of the state of the system after
the occurrence of every observable event. In addition, states
of
the diagnoser carry failure information and occurrences
of
failures can be detected (with a finite delay) by inspecting these
states. Fig.
2
illustrates the basic paradigm of our approach.
The top part of this figure shows the various steps involved
I
System
Model
and
Observations
I
Observer
-1
Estimate
of
Current System State
1
I
I
Inferencing
About
Past Failure Events
I
Potential Past Failures
d
Failure Identification
fi
I
Message
to
Coordinator
I
Fig.
2.
The
diagnostic
process.
in failure diagnosis; all these steps are to be performed by
the diagnoser, as shown in the bottom part of Fig.
2.
This
approach to diagnosis is appropriate for failures that involve
significant changes
in
the status of system components but do
not necessarily bring the system
to
a halt.
One of the main contributions of this paper is a pre-
cise methodology for modeling physical systems in a
DES
framework. The system is assumed
to
consist of several
distinct physical components and equipped with a set of
sensors. Starting from discrete-event models of the individual
components and from the discrete-valued sensor maps, we
present a systematic procedure for generating a composite
model which captures the interaction among the components
and also incorporates in it the sensor maps. This composite
model is the
DES
on which we perform diagnostics. While
this approach to modeling has been developed for the purpose
of diagnostics, its scope is not restricted to this problem; the
model building methodology presented here can be used to
develop
DES
models of any real system for other purposes
such as control.
Aside from the modeling methodology, the rest of the
theoretical developments underlying our approach to failure
diagnosis are presented in
[27].
In
[27]
we introduce two
related notions
of
diagnosability of a language generated by
a
DES.
The first definition, referred to as diagnosability,

SAMPATH
er
al.:
FAILURE DIAGNOSIS USING DISCRETE-EVENT MODELS
107
is more stringent than the second one, which we refer to
as I-diagnosability. Roughly speaking, a system is said to
be diagnosable if it is possible to detect, with finite delay,
occurrences
of
certain specific unobservable events, namely,
the failure events. In
[27]
we present a formal construction
procedure of the diagnoser followed by necessary and suffi-
cient conditions for diagnosability and I-diagnosability. These
conditions are stated on the diagnoser or variations thereof.
Thus, the diagnoser serves two purposes:
1)
on-line detection
and isolation of failures and
2)
off-line verification of the
diagnosability properties of the system.
In this paper, we restrict our attention to the notion
of I-diagnosability introduced in
[27].
Section I1 describes,
with illustrative examples, model building for diagnosis. In
Section 111, we present some
of
the main results of [27];
we review the notion of I-diagnosability, the construction of
diagnosers, and the necessary and sufficient conditions for
I-diagnosability. Next, we illustrate our approach to failure
diagnosis with two examples of HVAC systems. The DES
models of these systems, the corresponding diagnosers and
their analysis are presented in Section IV. In Section V, we
provide a brief comparison of the proposed method with some
of the other approaches to failure diagnosis mentioned earlier.
Finally, in Section
VI
we summarize the main results of this
paper.
11.
MODEL BUILDING
FOR
DIAGNOSIS
Suppose that the system to be diagnosed has
N
individual
components; typically, these components consist of equip-
ment and controllers. We first build DES models for these
components. Let
refer to the FSM (see, e.g., [25]) model of the ith component;
here
X,
is the state space,
C,
is the event set,
6,
is the
transition function, and
20,
is the initial state of G,. The states
in
X,
and the events in
C,
reflect the normal and the failed
behavior of the zth component. Some of the events in
E,
are
observable, i.e., their occurrence can be observed, while the
rest are unobservable. Typically, the observable events include
commands issued by the supervisor while the unobservable
events include failure events.
Next, we compose these individual models using the stan-
dard synchronous composition operation on state machines
(see, e.g.,
[
151). The synchronous composition procedure,
recalled below, is used to model the joint operation of two or
more
DES
given their individual
FSM
models. Consider two
discrete-event systems GI
=
(XI,
C1, 61,
ZOI)
and G2
=
(X2,
C2,
62,
1~02).
We denote by
e,(%)
the active event set
of
G,
at state
x,
i.e., the set of all transitions of G, defined
at state
x.
Let
G
=
(X,
C,
6,
z0)
denote the synchronous
composition of G1 and G2. Then
c
=
c1
U
C2
x
=
X1
x
X2
IC0
=
(201,
502)
Thus an event
U
which is common to both G1 and
Gz
is
possible at state (x1,
22)
of G only if
U
is in the active event
set of
GI
at x1 and in the active event set of Gz at
22.
In this
case, both systems
GI
and G2 are assumed to execute
o.
On
the other hand, if
o
is an event possible in G1 (Gz) and it
is not in
E2
(El), then only GI (Gz) executes the transition
0.
It is not difficult to see that the synchronous composition
procedure described above can be extended to model the joint
operation of any number of DES.
Let
G
=
(X,
2,
8,2i.o)
denote the synchronous composition of the component models
G;,
i
=
1,
+..
,
N.
Observe that we need only consider the
accessible part of
G.
G
then models the joint operation of
these components. Here
(3)
Given the set of
M
sensors of th_e system of interest, we
next identify the sensor maps
hj
:
X
-+
Yj,
j
=
1,
. .
.
, M
where
Yj
denotes the discrete set of possible outputs of the
jth sensor. Define
M
Y=ny,
(4)
j=l
and let
h:
X
-+
Y
denote the global sensor map defined as
follows:
h(z)
=
(h1(z),
hz(z),...,hnir(z)).
(5)
Finally, we transform
G
=
(X,
2,
8,
20)
to G
=
(X,-C,
6,
20)
with
xo
=
20
by redefining the trans_itions
of G
as
follows. Let
6(z,
U)
=
x’
where
x,
5’
E
X
and
2.
If
r
is observable (typically a command event), then
rename
U
in the transition as
(0,
h(z’))
and let
S(x,
(a,
h(x’)))
=
d.
The new event
(0,
h(x’))
is
observable in
C.
If
r
is unobservable and if
h(x)
=
h(x’),
then
o
is
left unchanged in
G
and
s(z,
o)
=
d.
The event
o
is
treated as unobservable in
C.
If
g
is unobservable and if
h(z)
#
h(x’),
then re-
place the transition
s“(~,
g)
=
x’
by the following two
transitions:
a)
6(z,
g)
=
z,,,
and
b)
S(xneW,
(h(z)
-+
h(5’)))
=
2’

I08
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY,
VOL.
4,
NO.
2,
MARCH
1996
OPEN-VALVE, CLOSE-VALVE
OPEN-VALVE, CLOSE-VALVE
VALVE
START-PUMP, STOP-PUMP
PUMP-FAILED-OFF-2
UMP
P~P-FAILED-ON-
IMP-FAILED-ON-2
START-PUMP. STOP-PUMP
PUMP
CONTROLLER
Fig.
3.
Component models
for
Example
2.1
where
x,,,
denotes a newly introduced state and
(h(x)
+
h(x’))
denotes the change in sensor readings corresponding
to states
x
and
2’.
The first transition
0
is unobservable in
C
while the second
(h(x)
4
h(z’))
is observable.
For the purpose of clarity, we henceforth denote all events
in the composite model
G
within braces,
(.
.
.).
Therefore the
event set
C
of
G
consists of composite events of the following
three types:
1)
(U,
h(x’)):
observable;
2)
(U):
unobservable; and
3)
(h(z)
4
h(x’)):
observable.
Let
X,,,
denote the set of all new states
x,,,
introduced
in Step
3)
above. Then
x
=
x
U
x,,,.
(6)
This completes the model building procedure for diagnosis.
The system to be diagnosed is now represented by the discrete-
event model
G
=
(X,
C,
6,
~0).
(7)
Note that the model
G
accounts for the normal and failed
behavior of the system. The observable events in this system
may be one of the following: commands issued by the super-
visor and sensor readings immediately after the execution
of
the above commands, and changes of sensor readings. The
unobservable events may be failure events or other events
which cause changes in the system state not recorded by
sensors.
We note at this point that the proposed approach to diagnosis
is not limited to the case of equipment and controller failures.
Sensor failures, too, can be handled in this framework by
simply treating the sensor
as
an additional component of the
system. In other words, we develop in addition to the equip-
ment and controller models, explicit discrete-event models,
which include both normal and failed states, for those sensors
that can fail.
We now present two examples to illustrate the above mod-
eling procedure. These examples also illustrate that in the
proposed framework, the modeling can be done at different
levels of granularity. In the first example, we model the
dynamic behavior of a system over its entire range of operation
including start-up and shutdown procedures. In the second
example, we model deviations from the steady state of a
system.
Example
2.1:
Consider an elementary
HVAC
system
consisting of a pump, a valve, and a controller. Fig.
3
depicts the individual component models
G,,i
=
1,2,3,
of the valve, pump, and controller, respectively.
The valve has four failure events:
STUCK-CLOSED-1,

SAMPATH
er
al.:
FAILURE
DIAGNOSIS
USING
DISCRETE-EVENT
MODELS
h(
POFF, VC,
)
=
NP, NF
h(
POFF, VO,
)
=
NP, NF
h(
POFF, SC,
)
=
NP, NF
h(
POFF,
SO,
)
=
NP, NF
h(
PON,VO,*)
=
PP,F
h(
PON,SC,*)
=
PP,NF
h(
PON,SO,*)
=
PP,F
-
109
h(
PFOFF, VC,
0)
=
NP, NF
h(
PFOFF, VO,
)
=
NP, NF
h(
PFOFF,
SC,
)
=
NP, NF
h(
PFOFF,
SO,
)
=
NP, NF
h(
PFON, VC,
)
=
PP, NF
h(
PFON,VO,*)
=
PP,F
h(
PFON,SC,*)
=
PP,NF
h(
PFON,SO,o)
=
PP,F
PON
PFOFF
PFON
vc
vo
sc
so
vc
so
vc
rli!
so
”!!
vc
vo
sc
so
Fig.
4.
Synchronous
composition
of
the
component models
for
Example
2.1.
C1
C1
c‘1
c‘1
(‘2
(‘2
c2
e2
IC3
c‘q
C3
(‘3
(‘4
c‘4
c
‘4
c‘4
STUCK-CLOSED-2, STUCK-OPEN-1,
and
STUCK-OPEN-2.
The states
SC
and
SO
represent the stuck-closed and the
stuck-open status of the valve, respectively, while the
states VC and VO denote the closed-normal and open-
normal status, respectively. Likewise, the pump has four
failure events:
PUMP-FAILED-OFF-1, PUMP-FAILED-OFF-2,
PUMP-FAILED-ON-1,
and
PUMP-FAILED-ON-2.
The states
PFOFF and PFON represent the failed-off and failed-on status
of the pump while the states PON and POFF represent the
normally-on and off status. The only unobservable events in
this system are the failure events of the pump and the valve.
The system
G
in Fig.
4
is obtained by the synchronous
composition of the valve, pump, and controller models of
Fig.
3.
Both the accessible and the inaccessible states
of
the
system are shown in this figure. The inaccessible states are
subsequently dropped. Dotted lines in this figure indicate un-
observable events while solid lines indicate observable events.
For the sake of clarity, some of the events in this figure are
shown abbreviated. For instance, the event
STUCK-CLOSED-1
is shown
as
SC1, the event
PUMP-FAILED-ON-:!
as
PFON2,
and
so
forth.
Next, assume that there are two sensors in the system,
a
pressure sensor
on
the pump and
a
valve flow sensor. Let
Yl
=
{NP,
PP} and
Y2
=
{NF, F} denote the set of outputs
of the pressure sensor and flow sensor, respectively.
NP
and PP
denote no pressure and positive pressure, respectively, while
NF and
F
denote no flow and flow, respectively. Table
I
lists
the global sensor map
h.
Note that the map
h
is
defined only
for the accessible states of
G
in Fig.
4.
Also,
h
does not depend
on the state of
GJ,
the controller, which is indicated
in
the
table by the
OS.
The final composite model
G
is given in Fig.
5.
The shaded
circles in Fig.
5
denote the additional states
x,,,;
as
before,
observable events are indicated by solid lines and unobservable
events by dotted lines. The table in Fig.
5
lists the events in

Citations
More filters
Journal ArticleDOI

Diagnosability of discrete-event systems

TL;DR: The approach to failure diagnosis presented in this paper is applicable to systems that fall naturally in the class of DES's; moreover, for the purpose of diagnosis, most continuous variable dynamic systems can be viewed as DES's at a higher level of abstraction.
Proceedings ArticleDOI

Failure diagnosis using discrete event models

TL;DR: A discrete event systems (DES) approach to the failure diagnosis problem is proposed and the notion of diagnosability is discussed, and the construction procedure of the diagnoser is presented.
Journal ArticleDOI

Coordinated Decentralized Protocols for Failure Diagnosisof Discrete Event Systems

TL;DR: Three protocols are specified that achieve, each under a set of assumptions, the same diagnostic performance as the centralized diagnoser and highlight the “performance vs. complexity” tradeoff that arises in coordinated decentralized architectures.
Journal ArticleDOI

Fault diagnosis in discrete-event systems: framework and model reduction

TL;DR: A state-based approach for online passive fault diagnosis in systems modeled as finite-state automata is presented, and necessary and sufficient conditions for failure diagnosability are derived.
Journal ArticleDOI

Fault detection for discrete event systems using Petri nets with unobservable transitions

TL;DR: This paper proves that the set of all possible firing sequences corresponding to a given observation can be described as follows, and proposes a simple tabular algorithm to determine a basis reachability tree that can be used as a diagnoser.
References
More filters
Journal ArticleDOI

Detection of abrupt changes: theory and application

TL;DR: A unified framework for the design and the performance analysis of the algorithms for solving change detection problems and links with the analytical redundancy approach to fault detection in linear systems are established.
Journal ArticleDOI

Fault diagnosis in dynamic systems using analytical and knowledge-based redundancy—a survey and some new results

Paul M. Frank
- 01 May 1990 - 
TL;DR: In this article, the authors review the state of the art of fault detection and isolation in automatic processes using analytical redundancy, and present some new results with emphasis on the latest attempts to achieve robustness with respect to modelling errors.
Journal ArticleDOI

The control of discrete event systems

TL;DR: The focus is on the qualitative aspects of control, but computation and the related issue of computational complexity are also considered.
Journal ArticleDOI

Paper: A survey of design methods for failure detection in dynamic systems

TL;DR: This paper surveys a number of methods for the detection of abrupt changes in stochastic dynamical systems, focusing on the class of linear systems, but the basic concepts carry over to other classes of systems.
Journal ArticleDOI

Diagnosability of discrete-event systems

TL;DR: The approach to failure diagnosis presented in this paper is applicable to systems that fall naturally in the class of DES's; moreover, for the purpose of diagnosis, most continuous variable dynamic systems can be viewed as DES's at a higher level of abstraction.
Related Papers (5)