scispace - formally typeset
Open AccessBook ChapterDOI

A Region-Based Algorithm for Discovering Petri Nets from Event Logs

Reads0
Chats0
TLDR
This paper presents a new method for the synthesis of Petri nets from event logs in the area of Process Mining that derives a bounded Petri net that over-approximates the behavior of an event log.
Abstract
The paper presents a new method for the synthesis of Petri nets from event logs in the area of Process Mining. The method derives a bounded Petri net that over-approximates the behavior of an event log. The most important property is that it produces a net with the smallest behavior that still contains the behavior of the event log. The methods described in this paper have been implemented in a tool and tested on a set of examples.

read more

Content maybe subject to copyright    Report

A Region-based Algorithm for Discovering
Petri Nets from Event Logs
J. Carmona
1
, J. Cortadella
1
, and M. Kishinevsky
2
1
Universitat Polit`ecnica de Catalunya, Spain
2
Intel Corporation, USA
Abstract. The paper presents a new method for the synthesis of Petri
nets from event logs in the area of Process Mining. The method derives a
bounded Petri net that over-approximates the behavior of an event log.
The most important property is that it produces a net with the smallest
behavior that still contains the behavior of the event log. The methods
described in this paper have been implemented in a tool and tested on
a set of examples .
1 Introduction
The discovery of formal models from event logs in information systems is known
as process mining. Since the nineties, the area of process mining has been fo-
cused in providing formal support to business information systems [16]. In the
industrial domain, ranging from hospitals and banks to se nsor networks or CAD
for VLSI, process mining can be applied to succinctly summarize the behavior
observed in large event logs [14]. Nowadays, several approaches can be used to
mine formal models, most of them included in the ProM framework [15].
The synthesis problem [7] is related to process mining: it consists in building
a Petri net that has a behavior equivalent to a given transition system. The prob-
lem was first addressed by Ehrenfeucht and Rozenberg [8] introducing regions to
model the sets of states that characterize marked places. Process mining differs
from synthesis in the knowledge assumption: while in synthesis one assumes a
complete description of the system, only a partial description of the system is
assumed in process mining. Therefore, bisimulation is no longer a goal to achieve
in process mining. Instead, obtaining approximations that succinctly represent
the log under consideration are more valuable [19].
In the area of synthesis, some approaches have been studied to take the
theory of regions into practice. In [3] polynomial algorithms for the synthesis of
bounded nets were presented. This approach has been recently adapted for the
problem of process mining in [4]. In [6], the theory of regions was applied for the
synthesis of safe Petri nets with bisimilar behavior. Rec ently, the theory from [6]
has been extended to b ounded Petri nets [5]. In this paper we adapt the theory
from [5] to the problem of process mining.
The work presented in this paper aims at constructing (mining) a Petri net
that covers the behavior observed in the event log, i.e. traces in the event log
Carmona, J.; Cortadella, J.; Kishinevsky, M. A region-based algorithm for discovering Petri nets from event
logs. A: International Conference on Business Process Management. "Business Process Management, 6th
International Conference, BPM 2008: Milan, Italy, September 2-4, 2008: proceedings". Springer, 2008, p.
358-373.
The final authenticated version is available online at https://doi.org/10.1007/978-3-540-85758-7_26

s
p
rj rs
sb
em
ac
ap
c
r
Fig. 1. Petri net mining to avoid overfitting.
will be feasible in the Petri net. Moreover, the Petri net may accept traces not
observed in the log. Additionally, a minimality property is demonstrated on the
mined Petri net: no other net exists that both covers the log and accepts less
traces than the mined Petri net. This capability of minimal over-approximation
represents the main theoretical contribution of this paper. The methods pre-
sented in the paper can mine a particular k-bounded Petri net, for a given
bound k. We have implemented the theory of this paper in a tool, and some
preliminary results from logs are reported. The approach taken in this paper
is a formal one and differs from the more heuristic methods in the literature.
Although the methods presented might have a high complexity for large logs,
they can be combined with recent iterative approaches [18] to alleviate their
complexity.
This paper shares common goals with the previously presented paper [4].
In [4], two process mining s trategies on region of languages are prese nted, having
the same minimality goal as the one that we have in this paper. However the
strategy is different: integer linear models are solved in order to find a set of
special places called feasible places that guarantee the inclusion of the traces
from the event log. The more places added, the more traces are forbidden in
the resulting net. If the net contains all the possible feasible places, then the
minimality property can be demonstrated. However, the set of feasible places
might be infinite. In our case, given a maximal bound k for the mining of a
k-bounded Petri net, minimal regions of the transition system are enough to
demonstrate the minimality property on this bound.
Example. In [14], a small log is presented to motivate the overfit-
ting produced by synthesis tools. The log contains the following activi-
ties: r=register, s=ship, sb=send
bill, p=payment, ac=accounting, ap=approved,
c=close, em=express mail, rj=rejected, and rs=resolve. Now assume that the
event log contains the traces (r, s, sb,p, ac, ap, c), (r,sb,em, p, ac, ap, c),
(r, sb, p, em, ac, rj, rs, c), (r, em, sb, p, ac, ap, c), (r, sb, s, p, ac, rj, rs, c),
(r, sb, p, s, ac, ap, c) and (r, sb, p, em, ac, ap, c). From this log, a TS can
be obtained [13] and a PN as the one shown in Figure 1 will be synthesized
by a tool like petrify [6]. If the log is slightly changed (for instance, trace
(r, sb, s, p, ac, rj, rs, c) is replaced by (r, sb, s, p, ac, ap, c), the synthesis tool
will adapt the PN to account for the changes, deriving a different PN. This means
that synthesis algorithms are very sensitive to variations in the logs. However,
the techniques presented in this paper, as it happ e ns also with traditional min-

ing approaches like the α-algorithm [16], are less sensitive to variations in event
logs, and will derive the same PN over the modified log.
The two models used in this paper are Petri nets and transition systems.
We will assume that a transition system represents an e vent log obtained from
observing a real system from which an event-based representation (e.g. a Petri
net) approximating its behavior must be obtained. The derivation of the tran-
sition system from an event log is an important step, that may have big impact
in the final mined Petri net, as it is demonstrated in [13]. A two-step approach
is presented in [13], emphasizing that the first step (generation of the transition
system) is crucial for the balance between underfitting and ove rfitting. If the de-
sired abstraction is attained in the first step, i.e. the transition system represents
an abstraction of the event log, the second step is expected to reproduce exactly
this abstraction, via synthesis. The methods presented in this paper extend the
possibilities of this two-step approach, given that the second s tep might also
introduce further abstraction in a controlled manner. The approaches based on
regions of languages perform the mining process in only one step, provided that
logs can be directly inte rpreted as languages [4].
2 Preliminaries: theory of regions
2.1 Finite transition systems and Petri nets
Definition 1 (Transition system). A transition system (TS) is a tuple
(S, E, A, s
in
), where S is a set of states, E is an alphabet of actions, such
that S E = , A S × E × S is a set of (labelled) transitions, and s
in
is the
initial state.
Let TS = (S, E, A, s
in
) be a transition system. We consider connected TSs
that satisfy the following axioms:
S and E are finite sets.
Every event has an occurrence: e E : (s, e, s
0
) A;
Every state is reachable from the initial state: s S : s
in
s.
A TS is called deterministic if for each state s and each label a there can
be at most one state s
0
such that s
a
s
0
. The relation between TSs will be
studied in this paper. The language of a TS, L(TS), is the set of traces feasible
from the initial state. When, L(TS
1
) L(TS
2
), we will denote TS
2
as an over-
approximation of TS
1
. The notion of simulation b etween two TSs is related to
this concept:
Definition 2 (Simulation [2]). Let TS
1
= (S
1
, E, A
1
, s
in
1
) and
TS
2
= (S
2
, E, A
2
, s
in
2
) be two TSs with the same set of events. A simula-
tion of TS
1
by TS
2
is a relation π between S
1
and S
2
such that
for every s
1
S
1
, there exists s
2
S
2
such that s
1
πs
2
.
for every (s
1
, e, s
0
1
) A
1
and for every s
2
S
2
such that s
1
πs
2
, there exists
(s
2
, e, s
0
2
) A
2
such that s
0
1
πs
0
2
.

When TS
1
is simulated by TS
2
with relations π, and viceversa with relation
π
1
, TS
1
and TS
2
are bisimilar [2].
Definition 3 (Petri net [12]). A Petri net (PN) is a tuple (P, T, F, M
0
)
where P and T represent finite sets of places and transitions, respectively, and
F (P × T ) (T × P ) is the flow relation. The initial marking M
0
P defin es
the initial state of the system
3
.
The sets of input and output transitions of place p are denoted by p and
p, respectively. The set of all markings of N reachable from the initial marking
m
0
is called its Reachability Set. The Reachability Graph of PN (RG(PN)) is a
transition system in which the set of states is the Reachability Set, the events
are the transitions of the net and a transition (m
1
, t, m
2
) exists if and only if
m
1
t
m
2
. We use L(PN) as a shortcut for L(RG(PN)).
2.2 Regions
We now review the classical theory of regions for the synthesis of Petri nets [6–8].
Let S
0
be a subset of the states of a TS, S
0
S. If s 6∈ S
0
and s
0
S
0
, then we
say that transition s
a
s
0
enters S
0
. If s S
0
and s
0
6∈ S
0
, then transition s
a
s
0
exits S
0
. Otherwise, transition s
a
s
0
does not cross S
0
.
Definition 4. Let TS = (S, E, A, s
in
) be a TS. Let S
0
S be a subset of states
and e E be an event. The following conditions (in the form of predicates) are
defined for S
0
and e:
nocross(e, S
0
) (s
1
, e, s
2
) A : s
1
S
0
s
2
S
0
enter(e, S
0
) (s
1
, e, s
2
) A : s
1
6∈ S
0
s
2
S
0
exit(e, S
0
) (s
1
, e, s
2
) A : s
1
S
0
s
2
6∈ S
0
The notion of a region is central for the synthesis of PNs. Intuitively, each
region is a set of states that corresponds to a place in the synthesized PN, so
that every state in the region models the marking of the place.
Definition 5 (region). A set of states r S in TS = (S, E, A, s
in
) is called a
region if the following two conditions are satisfied for each event e E:
(i) enter(e, r) ¬nocross(e, r) ¬exit(e, r)
(ii) exit(e, r) ¬nocross(e, r) ¬enter(e, r)
A region is a subset of states in which all transitions labeled with the same
event e have exactly the same “entry/exit” relation. This relation will become
the predecessor/successor relation in the Petri net. The event may always be
either an enter event for the region (case (i) in the previous definition), or
3
Although this paper deals with bounded Petri nets, for the sake of clarity we restrict
the theory of current and next sections to the simpler class of safe (1-bounded) Petri
nets. Section 4 discuss es how to generalize the method for bounded Petri nets.

a
b
c
d
r
1
r
2
r
3
r
4
r
5
a
b
b
a
d
s1
c
(a) (b)
Minimal Regions
1
2
r = { s1, s3 }
3
r = { s2, s4 }
4
r = { s3, s4 }
r = { s5 }
5
r = { s1, s2 }
(c)
s2
s3
s4
s5
Fig. 2. (a) Transition system, (b) minimal regions, (c) synthesis applying Algorithm
of Figure 3.
always be an exit event (case (ii)), or never “cross” the region’s boundaries,
i.e. each transition labeled with e is internal or external to the region, where
the antecedents of neither (i) nor (ii) hold. The transition corresponding to the
event will be successor, predecessor or unrelated with the corresponding place
respective ly.
Examples of regions are reported in Figure 2: from the TS of Figure 2(a),
some regions are enumerated in Figure 2(b). For instance, for region r
2
, event a
is an exit event, event d is an entry event while the rest of events do not cross
the region.
Definition 6 (Minimal region). Let r and r
0
be regions of a TS. A region r
0
is said to be a subregion of r if r
0
r. A region r is a minimal region if there is
no oth er region r
0
which is a subregion of r.
Going back to the example of Figure 2, in Figure 2(b) we report the set of
minimal regions. The union of disjoint regions is a region, so for instance the
union of the regions r
1
and r
4
is the set {s1, s2, s3, s4} which is also a (non-
minimal) region.
Each TS has two trivial regions: the set of all states, S, and the e mpty set.
Further on we will always consider only non-trivial regions. The set of non-trivial
regions of TS will be denoted by R
TS
. Given a set S
0
S and a region r, r |
S
0
represents the projection of the region r into the set S
0
, i.e. r |
S
0
= r S
0
.
A region r is a pre-region of event e if there is a transition labeled with
e which e xits r. A region r is a post-region of event e if there is a transition
labeled with e which enters r. The sets of all pre-regions and post-regions of e
are denoted with
e and e
, respectively. By definition it follows that if r
e,
then all transitions labeled with e exit r. Similarly, if r e
, then all transitions
labeled with e enter r.

Citations
More filters
Book ChapterDOI

Process Discovery using Integer Linear Programming

TL;DR: In this paper, the authors present a process discovery algorithm using concepts taken from the language-based theory of regions, a well-known Petri net research area and identify a number of shortcomings of this theory from the process discovery perspective, and provide solutions based on integer linear programming.
Journal ArticleDOI

Decomposing Petri nets for process mining: A generic approach

TL;DR: In this article, the authors propose a generic approach to decompose process mining problems into many smaller problems that can be analyzed easily and whose results can be combined into solutions for the original problems.
Book ChapterDOI

A fresh look at precision in process conformance

TL;DR: A novel measure for precision is proposed, based on the simple idea of counting these situations were the model deviates from the log, and a log-based traversal of the model that avoids inspecting its whole behavior is presented.
Book ChapterDOI

Extracting Event Data from Databases to Unleash Process Mining

TL;DR: A novel perspective is used to conceptualize a database view on event data that scopes, binds, and classifies data to create “flat” event logs that can be analyzed using traditional process-mining techniques.
Book ChapterDOI

Process cubes : slicing, dicing, rolling up and drilling down event data for process mining

TL;DR: This paper proposes the notion of process cubes where events and process models are organized using different dimensions and each cell in the process cube corresponds to a set of events and can be used to discover a process model, to check conformance with respect to some process models, or to discover bottlenecks.
References
More filters

Kommunikation mit Automaten

C. A. Petri
TL;DR: The theory of automata is shown not capable of representing the actual physical flow of information in the solution of a recursive problem and a theory of communication is proposed that yields a means of representation that with equal rigor and simplicity accomplishes more than the theory of synchronous automata.
Journal ArticleDOI

Workflow mining: discovering process models from event logs

TL;DR: A new algorithm is presented to extract a process model from a so-called "workflow log" containing information about the workflow process as it is actually being executed and represent it in terms of a Petri net.
Journal ArticleDOI

Workflow mining: a survey of issues and approaches

TL;DR: This paper introduces the concept of workflow mining and presents a common format for workflow logs, and discusses the most challenging problems and present some of the workflow mining approaches available today.
Journal ArticleDOI

Process mining : a two-step approach to balance between underfitting and overfitting

TL;DR: The two-step process mining approach, implemented in the context of ProM, overcomes many of the limitations of traditional approaches and enables the user to control the balance between “overfitting” and “underfitting’.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What contributions have the authors mentioned in the paper "A region-based algorithm for discovering petri nets from event logs" ?

The paper presents a new method for the synthesis of Petri nets from event logs in the area of Process Mining. The methods described in this paper have been implemented in a tool and tested on a set of examples. 

In the bounded case, the basic idea is that regions are represented by multisets (i.e., a state might have multiplicity greater than one). 

Bisimilarity or language equivalence are very restricting equivalence relations, not very useful for the area of Petri net mining where over-approximations of the initial event log are more valuable [4,19]. 

A Petri net (PN) is a tuple (P, T, F, M0) where P and T represent finite sets of places and transitions, respectively, and F ⊆ (P × T ) ∪ (T × P ) is the flow relation. 

Since the nineties, the area of process mining has been focused in providing formal support to business information systems [16]. 

The synthesis of a safe PN from the transition system applies many label splittings in order to enforce the excitation closure, deriving in a PN with 15 places, 34 transitions and 128 arcs. 

The main contribution is to allow the generation of overapproximations of the event log by means of a bounded Petri net, not necessarily safe. 

The relation π ⊆ (S1 × S2) defined as follows:s1πs2 ⇔ ∃ σ : sin1 σ→ s1 ∧ sin2 σ→ s2represents a simulation of TS1 by TS2: the first item of Definition 2 holds since L(TS1) ⊆ L(TS2). 

the techniques presented in this paper, as it happens also with traditional min-ing approaches like the α-algorithm [16], are less sensitive to variations in event logs, and will derive the same PN over the modified log. 

The work presented in this paper aims at constructing (mining) a Petri net that covers the behavior observed in the event log, i.e. traces in the event logCarmona, J.; Cortadella, J.; Kishinevsky, M. A region-based algorithm for discovering Petri nets from event logs.