What contributions have the authors mentioned in the paper "A region-based algorithm for discovering petri nets from event logs" ?

The paper presents a new method for the synthesis of Petri nets from event logs in the area of Process Mining. The methods described in this paper have been implemented in a tool and tested on a set of examples.

What is the basic idea of the bounded case?

In the bounded case, the basic idea is that regions are represented by multisets (i.e., a state might have multiplicity greater than one).

What are the main characteristics of the Petri net?

Bisimilarity or language equivalence are very restricting equivalence relations, not very useful for the area of Petri net mining where over-approximations of the initial event log are more valuable [4,19].

What is the flow relation of a Petri net?

A Petri net (PN) is a tuple (P, T, F, M0) where P and T represent finite sets of places and transitions, respectively, and F ⊆ (P × T ) ∪ (T × P ) is the flow relation.

What is the synthesis of a safe PN?

The synthesis of a safe PN from the transition system applies many label splittings in order to enforce the excitation closure, deriving in a PN with 15 places, 34 transitions and 128 arcs.

What is the main contribution to the analysis of the process mining?

The main contribution is to allow the generation of overapproximations of the event log by means of a bounded Petri net, not necessarily safe.

What is the first item of Definition 2?

The relation π ⊆ (S1 × S2) defined as follows:s1πs2 ⇔ ∃ σ : sin1 σ→ s1 ∧ sin2 σ→ s2represents a simulation of TS1 by TS2: the first item of Definition 2 holds since L(TS1) ⊆ L(TS2).

(Open Access) A Region-Based Algorithm for Discovering Petri Nets from Event Logs (2008) | Josep Carmona

Q: What is the main focus of the research in process mining?

Since the nineties, the area of process mining has been focused in providing formal support to business information systems [16].

Q: What is the minimumity property of the log?

the techniques presented in this paper, as it happens also with traditional min-ing approaches like the α-algorithm [16], are less sensitive to variations in event logs, and will derive the same PN over the modified log.

A Region-based Algorithm for Discovering

Petri Nets from Event Logs

J. Carmona

, J. Cortadella

, and M. Kishinevsky

Universitat Polit`ecnica de Catalunya, Spain

Intel Corporation, USA

Abstract. The paper presents a new method for the synthesis of Petri

nets from event logs in the area of Process Mining. The method derives a

bounded Petri net that over-approximates the behavior of an event log.

The most important property is that it produces a net with the smallest

behavior that still contains the behavior of the event log. The methods

described in this paper have been implemented in a tool and tested on

a set of examples .

1 Introduction

The discovery of formal models from event logs in information systems is known

as process mining. Since the nineties, the area of process mining has been fo-

cused in providing formal support to business information systems [16]. In the

industrial domain, ranging from hospitals and banks to se nsor networks or CAD

for VLSI, process mining can be applied to succinctly summarize the behavior

observed in large event logs [14]. Nowadays, several approaches can be used to

mine formal models, most of them included in the ProM framework [15].

The synthesis problem [7] is related to process mining: it consists in building

a Petri net that has a behavior equivalent to a given transition system. The prob-

lem was ﬁrst addressed by Ehrenfeucht and Rozenberg [8] introducing regions to

model the sets of states that characterize marked places. Process mining diﬀers

from synthesis in the knowledge assumption: while in synthesis one assumes a

complete description of the system, only a partial description of the system is

assumed in process mining. Therefore, bisimulation is no longer a goal to achieve

in process mining. Instead, obtaining approximations that succinctly represent

the log under consideration are more valuable [19].

In the area of synthesis, some approaches have been studied to take the

theory of regions into practice. In [3] polynomial algorithms for the synthesis of

bounded nets were presented. This approach has been recently adapted for the

problem of process mining in [4]. In [6], the theory of regions was applied for the

synthesis of safe Petri nets with bisimilar behavior. Rec ently, the theory from [6]

has been extended to b ounded Petri nets [5]. In this paper we adapt the theory

from [5] to the problem of process mining.

The work presented in this paper aims at constructing (mining) a Petri net

that covers the behavior observed in the event log, i.e. traces in the event log

Carmona, J.; Cortadella, J.; Kishinevsky, M. A region-based algorithm for discovering Petri nets from event

logs. A: International Conference on Business Process Management. "Business Process Management, 6th

International Conference, BPM 2008: Milan, Italy, September 2-4, 2008: proceedings". Springer, 2008, p.

358-373.

The final authenticated version is available online at https://doi.org/10.1007/978-3-540-85758-7_26

rj rs

Fig. 1. Petri net mining to avoid overﬁtting.

will be feasible in the Petri net. Moreover, the Petri net may accept traces not

observed in the log. Additionally, a minimality property is demonstrated on the

mined Petri net: no other net exists that both covers the log and accepts less

traces than the mined Petri net. This capability of minimal over-approximation

represents the main theoretical contribution of this paper. The methods pre-

sented in the paper can mine a particular k-bounded Petri net, for a given

bound k. We have implemented the theory of this paper in a tool, and some

preliminary results from logs are reported. The approach taken in this paper

is a formal one and diﬀers from the more heuristic methods in the literature.

Although the methods presented might have a high complexity for large logs,

they can be combined with recent iterative approaches [18] to alleviate their

complexity.

This paper shares common goals with the previously presented paper [4].

In [4], two process mining s trategies on region of languages are prese nted, having

the same minimality goal as the one that we have in this paper. However the

strategy is diﬀerent: integer linear models are solved in order to ﬁnd a set of

special places called feasible places that guarantee the inclusion of the traces

from the event log. The more places added, the more traces are forbidden in

the resulting net. If the net contains all the possible feasible places, then the

minimality property can be demonstrated. However, the set of feasible places

might be inﬁnite. In our case, given a maximal bound k for the mining of a

k-bounded Petri net, minimal regions of the transition system are enough to

demonstrate the minimality property on this bound.

Example. In [14], a small log is presented to motivate the overﬁt-

ting produced by synthesis tools. The log contains the following activi-

ties: r=register, s=ship, sb=send

bill, p=payment, ac=accounting, ap=approved,

c=close, em=express mail, rj=rejected, and rs=resolve. Now assume that the

event log contains the traces (r, s, sb,p, ac, ap, c), (r,sb,em, p, ac, ap, c),

(r, sb, p, em, ac, rj, rs, c), (r, em, sb, p, ac, ap, c), (r, sb, s, p, ac, rj, rs, c),

(r, sb, p, s, ac, ap, c) and (r, sb, p, em, ac, ap, c). From this log, a TS can

be obtained [13] and a PN as the one shown in Figure 1 will be synthesized

by a tool like petrify [6]. If the log is slightly changed (for instance, trace

(r, sb, s, p, ac, rj, rs, c) is replaced by (r, sb, s, p, ac, ap, c), the synthesis tool

will adapt the PN to account for the changes, deriving a diﬀerent PN. This means

that synthesis algorithms are very sensitive to variations in the logs. However,

the techniques presented in this paper, as it happ e ns also with traditional min-

ing approaches like the α-algorithm [16], are less sensitive to variations in event

logs, and will derive the same PN over the modiﬁed log.

The two models used in this paper are Petri nets and transition systems.

We will assume that a transition system represents an e vent log obtained from

observing a real system from which an event-based representation (e.g. a Petri

net) approximating its behavior must be obtained. The derivation of the tran-

sition system from an event log is an important step, that may have big impact

in the ﬁnal mined Petri net, as it is demonstrated in [13]. A two-step approach

is presented in [13], emphasizing that the ﬁrst step (generation of the transition

system) is crucial for the balance between underﬁtting and ove rﬁtting. If the de-

sired abstraction is attained in the ﬁrst step, i.e. the transition system represents

an abstraction of the event log, the second step is expected to reproduce exactly

this abstraction, via synthesis. The methods presented in this paper extend the

possibilities of this two-step approach, given that the second s tep might also

introduce further abstraction in a controlled manner. The approaches based on

regions of languages perform the mining process in only one step, provided that

logs can be directly inte rpreted as languages [4].

2 Preliminaries: theory of regions

2.1 Finite transition systems and Petri nets

Deﬁnition 1 (Transition system). A transition system (TS) is a tuple

(S, E, A, s

), where S is a set of states, E is an alphabet of actions, such

that S ∩ E = ∅, A ⊆ S × E × S is a set of (labelled) transitions, and s

is the

initial state.

Let TS = (S, E, A, s

) be a transition system. We consider connected TSs

that satisfy the following axioms:

– S and E are ﬁnite sets.

– Every event has an occurrence: ∀e ∈ E : ∃(s, e, s

) ∈ A;

– Every state is reachable from the initial state: ∀s ∈ S : s

∗

→ s.

A TS is called deterministic if for each state s and each label a there can

be at most one state s

such that s

→ s

. The relation between TSs will be

studied in this paper. The language of a TS, L(TS), is the set of traces feasible

from the initial state. When, L(TS

) ⊆ L(TS

), we will denote TS

as an over-

approximation of TS

. The notion of simulation b etween two TSs is related to

this concept:

Deﬁnition 2 (Simulation [2]). Let TS

= (S

, E, A

, s

) and

= (S

, E, A

, s

) be two TSs with the same set of events. A simula-

tion of TS

by TS

is a relation π between S

and S

such that

– for every s

∈ S

, there exists s

∈ S

such that s

πs

– for every (s

, e, s

) ∈ A

and for every s

∈ S

such that s

πs

, there exists

, e, s

) ∈ A

such that s

πs

When TS

is simulated by TS

with relations π, and viceversa with relation

−1

, TS

and TS

are bisimilar [2].

Deﬁnition 3 (Petri net [12]). A Petri net (PN) is a tuple (P, T, F, M

)

where P and T represent ﬁnite sets of places and transitions, respectively, and

F ⊆ (P × T ) ∪ (T × P ) is the ﬂow relation. The initial marking M

⊆ P deﬁn es

the initial state of the system

The sets of input and output transitions of place p are denoted by •p and

p•, respectively. The set of all markings of N reachable from the initial marking

is called its Reachability Set. The Reachability Graph of PN (RG(PN)) is a

transition system in which the set of states is the Reachability Set, the events

are the transitions of the net and a transition (m

, t, m

) exists if and only if

→ m

. We use L(PN) as a shortcut for L(RG(PN)).

2.2 Regions

We now review the classical theory of regions for the synthesis of Petri nets [6–8].

Let S

be a subset of the states of a TS, S

⊆ S. If s 6∈ S

and s

∈ S

, then we

say that transition s

→ s

enters S

. If s ∈ S

and s

6∈ S

, then transition s

→ s

exits S

. Otherwise, transition s

→ s

does not cross S

Deﬁnition 4. Let TS = (S, E, A, s

) be a TS. Let S

⊆ S be a subset of states

and e ∈ E be an event. The following conditions (in the form of predicates) are

deﬁned for S

and e:

nocross(e, S

) ≡ ∃(s

, e, s

) ∈ A : s

∈ S

⇔ s

∈ S

enter(e, S

) ≡ ∃(s

, e, s

) ∈ A : s

6∈ S

∧ s

∈ S

exit(e, S

) ≡ ∃(s

, e, s

) ∈ A : s

∈ S

∧ s

6∈ S

The notion of a region is central for the synthesis of PNs. Intuitively, each

region is a set of states that corresponds to a place in the synthesized PN, so

that every state in the region models the marking of the place.

Deﬁnition 5 (region). A set of states r ⊆ S in TS = (S, E, A, s

) is called a

region if the following two conditions are satisﬁed for each event e ∈ E:

– (i) enter(e, r) ⇒ ¬nocross(e, r) ∧ ¬exit(e, r)

– (ii) exit(e, r) ⇒ ¬nocross(e, r) ∧ ¬enter(e, r)

A region is a subset of states in which all transitions labeled with the same

event e have exactly the same “entry/exit” relation. This relation will become

the predecessor/successor relation in the Petri net. The event may always be

either an enter event for the region (case (i) in the previous deﬁnition), or

Although this paper deals with bounded Petri nets, for the sake of clarity we restrict

the theory of current and next sections to the simpler class of safe (1-bounded) Petri

nets. Section 4 discuss es how to generalize the method for bounded Petri nets.

(a) (b)

Minimal Regions

r = { s1, s3 }

r = { s2, s4 }

r = { s3, s4 }

r = { s5 }

r = { s1, s2 }

(c)

Fig. 2. (a) Transition system, (b) minimal regions, (c) synthesis applying Algorithm

of Figure 3.

always be an exit event (case (ii)), or never “cross” the region’s boundaries,

i.e. each transition labeled with e is internal or external to the region, where

the antecedents of neither (i) nor (ii) hold. The transition corresponding to the

event will be successor, predecessor or unrelated with the corresponding place

respective ly.

Examples of regions are reported in Figure 2: from the TS of Figure 2(a),

some regions are enumerated in Figure 2(b). For instance, for region r

, event a

is an exit event, event d is an entry event while the rest of events do not cross

the region.

Deﬁnition 6 (Minimal region). Let r and r

be regions of a TS. A region r

is said to be a subregion of r if r

⊂ r. A region r is a minimal region if there is

no oth er region r

which is a subregion of r.

Going back to the example of Figure 2, in Figure 2(b) we report the set of

minimal regions. The union of disjoint regions is a region, so for instance the

union of the regions r

and r

is the set {s1, s2, s3, s4} which is also a (non-

minimal) region.

Each TS has two trivial regions: the set of all states, S, and the e mpty set.

Further on we will always consider only non-trivial regions. The set of non-trivial

regions of TS will be denoted by R

. Given a set S

⊆ S and a region r, r |

represents the projection of the region r into the set S

, i.e. r |

= r ∩ S

A region r is a pre-region of event e if there is a transition labeled with

e which e xits r. A region r is a post-region of event e if there is a transition

labeled with e which enters r. The sets of all pre-regions and post-regions of e

are denoted with

◦

e and e

◦

, respectively. By deﬁnition it follows that if r ∈

◦

then all transitions labeled with e exit r. Similarly, if r ∈ e

◦

, then all transitions

labeled with e enter r.

A Region-Based Algorithm for Discovering Petri Nets from Event Logs

Figures

Citations

Process Discovery using Integer Linear Programming

Decomposing Petri nets for process mining: A generic approach

A fresh look at precision in process conformance

Extracting Event Data from Databases to Unleash Process Mining

Process cubes : slicing, dicing, rolling up and drilling down event data for process mining

References

Kommunikation mit Automaten

Workflow mining: discovering process models from event logs

Workflow mining: discovering process models from event logs

Workflow mining: a survey of issues and approaches

Process mining : a two-step approach to balance between underfitting and overfitting

Related Papers (5)

Workflow mining: discovering process models from event logs

Process Mining: Discovery, Conformance and Enhancement of Business Processes

Conformance checking of processes based on monitoring real behavior

Genetic process mining: an experimental evaluation

Discovering models of software processes from event-based data

Frequently Asked Questions (10)

Q1. What contributions have the authors mentioned in the paper "A region-based algorithm for discovering petri nets from event logs" ?

Q2. What is the basic idea of the bounded case?

Q3. What are the main characteristics of the Petri net?

Q4. What is the flow relation of a Petri net?

Q5. What is the main focus of the research in process mining?

Q6. What is the synthesis of a safe PN?

Q7. What is the main contribution to the analysis of the process mining?

Q8. What is the first item of Definition 2?

Q9. What is the minimumity property of the log?

Q10. What is the purpose of the paper?