What are the future works mentioned in the paper "Fuzzy mining - adaptive process simplification based on multi- perspective metrics" ?

Further work will concentrate on extending the set of metric implementations and improving the simplification algorithm. The success of process mining will depend on whether it is able to balance these conflicting goals sensibly.

What is the important implementation for binary significance?

Like for unary significance, the log-based frequency significance metric is also the most important implementation for binary significance.

Why is it important to remove edges from the model first?

Removing edges from the model first is important – due to the less-structured nature of real-life processes and their measurement of long-term relationships, the initial model contains deceptive ordering relations, which do not correspond to valid behavior and need to be discarded.

What are the phases of the process model that remove edges?

The first two phases, conflict resolution and edge filtering, remove edges (i.e., precedence relations) between activitynodes, while the final aggregation and abstraction phase removes and/or clusters lesssignificant nodes.

How long did it take to simplify the model?

Deriving all metrics from the mentioned log was performed in less than ten seconds, while simplifying the resulting model took less than two seconds on a 1.8 GHz dual-core machine.

What is the main concept of the approach to process mining?

Process mining techniques which are suitable for less-structured environments need to be able to provide a high-level view on the process, abstracting from undesired details.

What is the significance of a relation in a process model?

By dividing the significance of an ordering relation A → B with the sum of all its competing relations’ significances, the authors get the importance of this relation in its local context.

What distinguishes Fuzzy Mining from other mining techniques?

the foundation on multi-perspective metrics, i.e. looking at all aspects of the process at once, its interactive and explorative nature, and the integrated simplification algorithm clearly distinguishes Fuzzy Mining from all previous process mining techniques.

What are the useful tools for analyzing them?

These are notoriously flexible and unstructured environments, and the authors hold their approach to be one of the most useful tools for analyzing them so far.

How do the authors simplify the process model?

the authors apply three transformation methods to the process model, which will successively simplify specific aspects of it.

What are the popular solutions for supporting processes?

Yet the most popular solutions for supporting processes do not enforce any defined behavior at all, but merely offer functionality like sharing data and passing messages between users and resources.

What is the funding for this research?

This research is supported by the Technology Foundation STW, applied science division of NWO and the technology programme of the Dutch Ministry of Economic Affairs.

What is the metric for evaluating event classes?

the data type correlation metric evaluates event classes, where subsequent events share a large amount of data types (i.e., attribute keys), as highly correlated.

(Open Access) Fuzzy mining: adaptive process simplification based on multi-perspective metrics (2007) | CW Christian Günther

Fuzzy mining - adaptive process simplification based on multi-

perspective metrics

Citation for published version (APA):

Günther, C. W., & Aalst, van der, W. M. P. (2007). Fuzzy mining - adaptive process simplification based on

multi-perspective metrics. In G. Alonso, P. Dadam, & M. Rosemann (Eds.),

Proceedings of the 5th International

Conference on Business Process Management (BPM 2007) 24-28 September 2007, Brisbane, Australia

(pp.

328-343). (Lecture Notes in Computer Science; Vol. 4714). Springer. https://doi.org/10.1007/978-3-540-75183-

0_24

DOI:

10.1007/978-3-540-75183-0_24

Document status and date:

Published: 01/01/2007

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be

important differences between the submitted version and the official published version of record. People

interested in the research are advised to contact the author for the final version of the publication, or visit the

DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page

numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners

and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.

• You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please

follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at:

openaccess@tue.nl

providing details and we will investigate your claim.

Download date: 09. Aug. 2022

Fuzzy Mining – Adaptive Process Simpliﬁcation

Based on Multi-perspective Metrics

Christian W. G¨unther and Wil M.P. van der Aalst

Eindhoven University of Technology

P.O. Box 513, NL-5600 MB, Eindhoven, The Netherlands

{c.w.gunther, w.m.p.v.d.aalst}@tue.nl

Abstract. Process Mining is a technique for extracting process models from ex-

ecution logs. This is particularly useful in situations where people have an ide-

alized view of reality. Real-life processes turn out to be less structured than peo-

ple tend to believe. Unfortunately, traditional process mining approaches have

problems dealing with unstructured processes. The discovered models are often

“spaghetti-like”, showing all details without distinguishing what is important and

what is not. This paper proposes a new process mining approach to overcome this

problem. The approach is conﬁgurable and allows for different faithfully simpli-

ﬁed views of a particular process. To do this, the concept of a roadmap is used as

a metaphor. Just like different roadmaps provide suitable abstractions of reality,

process models should provide meaningful abstractions of operational processes

encountered in domains ranging from healthcare and logistics to web services

and public administration.

1 Introduction

Business processes, whether deﬁned and prescribed or implicit and ad-hoc, drive and

support most of the functions and services in enterprises and administrative bodies of

today’s world. For describing such processes, modeling them as graphs has proven to

be a useful and intuitive tool. While modeling is well-established in process design, it

is complicated to do for monitoring and documentation purposes. However, especially

for monitoring, process models are valuable artifacts, because they allow us to commu-

nicate complex knowledge in intuitive, compact, and high-level form.

Process mining is a line of research which attempts to extract such abstract, compact

representations of processes from their logs, i.e. execution histories [1,2,3,5,6,7,10,14].

The α-algorithm, for example, can create a Petri net process model from an execution

log [2]. In the last years, a number of process mining approaches have been developed,

which address the various perspectives of a process (e.g., control ﬂow, social network),

and use various techniques to generalize from the log (e.g., genetic algorithms, theory

of regions [12,4]). Applied to explicitly designed, well-structured, and rigidly enforced

processes, these techniques are able to deliver an impressive set of information, yet their

purpose is somewhat limited to verifying the compliant execution. However, most pro-

cesses in real life have not been purposefully designed and optimized, but have evolved

over time or are not even explicitly deﬁned. In such situations, the application of pro-

cess mining is far more interesting, as it is not limited to re-discovering what we already

know, but it can be used to unveil previously hidden knowledge.

G. Alonso, P. Dadam, and M. Rosemann (Eds.): BPM 2007, LNCS 4714, pp. 328–343, 2007.

 Springer-Verlag Berlin Heidelberg 2007

Fuzzy Mining – Adaptive Process Simpliﬁcation Based on Multi-perspective Metrics 329

Over the last couple of years we obtained much experience in applying the tried-

and-tested set of mining algorithms to real-life processes. Existing algorithms tend to

perform well on structured processes, but often fail to provide insightful models for less

structured processes. The phrase “spaghetti models” is often used to refer to the results

of such efforts. The problem is not that existing techniques produce incorrect results.

In fact, some of the more robust process mining techniques guarantee that the resulting

model is “correct” in the sense that reality ﬁts into the model. The problem is that the

resulting model shows all details without providing a suitable abstraction. This is com-

parable to looking at the map of a country where all cities and towns are represented by

identical nodes and all roads are depicted in the same manner. The resulting map is cor-

rect, but not very suitable. Therefore, the concept of a roadmap is used as a metaphor to

visualize the resulting models. Based on an analysis of the log, the importance of activ-

ities and relations among activities are taken into account. Activities and their relations

can be clustered or removed depending on their role in the process. Moreover, certain

aspects can be emphasized graphically just like a roadmap emphasizes highways and

large cities over dirt roads and small towns. As will be demonstrated in this paper, the

roadmap metaphor allows for meaningful process models.

In this paper we analyze the problems traditional mining algorithms have with

less-structured processes (Section 2), and use the metaphor of maps to derive a novel,

more appropriate approach from these lessons (Section 3). We abandon the idea of

performing process mining conﬁned to one perspective only, and propose a multi-

perspective set of log-based process metrics (Section 4). Based on these, we have de-

veloped a ﬂexible approach for Fuzzy Mining, i.e. adaptively simplifying mined process

models (Section 5).

2 Less-Structured Processes – The Infamous Spaghetti Affair

The fundamental idea of process mining is both simple and persuasive: There is a pro-

cess which is unknown to us, but we can follow the traces of its behavior, i.e. we have

access to enactment logs. Feeding those into a process mining technique will yield an

aggregate description of that observed behavior, e.g. in form of a process model.

In the beginning of process mining research, mostly artiﬁcially generated logs were

used to develop and verify mining algorithms. Then, also logs from real-life work-

ﬂow management systems, e.g. Staffware, could be successfully mined with these tech-

niques. Early mining algorithms had high requirements towards the qualities of log ﬁles,

e.g. they were supposed to be complete and limited to events of interest. Yet, most of

the resulting problems could be easily remedied with more data, ﬁltering the log and

tuning the algorithm to better cope with problematic data.

While these successes were certainly convincing, most real-life processes are not

executed within rigid, inﬂexible workﬂow management systems and the like, which en-

force correct, predictive behavior. It is the inherent inﬂexibility of these systems which

drove the majority of process owners (i.e., organizations having the need to support

processes) to choose more ﬂexible or ad-hoc solutions. Concepts like Adaptive Work-

ﬂow or Case Handling either allow users to change the process at runtime, or deﬁne

processes in a somewhat more “loose” manner which does not strictly deﬁne a speciﬁc

330 C.W. G¨unther and W.M.P. van der Aalst

path of execution. Yet the most popular solutions for supporting processes do not en-

force any deﬁned behavior at all, but merely offer functionality like sharing data and

passing messages between users and resources. Examples for these systems are ERP

(Enterprise Resource Planning) and CSCW (Computer-Supported Cooperative Work)

systems, custom-built solutions, or plain E-Mail.

It is obvious that executing a process within such less restrictive environments will

lead to more diverse and less-structured behavior. This abundance of observed behav-

ior, however, unveiled a fundamental weakness in most of the early process mining

algorithms. When these are used to mine logs from less-structured processes, the result

is usually just as unstructured and hard to understand. These “spaghetti” process mod-

els do not provide any meaningful abstraction from the event logs themselves, and are

therefore useless to process analysts. It is important to note that these “spaghetti” mod-

els are not incorrect. The problem is that the processes themselves are really “spaghetti-

like”, i.e., the model is an accurate reﬂection of reality.

DSYE

(complete)

OSYW

(complete)

181

OSIX

(complete)

0.5

0.987

125

OSZY

(complete)

0.8

AHCW

(complete)

0.5

UNHL

(complete)

0.979

VPPQ

(complete)

184

0.667

UNEL

(complete)

155

0.977

UNLE

(complete)

0.545

0.881

UNEN

(complete)

NEOI

(complete)

0.75

DNEZ

(complete)

0.833

ONZY

(complete)

394

0.992

244

ONVL

(complete)

155

0.756

ONZO

(complete)

173

0.956

0.955

DSYN

(complete)

0.921

DSYV

(complete)

0.939

DSPY

(complete)

0.8

DSLQ

(complete)

0.909

0.667

0.917

OSYN

(complete)

106

0.5

0.957

OSDL

(complete)

0.821

OSAT

(complete)

0.75

0.929

DSSV

(complete)

234

0.966

OSAI

(complete)

0.8

OSHY

(complete)

224

0.989

153

0.878

0.983

DSSN

(complete)

304

0.854

129

OSVZ

(complete)

0.5

0.8

0.9

ONVY

(complete)

160

DNLP

(complete)

264

HQQL

(complete)

1153

0.984

165

OSHB

(complete)

231

0.944

OSZO

(complete)

0.8

DSSA

(complete)

238

0.98

147

0.969

VPPN

(complete)

0.5

DSVM

(complete)

0.833

0.756

0.984

128

POZI

(complete)

263

0.8

OSOI

(complete)

291

0.981

110

POLA

(complete)

1255

0.923

169

ONCZ

(complete)

0.5

0.923

0.998

871

OSWL

(complete)

103

0.877

AISW

(complete)

430

0.667

ONPI

(complete)

264

0.819

104

LELY

(complete)

126

0.667

0.8

SPWV

(complete)

114

0.929

ONYZ

(complete)

0.5

OSPL

(complete)

0.5

0.667

0.5

0.974

ONAZ

(complete)

171

0.909

ONHL

(complete)

264

0.974

108

0.975

ONYN

(complete)

142

0.942

ONIZ

(complete)

0.75

LEVN

(complete)

0.5

VPNY

(complete)

0.5

VPNH

(complete)

160

0.854

0.983

ONLW

(complete)

100

0.978

UNEA

(complete)

239

0.941

VPNE

(complete)

0.667

0.957

0.75

0.909

0.985

NEOW

(complete)

0.923

NEPL

(complete)

0.885

0.909

0.667

0.941

XISH

(complete)

866

0.833

OSEL

(complete)

0.5

0.8

0.997

729

AIVM

(complete)

130

0.667

OSAX

(complete)

0.833

0.997

370

0.8

HDEI

(complete)

0.5

TUII

(complete)

0.667

VPPK

(complete)

350

0.904

0.986

154

0.9

0.909

0.964

0.984

101

VPHA

(complete)

102

0.9

0.981

173

0.978

0.909

ONOI

(complete)

703

0.667

0.966

ONHI

(complete)

0.947

ONHB

(complete)

178

0.952

ONHY

(complete)

0.983

ONNY

(complete)

191

0.938

0.988

108

0.75

0.998

522

VPHN

(complete)

279

0.867

140

0.985

136

0.985

151

PONA

(complete)

0.667

0.938

0.98

ONIY

(complete)

117

0.921

0.981

0.97

HQWL

(complete)

221

0.99

165

HQXZ

(complete)

0.759

0.968

0.982

LEJE

(complete)

0.923

LEOO

(complete)

0.914

0.75

0.917

0.786

AIHT

(complete)

0.909

0.977

111

HDWE

(complete)

0.833

TUGP

(complete)

0.688

DSCW

(complete)

0.667

0.944

0.957

AINO

(complete)

270

0.872

0.857

0.992

205

0.75

0.97

HQLE

(complete)

6010

HQQY

(complete)

185

0.962

HQLY

(complete)

0.902

HQLK

(complete)

0.5

0.896

0.998

682

0.667

0.845

5791

0.667

KZEY

(complete)

0.667

HQXY

(complete)

0.667

LEOR

(complete)

0.5

TUJI

(complete)

296

0.987

171

POZW

(complete)

103

0.933

0.8

0.933

0.667

TUMI

(complete)

409

0.995

310

0.923

HDPT

(complete)

145

0.833

BKMI

(complete)

195

0.923

TUMK

(complete)

0.571

TURK

(complete)

0.825

0.9

TUSC

(complete)

166

0.833

0.967

SPWA

(complete)

129

0.824

0.962

SPWN

(complete)

120

0.9

0.957

SPWI

(complete)

0.963

0.914

0.909

TUZV

(complete)

308

0.986

119

TUZC

(complete)

390

0.929

148

TUJV

(complete)

0.889

0.929

0.987

101

TUZI

(complete)

395

0.944

252

TUWV

(complete)

0.5

TUTP

(complete)

0.667

0.944

0.98

119

TUOK

(complete)

0.975

TUPK

(complete)

0.954

TUZK

(complete)

207

0.969

156

0.875

TUPC

(complete)

0.75

BKYA

(complete)

153

0.97

BKYI

(complete)

160

0.892

KZAL

(complete)

0.5

0.952

BKYV

(complete)

144

0.884

109

TUSI

(complete)

306

0.667

LEBR

(complete)

0.5

0.933

BLYI

(complete)

128

0.797

120

0.968

0.978

0.96

0.957

0.938

BCCC

(complete)

0.667

TUEW

(complete)

154

0.688

TUIV

(complete)

0.5

0.97

0.8

HDWP

(complete)

0.5

TUSK

(complete)

0.667

0.923

0.941

0.5

0.962

TUJC

(complete)

121

0.857

0.9

0.955

TUIC

(complete)

0.667

TUJP

(complete)

203

0.964

0.923

TUGV

(complete)

0.8

0.923

0.986

100

TUJK

(complete)

131

0.852

124

TUMC

(complete)

192

0.8

LEBS

(complete)

197

0.833

TUAB

(complete)

0.5

TUYK

(complete)

0.5

TUKW

(complete)

0.833

0.955

0.5

0.667

0.955

0.75

0.981

100

HDPR

(complete)

0.594

0.8

BCCV

(complete)

0.75

TUGC

(complete)

0.667

0.989

134

0.857

0.875

0.964

0.8

0.955

0.75

0.967

TUGI

(complete)

152

0.986

119

TUGK

(complete)

0.857

HLPO

(complete)

0.5

BCCI

(complete)

177

0.989

139

BCCK

(complete)

0.923

0.993

231

0.917

0.667

0.5

DSLN

(complete)

0.917

DSLL

(complete)

0.759

DSZN

(complete)

0.958

DSLX

(complete)

0.87

0.889

0.947

DSZV

(complete)

0.957

DSLV

(complete)

0.952

0.95

HDOP

(complete)

0.5

0.667

0.8

BKMA

(complete)

130

0.8

0.944

0.875

0.99

129

0.923

0.96

BKMV

(complete)

143

0.912

0.75

0.923

0.917

0.7

0.955

0.923

0.929

BCCW

(complete)

0.4

0.75

0.5

875

0.833

0.941

0.5

0.667

LECL

(complete)

211

0.75

0.991

159

0.75

0.99

135

TUIP

(complete)

0.75

0.667

0.8

0.98

TUIK

(complete)

0.842

0.889

0.917

BLOM

(complete)

0.522

BLAM

(complete)

0.833

TUQK

(complete)

0.5

0.889

0.857

0.833

0.977

BLBO

(complete)

0.5

0.75

0.929

TUWP

(complete)

0.944

0.5

0.923

0.545

0.667

OSHW

(complete)

0.5

NEBI

(complete)

0.917

0.947

NEOO

(complete)

0.562

0.9

OSLL

(complete)

0.941

OSHD

(complete)

0.667

0.8

0.5

0.875

0.5

0.4

0.75

0.929

0.5

0.75

TUAV

(complete)

0.5

0.25

TUAI

(complete)

0.5

TUAK

(complete)

0.5

0.75

0.5

0.833

0.929

0.5

Fig.1. Excerpt of a typical “Spaghetti” process model (ca. 20% of complete model)

An example of such a “spaghetti” model is given in Figure 1. It is noteworthy that this

ﬁgure shows only a small excerpt (ca. 20%) of a highly unstructured process model. It

has been mined from machine test logs using the Heuristics Miner, one of the traditional

process mining techniques which is most resilient towards noise in logs [14]. Although

this result is rather useful, certainly in comparison with other early process mining

techniques, it is plain to see that deriving helpful information from it is not easy.

Event classes found in the log are interpreted as activity nodes in the process model.

Their sheer amount makes it difﬁcult to focus on the interesting parts of the process.

The abundance of arcs in the model, which constitute the actual “spaghetti”, introduce

an even greater challenge for interpretation. Separating cause from effect, or the general

direction in which the process is executed, is not possible because virtually every node

is transitively connected to any other node in both directions. This mirrors the crux of

ﬂexibility in process execution – when people are free to execute anything in any given

order they will usually make use of such feature, which renders monitoring business

activities an essentially infeasible task.

Fuzzy Mining – Adaptive Process Simpliﬁcation Based on Multi-perspective Metrics 331

We argue that the fault for these problems lies neither with less-structured pro-

cesses, nor with process mining itself. Rather, it is the result of a number of, mostly

implicit, assumptions which process mining has historically made, both with respect

to the event logs under consideration, and regarding the processes which have gener-

ated them. While being perfectly sound in structured, controlled environments, these

assumptions do not hold in less-structured, real-life environments, and thus ultimately

make traditional process mining fail there.

Assumption 1: All logs are reliable and trustworthy. Any event type found in the

log is assumed to have a corresponding logical activity in the process. However,

activities in real-life processes may raise a random number of seemingly unrelated

events. Activities may also go unrecorded, while other events do not correspond to

any activity at all.

The assumption that logs are well-formed and homogeneous is also often not

true. For example, a process found in the log is assumed to correspondto one logical

entity. In less-structured environments, however, there are often a number of “tacit”

process types which are executed, and thus logged, under the same name.

Also, the idea that all events are raised on the same level of abstraction, and

are thus equally important, is not true in real-life settings. Events on different lev-

els are “ﬂattened” into the same event log, while there is also a high amount of

informational events (e.g., debug messages from the system) which need to be

disregarded.

Assumption 2: There exists an exact process which is reﬂected in the logs. This as-

sumption implies that there is the one perfect solution out there, which needs to

be found. Consequently, the mining result should model the process completely,

accurately,andprecisely. However, as stated before, spaghetti models are not nec-

essarily incorrect – the models look like spaghetti, because they precisely describe

every detail of the less-structured behavior found in the log. A more high-level

solution, which is able to abstract from details, would thus be preferable.

Traditional mining algorithms have also been conﬁned to a single perspective

(e.g., control ﬂow, data), as such isolated view is supposed to yield higher pre-

cision. However, perspectives are interacting in less-structured processes, e.g. the

data ﬂow may complement the control ﬂow, and thus also needs to be taken into

account.

In general, the assumption of a perfect solution is not well-suited for real-

life application. Reality often differs signiﬁcantly from theory, in ways that had

not been anticipated. Consequently, useful tools for practical application must be

explorative, i.e. support the analyst to tweak results and thus capitalize on their

knowledge.

We have conducted process mining case studies in organizations like Philips Med-

ical Systems, UWV, Rijkswaterstaat, the Catharina Hospital Eindhoven and the AMC

hospital Amsterdam, and the Dutch municipalities of Alkmaar and Heusden. Our ex-

periences in these case studies have shown the above assumptions to be violated in all

ways imaginable. Therefore, to make process mining a useful tool in practical, less-

structured settings, these assumptions need to be discarded. The next section introduces

the main concept of our mining approach, which takes these lessons into account.

Fuzzy mining: adaptive process simplification based on multi-perspective metrics

Figures

Citations

Discovering block-structured process models from event logs - a constructive approach

Time prediction based on process mining

Process mining in healthcare

Business process analysis in healthcare environments: A methodology based on process mining

Process mining

References

Data clustering: a review

Workflow mining: discovering process models from event logs

Graph Clustering by Flow Simulation

Workflow mining: discovering process models from event logs

Partitioning sparse matrices with eigenvectors of graphs

Related Papers (5)

Workflow mining: discovering process models from event logs

Process Mining: Discovery, Conformance and Enhancement of Business Processes

Conformance checking of processes based on monitoring real behavior

Process Mining Manifesto

The prom framework: a new era in process mining tool support

Frequently Asked Questions (15)

Q1. What contributions have the authors mentioned in the paper "Fuzzy mining - adaptive process simplification based on multi- perspective metrics" ?

Q2. What are the future works mentioned in the paper "Fuzzy mining - adaptive process simplification based on multi- perspective metrics" ?

Q3. What is the important implementation for binary significance?

Q4. Why is it important to remove edges from the model first?

Q5. What are the phases of the process model that remove edges?

Q6. How long did it take to simplify the model?

Q7. What is the reliable method of mining logs?

Q8. What is the main concept of the approach to process mining?

Q9. What is the significance of a relation in a process model?

Q10. What distinguishes Fuzzy Mining from other mining techniques?

Q11. What are the useful tools for analyzing them?

Q12. How do the authors simplify the process model?

Q13. What are the popular solutions for supporting processes?

Q14. What is the funding for this research?

Q15. What is the metric for evaluating event classes?