scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

User-guided discovery of declarative process models

TL;DR: In this paper, the authors use DECLARE, a declarative language that provides more flexibility than conventional procedural notations such as BPMN, Petri nets, UML ADs, EPCs and BPEL.
Abstract: Process mining techniques can be used to effectively discover process models from logs with example behaviour. Cross-correlating a discovered model with information in the log can be used to improve the underlying process. However, existing process discovery techniques have two important drawbacks. The produced models tend to be large and complex, especially in flexible environments where process executions involve multiple alternatives. This “overload” of information is caused by the fact that traditional discovery techniques construct procedural models explicitly showing all possible behaviours. Moreover, existing techniques offer limited possibilities to guide the mining process towards specific properties of interest. These problems can be solved by discovering declarative models. Using a declarative model, the discovered process behaviour is described as a (compact) set of rules. Moreover, the discovery of such models can easily be guided in terms of rule templates. This paper uses DECLARE, a declarative language that provides more flexibility than conventional procedural notations such as BPMN, Petri nets, UML ADs, EPCs and BPEL. We present an approach to automatically discover DECLARE models. This has been implemented in the process mining tool ProM. Our approach and toolset have been applied to a case study provided by the company Thales in the domain of maritime safety and security.

Summary (2 min read)

Introduction

  • These problems can be solved by discovering declarative models.
  • Logs provide detailed information about systems and human behaviour.
  • Instead of explicitly specifying all the allowed sequences of events in a business process, the possible ordering of events is implicitly specified with constraints, i.e., rules that must be followed during execution.
  • To solve this problem the authors propose to apply the truncated semantics introduced in [19] to discover significant DECLARE constraints also from truncated process instances.

II. PRELIMINARIES

  • DECLARE is a declarative language proposed by Pesic and Van der Aalst in [8] [9].
  • Templates are abstract entities that define parameterised classes of properties, and constraints are their concrete instantiations.
  • The co-existence(A,B) template specifies that if one of the events A or B occurs, the other one should also occur.
  • Finally, the succession(A,B) template requires that both response and precedence relations hold between the events A and B. Templates alternate response, alternate precedence and alternate succession strengthen the above templates by specifying that events must alternate without repetitions of these events in between.

III. APPROACH

  • The starting point of their work was a case study concerned with the monitoring of vessel behaviour in the domain of maritime safety and security.
  • The first step of their discovery algorithm is to generate a DECLARE model Dcandidates consisting of candidate DECLARE constraints.
  • At the end of the checking phase, a filtered LTL model L including the remaining LTL rules is available.
  • If PoE = 50% the discovered constraints will only involve 50% of the event classes in the log (the most frequent ones).
  • The DECLARE Miner generates a DECLARE model object by using the algorithm described in subsection III-A.

IV. ADVANCED MINING TECHNIQUES

  • The authors introduce two advanced techniques to support the discovery of DECLARE models: the truncated semantics for LTL formulas and the LTL vacuity detection.
  • Often the available logs are extracted from larger logs and the process instances are prefixes of larger process instances.
  • In a truncated path the truth value of an LTL formula can be non-definitive (i.e., temporarily violated or temporarily satisfied).
  • To address this problem, [19] introduces a strong semantics and a weak semantics for LTL formulas where a formula is evaluated to false and true respectively if its truth value is non-definitive.
  • A typical example of a vacuously satisfied constraint is given by the formula “every request is eventually acknowledged” in a process instance that does not contain requests.

V. CASE STUDY

  • The authors present the results of the DECLARE mining phase of a larger case study on the monitoring of vessel behaviour.
  • In the considered log, each process instance corresponds to a specific vessel.
  • V shows the settings for an experiment aimed at discovering chain response constraints for vessel type passenger ship with a PoI of 100% (i.e., constraints satisfied by all instances).
  • TABLE VII shows the results for this experiment.
  • The authors specify the percentage of nonvacuously satisfied instances, i.e., interesting witnesses, and also the PoI parameter, i.e., the percentage of process instances where the constraint is (vacuously or non-vacuously) satisfied.

VI. CONCLUSION

  • The authors have introduced a novel approach to discover declarative models from logs that allows users to guide the discovery process towards specific properties.
  • Moreover, the authors have shown how results on truncated semantics can be used to obtain significant results in the case that only partial logs are available.
  • The authors have also applied results on vacuity detection to identify, for each discovered constraint, the percentage of interesting witnesses, i.e., process instances where the constraint is non-trivially valid.
  • Also, given a constraint and a process instance where it is violated, the level of “healthiness” of the process instance can be evaluated based on the number of violations.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

User-guided discovery of declarative process models
Citation for published version (APA):
Maggi, F. M., Mooij, A. J., & Aalst, van der, W. M. P. (2011). User-guided discovery of declarative process
models. In N. Chawla, I. King, & A. Sperduti (Eds.),
Proceedings of the IEEE Symposium on Computational
Intelligence and Data Mining (CIDM 2011, Paris, France, April 11-15, 2011)
(pp. 192-199). Institute of Electrical
and Electronics Engineers. https://doi.org/10.1109/CIDM.2011.5949297
DOI:
10.1109/CIDM.2011.5949297
Document status and date:
Published: 01/01/2011
Document Version:
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
Please check the document version of this publication:
• A submitted manuscript is the version of the article upon submission and before peer-review. There can be
important differences between the submitted version and the official published version of record. People
interested in the research are advised to contact the author for the final version of the publication, or visit the
DOI to the publisher's website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page
numbers.
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.
If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please
follow below link for the End User Agreement:
www.tue.nl/taverne
Take down policy
If you believe that this document breaches copyright please contact us at:
openaccess@tue.nl
providing details and we will investigate your claim.
Download date: 10. Aug. 2022

User-Guided Discovery of
Declarative Process Models
Fabrizio M. Maggi, Arjan J. Mooij and Wil M.P. van der Aalst
Department of Mathematics and Computer Science, Eindhoven University of Technology - The Netherlands
Email: {f.m.maggi, a.j.mooij, w.m.p.v.d.aalst}@tue.nl
Abstract—Process mining techniques can be used to effectively
discover process models from logs with example behaviour. Cross-
correlating a discovered model with information in the log can
be used to improve the underlying process. However, existing
process discovery techniques have two important drawbacks. The
produced models tend to be large and complex, especially in
flexible environments where process executions involve multiple
alternatives. This “overload” of information is caused by the fact
that traditional discovery techniques construct procedural models
explicitly showing all possible behaviours. Moreover, existing
techniques offer limited possibilities to guide the mining process
towards specific properties of interest.
These problems can be solved by discovering declarative mod-
els. Using a declarative model, the discovered process behaviour
is described as a (compact) set of rules. Moreover, the discovery
of such models can easily be guided in terms of rule templates.
This paper uses DECLARE, a declarative language that pro-
vides more flexibility than conventional procedural notations such
as BPMN, Petri nets, UML ADs, EPCs and BPEL. We present
an approach to automatically discover DECLARE models. This
has been implemented in the process mining tool ProM. Our
approach and toolset have been applied to a case study provided
by the company Thales in the domain of maritime safety and
security.
I. INTRODUCTION
More and more event data become available as informa-
tion technology becomes more pervasive. The tight coupling
between processes and supporting information systems is gen-
erating unprecedented amounts of data. Logs provide detailed
information about systems and human behaviour. Therefore,
it is possible to evaluate whether observed behaviour is con-
sistent with pre-defined standards or not.
A log consists of a set of process instances, and each
process instance is described by a sequence of events. Often
a log also contains further information. It can specify, for
instance, a timestamp to indicate the time when an event has
been recorded, an originator, i.e., the agent (human or system
application) triggering the event, and other additional data.
Over the last decade, a variety of techniques and algorithms
has been proposed for mining process models from logs [1].
These methods demonstrate that logs can be used to construct
models underlying a process execution from scratch (i.e.,
process discovery [2] [3] [4] [5]), or to identify discrepancies
between logs and a given predefined process model (i.e.,
conformance testing [6]).
Traditional discovery techniques focus on the extraction of
procedural models where all possible orderings of events must
be specified explicitly. A consequence of this characteristic
is that when applying them to real life logs (especially if
generated in environments with a lot of variability), they
often produce spaghetti-like models that tend to be completely
unreadable.
In the literature, several approaches are described [4] [7]
to filter the information contained in a log and to simplify
less-structured processes. However, they do not allow analysts
to guide the discovery process to specific properties they are
interested in, e.g., events that cannot occur in sequence, events
that always coexist in the same process instance, events that
must eventually occur when a given event “a” occurs etc.
These properties are simple and help to compactly represent
complex behaviours by focusing on specific aspects.
Constraint-based process modeling aims at representing
process models in a declarative way. Instead of explicitly
specifying all the allowed sequences of events in a business
process, the possible ordering of events is implicitly specified
with constraints, i.e., rules that must be followed during
execution. Anything that does not permanently violate these
constraints is possible.
In this paper, we use the declarative language DECLARE
[8] [9] [10], which is characterised by templates and which
is based on LTL semantics. We propose an approach for the
discovery of DECLARE models allowing analysts to specify
which kinds of templates they are interested in. This feature
allows analysts to shape the discovery process to extract the
properties that are most relevant for them.
Recent publications [11] [12] [15] introduce other tech-
niques for declarative process discovery. In these works,
SCIFF, an extension of logic programming, is used to specify
declarative process models. In particular, the algorithm from
[15] repeatedly performs a beam search on all candidate con-
straints. Each iteration of the algorithm produces a constraint
that best fits both the positive traces and the negative traces that
are not excluded by the previously discovered constraints. The
main difference between these techniques and our approach is
that process discovery using SCIFF is based on the assumption
that both compliant and non-compliant traces of execution are
provided. In contrast, our approach can also be applied when
only positive interpretations are available. This represents an
advantage of our discovery technique, since, in real life, logs
hardly provide clearly-marked negative information.
Apriori-like approaches such as sequence mining [13] and
episode mining [14] can discover local patterns in a log,
but they cannot generate an overall process model from it.
978-1-4244-9927-4/11/$26.00 ©2011 IEEE

Using such approaches, it is not possible to discover rules
representing negative behaviours (what should not happen)
and choices. In general, the LTL rules underlying DECLARE
templates have more expressive power than the sequences used
in sequence mining and the partial orders used in episode
mining.
Our proposed approach has been implemented as a plug-in
in the process mining tool ProM. This plug-in has been applied
to a case study in the domain of maritime safety and security.
In this case study, we discover the behaviour of several types of
vessels starting from the data recorded by electronic sensors.
Based on practical experiences in this case study, we devel-
oped a new mechanism to deal with the noise. Moreover, these
experiences also triggered adaptations to the algorithm in order
to provide a good performance. The case study also showed
that, in order to obtain significant results, it is necessary to
address two additional requirements.
First of all, a log is often composed of prefixes of larger
process instances. This characteristic can affect the discovered
DECLARE models because some constraints can be temporar-
ily violated on the available part of a process instance but
satisfied on its continuation. To solve this problem we propose
to apply the truncated semantics introduced in [19] to discover
significant DECLARE constraints also from truncated process
instances.
Secondly, using the standard LTL semantics, a DECLARE
constraint is discovered when it is non-violated so that it is
also discovered when it is trivially valid, i.e., it is independent
of the process instances in the log. In this case, the discov-
ered constraint does not capture the desired behaviour. We
propose to use techniques for LTL vacuity detection [21] [22]
[20] to discriminate between instances where a constraint is
generically non-violated and instances where the constraint is
non-trivially valid.
This paper is organised as follows. Section II introduces the
DECLARE formalism. Section III describes the main features
of our approach and its implementation in ProM. Section IV
explains how truncated LTL semantics and LTL vacuity detec-
tion can be used to improve the discovery process. Section V
provides an illustrative case study. Section VI concludes the
paper.
II. P
RELIMINARIES
DECLARE is a declarative language proposed by Pesic and
Van der Aalst in [8] [9]. The language has been developed to
fulfill two important criteria for a process modeling language:
it must be understandable for end-users and it must have a
formal semantics in order to be verifiable and executable.
A DECLARE model consists of a set of constraints which,
in turn, are based on templates. Templates are abstract entities
that define parameterised classes of properties, and constraints
are their concrete instantiations. Templates have a user-friendly
graphical representation understandable to the user and their
semantics are specified through LTL formulas (see TABLE I
for the semantics of the LTL operators). Each constraint
inherits the graphical representation and semantics from its
template. These features allow DECLARE to meet both crite-
ria mentioned before.
TABLE I
LTL O
PERATORS SEMANTICS
operator semantics
ϕ ϕ has to hold in the next position of a path.
ϕ ϕ has to hold always in the subsequent positions of a path.
ϕ ϕ has to hold eventually (somewhere) in the subsequent positions of a path.
ϕUψ
ϕ has to hold in a path at least until ψ holds. ψ must hold in the current or
in a future position.
DECLARE is a declarative language, because instead of
explicitly specifying the flow of the interactions among process
events, it describes a set of constraints which must be satisfied
throughout the process execution. In comparison with proce-
dural approaches that produce “closed” models, i.e., all what
is not explicitly specified is forbidden, DECLARE models are
“open” and tend to offer more possibilities for execution. In
this way, DECLARE supports flexibility.
The DECLARE templates characterise the language and
highlight different features which are worth being specified
in a process model. In particular, the templates are classified
according to four groups: existence, relation, negative relation
and choice. The first three groups are described in the follow-
ing subsections. For sake of brevity, we do not describe the
choice templates and refer to [10] for more information about
the DECLARE templates.
A. Existence Templates
The existence templates involve only one event (unary rela-
tionship) and define the cardinality or the position of an event
in a process instance. Templates of the type existence(n, A)
specify that A should occur at least n times in a process
instance. In contrast, templates of the type absence(n +1,A)
specify that A should occur at most n times. Templates
exactly(n, A) indicate that A should occur exactly n times.
Finally, init(A) specifies that each process instance should
start with event A. The graphical notation and LTL semantics
are shown in TABLE II.
TABLE II
E
XISTENCE TEMPLATES
name of template LTL semantics graphical representation
existence(1,A) A
existence(2,A) (A ∧(existence(1,A)))
... ...
existence(n, A) (A ∧(existence(n 1,A)))
absence(A) ¬existence(1,A)
absence(2,A) ¬existence(2,A)
absence(3,A) ¬existence(3,A)
... ...
absence(n +1,A) ¬existence(n +1,A)
exactly(1,A) existence(1,A) absence(2,A)
exactly(2,A) existence(2,A) absence(3,A)
... ...
exactly(n, A) existence(n, A) absence(n +1,A)
init(A) A
B. Relation Templates
Whereas an existence template describes the cardinality of
one event, a relation template defines a dependency between

two events. The responded existence(A, B) template speci-
fies that if event A occurs, event B should also occur (either
before or after event A). The co-existence(A, B) template
specifies that if one of the events A or B occurs, the other one
should also occur.
If event B is the response of event A, then when event
A occurs, event B should eventually occur after A. In con-
trast, the precedence(A, B) template indicates that event B
should occur only if event A has occurred before. Finally, the
succession(A, B) template requires that both response and
precedence relations hold between the events A and B.
Templates alternate response, alternate precedence and alter-
nate succession strengthen the above templates by specifying
that events must alternate without repetitions of these events
in between. Even more strict ordering relations are specified
by templates chain response, chain precedence and chain
succession. These templates require that the occurrences of the
two events (A and B) are next to each other. See TABLE III
for notation and semantics.
TABLE III
R
ELATION TEMPLATES
name of template LTL semantics graphical representation
responded existence(A, B) A B
co-existence(A, B) A B
response(A, B) (A B)
precedence(A, B) (¬BUA) (¬B)
succession(A, B)
response(A, B)
precedence(A, B)
alternate response(A, B) (A ⇒(¬AUB))
alternate precedence(A, B)
precedence(A, B)
(B ⇒(precedence(A, B)))
alternate succession(A, B)
alternate response(A, B)
alternate precedence(A, B)
chain response(A, B) (A ⇒B)
chain precedence(A, B) (B A)
chain succession(A, B) (A ⇔B)
C. Negative Relation Templates
The not co-existence(A, B) template indicates that event A
and B cannot occur together in the same process instance. Ac-
cording to the not succession(A, B) template any occurrence
of A cannot be followed eventually by B. Finally, according
to the not chain succession(A, B), A cannot be directly
followed by B. Negative relation templates are summarised
in TABLE IV.
TABLE IV
N
EGATIVE RELATION TEMPLATES
name of template LTL semantics graphical representation
not co-existence(A, B) ¬(A B)
not succession(A, B) (A ⇒¬(B))
not chain succession(A, B) (A ⇒(¬B))
III. APPROACH
The starting point of our work was a case study concerned
with the monitoring of vessel behaviour in the domain of
maritime safety and security. Fig. 1 shows how our approach
to mining DECLARE models is embedded in the general
approach used for the case study.
Fig. 1. DECLARE Mining within a general approach for vessel monitoring
Our proposed approach for the discovery of DECLARE
models is used in the first phase of the case study to build
a declarative reference model (representing the normal be-
haviour of vessels) starting from historical logs. In this phase,
users can specify which DECLARE templates will be used to
generate the DECLARE constraints in the discovered model.
Accordingly, they can choose which aspects of the vessel be-
haviour they want to highlight through the discovery process.
The discovered DECLARE constraints can be validated and
improved by domain experts to completely fit their needs and
build a proper reference model.
In a second phase (LTL constraints generation), the resulting
DECLARE constraints are translated into LTL constraints. In
the last phase (LTL conformance checking), live logs from
vessels are checked w.r.t. the obtained LTL constraints by
applying approaches for static or run-time LTL checking [16]
[17].
In the remainder of this section, we present our approach for
the DECLARE mining. First, we introduce a core algorithm
to discover DECLARE models, then we extend it with some
additional parameters and describe the implementation in
ProM.

A. Core Discovery Algorithm
For discovering DECLARE models we need (1) a set T of
DECLARE templates and (2) a (historical) log W . The first
step of our discovery algorithm is to generate a DECLARE
model D
candidates
consisting of candidate DECLARE con-
straints. Let E = {e
1
, .., e
n
} be the set of event classes
belonging to W . D
candidates
is generated by instantiating each
DECLARE template t(a
1
, .., a
k
) in T with all the possible
dispositions of length k of the n event classes e
1
, .., e
n
. Each
template with k parameters produces n
k
potential constraints
in the model so that the number of constraints in D
candidates
is h =
tT
n
k
t
where k
t
is the number of parameters of t.
In the second step of the discovery algorithm, D
candidates
is translated into an LTL model L
candidates
. This model is
composed of a list of LTL rules corresponding to D
candidates
.
Each rule l in L
candidates
is then checked w.r.t. W (we
use the algorithm described in [17]) to decide whether it is
satisfied in W or not. If l is not satisfied, it is removed from
the model. At the end of the checking phase, a filtered LTL
model L including the remaining LTL rules is available.
Finally, the model L is translated into a DECLARE model
D which is the result of the discovery process.
B. Additional Mining Parameters
To apply the algorithm from subsection III-A to real life
logs, we need to deal with the time complexity of the algorithm
and the noise in the logs. To address these issues, we introduce
some additional parameters to tune the discovery process.
The parameter Percentage of Events (PoE) can be used to
avoid the discovery of less-relevant constraints referring to
event classes which rarely occur in the log. This parameter
specifies the percentage of the event classes to be used to
generate the candidate constraints. For instance, if PoE = 50%
the discovered constraints will only involve 50% of the event
classes in the log (the most frequent ones). This parameter
has also a positive effect on the time complexity of the
algorithm. The number of candidate constraints in D
candidates
,
as explained in subsection III-A, can be very large. For
instance, for a log including 30 event classes and a single
template with 4 parameters, 810.000 candidate constraints are
generated and checked. Using PoE = 50%, the number of event
classes is reduced to the 15 most frequently-occurring ones;
thus, we only need to consider 50.625 candidate constraints,
16 times less than before.
The parameter Percentage of Instances (PoI) can be used
to specify that a DECLARE constraint can still be discovered
even if it does not hold for all process instances of the log.
For instance, if PoI = 80%, a constraint will be discovered
if at least 80% of the process instances satisfy the constraint.
This parameter is useful in case of noisy logs, where rules are
violated in exceptional cases, but hold for most cases.
C. Implementation
All phases of the approach shown in Fig. 1 are supported
by ProM plug-ins
1
.
1
http://www.win.tue.nl/declare/declare-miner/
The DECLARE mining phase is implemented as the DE-
CLARE Miner. The DECLARE Miner takes as input two
ProM objects representing the (historical) log and the set of
available DECLARE templates. Before starting the discovery,
the DECLARE miner allows users to specify which templates
will be used to generate the DECLARE constraints in the
discovered model. Moreover, users can set (for each template)
the parameters described in subsection III-B to tune the
discovery process according to their specific needs (Fig. 2).
The DECLARE Miner generates a DECLARE model object
by using the algorithm described in subsection III-A.
Fig. 2. GUI of the DECLARE Miner
The LTL constraints generation phase is supported by the
DECLARE2LTL plug-in. It takes as input a DECLARE model
object and generates the corresponding LTL model object. This
object can be used as an input of the LTL Checker to check
the conformance of live logs w.r.t. the discovered model (LTL
conformance checking phase).
IV. A
DVANCED MINING TECHNIQUES
In this section, we introduce two advanced techniques to
support the discovery of DECLARE models: the truncated
semantics for LTL formulas and the LTL vacuity detection.
These techniques are typically applied in the field of model
checking, but we have modified them for process discovery.
A. Truncated Semantics
In our core algorithm, a (candidate) constraint is discovered
if it is satisfied in a given percentage of the process instances.
However, often the available logs are extracted from larger
logs and the process instances are prefixes of larger process
instances. This affects the discovered constraints.
For instance, the semantics of the chain response template
is defined as:
(A ⇒B).
This means that, for every occurrence of A, a next event
exists and this event is B.IfA is the last event of the

Citations
More filters
Book ChapterDOI
25 Jun 2012
TL;DR: An Apriori algorithm is developed that can quickly generate understandable Declare models for real-life event logs based on event logs by reducing the search space of the Declare model.
Abstract: Process mining techniques often reveal that real-life processes are more variable than anticipated. Although declarative process models are more suitable for less structured processes, most discovery techniques generate conventional procedural models. In this paper, we focus on discovering Declare models based on event logs. A Declare model is composed of temporal constraints. Despite the suitability of declarative process models for less structured processes, their discovery is far from trivial. Even for smaller processes there are many potential constraints. Moreover, there may be many constraints that are trivially true and that do not characterize the process well. Naively checking all possible constraints is computationally intractable and may lead to models with an excessive number of constraints. Therefore, we have developed an Apriori algorithm to reduce the search space. Moreover, we use new metrics to prune the model. As a result, we can quickly generate understandable Declare models for real-life event logs.

147 citations

Journal ArticleDOI
TL;DR: This work presents Mobucon EC, a novel monitoring framework that tracks streams of events and continuously determines the state of business constraints, using the declarative language Declare, which has been suitably extended to support quantitative time constraints and non-atomic, durative activities.
Abstract: Today, large business processes are composed of smaller, autonomous, interconnected subsystems, achieving modularity and robustness. Quite often, these large processes comprise software components as well as human actors, they face highly dynamic environments and their subsystems are updated and evolve independently of each other. Due to their dynamic nature and complexity, it might be difficult, if not impossible, to ensure at design-time that such systems will always exhibit the desired/expected behaviors. This, in turn, triggers the need for runtime verification and monitoring facilities. These are needed to check whether the actual behavior complies with expected business constraints, internal/external regulations and desired best practices. In this work, we present Mobucon EC, a novel monitoring framework that tracks streams of events and continuously determines the state of business constraints. In Mobucon EC, business constraints are defined using the declarative language Declare. For the purpose of this work, Declare has been suitably extended to support quantitative time constraints and non-atomic, durative activities. The logic-based language Event Calculus (EC) has been adopted to provide a formal specification and semantics to Declare constraints, while a light-weight, logic programming-based EC tool supports dynamically reasoning about partial, evolving execution traces. To demonstrate the applicability of our approach, we describe a case study about maritime safety and security and provide a synthetic benchmark to evaluate its scalability.

126 citations

Journal ArticleDOI
TL;DR: In this article, a case study that shows how process mining techniques can be used to mediate between event data reflecting the clinical reality and clinical guidelines describing best-practices in medicine is presented.
Abstract: Our approach mediates between data reflecting the reality and clinical guidelines.We repair and enrich declarative process models based on event logs.Process mining techniques are used to check conformance and analyze deviations.We applied the approach in the Isala hospital in the Netherlands. Clinical guidelines aim at improving the quality of care processes through evidence-based insights. However, there may be good reasons to deviate from such guidelines or the guidelines may provide insufficient support as they are not tailored toward a particular setting (e.g., hospital policy or patient group characteristics). Therefore, we report a case study that shows how process mining techniques can be used to mediate between event data reflecting the clinical reality and clinical guidelines describing best-practices in medicine. Declarative models are used as they allow for more flexibility and are more suitable for describing healthcare processes that are highly unpredictable and unstable. Concretely, initial (hand made) models based on clinical guidelines are improved based on actual process executions (if these executions are proven to be correct). Process mining techniques can be also used to check conformance, analyze deviations, and enrich models with conformance-related diagnostics. The techniques have been applied in the urology department of the Isala hospital in the Netherlands. The results demonstrate that the techniques are feasible and that our toolset based on ProM and Declare is indeed able to provide valuable insights related to process conformance.

113 citations

Journal ArticleDOI
23 Jan 2015
TL;DR: This article discussed how it addressed the challenge of discovering declarative control flows in the context of artful processes by devised and implemented a two-phase algorithm, named MINERful, and described in detail its discovery technique.
Abstract: Artful processes are those processes in which the experience, intuition, and knowledge of the actors are the key factors in determining the decision making. They are typically carried out by the “knowledge workers,” such as professors, managers, and researchers. They are often scarcely formalized or completely unknown a priori. Throughout this article, we discuss how we addressed the challenge of discovering declarative control flows in the context of artful processes. To this extent, we devised and implemented a two-phase algorithm, named MINERful. The first phase builds a knowledge base, where statistical information extracted from logs is represented. During the second phase, queries are evaluated on that knowledge base, in order to infer the constraints that constitute the discovered process. After outlining the overall approach and offering insight on the adopted process modeling language, we describe in detail our discovery technique. Thereupon, we analyze its performances, both from a theoretical and an experimental perspective. A user-driven evaluation of the quality of results is also reported on the basis of a real case study. Finally, a study on the fitness of discovered models with respect to synthetic and real logs is presented.

99 citations

Book ChapterDOI
26 Aug 2013
TL;DR: This paper proposes a technique to automatically discover declarative process models that incorporate both control-flow dependencies and data conditions and discovers underspecified models capturing recurrent rules relating pairs of activities, thus providing a summarized view of key rules governing the process.
Abstract: A wealth of techniques are available to automatically discover business process models from event logs. However, the bulk of these techniques yield procedural process models that may be useful for detailed analysis, but do not necessarily provide a comprehensible picture of the process. Additionally, barring few exceptions, these techniques do not take into account data attributes associated to events in the log, which can otherwise provide valuable insights into the rules that govern the process. This paper contributes to filling these gaps by proposing a technique to automatically discover declarative process models that incorporate both control-flow dependencies and data conditions. The discovered models are conjunctions of first-order temporal logic expressions with an associated graphical representation (Declare notation). Importantly, the proposed technique discovers underspecified models capturing recurrent rules relating pairs of activities, as opposed to full specifications of process behavior --- thus providing a summarized view of key rules governing the process. The proposed technique is validated on a real-life log of a cancer treatment process.

89 citations

References
More filters
Proceedings Article
01 Jul 1998
TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
Abstract: We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving thii problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the transaction size and the number of items in the database.

10,863 citations

Journal ArticleDOI
TL;DR: A new algorithm is presented to extract a process model from a so-called "workflow log" containing information about the workflow process as it is actually being executed and represent it in terms of a Petri net.
Abstract: Contemporary workflow management systems are driven by explicit process models, i.e., a completely specified workflow design is required in order to enact a given workflow process. Creating a workflow design is a complicated time-consuming process and, typically, there are discrepancies between the actual workflow processes and the processes as perceived by the management. Therefore, we have developed techniques for discovering workflow models. The starting point for such techniques is a so-called "workflow log" containing information about the workflow process as it is actually being executed. We present a new algorithm to extract a process model from such a log and represent it in terms of a Petri net. However, we also demonstrate that it is not possible to discover arbitrary workflow processes. We explore a class of workflow processes that can be discovered. We show that the /spl alpha/-algorithm can successfully mine any workflow represented by a so-called SWF-net.

1,953 citations

Journal ArticleDOI
TL;DR: This work gives efficient algorithms for the discovery of all frequent episodes from a given class of episodes, and presents detailed experimental results that are in use in telecommunication alarm management.
Abstract: Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a given partial order. We consider the problem of discovering frequently occurring episodes in a sequence. Once such episodes are known, one can produce rules for describing or predicting the behavior of the sequence. We give efficient algorithms for the discovery of all frequent episodes from a given class of episodes, and present detailed experimental results. The methods are in use in telecommunication alarm management.

1,593 citations

Journal ArticleDOI
TL;DR: An incremental approach to check the conformance of a process model and an event log is proposed and a Conformance Checker has been implemented within the ProM framework and it has been evaluated using artificial and real-life event logs.

1,111 citations

Journal ArticleDOI
TL;DR: The concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm, is introduced and strong connections between the verification problem and the hypergraph transversal problem are shown.
Abstract: One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S ⊆ L determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.

952 citations

Frequently Asked Questions (12)
Q1. What are the contributions in "User-guided discovery of declarative process models" ?

This paper uses DECLARE, a declarative language that provides more flexibility than conventional procedural notations such as BPMN, Petri nets, UML ADs, EPCs and BPEL. The authors present an approach to automatically discover DECLARE models. Their approach and toolset have been applied to a case study provided by the company 

In the near future, the authors want to extend their approach by providing users with the possibility to discover strongly, neutrally or weakly satisfied constraints depending on the level of reliability or flexibility they need in the discovery process. Given a constraint and a process instance where it is nonvacuously satisfied, it could also be useful to provide further information about how many times the constraint has been “ activated ” in the process instance ( counting the number of violations for ¬witness ( ϕ ) ). Also, given a constraint and a process instance where it is violated, the level of “ healthiness ” of the process instance can be evaluated based on the number of violations. 

The parameter Percentage of Events (PoE) can be used to avoid the discovery of less-relevant constraints referring to event classes which rarely occur in the log. 

For instance, for a log including 30 event classes and a single template with 4 parameters, 810.000 candidate constraints are generated and checked. 

Their proposed approach for the discovery of DECLARE models is used in the first phase of the case study to build a declarative reference model (representing the normal behaviour of vessels) starting from historical logs. 

The sub-log associated to vessel type tanker/unknown cargo type D is composed of an alternation of event under way using engine followed by moored or at anchor. 

The first step of their discovery algorithm is to generate a DECLARE model Dcandidates consisting of candidate DECLARE constraints. 

For instance, if PoE = 50% the discovered constraints will only involve 50% of the event classes in the log (the most frequent ones). 

In this section, the authors introduce two advanced techniques to support the discovery of DECLARE models: the truncated semantics for LTL formulas and the LTL vacuity detection. 

The existence templates involve only one event (unary relationship) and define the cardinality or the position of an event in a process instance. 

the succession(A,B) template requires that both response and precedence relations hold between the events A and B.Templates alternate response, alternate precedence and alternate succession strengthen the above templates by specifying that events must alternate without repetitions of these events in between. 

The DECLARE Miner generates a DECLARE model object by using the algorithm described in subsection III-A.The LTL constraints generation phase is supported by the DECLARE2LTL plug-in.