scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Active Trace Clustering for Improved Process Discovery

TL;DR: In an assessment using four complex, real-life event logs, it is shown that this technique significantly outperforms currently available trace clustering techniques.
Abstract: Process discovery is the learning task that entails the construction of process models from event logs of information systems. Typically, these event logs are large data sets that contain the process executions by registering what activity has taken place at a certain moment in time. By far the most arduous challenge for process discovery algorithms consists of tackling the problem of accurate and comprehensible knowledge discovery from highly flexible environments. Event logs from such flexible systems often contain a large variety of process executions which makes the application of process mining most interesting. However, simply applying existing process discovery techniques will often yield highly incomprehensible process models because of their inaccuracy and complexity. With respect to resolving this problem, trace clustering is one very interesting approach since it allows to split up an existing event log so as to facilitate the knowledge discovery process. In this paper, we propose a novel trace clustering technique that significantly differs from previous approaches. Above all, it starts from the observation that currently available techniques suffer from a large divergence between the clustering bias and the evaluation bias. By employing an active learning inspired approach, this bias divergence is solved. In an assessment using four complex, real-life event logs, it is shown that our technique significantly outperforms currently available trace clustering techniques.
Citations
More filters
Journal ArticleDOI
TL;DR: The proposed framework unifies a number of approaches for correlation analysis proposed in literature, proposing a general solution that can perform those analyses and many more and has been implemented in ProM and combines process and data mining techniques.

212 citations

Journal ArticleDOI
TL;DR: It is possible to observe that the most active research topics are associated with the process discovery algorithms, followed by conformance checking, and architecture and tools improvements, and finally application domains among different business segments are reported on.
Abstract: Process mining is a growing and promising study area focused on understanding processes and to help capture the more significant findings during real execution rather than, those methods that, only observed idealized process model. The objective of this article is to map the active research topics of process mining and their main publishers by country, periodicals, and conferences. We also extract the reported application studies and classify these by exploration domains or industry segments that are taking advantage of this technique. The applied research method was systematic mapping, which began with 3713 articles. After applying the exclusion criteria, 1278 articles were selected for review. In this article, an overview regarding process mining is presented, the main research topics are identified, followed by identification of the most applied process mining algorithms, and finally application domains among different business segments are reported on. It is possible to observe that the most active research topics are associated with the process discovery algorithms, followed by conformance checking, and architecture and tools improvements. In application domains, the segments with major case studies are healthcare followed by information and communication technology, manufacturing, education, finance, and logistics.

183 citations

Journal ArticleDOI
TL;DR: In this paper, the effects of metacognitive prompts on learning processes and outcomes during a computer-based learning task were analyzed using concurrent think-aloud protocols and process mining techniques were used to analyze sequential patterns.
Abstract: According to research examining self-regulated learning (SRL), we regard individual regulation as a specific sequence of regulatory activities. Ideally, students perform various learning activities, such as analyzing, monitoring, and evaluating cognitive and motivational aspects during learning. Metacognitive prompts can foster SRL by inducing regulatory activities, which, in turn, improve the learning outcome. However, the specific effects of metacognitive support on the dynamic characteristics of SRL are not understood. Therefore, the aim of our study was to analyze the effects of metacognitive prompts on learning processes and outcomes during a computer-based learning task. Participants of the experimental group (EG, n = 35) were supported by metacognitive prompts, whereas participants of the control group (CG, n = 35) received no support. Data regarding learning processes were obtained by concurrent think-aloud protocols. The EG exhibited significantly more metacognitive learning events than did the CG. Furthermore, these regulatory activities correspond positively with learning outcomes. Process mining techniques were used to analyze sequential patterns. Our findings indicate differences in the process models of the EG and CG and demonstrate the added value of taking the order of learning activities into account by discovering regulatory patterns.

73 citations

Journal ArticleDOI
TL;DR: This paper investigates a multiple view aware approach to trace clustering, based on a co-training strategy, and shows that the presented algorithm is able to discover a clustering pattern of the log, such that related traces result appropriately clustered.
Abstract: Process mining refers to the discovery, conformance, and enhancement of process models from event logs currently produced by several information systems (e.g. workflow management systems). By tightly coupling event logs and process models, process mining makes it possible to detect deviations, predict delays, support decision making, and recommend process redesigns.Event logs are data sets containing the executions (called traces) of a business process. Several process mining algorithms have been defined to mine event logs and deliver valuable models (e.g. Petri nets) of how logged processes are being executed. However, they often generate spaghetti-like process models, which can be hard to understand. This is caused by the inherent complexity of real-life processes, which tend to be less structured and more flexible than what the stakeholders typically expect. In particular, spaghetti-like process models are discovered when all possible behaviors are shown in a single model as a result of considering the set of traces in the event log all at once.To minimize this problem, trace clustering can be used as a preprocessing step. It splits up an event log into clusters of similar traces, so as to handle variability in the recorded behavior and facilitate process model discovery. In this paper, we investigate a multiple view aware approach to trace clustering, based on a co-training strategy. In an assessment, using benchmark event logs, we show that the presented algorithm is able to discover a clustering pattern of the log, such that related traces result appropriately clustered. We evaluate the significance of the formed clusters using established machine learning and process mining metrics.

65 citations


Cites background from "Active Trace Clustering for Improve..."

  • ...[8] show how this approach may suffer from scalability problems....

    [...]

  • ...This category of algorithms, which determine clusters by optimizing a distance-based criterion function, is frequently employed in process mining [3], [4], [5], [6], [7], [8], as the distance is, in...

    [...]

Book ChapterDOI
09 Sep 2018
TL;DR: The main contribution of this paper is the proposal of representation learning architectures at the level of activities, traces, logs, and models that can produce a distributed representation of these objects and a thorough analysis of potential applications.
Abstract: In process mining, the challenge is typically to turn raw event data into meaningful models, insights, or actions. One of the key problems of a data-driven analysis of processes, is the high dimensionality of the data. In this paper, we address this problem by developing representation learning techniques for business processes. More specifically, the representation learning paradigm is applied to activities, traces, logs, and models in order to learn highly informative but low-dimensional vectors, often referred to as embeddings, based on a neural network architecture. Subsequently, these vectors can be used for automated inference tasks such as trace clustering, process comparison, predictive process monitoring, anomaly detection, etc. Accordingly, the main contribution of this paper is the proposal of representation learning architectures at the level of activities, traces, logs, and models that can produce a distributed representation of these objects and a thorough analysis of potential applications. In an experimental evaluation, we show the power of such derived representations in the context of trace clustering and process model comparison.

58 citations

References
More filters
Journal ArticleDOI
01 Apr 1989
TL;DR: The author proceeds with introductory modeling examples, behavioral and structural properties, three methods of analysis, subclasses of Petri nets and their analysis, and one section is devoted to marked graphs, the concurrent system model most amenable to analysis.
Abstract: Starts with a brief review of the history and the application areas considered in the literature. The author then proceeds with introductory modeling examples, behavioral and structural properties, three methods of analysis, subclasses of Petri nets and their analysis. In particular, one section is devoted to marked graphs, the concurrent system model most amenable to analysis. Introductory discussions on stochastic nets with their application to performance modeling, and on high-level nets with their application to logic programming, are provided. Also included are recent results on reachability criteria. Suggestions are provided for further reading on many subject areas of Petri nets. >

10,755 citations


"Active Trace Clustering for Improve..." refers background in this paper

  • ...In the next sections, both evaluation dimensions will be detailed....

    [...]

01 Jan 2009
TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.
Abstract: The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns. An active learner may pose queries, usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant or easily obtained, but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for successful active learning, a summary of problem setting variants and practical issues, and a discussion of related topics in machine learning research are also presented.

5,227 citations


"Active Trace Clustering for Improve..." refers background in this paper

  • ...Its conception is significantly different from earlier techniques because it starts from the observation that traditional trace clustering techniques suffer from a divergence between the clustering bias and the evaluation bias....

    [...]

Book
01 Jan 2011
TL;DR: This book provides real-world techniques for monitoring and analyzing processes in real time and is a powerful new tool destined to play a key role in business process management.
Abstract: The first to cover this missing link between data mining and process modeling, this book provides real-world techniques for monitoring and analyzing processes in real time It is a powerful new tool destined to play a key role in business process management

2,287 citations


"Active Trace Clustering for Improve..." refers background in this paper

  • ...PROCESS mining has been demonstrated to possess the capabilities to profoundly assess business processes [1]....

    [...]