Journal Article•DOI•

A Study Into the Factors That Influence the Understandability of Business Process Models

Hajo A. Reijers¹, Jan Mendling²•Institutions (2)

Eindhoven University of Technology¹, Humboldt University of Berlin²

01 May 2011-Vol. 41, Iss: 3, pp 449-462

TL;DR: Findings are that both types of investigated factors affect model understanding, while personal factors seem to be the more important of the two.

read less

Abstract: Business process models are key artifacts in the development of information systems. While one of their main purposes is to facilitate communication among stakeholders, little is known about the factors that influence their comprehension by human agents. On the basis of a sound theoretical foundation, this paper presents a study into these factors. Specifically, the effects of both personal and model factors are investigated. Using a questionnaire, students from three different universities evaluated a set of realistic process models. Our findings are that both types of investigated factors affect model understanding, while personal factors seem to be the more important of the two. The results have been validated in a replication that involves professional modelers.

...read moreread less

Summary (3 min read)

Jump to: [Origine et évolution du concept de système d’innovation] – [Un cadre d’analyse pour les travaux se référant aux Systèmes d’Innovation] – [Deux principaux référentiels théoriques mobilisés par les auteurs] – [Le repérage de profils d’articles par l’analyse de correspondance multiple] – [ET SPÉCIFICITÉ AGRICOLE DES SI] – [Réinterroger la spécificité agricole de l’innovation, des SI et des recherches sur les SI] and [CONCLUSION]

Origine et évolution du concept de système d’innovation

Le concept de SI est même retenu comme l’un des quatre piliers de cette communauté scientifique, dominée par les approches économiques, mais incluant aussi des chercheurs en sciences de gestion, histoire ou sociologie (Fagerberg, Verspagen, 2009).
Des relations ou figures particulières ont aussi été mises en avant, comme le modèle de la « triple hélice » associant entreprises, universités et État (Leydesdorff, Etzkowitz, 1998).
Le concept de « système social d’innovation et de production » a été proposé pour prendre en compte les complémentarités institutionnelles jouant sur l’innovation et intégrer les dynamiques technologiques (Amable, 2003).
La Banque Mondiale s’est aussi fait un relai pour médiatiser le concept de SI qu’elle définit comme « un réseau d’organisations, d’entreprises et d’individus focalisés sur l’exploitation économique de nouveaux produits, procédés et formes d’organisation, ainsi que les institutions et les politiques qui influencent leur comportement et leur performance » (World Bank, 2006).

Un cadre d’analyse pour les travaux se référant aux Systèmes d’Innovation

Ce retour sur l’évolution des travaux sur les SI permet de proposer une grille d’analyse générique, dont le but est de préciser comment le concept est mobilisé dans des articles scientifiques, ouvrages ou documents politiques, en particulier dans un domaine comme l’agriculture et l’agroalimentaire.
Nous avons pour cela réalisé une étude bibliométrique1 à partir de trois moteurs de recherche : CAB2, Web of Science3 et Scopus4.
Multidisciplinaire, il recense plus de 10 000 revues.
Activités ciblées par le SI (citations et expertise) : Agriculture, biotechnologie, agroalimentaire, développement rural 2.2.

Deux principaux référentiels théoriques mobilisés par les auteurs

De manière générale le référentiel théorique est peu explicité dans les résumés, titres et mots clés.
Les Pays Moins Avancés (Afrique subsaharienne pour l’essentiel) sont les premiers concernés avec 34 % des publications, privilégiant l’analyse d’innovations agricoles.
Les Pays en transition ou méditerranéens sont peu présents (6 %, avec des articles essentiellement sur l’Espagne), tout comme les articles comparatifs questionnant la dimension internationale des SI (moins de 15 % des articles).
Dans 22 % des articles, le SI est un objet d’étude dans sa globalité, comme système.

Le repérage de profils d’articles par l’analyse de correspondance multiple

Pour synthétiser ces résultats et repérer les combinaisons entre ancrage théorique, domaine d’application et usages du SI, nous avons réalisé une analyse factorielle6 (Analyse des Correspondances Multiples) à partir du tableau de Burt rassemblant les variables qualitatives décrites précédemment.
Il oppose donc une posture critique et détachée de l’action, à 6.
Une analyse des contingences entre les modalités de la référence théorique et celles des autres variables précise cette observation (tableau 8).

ET SPÉCIFICITÉ AGRICOLE DES SI

Plusieurs communautés de connaissance mobilisent le concept de SI dans l’agriculture L’analyse bibliométrique suggère l’existence de 4 groupes d’articles marqués par des références théoriques différentes.
Ces communautés sont des regroupements plus ou moins structurés de scientifiques utilisant la notion SI, associés à des acteurs politiques ou économiques.
Les scientifiques viennent pour une large part d’une tradition de recherche constituée autour de l’agriculture (travaux sur le développement agricole, analyse des systèmes de recherche agronomiques, approches Farming System…) et sont associés à des institutions de recherche et de développement agronomique.
Les scientifiques peuvent être associés à une ingénierie du développement ou de la formation qui se recentre sur les innovations en milieu rural et la recherche action (Sanginga et al., 2009 ; Faure et al., 2010).

Réinterroger la spécificité agricole de l’innovation, des SI et des recherches sur les SI

Le repérage de ces communautés de connaissance amène alors à revenir sur les conditions d’une construction de définitions et d’usages du concept de SI qui seraient spécifiques à l’agriculture et l’agroalimentaire.
Ce questionnement peut être conduit en partant des arguments que développe la communauté épistémique qui cherche à produire des notions propres (AIS, AKIS…).
Les multiples domaines d’apprentissage technique et organisationnel, la nécessité d’adaptation et d’expérimentation locale de connaissances génériques, l’importance de « connaissances tacites », mais aussi l’implication croissante des consommateurs citoyens dans les conditions de production orientent les besoins de formation et les formes de médiations associés à la construction de ces connaissances (Goulet, Vinck, 2012).
– Enfin, l’agriculture et l’agroalimentaire sont confrontés, sans doute plus que d’autres secteurs, à un renouvellement d’enjeux qui appellent à inscrire les innovations agricoles dans des perspectives de long terme :.
Mais elles appellent aussi à développer les travaux sur les Systèmes Sectoriels d’Innovation (Malerba, 2002), en confrontant l’exemple agricole et agroalimentaire à d’autres secteurs pour renforcer l’édifice conceptuel permettant d’étudier l’innovation.

CONCLUSION

Nous avons montré en partant d’une analyse bibliométrique et bibliographique que le concept de SI connaît un développement important dans les travaux sur l’innovation dans l’agriculture et l’agroalimentaire.
Le succès croissant du concept dans ce secteur apparaît lié à la co-évolution de plusieurs communautés de connaissance, cherchant pour certaines à appliquer le cadre théorique et analytique des Innovation Studies (Martin, 2012), pour d’autres à construire un corpus de concepts et méthodes plus original, dans la lignée de travaux de sociologie et d’économie rurale ou du développement.
La question de la spécificité des conditions de l’innovation dans le secteur agricole est centrale et apparaît renouvelée du fait d’enjeux qui inscrivent aujourd’hui cette activité davantage dans les perspectives de long terme et dans les débats de la société.
Du côté des travaux appliquant les concepts évolutionnistes, l’arbitrage entre la reconnaissance d’une spécificité sectorielle (via la notion de Système Sectoriel d’Innovation) ou la minimisation de cette question (accompagnant par exemple le développement des biotechnologies) n’est pas tranchée.
Dans tous les cas, le maintien d’une dialectique entre plusieurs communautés de connaissance autour de l’usage de SI apparaît comme un gage de vivacité scientifique, à condition sans doute que les différentes communautés puissent interagir davantage.

Did you find this useful? Give us your feedback

Figures (12)

TABLE I METRICS’ VALUES FOR THE SAMPLE PROCESS MODEL IN FIGURE 2

Fig. 2. Sample process model to illustrate the model factors

TABLE V REGRESSION MODELS FOR PERSONAL AND MODEL FACTORS

Fig. 6. Multivariate linear regression model explaining the average SCORE

Fig. 4. Fragments of process models J, K, and L (from left to right)

Fig. 5. Boxplot of average SCORE for different values of EDUCATION

TABLE IV MODEL FACTORS: CORRELATION ANALYSIS WITH UNDERSTANDING (SCORE)

TABLE III PERSONAL FACTORS: KRUSKAL-WALLIS ANALYSIS OF DIFFERENCES IN UNDERSTANDING (SCORE)

Content maybe subject to copyright Report

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART A 1

A Study into the Factors that Inﬂuence the

Understandability of Business Process Models

Hajo A. Reijers and Jan Mendling

Abstract—Business process models are key artifacts in the

development of information systems. While one of their main

purposes is to facilitate communication among stakeholders, little

is known about the factors that inﬂuence their comprehension by

human agents. On the basis of a sound theoretical foundation, this

paper presents a study into these factors. Speciﬁcally, the effects

of both personal and model factors are investigated. Using a

questionnaire, students from three different universities evaluated

a set of realistic process models. Our ﬁndings are that both types

of investigated factors affect model understanding, while personal

factors seem to be the more important of the two. The results

have been validated in a replication that involves professional

modelers.

Index Terms—Business Process Modeling, Process models,

Human information processing, Complexity measures.

I. INTRODUCTION

INCE the 1960s, conceptual models are in use to facilitate

the early detection and correction of system development

errors [1]. In more recent years, the primary focus of con-

ceptual modeling efforts has shifted to business processes [2].

Models resulting from such efforts are commonly referred to

as business process models, or process models for short. They

are used to support the analysis and design of, for example,

process-aware information systems [3], service-oriented archi-

tectures [4], and web services [5].

Process models typically capture in some graphical notation

what tasks, events, states, and control ﬂow logic constitute a

business process. A business process that is in place to deal

with complaints may, for example, contain a task to evaluate

the complaint, which is followed by another one specifying

that the customer in question is to be contacted. Similar to

other conceptual models, process models are ﬁrst and foremost

required to be intuitive and easily understandable, especially

in IS project phases that are concerned with requirements doc-

umentation and communication [6]. Today, many companies

design and maintain several thousand process models, often

also involving non-expert modelers [7]. It has been observed

that such large model collections exhibit serious quality issues

in industry practice [8].

Against this background it is problematic that little insights

exist into what inﬂuences the quality of process models, in

particular with respect to their understandability. The most

important insight is that the size of the model is of notable

Hajo A. Reijers is with the School of Industrial Engineering, Eind-

hoven University of Technology, Eindhoven, The Netherlands, e-mail:

h.a.reijers@tue.nl.

Jan Mendling is with the School of Business and Economics, Humboldt-

Universit

at zu Berlin, Germany, e-mail jan.mendling@wiwi.hu-berlin.de

Manuscript submitted April 19, 2009.

impact. An empirical study provides evidence that larger, real-

world process models tend to have more formal ﬂaws (such as

e.g. deadlocks or unreachable end states) than smaller models

[9]. A likely explanation for this phenomenon would be that

human modelers loose track of the interrelations in large and

complex models due to their limited cognitive capabilities (cf.

[10]). They then introduce errors that they would not insert in

a small model, which will make the model less effective for

communication purposes.

There is both an academic and a practical motivation to

look beyond this insight. To start with the former, it can be

virtually ruled out that size is the sole factor that plays a

role in understanding a process model. To illustrate, a purely

sequential model will be easier to understand than a model that

is similar in size but where tasks are interrelated in various

ways. This raises the interest into the other factors that play

a role here. A more practical motivation is that model size

is often determined by the modeling domain or context. So,

process modelers will ﬁnd it difﬁcult to affect the size metric

of a process model towards creating a better readable version:

They cannot simply skip relevant parts from the model.

The aim of this paper is to investigate whether factors can

be determined beyond the size of process model that inﬂuence

its understanding. In that respect, we distinguish between

model factors and personal factors. Model factors relate to

the process model itself and refer to characteristics such as

a model’s density or structuredness. Personal factors relate to

the reader of such a model, for example with respect to one’s

educational background or the perceptions that are held about

a process model. While insights are missing into the impact

of any of such factors – beyond size – on process model

understandability, research has suggested the importance of

similar factors in other conceptual data models [11], [12].

To investigate the impact of personal and model factors,

the research that is presented here takes on a survey design.

Using a questionnaire that has been ﬁlled out by 73 students

from three different universities, hypothetical relations be-

tween model and personal factors on the understanding of a set

of process models are investigated. Some exploratory ﬁndings

from this data are reported in [13], [14], which essentially

conﬁrm the signiﬁcance of the two types of factors. The

contribution of this paper is quite different. We develop a

sound theoretical foundation for discussing individual model

understanding that is rooted in cognitive research on computer

program comprehension. We use this foundation to establish

hypotheses on the connection between understandability on

the one hand, and personal and model factors on the other

hand. For these tests we use the before mentioned survey

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART A 2

data. Beyond that, we provide an extensive validation of our

ﬁndings and our instruments addressing two major challenges.

First, there has been little research on construct validity of

understandability measures. We use Cronbach’s alpha to check

the consistency of our questions that are used in calculating

the understandability score. Furthermore, we address potential

threats to external validity. We report on a replication of our

survey with practitioners and investigate if the results differ

from that of the students.

The rest of this paper is organized in accordance with the

presented aims. Section II introduces the theoretical founda-

tions of process model understanding. We identify matters of

process model understanding and respective challenges. This

leads us to factors of understanding. Section III describes the

setup of our survey design and the motivations behind it. Sec-

tion IV then presents the data analysis and the interpretation.

Section V discusses threats to validity and how we addressed

them. Section VI concludes the article. We use Appendix A

to summarize our survey design.

II. BACKGROUND

This section introduces the theoretical background of our

empirical research. Section II-A gives a brief overview of the

information content of a process model, deﬁnes a notion of

understandability, and summarizes related work on process

model understanding. Section II-B investigates potential fac-

tors of understandability. We utilize insights from cognitive

research into computer program comprehension in order to

derive propositions about the signiﬁcance of personal and

model factors for understanding.

A. Matters of Process Model Understanding

Before considering a notion of understandability we ﬁrst

have to discuss matters that can be understood from a process

model. We are focusing on so-called activity-based or control-

ﬂow-based process models (in contrast to goal-oriented [15]

and choreography-oriented languages [16]). Figure 1 shows an

example of such a process model in a notation that we will

use throughout this paper. This notation essentially covers the

commonalities of Event-driven Process Chains (EPCs) [17],

[18] and the Business Process Modeling Notation (BPMN)

[19], which are two of the most frequently used notations for

process modeling. Such a process model describes the control

ﬂow between different activities (A, B, I, J, K, L, M, N, and

O in Figure 1) using arcs. So-called connectors (XOR, AND,

OR) deﬁne complex routing constraints of splits (multiple

outgoing arcs) and joins (multiple ingoing arcs). XOR-splits

represent exclusive choices and XOR-joins capture respective

merges without synchronization. AND-splits introduce concur-

rency of all outgoing branches while AND-joins synchronize

all incoming arcs. OR-splits deﬁne inclusive choices of a 1-to-

all fashion. OR-joins synchronize such multiple choices, which

requires a quite sophisticated implementation (see [18], [20]).

Furthermore, there are speciﬁc nodes to indicate start and end.

In this paper we consider formal statements that can be

derived about the behavior described by such a process model,

ignoring the (informal) content of activity labels. This formal

Start

XOR

L M

AND

XOR

Fig. 1. Part of a process model

focus has the advantage that we can unambiguously evaluate

whether an individual has grasped a particular model aspect

correctly. In particular, we focus on binary relationships be-

tween two activities in terms of execution order, exclusiveness,

concurrency, and repetition. These relationships play an impor-

tant role for reading, modifying, and validating the model.

• Execution Order relates to whether the execution of

one activity a

eventually leads the execution of another

activity a

. In Figure 1, the execution of J leads to the

execution of L.

• Exclusiveness means that two activities a

and a

can

never be executed in the same process instance. In

Figure 1, J and K are mutually exclusive.

• The concurrency relation covers two activities a

and a

they can potentially be executed concurrently. In Figure 1,

L and M are concurrent.

• A single activity a is called repeatable if it is possible

to execute it more than once for a process instance. In

Figure 1, among others, K, N, and I are repeatable.

Statements such as “Executing activity a

implies that

will be executed later” can be easily veriﬁed using the

reachability graph of the process model. A reachability graph

captures all states and transitions represented by the process

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART A 3

model and it can be (automatically) generated from it. For

some classes of models, several relationships can be calcu-

lated more efﬁciently without the reachability graph [21]. For

instance, these relations can be constructed for those process

models that map to free-choice Petri nets in O(n

) time [22],

[23].

B. Factors of Process Model Understanding

Throughout this paper, we use the term understandability in

order to refer to the degree to which information contained in

a process model can be easily understood by a reader of that

model. This deﬁnition already implies that understandability

can be investigated from two major angles: personal factors

related to the model reader and factors that relate to the model

itself. We discuss the relevance of both categories using the

cognitive dimensions framework as a theoretical foundation.

The cognitive dimensions framework is a set of aspects

that have empirically been proven to be signiﬁcant for the

comprehension of computer programs and visual notations

[24]. There are two major ﬁndings that the framework builds

upon: A representation always emphasizes a certain informa-

tion at the expense of another one, and there has to be a ﬁt

between the mental task at hand and the notation [25], [26].

The implications of these insights are reﬂected by cognitive

dimensions that are relevant for process model reading.

• Abstraction Gradient refers to the grouping capabilities

of a notation. In a single process model, there is no

mechanism to group activities. Therefore, ﬂow languages

are called abstraction-hating [24]. As a consequence,

the more complex the model gets the more difﬁcult it

becomes for the model reader to identify those parts that

closely relate. Presumably, expert model readers will be

more efﬁcient in ﬁnding the related parts.

• Hard Mental Operations relates to an over-proportional

increase in difﬁculty when elements are added to a

representation. This is indeed the case for the behav-

ioral semantics of a process model. In the general case,

calculating the reachability graph for a process model

is NP-complete [27]. Therefore, a larger process model

is over-proportionally more difﬁcult to interpret than a

simple model. On the other hand, experts are more likely

to know decomposition strategies, e.g. as described in

[28], to cope with complexity.

• Hidden Dependencies refer to interdependencies that are

not fully visible. In process models such hidden depen-

dencies exist between split and join connectors: each split

should have a matching join connector of the same type,

e.g. to synchronize concurrent paths. In complex models,

distant split-join pairs can be quite difﬁcult to trace. In

general such interdependencies can be analyzed using the

reachability graph, but many analyses can be performed

also structurally (see [29]). Experts modelers tend to use

structural heuristics for investigating the behavior of a

process model.

• Secondary Notation refers to any piece of extra informa-

tion that is not part of the formalism. In process models

secondary notation is an important matter, among others

in terms of labeling conventions [30] or layout strategies

[31]. For models of increasing complexity, secondary

notation also gains in importance for making the hidden

dependencies better visible. On the other hand, it has

been shown that experts’ performance is less dependent

on secondary notation as that of novices [32].

Personal factors have also been recognized as important

factors in engineering and design [33], [34]. In particular,

the matter of expertise is clearly established by prior research

on human-computer interaction. While research on perceptual

quality and perceptual expertise is only emerging recently in

conceptual modeling (see [35], [36]), there are some strong

insights into the factors of expert performance in different

areas. A level of professional expertise is assumed to take at

least 1,000 to 5,000 hours of continuous training [37, p.563].

In this context, it is not only important that the expert has

worked on a certain matter for years, but also that practicing

has taken place on a daily basis [38]. Such regular training is

needed to build up experience, knowledge, and the ability to

recognize patterns [39]. Furthermore, the way information is

processed by humans is inﬂuenced by cognitive styles, which

can be related to personality. There are persons who prefer

verbal over image information and who rather grasp the whole

instead of analytically decomposing a matter, or the other way

round [40]. As models enable reasoning through visualization,

perceptional capabilities of a person are also relevant [41].

Clearly, these capabilities differ between persons with different

process modeling expertise.

We conclude for this theoretical discussion that model

features and personal characteristics are indeed likely to be

relevant factors of process model understandability.

C. Related work

In this section we present related work grouped into three

categories: model factors, personal factors, and other factors.

The importance of model characteristics was intuitively

assumed by early work into process model metrics. Such

metrics quantify structural properties of a process model,

inspired by prior work in software engineering on lines of

code, cyclomatic number, or object-oriented metrics [42]–[44].

Early contributions by Lee and Yoon, Nissen, and Morasca

[45]–[47] focus on deﬁning metrics. More recently, different

metrics have been validated empirically. The work of Cardoso

is centered around an adaptation of the cyclomatic number

for business processes he calls control-ﬂow complexity (CFC)

[48]. This metric was validated with respect to its correlation

with perceived complexity of process models [49]. The re-

search conducted by a group including Canfora, Rol

on, and

Garc

ıa analyzes understandability as an aspect of maintainabil-

ity. They include different metrics of size, complexity, and cou-

pling in a set of experiments, and identify several correlations

[50], [51]. Further metrics take their motivation from cognitive

research, e.g. [14], and based on concepts of modularity, e.g.

[52], [53]. Most notably, an extensive set of metrics has been

validated as factors of error probability [9], a symptom of bad

understanding. The different validations clearly show that size

is an important model factor for understandability, but does

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART A 4

not fully determine phenomenons of understanding: additional

metrics like structuredness help to improve the explanatory

power signiﬁcantly [18].

Personal factors have been less intensively researched as

factors of process model understanding. The experiment by

Recker and Dreiling operationalizes the notion of process

modeling expertise by a level of familiarity of a particular

modeling notation [54]. In a survey by Mendling, Reijers, and

Cardoso participants are characterized based on the number

of process models they created and the years of modeling

experience they have [13]. Mendling and Strembeck measure

theoretical knowledge of the participants in another survey

using six yes/no questions [55]. Most notable are two results

that point to the importance of theoretical process modeling

knowledge. In the Mendling, Reijers, and Cardoso survey the

participants from TU Eindhoven with strong Petri net educa-

tion scored best and in the Mendling and Strembeck survey,

there was a high correlation between theoretical knowledge

and the understandability score.

There are other factors that also might have an impact on

process model understanding. We brieﬂy discuss model pur-

pose, problem domain, modeling notation, and visual layout.

Model purpose: The understanding of a model may be

affected by the speciﬁc purpose the modeler had in mind. The

best example is that some process models are not intended

to be used on a day-to-day basis by people but instead are

explicitly created for automatic enactment. In such a case,

less care will be given to make them comprehensible to

humans. The differences between process models as a result

of different modeling purposes are mentioned, for example, in

[6]. Empirical research into this factor is missing.

Problem domain: People may ﬁnd it easier to read a model

about the domain they are familiar with than other models.

While this has not been established for process models, it

is known from software engineering that domain knowledge

affects the understanding of particular code [56].

Modeling notation: In the presence of many different nota-

tions for process models, e.g. UML Activity diagrams, EPCs,

BPMN, and Petri nets, it cannot be ruled out that some of these

are inherently more suitable to convey meaning to people than

others. Empirical research that has explored this difference is,

for example, reported in [57]. According to these publications,

the impact of the notation being used is not very high, maybe

because the languages are too similar. Other research that

compares notations of a different focus identify a signiﬁcant

impact on understanding [58], [59].

Visual layout: Semantically equivalent models can be ar-

ranged in different ways. For example, different line drawing

algorithms can be used or models may be split up into

different submodels. The effect of layout on process model

understanding was already noted in the early 1990s [60]. With

respect to graphs, it is a well-known result that edge crosses

negatively affect understanding [61]. Also, for process models,

the use of modularity can improve understanding [62].

Given that, as we argued, the insights into the understanding

of process models are limited, this is probably not a complete

set of factors. But even at this number, it would be difﬁcult to

investigate them all together. In this study, we restrict ourselves

to the ﬁrst two categories, i.e. personal and model factors. In

the deﬁnition of this survey, which will be explained in the

next section, we will discuss how we aimed to neutralize the

potential effects of the other categories.

D. Summary

From cognitive research into program understanding we can

expect that personal and model factors are likely to be factors

of process model understandability. The impact of size as an

important metric has been established by prior research. Yet, it

only partially explains phenomena of understanding. Personal

factors also appear to be relevant. Theoretical knowledge

was found to be a signiﬁcant factor, but so far only in

student experiments. Furthermore, research into the relative

importance of personal and model factors are missing. In the

following sections, we present a survey to investigate this

question and analyze threats to validity.

III. DEFINITION, PLANNING, AND OPERATION OF THE

SURVEY DESIGN

This section explains the deﬁnition, the planning and the

operation of a survey design in personal and model related

factors of understanding.

A. Deﬁnition

According to the theoretical background we provided, both

the characteristics of the reader of a process model and those

of the process model itself affect the understanding that such

a reader may gain from studying that model. Both types of

characteristics can be considered as independent variables,

while the understanding gained from studying a process model

constitutes the dependent variable. Beyond this, there are other

potential factors of inﬂuence which we wish to neutralize,

i.e. model purpose, problem domain, modeling notation, and

visual layout (see Section II-C). To explore the relations that

interest us, the idea is to expose a group of respondents to a set

of process models and then test their understanding of these

models using a self-administered questionnaire. Such a design

shares characteristics with a survey where personal and model

parameters are recorded, but without predeﬁned factor levels.

We use a convenience sample of students. From the analysis

perspective it can be classiﬁed as a correlational study that

seeks to identify statistical connections between measurements

of interest. Similar designs have been used to investigate

metrics in software engineering, e.g. in [63]. Conclusions on

causality are limited, though.

We strived to neutralize the inﬂuence of other relevant

factors. First of all, a set of process models from practice was

gathered that was speciﬁcally developed for documentation

purposes. Next, to eliminate the inﬂuence of domain knowl-

edge all the task labels in these process models were replaced

by neutral identiﬁers, i.e. capital letters A to W . In this way,

we also prevent a potential bias stemming from varying length

of natural activity label (see [55]). Based on the observation

by [57] that EPCs appear to be easier to understand than

Petri nets, we chose for an EPC-like notation without events.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS – PART A 5

The participants received a short informal description of the

semantics similar to [64, p. 25]. Finally, all models were

graphically presented on one page, without use of modularity,

and drawn in the same top-to-bottom layout with the start

element at the top and end element at the bottom.

Furthermore, in our exploration we wish to exclude one

particular process model characteristic, which is size. As we

argued in the introduction of this paper and our discussion

of related work, process model size is the one model char-

acteristic of which its impact on both error proneness and

understanding is reasonably well understood. Because it is our

purpose to look beyond the impact of this particular aspect,

we have controlled the number of tasks in the process model.

Each of the included process models has the same number

of tasks. However, to allow for variation across the other

model characteristics, two additional variants were constructed

for each of the real process models. The variations were

established by changing one or two routing elements in each

of these models (e.g. a particular XOR-split in a AND-split).

Having taken care of the various factors we wish to control,

at this point we can reﬁne what personal and model factors

are taken into account and how these are measured in the

questionnaire. Note that a summary of the questionnaire is

presented in Appendix A.

For the personal factors, we take the following variables

into consideration:

• THEORY: A person’s theoretical knowledge on process

modeling. This variable is measured as a self-assessment

by the respondents on a 5-point ordinal scale, with anchor

points “I have weak theoretical knowledge” and “I have

strong theoretical knowledge”.

• PRACTICE: A person’s practical experience with pro-

cess modeling. This variable is a self-assessment by

the respondents. It is measured on a 4-point ordinal

scale. The scale has anchor points “I never use business

process modeling in practice” and “I use business process

modeling in practice every day”.

• EDUCATION: A person’s educational background. This

categorical variable refers to the educational institute that

the respondents is registered at.

For the model factors, several variables are included. These

variables are all formally deﬁned in [18, pp. 117-128], with

the exception of the cross-connectivity metric that is speciﬁed

in [14]. The model factors can be characterized as follows:

• #NODES, #ARCS, #TASKS, #CONNECTORS, #AND-

SPLITS, #AND-JOINS, #XOR-SPLITS, #XOR-JOINS,

#OR-SPLITS, #OR-JOINS: These variables all relate to

the number of a particular type of elements in a pro-

cess model. These include counts for the number of

arcs (#ARCS) and nodes (#NODES). The latter can be

further subdivided into #TASKS on the one hand and

#CONNECTORS on the other. The most speciﬁc counts

are subcategorizations of the different types of logical

connectors, like #AND-SPLITS and #OR-JOINS.

• DIAMETER: The length of the longest path from a start

node to an end node in the process model.

• TOKEN SPLITS: The maximum number of paths in a

process model that may be concurrently initiated through

the use of AND-splits and OR-splits.

• AVERAGE CONNECTOR DEGREE, MAXIMUM CONNEC-

TOR DEGREE: The AVERAGE CONNECTOR DEGREE ex-

presses the average of the number of both incoming

and outgoing arcs of the connector nodes in the process

model; the MAXIMUM CONNECTOR DEGREE expresses

the maximum sum of incoming and outgoing arcs of those

connector nodes.

• CONTROL FLOW COMPLEXITY: A weighted sum of all

connectors that are used in a process model.

• MISMATCH: The sum of connector pairs that do not match

with each other, e.g. when an AND-split is followed up

by an OR-join.

• DEPTH: The maximum nesting of structured blocks in a

process model.

• CONNECTIVITY, DENSITY: While CONNECTIVITY re-

lates to the ratio of the total number of arcs in a process

model to its total number of nodes, DENSITY relates to

the ratio of the total number of arcs in a process model

to the theoretically maximum number of arcs (i.e. when

all nodes are directly connected).

• CROSS-CONNECTIVITY: The extent to which all the

nodes in a model are connected to each other.

• SEQUENTIALITY: The degree to which the model is

constructed of pure sequences of tasks.

• SEPARABILITY: The ratio of the number of cut-vertices

on the one hand, i.e. nodes that serve as bridges between

otherwise disconnected components, to the total number

of nodes in the process model on the other.

• STRUCTUREDNESS: The extent to which a process model

can be built by nesting blocks of matching split and join

connectors.

• CONNECTOR HETEROGENEITY: The extent to which dif-

ferent types of connectors are used in a process model.

To illustrate these factors, we refer the reader to Figure 2.

Shown here is a model of a loan request process expressed in

the EPC modeling notation, which is elaborated in [18, pp. 19-

20]. In addition to the standard EPC notational elements, tags

are added to identify sequence arcs, cut vertices, and cycle

nodes. Additionally, the numbers of incoming and outgoing

arcs are given for each node, as well as a bold arc that provides

the diameter of the model. All these notions are instrumental

in calculating the exact values of the model factors that were

presented previously. For this particular model, the values of

the model factors are given in Table I.

Having discussed the independent variables, we need to

address now how a process model’s understanding is captured.

There are various dimensions in how far comprehension can

be measured, for an overview see [65]. For our research, we

focus on a SCORE variable. SCORE is a quantiﬁcation of a

respondent’s accurate understanding of a process model. This

ratio is determined by the set of correct answers to a set of

seven closed questions and one open question. The closed

questions confront the respondent with execution order, ex-

clusiveness, concurrency, and repeatability issues (see Section

II-A) which are linked to closed questions (yes/no/I don’t

HTML Viewer

Frequently Asked Questions (7)

Q1. What have the authors contributed in "A study into the factors that influence the understandability of business process models" ?

On the basis of a sound theoretical foundation, this paper presents a study into these factors.

Q2. What have the authors stated for future works in "A study into the factors that influence the understandability of business process models" ?

Future research is needed for analyzing the relative importance of model size in comparison to personal expertise, and should explicitly consider potential interaction effects. The authors also plan to investigate the significance of those factors for understanding that they neutralized in this research. Finally, the case of model L points to research opportunities on the difficulty of understanding particular process model components. As certain components can be reduced because they are always correct, it might be interesting to investigate whether certain components can be understood with the same ease, even if they are moved to a different position in the process model.

Q3. What are the main aspects of the formal focus?

In particular, the authors focus on binary relationships between two activities in terms of execution order, exclusiveness, concurrency, and repetition.

Q4. What is the importance of secondary notation in process models?

For models of increasing complexity, secondary notation also gains in importance for making the hidden dependencies better visible.

Q5. What can be easily verified using the reachability graph of the process model?

Statements such as “Executing activity ai implies that aj will be executed later” can be easily verified using the reachability graph of the process model.

Q6. What is the main idea of the paper?

The research conducted by a group including Canfora, Rolón, and Garcı́a analyzes understandability as an aspect of maintainability.

Q7. What is the commonly used notation for process modeling?

This notation essentially covers the commonalities of Event-driven Process Chains (EPCs) [17], [18] and the Business Process Modeling Notation (BPMN) [19], which are two of the most frequently used notations for process modeling.