Home
/
Authors
/
Donato Malerba

Author

Donato Malerba

Other affiliations: University of Calabar, Logica

Bio: Donato Malerba is an academic researcher from University of Bari. The author has contributed to research in topics: Spatial analysis & Cluster analysis. The author has an hindex of 36, co-authored 361 publications receiving 6010 citations. Previous affiliations of Donato Malerba include University of Calabar & Logica.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Process Mining Manifesto

[...]

Wil M. P. van der Aalst¹, Wil M. P. van der Aalst², A Arya Adriansyah¹, Ana Karla Alves de Medeiros³, Franco Arcieri⁴, Thomas Baier⁵, Tobias Blickle⁶, Jagadeesh Chandra Bose¹, Peter van den Brand, Ronald Brandtjen, Joos C. A. M. Buijs¹, Andrea Burattin⁷, Josep Carmona⁸, Malu Castellanos⁹, Jan Claes¹⁰, Jonathan Cook¹¹, Nicola Costantini, Francisco Curbera¹², Ernesto Damiani¹³, Massimiliano de Leoni¹, Pavlos Delias, Boudewijn F. van Dongen¹, Marlon Dumas¹⁴, Schahram Dustdar¹⁵, Dirk Fahland¹, Diogo R. Ferreira¹⁶, Walid Gaaloul¹⁷, Frank van Geffen¹⁸, Sukriti Goel¹⁹, CW Christian Günther, Antonella Guzzo²⁰, Paul Harmon, Arthur H. M. ter Hofstede¹, Arthur H. M. ter Hofstede², John Hoogland, Jon Espen Ingvaldsen, Koki Kato²¹, Rudolf Kuhn, Akhil Kumar²², Marcello La Rosa², Fabrizio Maria Maggi¹, Donato Malerba²³, RS Ronny Mans¹, Alberto Manuel, Martin McCreesh, Paola Mello²⁴, Jan Mendling²⁵, Marco Montali²⁶, Hamid Reza Motahari-Nezhad⁹, Michael zur Muehlen²⁷, Jorge Munoz-Gama⁸, Luigi Pontieri²⁸, Joel Ribeiro¹, A Anne Rozinat, Hugo Seguel Pérez, Ricardo Seguel Pérez, Marcos Sepúlveda²⁹, Jim Sinur, Pnina Soffer³⁰, Minseok Song³¹, Alessandro Sperduti⁷, Giovanni Stilo⁴, Casper Stoel, Keith D. Swenson²¹, Maurizio Talamo⁴, Wei Tan¹², Christopher Turner³², Jan Vanthienen³³, George Varvaressos, Eric Verbeek¹, Marc Verdonk³⁴, Roberto Vigo, Jianmin Wang³⁵, Barbara Weber³⁶, Matthias Weidlich³⁷, Ton Weijters¹, Lijie Wen³⁵, Michael Westergaard¹, Moe Thandar Wynn² - Show less +75 more•Institutions (37)

Eindhoven University of Technology¹, Queensland University of Technology², Capgemini³, University of Rome Tor Vergata⁴, Humboldt University of Berlin⁵, Software AG⁶, University of Padua⁷, Polytechnic University of Catalonia⁸, Hewlett-Packard⁹, Ghent University¹⁰, New Mexico State University¹¹, IBM¹², University of Milan¹³, University of Tartu¹⁴, University of Vienna¹⁵, Technical University of Lisbon¹⁶, Telecom SudParis¹⁷, Rabobank¹⁸, Infosys¹⁹, University of Calabria²⁰, Fujitsu²¹, Pennsylvania State University²², University of Bari²³, University of Bologna²⁴, Vienna University of Economics and Business²⁵, Free University of Bozen-Bolzano²⁶, Stevens Institute of Technology²⁷, Indian Council of Agricultural Research²⁸, Pontifical Catholic University of Chile²⁹, University of Haifa³⁰, Ulsan National Institute of Science and Technology³¹, Cranfield University³², Katholieke Universiteit Leuven³³, Deloitte³⁴, Tsinghua University³⁵, University of Innsbruck³⁶, Hasso Plattner Institute³⁷

01 Jan 2012

TL;DR: This manifesto hopes to serve as a guide for software developers, scientists, consultants, business managers, and end-users to increase the maturity of process mining as a new tool to improve the design, control, and support of operational business processes.

...read moreread less

Abstract: Process mining techniques are able to extract knowledge from event logs commonly available in today’s information systems. These techniques provide new means to discover, monitor, and improve processes in a variety of application domains. There are two main drivers for the growing interest in process mining. On the one hand, more and more events are being recorded, thus, providing detailed information about the history of processes. On the other hand, there is a need to improve and support business processes in competitive and rapidly changing environments. This manifesto is created by the IEEE Task Force on Process Mining and aims to promote the topic of process mining. Moreover, by defining a set of guiding principles and listing important challenges, this manifesto hopes to serve as a guide for software developers, scientists, consultants, business managers, and end-users. The goal is to increase the maturity of process mining as a new tool to improve the (re)design, control, and support of operational business processes.

...read moreread less

1,135 citations

Journal Article•DOI•

A comparative analysis of methods for pruning decision trees

[...]

Floriana Esposito, Donato Malerba¹, Giovanni Semeraro, J. Kay•Institutions (1)

University of Bari¹

01 May 1997-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A comparative study of six well-known pruning methods with the aim of understanding their theoretical foundations, their computational complexity, and the strengths and weaknesses of their formulation, and an objective evaluation of the tendency to overprune/underprune observed in each method is made.

...read moreread less

Abstract: In this paper, we address the problem of retrospectively pruning decision trees induced from data, according to a top-down approach. This problem has received considerable attention in the areas of pattern recognition and machine learning, and many distinct methods have been proposed in literature. We make a comparative study of six well-known pruning methods with the aim of understanding their theoretical foundations, their computational complexity, and the strengths and weaknesses of their formulation. Comments on the characteristics of each method are empirically supported. In particular, a wide experimentation performed on several data sets leads us to opposite conclusions on the predictive accuracy of simplified trees from some drawn in the literature. We attribute this divergence to differences in experimental designs. Finally, we prove and make use of a property of the reduced error pruning method to obtain an objective evaluation of the tendency to overprune/underprune observed in each method.

...read moreread less

556 citations

Journal Article•DOI•

Transforming paper documents into XML format with WISDOM

[...]

O. Altamura, Floriana Esposito, Donato Malerba

01 Aug 2001-International Journal on Document Analysis and Recognition

TL;DR: The innovative aspects described in the paper are: the preprocessing algorithm, the adaptive page segmentation, the acquisition of block classification rules using techniques from machine learning, the layout analysis based on general layout principles, and a method that uses document layout information for conversion to HTML/XML formats.

...read moreread less

Abstract: The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems. The application of an OCR to some parts of the document image is only one of the problems. In fact, the generation of documents in HTML format is easier when the layout structure of a page has been extracted by means of a document analysis process. The adoption of an XML format is even better, since it can facilitate the retrieval of documents in the Web. Nevertheless, an effective transformation of paper documents into this format requires further processing steps, namely document image classification and understanding. WISDOM++ is a document processing system that operates in five steps: document analysis, document classification, document understanding, text recognition with an OCR, and transformation into HTML/XML format. The innovative aspects described in the paper are: the preprocessing algorithm, the adaptive page segmentation, the acquisition of block classification rules using techniques from machine learning, the layout analysis based on general layout principles, and a method that uses document layout information for conversion to HTML/XML formats. A benchmarking of the system components implementing these innovative aspects is reported.

...read moreread less

129 citations

Journal Article•DOI•

Classifying web documents in a hierarchy of categories: a comprehensive study

[...]

Michelangelo Ceci¹, Donato Malerba¹•Institutions (1)

University of Bari¹

01 Feb 2007

TL;DR: A general hierarchical text categorization framework where the hierarchy of categories is involved in all phases of automated document classification, namely feature selection, learning and classification of a new document is proposed.

...read moreread less

Abstract: Most of the research on text categorization has focused on classifying text documents into a set of categories with no structural relationships among them (flat classification). However, in many information repositories documents are organized in a hierarchy of categories to support a thematic search by browsing topics of interests. The consideration of the hierarchical relationship among categories opens several additional issues in the development of methods for automated document classification. Questions concern the representation of documents, the learning process, the classification process and the evaluation criteria of experimental results. They are systematically investigated in this paper, whose main contribution is a general hierarchical text categorization framework where the hierarchy of categories is involved in all phases of automated document classification, namely feature selection, learning and classification of a new document. An automated threshold determination method for classification scores is embedded in the proposed framework. It can be applied to any classifier that returns a degree of membership of a document to a category. In this work three learning methods are considered for the construction of document classifiers, namely centroid-based, naive Bayes and SVM. The proposed framework has been implemented in the system WebClassIII and has been tested on three datasets (Yahoo, DMOZ, RCV1) which present a variety of situations in terms of hierarchical structure. Experimental results are reported and several conclusions are drawn on the comparison of the flat vs. the hierarchical approach as well as on the comparison of different hierarchical classifiers. The paper concludes with a review of related work and a discussion of previous findings vs. our findings.

...read moreread less

120 citations

Journal Article•DOI•

Inducing Multi-Level Association Rules from Multiple Relations

[...]

Francesca A. Lisi¹, Donato Malerba¹•Institutions (1)

University of Bari¹

01 May 2004-Machine Learning

TL;DR: This paper presents a novel approach to association rule mining which deals with multiple levels of description granularity and relies on the hybrid language A -log which allows a unified treatment of both the relational and structural features of data.

...read moreread less

Abstract: Recently there has been growing interest both to extend ILP to description logics and to apply it to knowledge discovery in databases. In this paper we present a novel approach to association rule mining which deals with multiple levels of description granularity. It relies on the hybrid language $$\mathcal{A}\mathcal{L}$$ -log which allows a unified treatment of both the relational and structural features of data. A generality order and a downward refinement operator for $$\mathcal{A}\mathcal{L}$$ -log pattern spaces is defined on the basis of query subsumption. This framework has been implemented in SPADA, an ILP system for mining multi-level association rules from spatial data. As an illustrative example, we report experimental results obtained by running the new version of SPADA on geo-referenced census data of Manchester Stockport.

...read moreread less

119 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

Collapse

Cited by

PDF

Open Access

More filters

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science