Simplifying decision trees

doi:10.1016/S0020-7373(87)80053-6

Home
/
Papers
/
Simplifying decision trees

Journal Article•DOI•

Simplifying decision trees

J. R. Quinlan¹•Institutions (1)

01 Sep 1987-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies (Academic Press Ltd.)-Vol. 51, Iss: 2, pp 221-234

TL;DR: Techniques for simplifying decision trees while retaining their accuracy are discussed, described, illustrated, and compared on a test-bed of decision trees from a variety of domains.

read less

Abstract: Many systems have been developed for constructing decision trees from collections of examples. Although the decision trees generated by these methods are accurate and efficient, they often suffer the disadvantage of excessive complexity and are therefore incomprehensible to experts. It is questionable whether opaque structures of this kind can be described as knowledge, no matter how well they function. This paper discusses techniques for simplifying decision trees while retaining their accuracy. Four methods are described, illustrated, and compared on a test-bed of decision trees from a variety of domains.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

Journal Article•DOI•

Statistical pattern recognition: a review

[...]

Anil K. Jain¹, Robert P. W. Duin², Jianchang Mao³•Institutions (3)

Michigan State University¹, Delft University of Technology², IBM³

01 Jan 2000-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

...read moreread less

Abstract: The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

...read moreread less

6,527 citations

Monographs on statistics and applied probability

[...]

V. Isham, N. Keiding, T. Louis, Susan A. Murphy, Richard Smith, Howell Tong - Show less +2 more

01 Jan 2007

4,221 citations

Book Chapter•DOI•

Fast effective rule induction

[...]

William W. Cohen¹•Institutions (1)

Bell Labs¹

09 Jul 1995

TL;DR: This paper evaluates the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems, and proposes a number of modifications resulting in an algorithm RIPPERk that is very competitive with C4.5 and C 4.5rules with respect to error rates, but much more efficient on large samples.

...read moreread less

Abstract: Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error rates higher than those of C4.5 and C4.5rules. We then propose a number of modifications resulting in an algorithm RIPPERk that is very competitive with C4.5rules with respect to error rates, but much more efficient on large samples. RIPPERk obtains error rates lower than or equivalent to C4.5rules on 22 of 37 benchmark problems, scales nearly linearly with the number of training examples, and can efficiently process noisy datasets containing hundreds of thousands of examples.

...read moreread less

4,081 citations

Cites methods from "Simplifying decision trees"

...Seminal implementations of REP were successfully applied to decision trees by [Quinlan 1987], and to decision lists by [Pagallo and Haussler, 1990]....
[...]

Correlation-based Feature Selection for Machine Learning

[...]

Mark Hall

01 Jan 1998

TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.

...read moreread less

Abstract: A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set. Feature selection degraded machine learning performance in cases where some features were eliminated which were highly predictive of very small areas of the instance space. Further experiments compared CFS with a wrapper—a well known approach to feature selection that employs the target learning algorithm to evaluate feature sets. In many cases CFS gave comparable results to the wrapper, and in general, outperformed the wrapper on small datasets. CFS executes many times faster than the wrapper, which allows it to scale to larger datasets. Two methods of extending CFS to handle feature interaction are presented and experimentally evaluated. The first considers pairs of features and the second incorporates iii feature weights calculated by the RELIEF algorithm. Experiments on artificial domains showed that both methods were able to identify interacting features. On natural domains, the pairwise method gave more reliable results than using weights provided by RELIEF.

...read moreread less

3,533 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Classification and Regression Trees.

[...]

John Van Ryzin, Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone - Show less +1 more

01 Mar 1986-Journal of the American Statistical Association

21,694 citations

Journal Article•DOI•

Induction of Decision Trees

[...]

J. R. Quinlan

25 Mar 1986-Machine Learning

TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.

...read moreread less

Abstract: The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.

...read moreread less

17,177 citations

Book•

A Guide to Expert Systems

[...]

Donald A. Waterman¹•Institutions (1)

RAND Corporation¹

01 Sep 1985

TL;DR: Technical managers, professionals, and researchers who are considering the implementation or application of expert systems will find this book to be an authoritative, but accessible guide to the state-of-the-art.

...read moreread less

Abstract: This is a comprehensive introduction to expert systems designed specifically for the reader without a computer science background. Carefully written and illustrated, it covers working systems in commercial use, applications for which they are most suitable and guidelines for building a system. Technical managers, professionals, and researchers who are considering the implementation or application of expert systems will find this book to be an authoritative, but accessible guide to the state-of the-art. 0201083132B04062001

...read moreread less

1,428 citations

Book•

A Guide to Expert Systems

[...]

Waterman

01 Jan 1986

1,385 citations

Book•

Pattern-directed inference systems

[...]

D. A. Waterman, Frederick Hayes-Roth

01 Jan 1978

TL;DR: In this paper, the authors discuss a crop identification and acreage estimation case study, followed by rather brief discussions of five selected management problems: large area land use inventory and forest, snow-cover, geologic, and water-temperature mapping.

...read moreread less

Abstract: Chapter 6 puts the information covered to that point into direct application. The authors first discuss in some detail a crop identification and acreage estimation case study. This is followed by rather brief discussions of five selected management problems: large area land use inventory and forest, snow-cover, geologic, and water-temperature mapping. Serious students will wish to supplement these with studies of problems pertinent to their own areas of special interest. While much of the information presented is valuable, I see little justification for the final chapter since most of the material in it could well have been worked into other parts of the text. A few imperfections merit comment. Reproduction of some of the aerial-photographs and images does not meet the standards which were imposed on the drawings. For example, the images in Fig. 5-39 are difficult to interpret, although that problem may relate more to the small size of each wave band illustrated than to the quality of photographic reproduction. The areas shown in Fig. 1-7 to illustrate the three spectral regions are not the same scale; further, the same areas (with the same scale problem) are shown in Fig. 5-41. Fig. 6-13 contributes the little to an understanding of the selection or appearance of training areas; nor does Fig.-6-15 to-the selection of test areas. Three chapters have brief but useful summary sections. The other four would have benefited by a similar procedure. While the selection of terms to include in a glossary is a difficult task, a few which are encountered frequently in quantitative remote sensing were omitted, e.g., band ratioing, minimum Euclidean distance elassifier, maximum likelihood classifier smoothing, vector, etc. While there are savings in printing costs to have all color plates grouped on four pages, I found this system awkward to use and disruptive of comprehension. I was surprised, too, that answers to the questions posed after the various sections are not given. Individuals using the text on a self-study basis probably would not have a background adequate to verify their answers without such assistance. These, though, are relatively minor criticisms. Overall, this is one of the best sources of information that I have encountered on the subject of quantitative remote sensing. It would serve well as the textbook for courses at various levels and for students with a wide range of backgrounds. Professionals in the field of remote sensing will wish to add this volume …

...read moreread less

584 citations