scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Simplifying decision trees

TL;DR: Techniques for simplifying decision trees while retaining their accuracy are discussed, described, illustrated, and compared on a test-bed of decision trees from a variety of domains.
Abstract: Many systems have been developed for constructing decision trees from collections of examples. Although the decision trees generated by these methods are accurate and efficient, they often suffer the disadvantage of excessive complexity and are therefore incomprehensible to experts. It is questionable whether opaque structures of this kind can be described as knowledge, no matter how well they function. This paper discusses techniques for simplifying decision trees while retaining their accuracy. Four methods are described, illustrated, and compared on a test-bed of decision trees from a variety of domains.

Content maybe subject to copyright    Report

Citations
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Journal ArticleDOI
TL;DR: The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Abstract: The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

6,527 citations

Book ChapterDOI
William W. Cohen1
09 Jul 1995
TL;DR: This paper evaluates the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems, and proposes a number of modifications resulting in an algorithm RIPPERk that is very competitive with C4.5 and C 4.5rules with respect to error rates, but much more efficient on large samples.
Abstract: Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error rates higher than those of C4.5 and C4.5rules. We then propose a number of modifications resulting in an algorithm RIPPERk that is very competitive with C4.5rules with respect to error rates, but much more efficient on large samples. RIPPERk obtains error rates lower than or equivalent to C4.5rules on 22 of 37 benchmark problems, scales nearly linearly with the number of training examples, and can efficiently process noisy datasets containing hundreds of thousands of examples.

4,081 citations


Cites methods from "Simplifying decision trees"

  • ...Seminal implementations of REP were successfully applied to decision trees by [Quinlan 1987], and to decision lists by [Pagallo and Haussler, 1990]....

    [...]

01 Jan 1998
TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.
Abstract: A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set. Feature selection degraded machine learning performance in cases where some features were eliminated which were highly predictive of very small areas of the instance space. Further experiments compared CFS with a wrapper—a well known approach to feature selection that employs the target learning algorithm to evaluate feature sets. In many cases CFS gave comparable results to the wrapper, and in general, outperformed the wrapper on small datasets. CFS executes many times faster than the wrapper, which allows it to scale to larger datasets. Two methods of extending CFS to handle feature interaction are presented and experimentally evaluated. The first considers pairs of features and the second incorporates iii feature weights calculated by the RELIEF algorithm. Experiments on artificial domains showed that both methods were able to identify interacting features. On natural domains, the pairwise method gave more reliable results than using weights provided by RELIEF.

3,533 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
Abstract: The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.

17,177 citations

Book
01 Sep 1985
TL;DR: Technical managers, professionals, and researchers who are considering the implementation or application of expert systems will find this book to be an authoritative, but accessible guide to the state-of-the-art.
Abstract: This is a comprehensive introduction to expert systems designed specifically for the reader without a computer science background. Carefully written and illustrated, it covers working systems in commercial use, applications for which they are most suitable and guidelines for building a system. Technical managers, professionals, and researchers who are considering the implementation or application of expert systems will find this book to be an authoritative, but accessible guide to the state-of the-art. 0201083132B04062001

1,428 citations

Book
01 Jan 1986

1,385 citations

Book
01 Jan 1978
TL;DR: In this paper, the authors discuss a crop identification and acreage estimation case study, followed by rather brief discussions of five selected management problems: large area land use inventory and forest, snow-cover, geologic, and water-temperature mapping.
Abstract: Chapter 6 puts the information covered to that point into direct application. The authors first discuss in some detail a crop identification and acreage estimation case study. This is followed by rather brief discussions of five selected management problems: large area land use inventory and forest, snow-cover, geologic, and water-temperature mapping. Serious students will wish to supplement these with studies of problems pertinent to their own areas of special interest. While much of the information presented is valuable, I see little justification for the final chapter since most of the material in it could well have been worked into other parts of the text. A few imperfections merit comment. Reproduction of some of the aerial-photographs and images does not meet the standards which were imposed on the drawings. For example, the images in Fig. 5-39 are difficult to interpret, although that problem may relate more to the small size of each wave band illustrated than to the quality of photographic reproduction. The areas shown in Fig. 1-7 to illustrate the three spectral regions are not the same scale; further, the same areas (with the same scale problem) are shown in Fig. 5-41. Fig. 6-13 contributes the little to an understanding of the selection or appearance of training areas; nor does Fig.-6-15 to-the selection of test areas. Three chapters have brief but useful summary sections. The other four would have benefited by a similar procedure. While the selection of terms to include in a glossary is a difficult task, a few which are encountered frequently in quantitative remote sensing were omitted, e.g., band ratioing, minimum Euclidean distance elassifier, maximum likelihood classifier smoothing, vector, etc. While there are savings in printing costs to have all color plates grouped on four pages, I found this system awkward to use and disruptive of comprehension. I was surprised, too, that answers to the questions posed after the various sections are not given. Individuals using the text on a self-study basis probably would not have a background adequate to verify their answers without such assistance. These, though, are relatively minor criticisms. Overall, this is one of the best sources of information that I have encountered on the subject of quantitative remote sensing. It would serve well as the textbook for courses at various levels and for students with a wide range of backgrounds. Professionals in the field of remote sensing will wish to add this volume …

584 citations