scispace - formally typeset
Topic

C4.5 algorithm

About: C4.5 algorithm is a(n) research topic. Over the lifetime, 1314 publication(s) have been published within this topic receiving 39194 citation(s).

...read more

Papers
More filters

Book
J. Ross Quinlan1Institutions (1)
15 Oct 1992-
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.

...read more

Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

...read more

21,396 citations


Book ChapterDOI
25 Mar 1996-
TL;DR: Issues in building a scalable classifier are discussed and the design of SLIQ, a new classifier that uses a novel pre-sorting technique in the tree-growth phase to enable classification of disk-resident datasets is presented.

...read more

Abstract: Classification is an important problem in the emerging field of data mining Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets This paper discusses issues in building a scalable classifier and presents the design of SLIQ, a new classifier SLIQ is a decision tree classifier that can handle both numeric and categorical attributes It uses a novel pre-sorting technique in the tree-growth phase This sorting procedure is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact and accurate trees The combination of these techniques enables SLIQ to scale for large data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an attractive tool for data mining

...read more

824 citations


Journal ArticleDOI
Paul E. Utgoff1Institutions (1)
01 Nov 1989-Machine Learning
TL;DR: An incremental algorithm for inducing decision trees equivalent to those formed by Quinlan's nonincremental ID3 algorithm, given the same training instances is presented, named ID5R.

...read more

Abstract: This article presents an incremental algorithm for inducing decision trees equivalent to those formed by Quinlan's nonincremental ID3 algorithm, given the same training instances. The new algorithm, named ID5R, lets one apply the ID3 induction process to learning tasks in which training instances are presented serially. Although the basic tree-building algorithms differ only in how the decision trees are constructed, experiments show that incremental training makes it possible to select training instances more carefully, which can result in smaller decision trees. The ID3 algorithm and its variants are compared in terms of theoretical complexity and empirical behavior.

...read more

760 citations


Journal ArticleDOI
R. Lopez de Mantaras1Institutions (1)
03 Jan 1991-Machine Learning
TL;DR: A new attribute selection measure for ID3-like inductive algorithms based on a distance between partitions such that the selected attribute in a node induces the partition which is closest to the correct partition of the subset of training examples corresponding to this node.

...read more

Abstract: This note introduces a new attribute selection measure for ID3-like inductive algorithms. This measure is based on a distance between partitions such that the selected attribute in a node induces the partition which is closest to the correct partition of the subset of training examples corresponding to this node. The relationship of this measure with Quinlan's information gain is also established. It is also formally proved that our distance is not biased towards attributes with large numbers of values. Experimental studies with this distance confirm previously reported results showing that the predictive accuracy of induced decision trees is not sensitive to the goodness of the attribute selection measure. However, this distance produces smaller trees than the gain ratio measure of Quinlan, especially in the case of data whose attributes have significantly different numbers of values.

...read more

430 citations


Journal ArticleDOI
Eibe Frank1, Yong Wang1, S. Inglis1, Geoffrey Holmes1  +1 moreInstitutions (1)
01 Jul 1998-Machine Learning
TL;DR: Surprisingly, using this simple transformation the model tree inducer M5′, based on Quinlan's M5, generates more accurate classifiers than the state-of-the-art decision tree learner C5.0, particularly when most of the attributes are numeric.

...read more

Abstract: Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They can be applied to classification problems by employing a standard method of transforming a classification problem into a problem of function approximation. Surprisingly, using this simple transformation the model tree inducerM5 ′, based on Quinlan‘s M5, generates more accurate classifiers than the state-of-the-art decision tree learner C5.0, particularly when most of the attributes are numeric.

...read more

373 citations


Network Information
Related Topics (5)
Naive Bayes classifier

16.2K papers, 386.5K citations

87% related
Association rule learning

15.1K papers, 362K citations

84% related
k-means clustering

9.6K papers, 191.5K citations

84% related
Intrusion detection system

28.4K papers, 509.5K citations

84% related
Support vector machine

73.6K papers, 1.7M citations

84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20223
202189
2020122
2019117
201891
2017114

Top Attributes

Show by:

Topic's top 5 most impactful authors

V. Sugumaran

11 papers, 310 citations

H. Hannah Inbarani

5 papers, 75 citations

S. Sasikala

5 papers, 38 citations

Mohd Shamrie Sainin

4 papers, 10 citations

Ahmad Taher Azar

4 papers, 49 citations