scispace - formally typeset
Search or ask a question
Topic

C4.5 algorithm

About: C4.5 algorithm is a research topic. Over the lifetime, 1314 publications have been published within this topic receiving 39194 citations.


Papers
More filters
Book
15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

21,674 citations

Book ChapterDOI
25 Mar 1996
TL;DR: Issues in building a scalable classifier are discussed and the design of SLIQ, a new classifier that uses a novel pre-sorting technique in the tree-growth phase to enable classification of disk-resident datasets is presented.
Abstract: Classification is an important problem in the emerging field of data mining Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets This paper discusses issues in building a scalable classifier and presents the design of SLIQ, a new classifier SLIQ is a decision tree classifier that can handle both numeric and categorical attributes It uses a novel pre-sorting technique in the tree-growth phase This sorting procedure is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact and accurate trees The combination of these techniques enables SLIQ to scale for large data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an attractive tool for data mining

860 citations

Journal ArticleDOI
TL;DR: An incremental algorithm for inducing decision trees equivalent to those formed by Quinlan's nonincremental ID3 algorithm, given the same training instances is presented, named ID5R.
Abstract: This article presents an incremental algorithm for inducing decision trees equivalent to those formed by Quinlan's nonincremental ID3 algorithm, given the same training instances. The new algorithm, named ID5R, lets one apply the ID3 induction process to learning tasks in which training instances are presented serially. Although the basic tree-building algorithms differ only in how the decision trees are constructed, experiments show that incremental training makes it possible to select training instances more carefully, which can result in smaller decision trees. The ID3 algorithm and its variants are compared in terms of theoretical complexity and empirical behavior.

805 citations

Journal ArticleDOI
TL;DR: A new hybrid model can be used to estimate the intrusion scope threshold degree based on the network transaction data’s optimal features that were made available for training and revealed that the hybrid approach had a significant effect on the minimisation of the computational and time complexity involved when determining the feature association impact scale.

484 citations

Journal ArticleDOI
TL;DR: A new attribute selection measure for ID3-like inductive algorithms based on a distance between partitions such that the selected attribute in a node induces the partition which is closest to the correct partition of the subset of training examples corresponding to this node.
Abstract: This note introduces a new attribute selection measure for ID3-like inductive algorithms. This measure is based on a distance between partitions such that the selected attribute in a node induces the partition which is closest to the correct partition of the subset of training examples corresponding to this node. The relationship of this measure with Quinlan's information gain is also established. It is also formally proved that our distance is not biased towards attributes with large numbers of values. Experimental studies with this distance confirm previously reported results showing that the predictive accuracy of induced decision trees is not sensitive to the goodness of the attribute selection measure. However, this distance produces smaller trees than the gain ratio measure of Quinlan, especially in the case of data whose attributes have significantly different numbers of values.

458 citations


Network Information
Related Topics (5)
Fuzzy logic
151.2K papers, 2.3M citations
81% related
Artificial neural network
207K papers, 4.5M citations
80% related
Wireless sensor network
142K papers, 2.4M citations
80% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Feature extraction
111.8K papers, 2.1M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202384
2022260
202189
2020122
2019117
201891