Open Access
Programs for Machine Learning
Steven L. Salzberg,Alberto Segre +1 more
TLDR
In his new book, C4.5: Programs for Machine Learning, Quinlan has put together a definitive, much needed description of his complete system, including the latest developments, which will be a welcome addition to the library of many researchers and students.Abstract:
Algorithms for constructing decision trees are among the most well known and widely used of all machine learning methods. Among decision tree algorithms, J. Ross Quinlan's ID3 and its successor, C4.5, are probably the most popular in the machine learning community. These algorithms and variations on them have been the subject of numerous research papers since Quinlan introduced ID3. Until recently, most researchers looking for an introduction to decision trees turned to Quinlan's seminal 1986 Machine Learning journal article [Quinlan, 1986]. In his new book, C4.5: Programs for Machine Learning, Quinlan has put together a definitive, much needed description of his complete system, including the latest developments. As such, this book will be a welcome addition to the library of many researchers and students.read more
Citations
More filters
Book ChapterDOI
Heterogenous uncertainty sampling for supervised learning
David D. Lewis,Jason A. Catlett +1 more
TL;DR: This work test the use of one classifier (a highly efficient probabilistic one) to select examples for training another (the C4.5 rule induction program) and finds that the uncertainty samples yielded classifiers with lower error rates than random samples ten times larger.
ReportDOI
A Survey of Dimension Reduction Techniques
TL;DR: Several dimension reduction methods are described that can be applied to high-dimensional datasets to reduce the dimension of the original data prior to any modeling ofmore » the data.
Journal ArticleDOI
A global atlas of the dominant bacteria found in soil
Manuel Delgado-Baquerizo,Manuel Delgado-Baquerizo,Angela M. Oliverio,Angela M. Oliverio,Tess E. Brewer,Tess E. Brewer,Alberto Benavent-González,David J. Eldridge,Richard D. Bardgett,Fernando T. Maestre,Brajesh K. Singh,Noah Fierer,Noah Fierer +12 more
TL;DR: This study narrows down the immense number of bacterial taxa to a “most wanted” list that will be fruitful targets for genomic and cultivation-based efforts aimed at improving the understanding of soil microbes and their contributions to ecosystem functioning.
Journal ArticleDOI
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms
TL;DR: Among decision tree algorithms with univariate splits, C4.5, IND-CART, and QUEST have the best combinations of error rate and speed, but C 4.5 tends to produce trees with twice as many leaves as those fromIND-Cart and QUEST.
Proceedings ArticleDOI
Get another label? improving data quality and data mining using multiple, noisy labelers
TL;DR: The results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.
References
More filters
Journal ArticleDOI
Classification and Regression Trees.
Journal ArticleDOI
Induction of Decision Trees
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
Book
Classification and regression trees
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Journal ArticleDOI
An Empirical Comparison of Pruning Methods for Decision Tree Induction
TL;DR: This paper compares five methods for pruning decision trees, developed from sets of examples, and shows that three methods—critical value, error complexity and reduced error—perform well, while the other two may cause problems.
Book ChapterDOI
Unknown attribute values in induction
TL;DR: This paper compares the effectiveness of several approaches to the development and use of decision tree classifiers as measured by their performance on a collection of datasets.