scispace - formally typeset
Open Access

Programs for Machine Learning

TLDR
In his new book, C4.5: Programs for Machine Learning, Quinlan has put together a definitive, much needed description of his complete system, including the latest developments, which will be a welcome addition to the library of many researchers and students.
Abstract
Algorithms for constructing decision trees are among the most well known and widely used of all machine learning methods. Among decision tree algorithms, J. Ross Quinlan's ID3 and its successor, C4.5, are probably the most popular in the machine learning community. These algorithms and variations on them have been the subject of numerous research papers since Quinlan introduced ID3. Until recently, most researchers looking for an introduction to decision trees turned to Quinlan's seminal 1986 Machine Learning journal article [Quinlan, 1986]. In his new book, C4.5: Programs for Machine Learning, Quinlan has put together a definitive, much needed description of his complete system, including the latest developments. As such, this book will be a welcome addition to the library of many researchers and students.

read more

Content maybe subject to copyright    Report

Citations
More filters

Automatic detection and classification of prosodic events

TL;DR: This thesis describes work on the automatic detection and classification of prosodic events – specifically, pitch accents and prosodic phrase boundaries, and presents three proof-of-concept applications showing that access to hypothesized prosodic event information can be used to improve the performance of downstream spoken language processing tasks.

Confidence Bands for ROC Curves: Methods and an Empirical Study

TL;DR: It is shown how some of these methods for generating and evaluating condence bands on ROC curves work remarkably well, others are too loose, and that existing machine learning methods for generation of 1-dimensional condence intervals do not translate well to generation of simultanous bands|their bands are too tight.
Book ChapterDOI

Balancing strategies and class overlapping

TL;DR: This work investigates sampling strategies which aim to balance the training set and shows that these sampling strategies usually lead to a performance improvement for highly imbalanced data sets having highly overlapped classes.

Towards SMS Spam Filtering: Results under a New Dataset

TL;DR: The results indicate that the procedure followed to build the collection does not lead to near-duplicates and, regarding the classifiers, the Support Vector Machines outperforms other evaluated techniques and, hence, it can be used as a good baseline for further comparison.
Proceedings ArticleDOI

Improving defect prediction using temporal features and non linear models

TL;DR: It is argued that temporal features (or aspects) of the data are central to prediction performance and the use of non-linear models, as opposed to traditional regression, is necessary to uncover some of the hidden interrelationships between features and the defects and maintain the accuracy of the prediction in some cases.
References
More filters
Journal ArticleDOI

Induction of Decision Trees

J. R. Quinlan
- 25 Mar 1986 - 
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
Book

Classification and regression trees

Leo Breiman
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Journal ArticleDOI

An Empirical Comparison of Pruning Methods for Decision Tree Induction

TL;DR: This paper compares five methods for pruning decision trees, developed from sets of examples, and shows that three methods—critical value, error complexity and reduced error—perform well, while the other two may cause problems.
Book ChapterDOI

Unknown attribute values in induction

TL;DR: This paper compares the effectiveness of several approaches to the development and use of decision tree classifiers as measured by their performance on a collection of datasets.