scispace - formally typeset
Search or ask a question

Showing papers on "C4.5 algorithm published in 1988"


Book ChapterDOI
12 Jun 1988
TL;DR: The GID3 algorithm is empirically shown to outperform ID3 on all performance measures considered and is applied to the development of an expert system for automating the Reactive Ion Etching process in semiconductor manufacturing.
Abstract: A popular and particularly efficient method for inducing classification rules from examples is Quinlan's ID3 algorithm. This paper examines the problem of overspecialization in ID3. Two causes of overspecialization in ID3 are identified. An algorithm that avoids some of the inherent problems in ID3 is developed. The new algorithm, GID3, is applied to the development of an expert system for automating the Reactive Ion Etching (RIE) process in semiconductor manufacturing. Six performance measures for decision trees are defined. The GID3 algorithm is empirically shown to outperform ID3 on all performance measures considered. The improvement is gained with negligible increase in computational complexity.

109 citations


Journal ArticleDOI
TL;DR: A communication theory approach to decision tree design based on a top-town mutual information algorithm that is equivalent to a form of Shannon-Fano prefix coding, and several fundamental bounds relating decision-tree parameters are derived.
Abstract: A communication theory approach to decision tree design based on a top-town mutual information algorithm is presented. It is shown that this algorithm is equivalent to a form of Shannon-Fano prefix coding, and several fundamental bounds relating decision-tree parameters are derived. The bounds are used in conjunction with a rate-distortion interpretation of tree design to explain several phenomena previously observed in practical decision-tree design. A termination rule for the algorithm called the delta-entropy rule is proposed that improves its robustness in the presence of noise. Simulation results are presented, showing that the tree classifiers derived by the algorithm compare favourably to the single nearest neighbour classifier. >

83 citations


01 Jan 1988
TL;DR: An information-theoretic approach to pattern recognition is derived and used to develop new or improved algorithms for designing decision trees for pattern classification, which are inspired by information theoretic ideas and experimentally supported by applications to speech recognition, text-to-speech, and synthetic problems.
Abstract: An information-theoretic approach to pattern recognition is derived and used to develop new or improved algorithms for designing decision trees for pattern classification. Pattern classification is treated as data compression in a distortion-rate framework, by equating the distortion of a pattern classifier to its probability of error, and by equating the rate of a pattern classifier to its average number of discriminant function evaluations. Sequential pattern classifiers are interpreted as decision trees, which in turn are interpreted as variable length encoder/decoder pairs. Distortion-rate bounds and other insights from the theory and practice of data compression are applied to decision trees and more generally to pattern classifiers. In particular, algorithms are developed for growing decision trees in the distortion-rate plane, optimal pruning of decision trees in the distortion-rate plane, and optimal merging of decision trees to form decision trellises. The growing algorithm, which is a simple extension of the prevalent decision tree design algorithm, permits whole series of trees at different performance levels to be designed simultaneously. The optimal pruning algorithm, which is a generalization of an algorithm by Breiman, Friedman, Olshen, and Stone, chooses a sequence of pruned subtrees that traces the convex hull of the operational distortion-rate function. The optimal merging algorithm, which is an iterative descent algorithm formally equivalent to the generalized Lloyd algorithm for designing vector quantizers, finds a partition of feature space that retains the maximum amount of mutual information about the unknown class. The latter two algorithms rely on generalizations of theorems by Brieman et al., of interest in their own right. All of the algorithms are inspired by information theoretic ideas. The algorithms are experimentally supported throughout the thesis by applications to speech recognition, text-to-speech, and synthetic problems.

39 citations