scispace - formally typeset
Search or ask a question

Showing papers on "C4.5 algorithm published in 2000"


Proceedings ArticleDOI
01 Aug 2000
TL;DR: Novel branch-and-bound algorithms for pushing the constraints into the building phase of classi ers, and pruning early tree nodes that cannot possibly satisfy the constraints are developed.
Abstract: Classi cation is an important problem in data mining. A number of popular classi ers construct decision trees to generate class models. Frequently, however, the constructed trees are complex with hundreds of nodes and thus di cult to comprehend, a fact that calls into question an oftencited bene t that decision trees are easy to interpret. In this paper, we address the problem of constructing \simple" decision trees with few nodes that are easy for humans to interpret. By permitting users to specify constraints on tree size or accuracy, and then building the \best" tree that satis es the constraints, we ensure that the nal tree is both easy to understand and has good accuracy. We develop novel branch-and-bound algorithms for pushing the constraints into the building phase of classi ers, and pruning early tree nodes that cannot possibly satisfy the constraints. Our experimental results with real-life and synthetic data sets demonstrate that signi cant performance speedups and reductions in the number of nodes expanded can be achieved as a result of incorporating knowledge of the constraints into the building step as opposed to applying the constraints after the entire tree is built.

42 citations