A fast decision tree learning algorithm

Open AccessProceedings Article

A fast decision tree learning algorithm

- pp 500-505

TLDR

The proposed algorithm is a core tree-growing algorithm that can be combined with other scaling-up techniques to achieve further speedup and is as fast as naive Bayes but outperforms naive Baye in accuracy according to experiments.

Abstract:

There is growing interest in scaling up the widely-used decision-tree learning algorithms to very large data sets. Although numerous diverse techniques have been proposed, a fast tree-growing algorithm without substantial decrease in accuracy and substantial increase in space complexity is essential. In this paper, we present a novel, fast decision-tree learning algorithm that is based on a conditional independence assumption. The new algorithm has a time complexity of O(m ċ n), where m is the size of the training data and n is the number of attributes. This is a significant asymptotic improvement over the time complexity O(m ċ n2) of the standard decision-tree learning algorithm C4.5, with an additional space increase of only O(n). Experiments show that our algorithm performs competitively with C4.5 in accuracy on a large number of UCI benchmark data sets, and performs even better and significantly faster than C4.5 on a large number of text classification data sets. The time complexity of our algorithm is as low as naive Bayes'. Indeed, it is as fast as naive Bayes but outperforms naive Bayes in accuracy according to our experiments. Our algorithm is a core tree-growing algorithm that can be combined with other scaling-up techniques to achieve further speedup.

A fast decision tree learning algorithm

Citations

Inductive learning algorithms and representations for text categorization

Stochastic gradient boosted distributed decision trees

Multi-target regression via input space expansion: treating targets as inputs

A Correlation-Based Feature Weighting Filter for Naive Bayes

A comparative study of Reduced Error Pruning method in decision tree algorithms

References

C4.5: Programs for Machine Learning

Data Mining: Practical Machine Learning Tools and Techniques

Data mining: practical machine learning tools and techniques with Java implementations

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Related Papers (5)

C4.5: Programs for Machine Learning

Induction of Decision Trees

Random Forests

UCI Machine Learning Repository

Bagging predictors