scispace - formally typeset
Search or ask a question

Showing papers on "Random forest published in 2000"


Journal ArticleDOI
TL;DR: In this article, the authors compared the effectiveness of randomization, bagging, and boosting for improving the performance of the decision-tree algorithm C4.5 and found that in situations with little or no classification noise, randomization is competitive with bagging but not as accurate as boosting.
Abstract: Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a “base” learning algorithm. Breiman has pointed out that they rely for their effectiveness on the instability of the base learning algorithm. An alternative approach to generating an ensemble is to randomize the internal decisions made by the base algorithm. This general approach has been studied previously by Ali and Pazzani and by Dietterich and Kong. This paper compares the effectiveness of randomization, bagging, and boosting for improving the performance of the decision-tree algorithm C4.5. The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting. In situations with substantial classification noise, bagging is much better than boosting, and sometimes better than randomization.

2,919 citations


Book ChapterDOI
28 Aug 2000
TL;DR: A new representation based on trees, the linked decision forest, that does not need to repeat internal structure is introduced, and Lumberjack, that uses the new representation, is shown to improve generalization accuracy on hierarchically decomposable concepts.
Abstract: While the decision tree is an effective representation that has been used in many domains, a tree can often encode a concept inefficiently. This happens when the tree has to represent a subconcept multiple times in different parts of the tree. In this paper we introduce a new representation based on trees, the linked decision forest, that does not need to repeat internal structure. We also introduce a supervised learning algorithm. Lumberjack, that uses the new representation. We then show empirically that Lumberjack improves generalization accuracy on hierarchically decomposable concepts.

11 citations