Random Forests

doi:10.1023/A:1010933404324

Home
/
Papers
/
Random Forests

Journal Article•DOI•

Random Forests

Leo Breiman¹•Institutions (1)

University of California, Berkeley¹

01 Oct 2001-Vol. 45, Iss: 1, pp 5-32

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

read less

Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book•

Understanding Machine Learning: From Theory To Algorithms

[...]

Shai Shalev-Shwartz¹, Shai Ben-David²•Institutions (2)

Hebrew University of Jerusalem¹, University of Waterloo²

01 Jan 2015

TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.

...read moreread less

Abstract: Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

...read moreread less

3,857 citations

Journal Article•DOI•

Taking the Human Out of the Loop: A Review of Bayesian Optimization

[...]

Bobak Shahriari¹, Kevin Swersky², Ziyu Wang³, Ryan P. Adams⁴, Nando de Freitas³ - Show less +1 more•Institutions (4)

University of British Columbia¹, University of Toronto², University of Oxford³, Harvard University⁴

01 Jan 2016

TL;DR: This review paper introduces Bayesian optimization, highlights some of its methodological aspects, and showcases a wide range of applications.

...read moreread less

Abstract: Big Data applications are typically associated with systems involving large numbers of users, massive complex software systems, and large-scale heterogeneous computing and storage architectures. The construction of such systems involves many distributed design choices. The end products (e.g., recommendation systems, medical analysis tools, real-time game engines, speech recognizers) thus involve many tunable configuration parameters. These parameters are often specified and hard-coded into the software by various developers or teams. If optimized jointly, these parameters can result in significant improvements. Bayesian optimization is a powerful tool for the joint optimization of design choices that is gaining great popularity in recent years. It promises greater automation so as to increase both product quality and human productivity. This review paper introduces Bayesian optimization, highlights some of its methodological aspects, and showcases a wide range of applications.

...read moreread less

3,703 citations

Cites background or methods from "Random Forests"

...More precisely, the random forest is an ensemble method where the weak learners are decision trees trained on random subsamples of the data [24]....
[...]
...Introduced in 2001 [24], random forests are a class of scalable and highly parallelizable regression models that have been very successful in practice [42]....
[...]

Journal Article•DOI•

The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)

[...]

Bjoern H. Menze¹, Andras Jakab², Stefan Bauer³, Jayashree Kalpathy-Cramer⁴, Keyvan Farahani⁵, Justin Kirby⁵, Yuliya Burren³, N Porz³, Johannes Slotboom³, Roland Wiest³, Levente Lanczi⁶, Elizabeth R. Gerstner⁴, Marc-André Weber⁷, Tal Arbel⁸, Brian B. Avants⁹, Nicholas Ayache¹⁰, Patricia Buendia, D. Louis Collins⁸, Nicolas Cordier¹⁰, Jason J. Corso¹¹, Antonio Criminisi¹², Tilak Das¹³, Hervé Delingette¹⁰, Çağatay Demiralp¹⁴, Christopher R. Durst¹⁵, Michel Dojat¹⁰, Senan Doyle¹⁰, Joana Festa, Florence Forbes¹⁰, Ezequiel Geremia¹⁰, Ben Glocker¹⁶, Polina Golland¹⁷, Xiaotao Guo¹⁸, Andac Hamamci¹⁹, Khan M. Iftekharuddin²⁰, Raj Jena¹³, Nigel M. John, Ender Konukoglu⁴, Danial Lashkari¹⁷, José Mariz²¹, Raphael Meier³, Sérgio Pereira, Doina Precup⁸, Stephen J. Price¹³, Tammy Riklin Raviv¹⁷, Syed M. S. Reza²⁰, Michael Ryan, Duygu Sarikaya¹¹, Lawrence H. Schwartz¹⁸, Hoo-Chang Shin, Jamie Shotton¹², Carlos A. Silva, Nuno Sousa²¹, Nagesh K. Subbanna⁸, Gábor Székely², Thomas J. Taylor, Owen M. Thomas¹³, Nicholas J. Tustison¹⁵, Gozde Unal¹⁹, Flor Vasseur¹⁰, Max Wintermark¹⁵, Dong Hye Ye²², Liang Zhao¹¹, Binsheng Zhao¹⁸, Darko Zikic¹², Marcel Prastawa²³, Mauricio Reyes³, Koen Van Leemput⁴ - Show less +64 more•Institutions (23)

Technische Universität München¹, ETH Zurich², University of Bern³, Harvard University⁴, National Institutes of Health⁵, University of Debrecen⁶, University Hospital Heidelberg⁷, McGill University⁸, University of Pennsylvania⁹, French Institute for Research in Computer Science and Automation¹⁰, University at Buffalo¹¹, Microsoft¹², University of Cambridge¹³, Stanford University¹⁴, University of Virginia¹⁵, Imperial College London¹⁶, Massachusetts Institute of Technology¹⁷, Columbia University¹⁸, Sabancı University¹⁹, Old Dominion University²⁰, RMIT University²¹, Purdue University²², General Electric²³

01 Oct 2015-IEEE Transactions on Medical Imaging

TL;DR: The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) as mentioned in this paper was organized in conjunction with the MICCAI 2012 and 2013 conferences, and twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low and high grade glioma patients.

...read moreread less

Abstract: In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma patients—manually annotated by up to four raters—and to 65 comparable scans generated using tumor image simulation software Quantitative evaluations revealed considerable disagreement between the human raters in segmenting various tumor sub-regions (Dice scores in the range 74%–85%), illustrating the difficulty of this task We found that different algorithms worked best for different sub-regions (reaching performance comparable to human inter-rater variability), but that no single algorithm ranked in the top for all sub-regions simultaneously Fusing several good algorithms using a hierarchical majority vote yielded segmentations that consistently ranked above all individual algorithms, indicating remaining opportunities for further methodological improvements The BRATS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource

...read moreread less

3,699 citations

Proceedings Article•DOI•

Real-time human pose recognition in parts from single depth images

[...]

Jamie Shotton¹, Andrew Fitzgibbon¹, Mat Cook¹, Toby Sharp¹, Mark J. Finocchio¹, Richard E. Moore¹, Alex Aben-Athar Kipman¹, Andrew Blake¹ - Show less +4 more•Institutions (1)

Microsoft¹

20 Jun 2011

TL;DR: This work takes an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem, and generates confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes.

...read moreread less

Abstract: We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc. Finally we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes. The system runs at 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.

...read moreread less

3,579 citations

Journal Article•DOI•

Random Forests for Classification in Ecology

[...]

D. Richard Cutler¹, Thomas C. Edwards¹, Thomas C. Edwards², Karen H. Beard¹, Adele Cutler¹, Kyle Hess¹, Jacob Gibson¹, Joshua J. Lawler³ - Show less +4 more•Institutions (3)

Utah State University¹, United States Geological Survey², University of Washington³

01 Nov 2007-Ecology

TL;DR: High classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods are observed.

...read moreread less

Abstract: Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.

...read moreread less

3,368 citations

Cites background or methods from "Random Forests"

...Random forests (hereafter RF) is one such method (Breiman 2001)....
[...]
...For the classification situation, Breiman (2001) showed that classification accuracy can be significantly improved by aggregating the results of many classifiers that have little bias by averaging or voting, if the classifiers have low pairwise correlations....
[...]

1
2
3
4
5
6
…
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Bagging predictors

[...]

Leo Breiman

01 Aug 1996

TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.

...read moreread less

Abstract: Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.

...read moreread less

16,118 citations

Proceedings Article•

Experiments with a new boosting algorithm

[...]

Yoav Freund¹, Robert E. Schapire¹•Institutions (1)

AT&T¹

03 Jul 1996

TL;DR: This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers.

...read moreread less

Abstract: In an earlier paper, we introduced a new "boosting" algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learning algorithm that con- sistently generates classifiers whose performance is a little better than random guessing. We also introduced the related notion of a "pseudo-loss" which is a method for forcing a learning algorithm of multi-label concepts to concentrate on the labels that are hardest to discriminate. In this paper, we describe experiments we carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems. We performed two sets of experiments. The first set compared boosting to Breiman's "bagging" method when used to aggregate various classifiers (including decision trees and single attribute- value tests). We compared the performance of the two methods on a collection of machine-learning benchmarks. In the second set of experiments, we studied in more detail the performance of boosting using a nearest-neighbor classifier on an OCR problem.

...read moreread less

7,601 citations

"Random Forests" refers background or methods in this paper

...But none of these these three forests do as well as Adaboost (Freund & Schapire, 1996) or other algorithms that work by adaptive reweighting (arcing) of the training set (see Breiman, 1998b; Dieterrich, 1998; Bauer & Kohavi, 1999)....
[...]
...In its original version, Adaboost (Freund & Schapire, 1996) is a deterministic algorithm that selects the weights on the training set for input to the next classifier based on the misclassifications in the previous classifiers....
[...]

Journal Article•DOI•

The random subspace method for constructing decision forests

[...]

Tin Kam Ho¹•Institutions (1)

Bell Labs¹

01 Aug 1998-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity.

...read moreread less

Abstract: Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experiments on publicly available datasets, where the method's superiority is demonstrated. We also discuss independence between trees in a forest and relate that to the combined classification accuracy.

...read moreread less

5,984 citations

"Random Forests" refers background in this paper

...Ho (1998) has written a number of papers on “the random subspace” method which does a random selection of a subset of features to use to grow each tree....
[...]
...Keywords: classification, regression, ensemble...
[...]

Journal Article•DOI•

An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Aug 2000-Machine Learning

TL;DR: In this article, the authors compared the effectiveness of randomization, bagging, and boosting for improving the performance of the decision-tree algorithm C4.5 and found that in situations with little or no classification noise, randomization is competitive with bagging but not as accurate as boosting.

...read moreread less

Abstract: Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a “base” learning algorithm. Breiman has pointed out that they rely for their effectiveness on the instability of the base learning algorithm. An alternative approach to generating an ensemble is to randomize the internal decisions made by the base algorithm. This general approach has been studied previously by Ali and Pazzani and by Dietterich and Kong. This paper compares the effectiveness of randomization, bagging, and boosting for improving the performance of the decision-tree algorithm C4.5. The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting. In situations with substantial classification noise, bagging is much better than boosting, and sometimes better than randomization.

...read moreread less

2,919 citations

Journal Article•DOI•

An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

[...]

Eric Bauer¹, Ron Kohavi²•Institutions (2)

Stanford University¹, Blue Martini Software²

01 Jul 1999-Machine Learning

TL;DR: It is found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit, and that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference.

...read moreread less

Abstract: Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms, which use perturbation, reweighting, and combination techniques, affect classification error. We provide a bias and variance decomposition of the error to show how different methods and variants influence these two terms. This allowed us to determine that Bagging reduced variance of unstable methods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Voting variants, some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates, weight perturbations (Wagging), and backfitting of data. We found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit. We measure tree sizes and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its success in reducing the error. We compare the mean-squared error of voting methods to non-voting methods and show that the voting methods lead to large and significant reductions in the mean-squared errors. Practical problems that arise in implementing boosting algorithms are explored, including numerical instabilities and underflows. We use scatterplots that graphically show how AdaBoost reweights instances, emphasizing not only “hard” areas but also outliers and noise.

...read moreread less

2,686 citations

"Random Forests" refers background or methods in this paper

...But none of these these three forests do as well as Adaboost (Freund & Schapire, 1996) or other algorithms that work by adaptive reweighting (arcing) of the training set (see Breiman, 1998b; Dieterrich, 1998; Bauer & Kohavi, 1999)....
[...]
...The second is that bagging can be used to give ongoing estimates of the generalization error (PE∗) of the combined ensemble of trees, as well as estimates for the strength and correlation....
[...]