scispace - formally typeset
Search or ask a question
Topic

Random forest

About: Random forest is a research topic. Over the lifetime, 13345 publications have been published within this topic receiving 345395 citations. The topic is also known as: random forests & randomized trees.


Papers
More filters
Journal ArticleDOI
TL;DR: The Random Forest classifier uses bagging, or bootstrap aggregating, to form an ensemble of classification and regression tree (CART)-like classifiers, which is computationally much lighter than methods based on boosting and somewhat lighter than simple bagging.

1,634 citations

Journal ArticleDOI
TL;DR: Random Survival Forest (RSF) as discussed by the authors is a random forests method for the analysis of right-censored survival data, which is based on the conservation-of-events principle.
Abstract: We introduce random survival forests, a random forests method for the analysis of right-censored survival data. New survival splitting rules for growing survival trees are introduced, as is a new missing data algorithm for imputing missing data. A conservation-of-events principle for survival forests is introduced and used to define ensemble mortality, a simple interpretable measure of mortality that can be used as a predicted outcome. Several illustrative examples are given, including a case study of the prognostic implications of body mass for individuals with coronary artery disease. Computations for all examples were implemented using the freely available R-software package, randomSurvivalForest.

1,562 citations

Journal ArticleDOI
TL;DR: It is shown that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study.
Abstract: We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software proves to scale best with the number of features, samples, trees, and features tried for splitting. Finally, we show that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study.

1,512 citations

Journal ArticleDOI
TL;DR: Ranger as mentioned in this paper is a C++ application and R package for high-dimensional data, which is a fast implementation of random forests for high dimensional data and supports ensemble of classification, regression and survival trees.
Abstract: We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software proves to scale best with the number of features, samples, trees, and features tried for splitting. Finally, we show that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study.

1,423 citations

Proceedings ArticleDOI
26 Dec 2007
TL;DR: It is shown that selecting the ROI adds about 5% to the performance and, together with the other improvements, the result is about a 10% improvement over the state of the art for Caltech-256.
Abstract: We explore the problem of classifying images by the object categories they contain in the case of a large number of object categories. To this end we combine three ingredients: (i) shape and appearance representations that support spatial pyramid matching over a region of interest. This generalizes the representation of Lazebnik et al., (2006) from an image to a region of interest (ROI), and from appearance (visual words) alone to appearance and local shape (edge distributions); (ii) automatic selection of the regions of interest in training. This provides a method of inhibiting background clutter and adding invariance to the object instance 's position; and (iii) the use of random forests (and random ferns) as a multi-way classifier. The advantage of such classifiers (over multi-way SVM for example) is the ease of training and testing. Results are reported for classification of the Caltech-101 and Caltech-256 data sets. We compare the performance of the random forest/ferns classifier with a benchmark multi-way SVM classifier. It is shown that selecting the ROI adds about 5% to the performance and, together with the other improvements, the result is about a 10% improvement over the state of the art for Caltech-256.

1,401 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
90% related
Convolutional neural network
74.7K papers, 2M citations
90% related
Cluster analysis
146.5K papers, 2.9M citations
89% related
Feature extraction
111.8K papers, 2.1M citations
87% related
Artificial neural network
207K papers, 4.5M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
20235,459
202210,287
20212,325
20202,251
20191,961