Topic

Random forest

About: Random forest is a research topic. Over the lifetime, 13345 publications have been published within this topic receiving 345395 citations. The topic is also known as: random forests & randomized trees.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Random Forests for Classification in Ecology

[...]

D. Richard Cutler¹, Thomas C. Edwards¹, Thomas C. Edwards², Karen H. Beard¹, Adele Cutler¹, Kyle Hess¹, Jacob Gibson¹, Joshua J. Lawler³ - Show less +4 more•Institutions (3)

Utah State University¹, United States Geological Survey², University of Washington³

01 Nov 2007-Ecology

TL;DR: High classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods are observed.

...read moreread less

Abstract: Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.

...read moreread less

3,368 citations

Journal Article•DOI•

Random forest in remote sensing: A review of applications and future directions

[...]

Mariana Belgiu¹, Lucian Drăguţ²•Institutions (2)

University of Salzburg¹, West University of Timișoara²

01 Apr 2016-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: This review has revealed that RF classifier can successfully handle high data dimensionality and multicolinearity, being both fast and insensitive to overfitting.

...read moreread less

Abstract: A random forest (RF) classifier is an ensemble classifier that produces multiple decision trees, using a randomly selected subset of training samples and variables. This classifier has become popular within the remote sensing community due to the accuracy of its classifications. The overall objective of this work was to review the utilization of RF classifier in remote sensing. This review has revealed that RF classifier can successfully handle high data dimensionality and multicolinearity, being both fast and insensitive to overfitting. It is, however, sensitive to the sampling design. The variable importance (VI) measurement provided by the RF classifier has been extensively exploited in different scenarios, for example to reduce the number of dimensions of hyperspectral data, to identify the most relevant multisource remote sensing and geographic data, and to select the most suitable season to classify particular target classes. Further investigations are required into less commonly exploited uses of this classifier, such as for sample proximity analysis to detect and remove outliers in the training samples.

...read moreread less

3,244 citations

Journal Article•DOI•

MissForest—non-parametric missing value imputation for mixed-type data

[...]

Daniel J. Stekhoven¹, Peter Bühlmann¹•Institutions (1)

ETH Zurich¹

01 Jan 2012-Bioinformatics

TL;DR: In this comparative study, missForest outperforms other methods of imputation especially in data settings where complex interactions and non-linear relations are suspected and the out-of-bag imputation error estimates of missForest prove to be adequate in all settings.

...read moreread less

Abstract: Motivation Modern data acquisition based on high-throughput technology is often facing the problem of missing data. Algorithms commonly used in the analysis of such large-scale data often depend on a complete set. Missing value imputation offers a solution to this problem. However, the majority of available imputation methods are restricted to one type of variable only: continuous or categorical. For mixed-type data, the different types are usually handled separately. Therefore, these methods ignore possible relations between variable types. We propose a non-parametric method which can cope with different types of variables simultaneously. Results We compare several state of the art methods for the imputation of missing values. We propose and evaluate an iterative imputation method (missForest) based on a random forest. By averaging over many unpruned classification or regression trees, random forest intrinsically constitutes a multiple imputation scheme. Using the built-in out-of-bag error estimates of random forest, we are able to estimate the imputation error without the need of a test set. Evaluation is performed on multiple datasets coming from a diverse selection of biological fields with artificially introduced missing values ranging from 10% to 30%. We show that missForest can successfully handle missing values, particularly in datasets including different types of variables. In our comparative study, missForest outperforms other methods of imputation especially in data settings where complex interactions and non-linear relations are suspected. The out-of-bag imputation error estimates of missForest prove to be adequate in all settings. Additionally, missForest exhibits attractive computational efficiency and can cope with high-dimensional data. Availability The package missForest is freely available from http://stat.ethz.ch/CRAN/. Contact stekhoven@stat.math.ethz.ch; buhlmann@stat.math.ethz.ch

...read moreread less

2,928 citations

Journal Article•DOI•

An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Aug 2000-Machine Learning

TL;DR: In this article, the authors compared the effectiveness of randomization, bagging, and boosting for improving the performance of the decision-tree algorithm C4.5 and found that in situations with little or no classification noise, randomization is competitive with bagging but not as accurate as boosting.

...read moreread less

Abstract: Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a “base” learning algorithm. Breiman has pointed out that they rely for their effectiveness on the instability of the base learning algorithm. An alternative approach to generating an ensemble is to randomize the internal decisions made by the base algorithm. This general approach has been studied previously by Ali and Pazzani and by Dietterich and Kong. This paper compares the effectiveness of randomization, bagging, and boosting for improving the performance of the decision-tree algorithm C4.5. The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting. In situations with substantial classification noise, bagging is much better than boosting, and sometimes better than randomization.

...read moreread less

2,919 citations

Journal Article•DOI•

Feature Selection with the Boruta Package

[...]

Miron B. Kursa¹, Witold R. Rudnicki¹•Institutions (1)

University of Warsaw¹

16 Sep 2010-Journal of Statistical Software

TL;DR: The Boruta package provides a convenient interface to the Boruta algorithm, implementing a novel feature selection algorithm for finding emph{all relevant variables}.

...read moreread less

Abstract: This article describes a R package Boruta, implementing a novel feature selection algorithm for finding emph{all relevant variables}. The algorithm is designed as a wrapper around a Random Forest classification algorithm. It iteratively removes the features which are proved by a statistical test to be less relevant than random probes. The Boruta package provides a convenient interface to the algorithm. The short description of the algorithm and examples of its application are presented.

...read moreread less

2,832 citations

Collapse

Network Information

Performance

Metrics

29,141

Papers

532,363

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	5,459
2022	10,287
2021	2,325
2020	2,251
2019	1,961

Random forest

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics