scispace - formally typeset
Open AccessJournal Article

To Tune or Not to Tune the Number of Trees in Random Forest

Philipp Probst, +1 more
- 16 May 2017 - 
- Vol. 18, Iss: 181, pp 1-18
Reads0
Chats0
TLDR
The goal of this paper is providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and arguing in favor of setting it to a computationally feasible large number, depending on convergence properties of the desired performance measure.
Abstract
The number of trees T in the random forest (RF) algorithm for supervised learning has to be set by the user It is controversial whether T should simply be set to the largest computationally manageable value or whether a smaller T may in some cases be better While the principle underlying bagging is that "more trees are better", in practice the classification error rate sometimes reaches a minimum before increasing again for increasing number of trees The goal of this paper is four-fold: (i) providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and explaining under which circumstances this happens; (ii) providing theoretical results showing that such non-monotonous patterns cannot be observed for other performance measures such as the Brier score and the logarithmic loss (for classification) and the mean squared error (for regression); (iii) illustrating the extent of the problem through an application to a large number (n = 306) of datasets from the public database OpenML; (iv) finally arguing in favor of setting it to a computationally feasible large number, depending on convergence properties of the desired performance measure

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Hyperparameters and tuning strategies for random forest

TL;DR: A literature review on the parameters' influence on the prediction performance and on variable importance measures is provided, and the application of one of the most established tuning strategies, model‐based optimization (MBO), is demonstrated.
Journal ArticleDOI

Random Forest as a generic framework for predictive modeling of spatial and spatio-temporal variables

TL;DR: A random forest for spatial predictions framework (RFsp) where buffer distances from observation points are used as explanatory variables, thus incorporating geographical proximity effects into the prediction process, and appears to be especially attractive for building multivariate spatial prediction models that can be used as “knowledge engines” in various geoscience fields.
Journal ArticleDOI

Random forest versus logistic regression: a large-scale benchmark experiment.

TL;DR: A large scale benchmarking experiment based on 243 real datasets comparing the prediction performance of the original version of RF with default parameters and LR as binary classification tools suggests a significantly better performance of RF.
Journal ArticleDOI

A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources

TL;DR: This work popularizes RF and their variants for the practicing water scientist, and discusses related concepts and techniques, which have received less attention from the water science and hydrologic communities.
Posted ContentDOI

Simple Behavioral Analysis (SimBA) – an open source toolkit for computer classification of complex social behaviors in experimental animals

TL;DR: An open-source package with graphical interface and workflow (SimBA) that uses pose-estimation to create supervised machine learning predictive classifiers of rodent social behavior, with millisecond resolution and accuracies that can out-perform human observers is presented.
Related Papers (5)
Trending Questions (1)
How does the number of trees in a lightweight random forest influence the performance of the model?

The number of trees in a random forest can affect the performance, with the error rate sometimes reaching a minimum before increasing again for increasing number of trees.