Confidence Intervals for the Generalisation Error of Random Forests

Confidence Intervals for the Generalisation Error of Random Forests

TLDR

Out-of-bag error is commonly used as an estimate of generalisation error in ensemble-based learning models such as random forests and it is shown that these new confidence intervals have improved coverage properties over the näıve confidence interval, in real and simulated examples.

Abstract:

Out-of-bag error is commonly used as an estimate of generalisation error in ensemble-based learning models such as random forests. We present confidence intervals for this quantity using the delta-method-after-bootstrap and the jackknife-after-bootstrap techniques. These methods do not require growing any additional trees. We show that these new confidence intervals have improved coverage properties over the naive confidence interval, in real and simulated examples.

Confidence Intervals for the Generalisation Error of Random Forests

Citations

Distribution-free prediction regions of multivariate response PLS models with applications to NIR datasets

References

A new look at the statistical model identification

Estimating the dimension of a model

The Elements of Statistical Learning

Estimation of the Mean of a Multivariate Normal Distribution

Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation