scispace - formally typeset
Journal ArticleDOI

Improvements on Cross-Validation: The 632+ Bootstrap Method

Reads0
Chats0
TLDR
It is shown that a particular bootstrap method, the .632+ rule, substantially outperforms cross-validation in a catalog of 24 simulation experiments and also considers estimating the variability of an error rate estimate.
Abstract
A training set of data has been used to construct a rule for predicting future responses. What is the error rate of this rule? This is an important question both for comparing models and for assessing a final selected model. The traditional answer to this question is given by cross-validation. The cross-validation estimate of prediction error is nearly unbiased but can be highly variable. Here we discuss bootstrap estimates of prediction error, which can be thought of as smoothed versions of cross-validation. We show that a particular bootstrap method, the .632+ rule, substantially outperforms cross-validation in a catalog of 24 simulation experiments. Besides providing point estimates, we also consider estimating the variability of an error rate estimate. All of the results here are nonparametric and apply to any possible prediction rule; however, we study only classification problems with 0–1 loss in detail. Our simulations include “smooth” prediction rules like Fisher's linear discriminant fun...

read more

Citations
More filters
Journal ArticleDOI

Differential expression analysis for sequence count data.

Simon Anders, +1 more
- 27 Oct 2010 - 
TL;DR: A method based on the negative binomial distribution, with variance and mean linked by local regression, is proposed and an implementation, DESeq, as an R/Bioconductor package is presented.
BookDOI

Regression Modeling Strategies

TL;DR: Regression models are frequently used to develop diagnostic, prognostic, and health resource utilization models in clinical, health services, outcomes, pharmacoeconomic, and epidemiologic research, and in a multitude of non-health-related areas.
Book

Applied Predictive Modeling

Max Kuhn, +1 more
TL;DR: This research presents a novel and scalable approach called “Smartfitting” that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing statistical models for regression models.
Journal ArticleDOI

An introduction to kernel-based learning algorithms

TL;DR: This paper provides an introduction to support vector machines, kernel Fisher discriminant analysis, and kernel principal component analysis, as examples for successful kernel-based learning methods.
References
More filters
Book

An introduction to the bootstrap

TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
Journal ArticleDOI

Bagging predictors

Leo Breiman
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Book

Classification and regression trees

Leo Breiman
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Journal ArticleDOI

Bootstrap Methods: Another Look at the Jackknife

TL;DR: In this article, the authors discuss the problem of estimating the sampling distribution of a pre-specified random variable R(X, F) on the basis of the observed data x.