Journal ArticleDOI
Improvements on Cross-Validation: The 632+ Bootstrap Method
Bradley Efron,Robert Tibshirani +1 more
Reads0
Chats0
TLDR
It is shown that a particular bootstrap method, the .632+ rule, substantially outperforms cross-validation in a catalog of 24 simulation experiments and also considers estimating the variability of an error rate estimate.Abstract:
A training set of data has been used to construct a rule for predicting future responses. What is the error rate of this rule? This is an important question both for comparing models and for assessing a final selected model. The traditional answer to this question is given by cross-validation. The cross-validation estimate of prediction error is nearly unbiased but can be highly variable. Here we discuss bootstrap estimates of prediction error, which can be thought of as smoothed versions of cross-validation. We show that a particular bootstrap method, the .632+ rule, substantially outperforms cross-validation in a catalog of 24 simulation experiments. Besides providing point estimates, we also consider estimating the variability of an error rate estimate. All of the results here are nonparametric and apply to any possible prediction rule; however, we study only classification problems with 0–1 loss in detail. Our simulations include “smooth” prediction rules like Fisher's linear discriminant fun...read more
Citations
More filters
Journal ArticleDOI
Differential expression analysis for sequence count data.
Simon Anders,Wolfgang Huber +1 more
TL;DR: A method based on the negative binomial distribution, with variance and mean linked by local regression, is proposed and an implementation, DESeq, as an R/Bioconductor package is presented.
Journal ArticleDOI
Least angle regression
Bradley Efron,Trevor Hastie,Iain M. Johnstone,Robert Tibshirani,Hemant Ishwaran,Keith Knight,Jean-Michel Loubes,Jean-Michel Loubes,Pascal Massart,Pascal Massart,David Madigan,David Madigan,Greg Ridgeway,Greg Ridgeway,Saharon Rosset,Saharon Rosset,Ji Zhu,Robert A. Stine,Berwin A. Turlach,Sanford Weisberg +19 more
TL;DR: A publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates is described.
BookDOI
Regression Modeling Strategies
TL;DR: Regression models are frequently used to develop diagnostic, prognostic, and health resource utilization models in clinical, health services, outcomes, pharmacoeconomic, and epidemiologic research, and in a multitude of non-health-related areas.
Book
Applied Predictive Modeling
Max Kuhn,Kjell Johnson +1 more
TL;DR: This research presents a novel and scalable approach called “Smartfitting” that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing statistical models for regression models.
Journal ArticleDOI
An introduction to kernel-based learning algorithms
TL;DR: This paper provides an introduction to support vector machines, kernel Fisher discriminant analysis, and kernel principal component analysis, as examples for successful kernel-based learning methods.
References
More filters
Book
An introduction to the bootstrap
Bradley Efron,Robert Tibshirani +1 more
TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
Journal ArticleDOI
Classification and Regression Trees.
Journal ArticleDOI
Bagging predictors
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Book
Classification and regression trees
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Journal ArticleDOI
Bootstrap Methods: Another Look at the Jackknife
TL;DR: In this article, the authors discuss the problem of estimating the sampling distribution of a pre-specified random variable R(X, F) on the basis of the observed data x.