A survey of cross-validation procedures for model selection

doi:10.1214/09-SS054

Home
/
Papers
/
A survey of cross-validation procedures for model selection

Journal Article•DOI•

A survey of cross-validation procedures for model selection

Sylvain Arlot¹, Alain Celisse•Institutions (1)

École Normale Supérieure¹

27 Jul 2009-arXiv: Statistics Theory-

TL;DR: In this paper, a survey on the model selection performances of cross-validation procedures is presented, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results, and guidelines are provided for choosing the best crossvalidation procedure according to the particular features of the problem in hand.

read less

Abstract: Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

[...]

Aki Vehtari¹, Andrew Gelman², Jonah Gabry²•Institutions (2)

Helsinki Institute for Information Technology¹, Columbia University²

16 Jul 2015-arXiv: Computation

TL;DR: In this article, leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are used to estimate pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values.

...read moreread less

Abstract: Leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. LOO and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. Here we lay out fast and stable computations for LOO and WAIC that can be performed using existing simulation draws. We introduce an efficient computation of LOO using Pareto-smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. Although WAIC is asymptotically equal to LOO, we demonstrate that PSIS-LOO is more robust in the finite case with weak priors or influential observations. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparing of predictive errors between two models. We implement the computations in an R package called 'loo' and demonstrate using models fit with the Bayesian inference package Stan.

...read moreread less

2,455 citations

Journal Article•DOI•

Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure

[...]

David R. Roberts¹, Volker Bahn², Simone Ciuti¹, Mark S. Boyce³, Jane Elith⁴, Gurutzeta Guillera-Arroita⁴, Severin Hauenstein¹, José J. Lahoz-Monfort⁴, Boris Schröder, Wilfried Thuiller, David I. Warton⁵, Brendan A. Wintle⁴, Florian Hartig¹, Florian Hartig⁶, Carsten F. Dormann¹ - Show less +11 more•Institutions (6)

University of Freiburg¹, Wright State University², University of Alberta³, University of Melbourne⁴, University of New South Wales⁵, University of Regensburg⁶

01 Aug 2017-Ecography

TL;DR: It is recommended that block cross-validation be used wherever dependence structures exist in a dataset, even if no correlation structure is visible in the fitted model residuals, or if the fitted models account for such correlations.

...read moreread less

Abstract: Ecological data often show temporal, spatial, hierarchical (random effects), or phylogenetic structure. Modern statistical approaches are increasingly accounting for such dependencies. However, when performing cross-validation, these structures are regularly ignored, resulting in serious underestimation of predictive error. One cause for the poor performance of uncorrected (random) cross-validation, noted often by modellers, are dependence structures in the data that persist as dependence structures in model residuals, violating the assumption of independence. Even more concerning, because often overlooked, is that structured data also provides ample opportunity for overfitting with non-causal predictors. This problem can persist even if remedies such as autoregressive models, generalized least squares, or mixed models are used. Block cross-validation, where data are split strategically rather than randomly, can address these issues. However, the blocking strategy must be carefully considered. Blocking in space, time, random effects or phylogenetic distance, while accounting for dependencies in the data, may also unwittingly induce extrapolations by restricting the ranges or combinations of predictor variables available for model training, thus overestimating interpolation errors. On the other hand, deliberate blocking in predictor space may also improve error estimates when extrapolation is the modelling goal. Here, we review the ecological literature on non-random and blocked cross-validation approaches. We also provide a series of simulations and case studies, in which we show that, for all instances tested, block cross-validation is nearly universally more appropriate than random cross-validation if the goal is predicting to new data or predictor space, or for selecting causal predictors. We recommend that block cross-validation be used wherever dependence structures exist in a dataset, even if no correlation structure is visible in the fitted model residuals, or if the fitted models account for such correlations.

...read moreread less

998 citations

Journal Article•DOI•

Machine Learning for Medical Imaging.

[...]

Bradley J. Erickson¹, Panagiotis Korfiatis¹, Zeynettin Akkus¹, Timothy L. Kline¹•Institutions (1)

Mayo Clinic¹

17 Feb 2017-Radiographics

TL;DR: Deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process.

...read moreread less

Abstract: Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. ©RSNA, 2017.

...read moreread less

870 citations

Journal Article•DOI•

Cross-validation pitfalls when selecting and assessing regression and classification models

[...]

Damjan Krstajic¹, Ljubomir Buturovic, David E. Leahy, Simon Thomas•Institutions (1)

University of Belgrade¹

29 Mar 2014-Journal of Cheminformatics

TL;DR: An algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and a repeated nested cross- validation algorithm for model assessment are described and evaluated.

...read moreread less

Abstract: We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error.

...read moreread less

644 citations

Book•

Habitat Suitability and Distribution Models

[...]

Antoine Guisan¹, Wilfried Thuiller², Niklaus E. Zimmermann•Institutions (2)

University of Lausanne¹, University of Grenoble²

01 Sep 2017

TL;DR: In this article, the authors introduce the key stages of niche-based habitat suitability model building, evaluation and prediction required for understanding and predicting future patterns of species and biodiversity, including the main theory behind ecological niches and species distributions.

...read moreread less

Abstract: This book introduces the key stages of niche- based habitat suitability model building, evaluation and prediction required for understanding and predicting future patterns of species and biodiversity. Beginning with the main theory behind ecological niches and species distributions, the book proceeds through all major steps of model building, from conceptualization and model training to model evaluation and spatio- temporal predictions. Extensive examples using R support graduate students and researchers in quantifying ecological niches and predicting species distributions with their own data, and help to address key environmental and conservation problems. Reflecting this highly active field of research, the book incorporates the latest developments from informatics and statistics, as well as using data from remote sources such as satellite imagery. A website at www.unil.ch/ hsdm contains the codes and supporting material required to run the examples and teach courses. All three authors are recognized specialists of and have contributed substantially to the development of spatial prediction methods for species’ habitat suitability and distribution modeling. They published a large number of papers, overall cumulating tens of thousands of citations, and are ISI Highly Cited Researchers.

...read moreread less

632 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Regression Shrinkage and Selection via the Lasso

[...]

Robert Tibshirani

01 Jan 1996-Journal of the royal statistical society series b-methodological

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

...read moreread less

40,785 citations

Journal Article•DOI•

Estimating the Dimension of a Model

[...]

Gideon Schwarz

01 Mar 1978-Annals of Statistics

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.

...read moreread less

Abstract: The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution.

...read moreread less

38,681 citations

Estimating the dimension of a model

[...]

Gideon Schwarz

01 Jan 2005

...read moreread less

36,760 citations

Statistical learning theory

[...]

Vladimir Vapnik

01 Jan 1998

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

26,531 citations

Proceedings Article•

Information Theory and an Extention of the Maximum Likelihood Principle

[...]

H. Akaike

01 Jan 1973

TL;DR: The classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion to provide answers to many practical problems of statistical model fitting.

...read moreread less

Abstract: In this paper it is shown that the classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion. This observation shows an extension of the principle to provide answers to many practical problems of statistical model fitting.

...read moreread less

18,539 citations