scispace - formally typeset
Search or ask a question
Topic

Cross-validation

About: Cross-validation is a research topic. Over the lifetime, 4625 publications have been published within this topic receiving 167508 citations.


Papers
More filters
Proceedings Article
Ron Kohavi1
20 Aug 1995
TL;DR: The results indicate that for real-word datasets similar to the authors', the best method to use for model selection is ten fold stratified cross validation even if computation power allows using more folds.
Abstract: We review accuracy estimation methods and compare the two most common methods crossvalidation and bootstrap. Recent experimental results on artificial data and theoretical re cults in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment--over half a million runs of C4.5 and a Naive-Bayes algorithm--to estimate the effects of different parameters on these algrithms on real-world datasets. For crossvalidation we vary the number of folds and whether the folds are stratified or not, for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, The best method to use for model selection is ten fold stratified cross validation even if computation power allows using more folds.

11,185 citations

Journal ArticleDOI
TL;DR: This paper reviewed the nonparametric estimation of statistical error, mainly the bias and standard error of an estimator, or the error rate of a prediction rule, at a relaxed mathematical level, omitting most proofs, regularity conditions and technical details.
Abstract: This is an invited expository article for The American Statistician. It reviews the nonparametric estimation of statistical error, mainly the bias and standard error of an estimator, or the error rate of a prediction rule. The presentation is written at a relaxed mathematical level, omitting most proofs, regularity conditions, and technical details.

3,146 citations

Journal ArticleDOI
Svante Wold1
TL;DR: In this article, the rank estimation of the rank A of the matrix Y, i.e., the estimation of how much of the data y ik is signal and how much is noise, is considered.
Abstract: By means of factor analysis (FA) or principal components analysis (PCA) a matrix Y with the elements y ik is approximated by the model Here the parameters α, β and θ express the systematic part of the data yik, “signal,” and the residuals ∊ ik express the “random” part, “noise.” When applying FA or PCA to a matrix of real data obtained, for example, by characterizing N chemical mixtures by M measured variables, one major problem is the estimation of the rank A of the matrix Y, i.e. the estimation of how much of the data y ik is “signal” and how much is “noise.” Cross validation can be used to approach this problem. The matrix Y is partitioned and the rank A is determined so as to maximize the predictive properties of model (I) when the parameters are estimated on one part of the matrix Y and the prediction tested on another part of the matrix Y.

2,468 citations

Journal ArticleDOI
TL;DR: In this paper, a prediction rule is constructed on the basis of some data, and then the error rate of this rule is estimated in classifying future observations using cross-validation.
Abstract: We construct a prediction rule on the basis of some data, and then wish to estimate the error rate of this rule in classifying future observations. Cross-validation provides a nearly unbiased estimate, using only the original data. Cross-validation turns out to be related closely to the bootstrap estimate of the error rate. This article has two purposes: to understand better the theoretical basis of the prediction problem, and to investigate some related estimators, which seem to offer considerably improved estimation in small samples.

2,331 citations

Journal ArticleDOI
TL;DR: A form of k -fold cross validation for evaluating prediction success is proposed for presence/available RSF models, which involves calculating the correlation between RSF ranks and area-adjusted frequencies for a withheld sub-sample of data.

2,107 citations


Network Information
Related Topics (5)
Artificial neural network
207K papers, 4.5M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Deep learning
79.8K papers, 2.1M citations
79% related
Convolutional neural network
74.7K papers, 2M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023102
2022231
2021313
2020357
2019348
2018289