Cross-validation failure: Small sample sizes lead to large error bars.
TLDR
In this article, the authors raise awareness on error bars of cross-validation, which are often underestimated and propose solutions to increase sample size, tackling possible increases in heterogeneity of the data.About:
This article is published in NeuroImage.The article was published on 2017-06-24 and is currently open access. It has received 408 citations till now. The article focuses on the topics: Sample size determination & Cross-validation.read more
Citations
More filters
Journal ArticleDOI
Consolidation alters motor sequence-specific distributed representations.
Basile Pinsard,Arnaud Boutin,Ella Gabitov,Ovidiu Lungu,Habib Benali,Habib Benali,Julien Doyon +6 more
TL;DR: It is shown, for the first time in humans, that complementary sequence-specific motor representations evolve distinctively during critical phases of skill acquisition and consolidation.
Journal ArticleDOI
Machine learning algorithm validation with a limited sample size
TL;DR: The authors' simulations show that K-fold Cross-Validation (CV) produces strongly biased performance estimates with small sample sizes, and the bias is still evident with sample size of 1000, while Nested CV and train/test split approaches produce robust and unbiased performance estimates regardless of sample size.
Journal ArticleDOI
Reproducible brain-wide association studies require thousands of individuals
Scott Marek,Brenden Tervo-Clemmens,Finnegan J. Calabro,David F. Montez,Benjamin P Kay,Alexander S. Hatoum,Meghan Rose Donohue,Will Foran,Ryland L. Miller,Timothy Hendrickson,Stephen M. Malone,Sridhar Kandala,Eric Feczko,Oscar Miranda-Dominguez,Alice M. Graham,Eric Earl,Anders Perrone,Michaela Cordova,Olivia Doyle,Lucille A. Moore,Gregory Mark Conan,Johnny Uriarte,Katherine Allene Snider,Benjamin J. Lynch,James C. Wilgenbusch,Thomas Pengo,Angela Tam,Jianzhong Chen,Dillan J. Newbold,Annie Zheng,Nicole A Seider,Andrew N. Van,Athanasia Metoki,Roselyne Chauvin,Timothy O. Laumann,Deanna J. Greene,Steven E. Petersen,Hugh Garavan,Wesley K. Thompson,Thomas E. Nichols,B.T. Thomas Yeo,Deanna M. Barch,Beatriz Luna,Damien A. Fair,Nico U.F. Dosenbach +44 more
TL;DR: In this article , the authors used three of the largest neuroimaging datasets currently available, with a total sample size of around 50,000 individuals, to quantify brain-wide association studies effect sizes and reproducibility as a function of sample size.
Journal ArticleDOI
Reproducible brain-wide association studies require thousands of individuals
Scott Marek,Brenden Tervo-Clemmens,Finnegan J. Calabro,David F. Montez,Benjamin P Kay,Alexander S. Hatoum,Meghan Rose Donohue,Will Foran,Ryland L. Miller,Timothy Hendrickson,Stephen M. Malone,Sridhar Kandala,Eric Feczko,Oscar Miranda-Dominguez,Alice M. Graham,Eric Earl,Anders Perrone,Michaela Cordova,Olivia Doyle,Lucille A. Moore,Gregory Mark Conan,Johnny Uriarte,Katherine Allene Snider,Benjamin J. Lynch,James C. Wilgenbusch,Thomas Pengo,Angela Tam,Jianzhong Chen,Dillan J. Newbold,Annie Zheng,Nicole A Seider,Andrew N. Van,Athanasia Metoki,Roselyne Chauvin,Timothy O. Laumann,Deanna J. Greene,Steven E. Petersen,Hugh Garavan,Wesley K. Thompson,Thomas E. Nichols,B.T. Thomas Yeo,Deanna M. Barch,Beatriz Luna,Damien A. Fair,Nico U.F. Dosenbach +44 more
TL;DR: In this paper , the authors used three of the largest neuroimaging datasets currently available, with a total sample size of around 50,000 individuals, to quantify brain-wide association studies effect sizes and reproducibility as a function of sample size.
Journal ArticleDOI
Machine Learning for Precision Psychiatry: Opportunities and Challenges.
TL;DR: This primer aims to introduce clinicians and researchers to the opportunities and challenges in bringing machine intelligence into psychiatric practice.
References
More filters
Journal Article
Statistical Comparisons of Classifiers over Multiple Data Sets
TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.
Journal ArticleDOI
The file drawer problem and tolerance for null results
TL;DR: Quantitative procedures for computing the tolerance for filed and future null results are reported and illustrated, and the implications are discussed.
Journal ArticleDOI
Power failure: why small sample size undermines the reliability of neuroscience
Katherine S. Button,John P. A. Ioannidis,Claire Mokrysz,Brian A. Nosek,Jonathan Flint,Emma S J Robinson,Marcus R. Munafò +6 more
TL;DR: It is shown that the average statistical power of studies in the neurosciences is very low, and the consequences include overestimates of effect size and low reproducibility of results.
Journal ArticleDOI
Why Most Published Research Findings Are False
TL;DR: In this paper, the authors discuss the implications of these problems for the conduct and interpretation of research and conclude that the probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and the ratio of true to no relationships among the relationships probed in each scientifi c fi eld.