Alexander A. Aarts
Bio: Alexander A. Aarts is an academic researcher. The author has contributed to research in topic(s): Replication (statistics) & Reproducibility Project. The author has an hindex of 1, co-authored 1 publication(s) receiving 4564 citation(s).
Alexander A. Aarts, Joanna E. Anderson1, Christopher J. Anderson2, Peter Raymond Attridge3 +287 more•Institutions (116)
TL;DR: A large-scale assessment suggests that experimental reproducibility in psychology leaves a lot to be desired, and correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.
Abstract: Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.
01 Jun 1992
University of Southern California1, Duke University2, Stockholm School of Economics3, Center for Open Science4, University of Virginia5, University of Amsterdam6, University of Pennsylvania7, University of North Carolina at Chapel Hill8, University of Regensburg9, California Institute of Technology10, Research Institute of Industrial Economics11, New York University12, Cardiff University13, Northwestern University14, Mathematica Policy Research15, Ohio State University16, University of Sussex17, Texas A&M University18, Royal Holloway, University of London19, University of Zurich20, University of Melbourne21, University of Wisconsin-Madison22, University of Michigan23, Stanford University24, Rutgers University25, Columbia University26, University of Washington27, University of Edinburgh28, National University of Singapore29, Utrecht University30, Arizona State University31, Princeton University32, University of California, Los Angeles33, Imperial College London34, University of Innsbruck35, Harvard University36, University of Chicago37, University of Pittsburgh38, University of Notre Dame39, University of California, Berkeley40, Johns Hopkins University41, University of Bristol42, University of New South Wales43, Dartmouth College44, Whitman College45, University of Puerto Rico46, University of Milan47, University of California, Irvine48, Paris Dauphine University49, University of British Columbia50, Ludwig Maximilian University of Munich51, Purdue University52, Washington University in St. Louis53, University of California, Davis54, Microsoft55
01 Jan 2018-Nature Human Behaviour
TL;DR: The default P-value threshold for statistical significance is proposed to be changed from 0.05 to 0.005 for claims of new discoveries in order to reduce uncertainty in the number of discoveries.
Abstract: We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries.
TL;DR: This article proposed to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005, which is the threshold used in this paper.
Abstract: We propose to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005.
Ghent University1, Forschungszentrum Jülich2, Åbo Akademi University3, Aalto University4, Vienna University of Technology5, Duke University6, University of Grenoble7, École Polytechnique Fédérale de Lausanne8, Durham University9, International School for Advanced Studies10, Max Planck Society11, Uppsala University12, Humboldt University of Berlin13, Fritz Haber Institute of the Max Planck Society14, Technical University of Denmark15, National Institute of Standards and Technology16, University of Udine17, Université catholique de Louvain18, University of Basel19, Harvard University20, University of California, Davis21, Rutgers University22, University of York23, Wake Forest University24, Science and Technology Facilities Council25, University of Oxford26, University of Vienna27, Leibniz Institute for Neurobiology28, Dresden University of Technology29, Radboud University Nijmegen30, University of Tokyo31, Centre national de la recherche scientifique32, University of Cambridge33, Royal Holloway, University of London34, University of California, Santa Barbara35, University of Luxembourg36, Los Alamos National Laboratory37, Harbin Institute of Technology38
TL;DR: A procedure to assess the precision of DFT methods was devised and used to demonstrate reproducibility among many of the most widely used DFT codes, demonstrating that the precisionof DFT implementations can be determined, even in the absence of one absolute reference code.
Abstract: The widespread popularity of density functional theory has given rise to an extensive range of dedicated codes for predicting molecular and crystalline properties. However, each code implements the formalism in a different way, raising questions about the reproducibility of such predictions. We report the results of a community-wide effort that compared 15 solid-state codes, using 40 different potentials or basis set types, to assess the quality of the Perdew-Burke-Ernzerhof equations of state for 71 elemental crystals. We conclude that predictions from recent codes and pseudopotentials agree very well, with pairwise differences that are comparable to those between different high-precision experiments. Older methods, however, have less precise agreement. Our benchmark provides a framework for users and developers to document the precision of new applications and methodological improvements.