scispace - formally typeset
Search or ask a question
Journal ArticleDOI

At what sample size do correlations stabilize

01 Oct 2013-Journal of Research in Personality (Academic Press)-Vol. 47, Iss: 5, pp 609-612
TL;DR: In this article, the authors used Monte-Carlo simulations to determine the critical sample size from which on the magnitude of a correlation can be expected to be stable, which depends on the effect size, the width of the corridor of stability, and the requested confidence that the trajectory does not leave this corridor any more.
About: This article is published in Journal of Research in Personality.The article was published on 2013-10-01 and is currently open access. It has received 1302 citations till now. The article focuses on the topics: Sample size determination & Sample (statistics).

Summary (1 min read)

1. Introduction

  • Many researchers might have observed that the magnitude of a correlation is pretty unstable in small samples, as the following empirical example demonstrates.
  • Multiple questionnaire scales have been administered in an open online study (Schönbrodt & Gerstenberg, 2012 ; Study 3).
  • From a visual inspection, the trajectory did not stabilize up to a sample size of around 150.
  • Data have not been rearranged -it is simply the order how participants dropped into the study.

2. Definition and Operationalization of Stability

  • With these definitions, the answer to the research question can be formulated more precisely:.
  • The authors are interested in the critical sample size POS crit , from which value on the estimate of a correlation does not leave the COS w with a confidence of 80% (90%, 95%).

3. Method and Results

  • To explore the impact of these deviations on the POS crit values, the authors used four real world data sets provided by T. Micceri 2 as marginal distributions and imposed the specified population correlations (Ruscio & Kaczetow, 2008) .
  • The authors constrained their analysis to typical non-normal distributions found in psychology (i.e., some skewness and somewhat heavier tails) 3 .
  • Results for these variables were comparable to the Gaussian simulated data set.
  • In non-normal distributions the POS had a median increase of 1.7% compared to the normal case (i.e., on average the correlations stabilized slightly later), and 90% of the differences between the nonnormal and normal POS were smaller than 6%.

4. Discussion

  • If Table 1 should be boiled down to simple answers, one can ask what effect size typically can be expected in personality.
  • In a meta-meta-analysis summarizing 322 metaanalyses with more than 25'000 published studies in the field of personality and social psychology, Richard, Bond, and Stokes-Zoota (2003) report that the average published effect is r = .21, less than 25% of all meta-analytic effects sizes are greater than .30, and only 5.28% of all effects are greater than .50.
  • Further let's assume that a confidence level of 80% is requested (a level that is typically used for statistical power analyses), and only small effect sizes (w < .10) are considered as acceptable fluctuations.
  • Of course, what is a meaningful or expected correlation can vary depending on the research context and questions.
  • This would reduce the necessary sample size.

Did you find this useful? Give us your feedback

Citations
More filters
Journal ArticleDOI
08 May 2019
TL;DR: The most common mistakes being to describe effect sizes in ways that are uninformative (e.g., using arbitrary standards) or misleading as mentioned in this paper, i.e., squa...
Abstract: Effect sizes are underappreciated and often misinterpreted—the most common mistakes being to describe them in ways that are uninformative (e.g., using arbitrary standards) or misleading (e.g., squa...

1,292 citations


Cites background from "At what sample size do correlations..."

  • ...Schönbrodt and Perugini (2013) ran a series of Monte Carlo simulations that led them to conclude that “in typical scenarios sample size should approach 250 for stable estimates” (p....

    [...]

  • ...Schönbrodt and Perugini (2013) ran a series of Monte Carlo simulations that led them to conclude that “in typical scenarios sample size should approach 250 for stable estimates” (p. 609)....

    [...]

Journal ArticleDOI
TL;DR: The R package (rmcorr) is introduced and its use for inferential statistics and visualization with two example datasets are used to illustrate research questions at different levels of analysis, intra-individual, and inter-individual.
Abstract: Repeated measures correlation (rmcorr) is a statistical technique for determining the common within-individual association for paired measures assessed on two or more occasions for multiple individuals. Simple regression/correlation is often applied to non-independent observations or aggregated data; this may produce biased, specious results due to violation of independence and/or differing patterns between-participants versus within-participants. Unlike simple regression/correlation, rmcorr does not violate the assumption of independence of observations. Also, rmcorr tends to have much greater statistical power because neither averaging nor aggregation is necessary for an intra-individual research question. Rmcorr estimates the common regression slope, the association shared among individuals. To make rmcorr accessible, we provide background information for its assumptions and equations, visualization, power, and tradeoffs with rmcorr compared to multilevel modeling. We introduce the R package (rmcorr) and demonstrate its use for inferential statistics and visualization with two example datasets. The examples are used to illustrate research questions at different levels of analysis, intra-individual, and inter-individual. Rmcorr is well-suited for research questions regarding the common linear association in paired repeated measures data. All results are fully reproducible.

1,135 citations

01 Jan 2016
TL;DR: This introduction to robust estimation and hypothesis testing helps people to enjoy a good book with a cup of coffee in the afternoon, instead they cope with some harmful bugs inside their laptop.
Abstract: Thank you very much for downloading introduction to robust estimation and hypothesis testing. As you may know, people have search numerous times for their favorite books like this introduction to robust estimation and hypothesis testing, but end up in harmful downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they cope with some harmful bugs inside their laptop.

968 citations

Journal ArticleDOI
TL;DR: The very reason such tasks produce robust and easily replicable experimental effects – low between-participant variability – makes their use as correlational tools problematic, and it is demonstrated that taking reliability estimates into account has the potential to qualitatively change theoretical conclusions.
Abstract: Individual differences in cognitive paradigms are increasingly employed to relate cognition to brain structure, chemistry, and function. However, such efforts are often unfruitful, even with the most well established tasks. Here we offer an explanation for failures in the application of robust cognitive paradigms to the study of individual differences. Experimental effects become well established – and thus those tasks become popular – when between-subject variability is low. However, low between-subject variability causes low reliability for individual differences, destroying replicable correlations with other factors and potentially undermining published conclusions drawn from correlational relationships. Though these statistical issues have a long history in psychology, they are widely overlooked in cognitive psychology and neuroscience today. In three studies, we assessed test-retest reliability of seven classic tasks: Eriksen Flanker, Stroop, stop-signal, go/no-go, Posner cueing, Navon, and Spatial-Numerical Association of Response Code (SNARC). Reliabilities ranged from 0 to .82, being surprisingly low for most tasks given their common use. As we predicted, this emerged from low variance between individuals rather than high measurement variance. In other words, the very reason such tasks produce robust and easily replicable experimental effects – low between-participant variability – makes their use as correlational tools problematic. We demonstrate that taking such reliability estimates into account has the potential to qualitatively change theoretical conclusions. The implications of our findings are that well-established approaches in experimental psychology and neuropsychology may not directly translate to the study of individual differences in brain structure, chemistry, and function, and alternative metrics may be required.

869 citations


Cites background from "At what sample size do correlations..."

  • ...Given that correlations begin to stabilize with around 150 observations (Schönbrodt & Perugini, 2013), our confidence in the reliability of any specific task will depend on collecting larger test-retest data sets....

    [...]

Journal ArticleDOI
TL;DR: In this article , the authors used three of the largest neuroimaging datasets currently available, with a total sample size of around 50,000 individuals, to quantify brain-wide association studies effect sizes and reproducibility as a function of sample size.
Abstract: Magnetic resonance imaging (MRI) has transformed our understanding of the human brain through well-replicated mapping of abilities to specific structures (for example, lesion studies) and functions1-3 (for example, task functional MRI (fMRI)). Mental health research and care have yet to realize similar advances from MRI. A primary challenge has been replicating associations between inter-individual differences in brain structure or function and complex cognitive or mental health phenotypes (brain-wide association studies (BWAS)). Such BWAS have typically relied on sample sizes appropriate for classical brain mapping4 (the median neuroimaging study sample size is about 25), but potentially too small for capturing reproducible brain-behavioural phenotype associations5,6. Here we used three of the largest neuroimaging datasets currently available-with a total sample size of around 50,000 individuals-to quantify BWAS effect sizes and reproducibility as a function of sample size. BWAS associations were smaller than previously thought, resulting in statistically underpowered studies, inflated effect sizes and replication failures at typical sample sizes. As sample sizes grew into the thousands, replication rates began to improve and effect size inflation decreased. More robust BWAS effects were detected for functional MRI (versus structural), cognitive tests (versus mental health questionnaires) and multivariate methods (versus univariate). Smaller than expected brain-phenotype associations and variability across population subsamples can explain widespread BWAS replication failures. In contrast to non-BWAS approaches with larger effects (for example, lesions, interventions and within-person), BWAS reproducibility requires samples with thousands of individuals.

611 citations

References
More filters
Journal Article
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

272,030 citations

Book
01 Dec 1969
TL;DR: The concepts of power analysis are discussed in this paper, where Chi-square Tests for Goodness of Fit and Contingency Tables, t-Test for Means, and Sign Test are used.
Abstract: Contents: Prefaces. The Concepts of Power Analysis. The t-Test for Means. The Significance of a Product Moment rs (subscript s). Differences Between Correlation Coefficients. The Test That a Proportion is .50 and the Sign Test. Differences Between Proportions. Chi-Square Tests for Goodness of Fit and Contingency Tables. The Analysis of Variance and Covariance. Multiple Regression and Correlation Analysis. Set Correlation and Multivariate Methods. Some Issues in Power Analysis. Computational Procedures.

115,069 citations


"At what sample size do correlations..." refers background or methods in this paper

  • ...As the confidence interval around correlations partly depends on the magnitude of the correlation, the corridor is defined in units of q, an effect size measure for correlations that only depends on sample size (Cohen, 1988). For that purpose, ρ is Fisher-r-to-Z-transformed and the desired width of the corridor, w, is both subtracted from and added to that value. Therefore, w denotes the half-width of the COS. These upper and lower boundaries then are back-transformed to a correlation metric. The desired width of the corridor depends on the specific research context (see Figure 1 for a COS with w = .10). In this paper, three widths are used: ± .10, ± .15, and ± .20. Following the rules of thumb proposed by Cohen (1992), a value of ....

    [...]

  • ...As the confidence interval around correlations partly depends on the magnitude of the correlation, the corridor is defined in units of q, an effect size measure for correlations that only depends on sample size (Cohen, 1988)....

    [...]

Journal ArticleDOI
Jacob Cohen1
TL;DR: A convenient, although not comprehensive, presentation of required sample sizes is providedHere the sample sizes necessary for .80 power to detect effects at these levels are tabled for eight standard statistical tests.
Abstract: One possible reason for the continued neglect of statistical power analysis in research in the behavioral sciences is the inaccessibility of or difficulty with the standard material. A convenient, although not comprehensive, presentation of required sample sizes is provided here. Effect-size indexes and conventional values for these are given for operationally defined small, medium, and large effects. The sample sizes necessary for .80 power to detect effects at these levels are tabled for eight standard statistical tests: (a) the difference between independent means, (b) the significance of a product-moment correlation, (c) the difference between independent rs, (d) the sign test, (e) the difference between independent proportions, (f) chi-square tests for goodness of fit and contingency tables, (g) one-way analysis of variance, and (h) the significance of a multiple or multiple partial correlation.

38,291 citations


"At what sample size do correlations..." refers methods in this paper

  • ...Following the rules of thumb proposed by Cohen (1992), a value of .10 for w corresponds to a small effect size....

    [...]

Book
01 Jan 1990
TL;DR: In this article, the authors present a meta-analysis of Artifact Distributions and their impact on study outcomes. But they focus mainly on the second-order sampling error and related issues.
Abstract: PART ONE: INTRODUCTION TO META-ANALYSIS Integrating Research Findings Across Studies Study Artifacts and Their Impact on Study Outcomes PART TWO: META-ANALYSIS OF CORRELATIONS Meta-Analysis of Correlations Corrected Individually for Artifacts Meta-Analysis of Correlations Using Artifact Distributions Technical Questions in Meta-Analysis of Correlations PART THREE: META-ANALYSIS OF EXPERIMENTAL EFFECTS AND OTHER DICHOTOMOUS COMPARISONS Treatment Effects Experimental Artifacts and Their Impact Meta-Analysis Methods for d Values Technical Questions in Meta-Analysis of d Values PART FOUR: GENERAL ISSUES IN META-ANALYSIS Second Order Sampling Error and Related Issues Cumulation of Findings within Studies Methods of Integrating Findings Across Studies Locating, Selecting, and Evaluating Studies General Criticisms of Meta-Analysis Summary of Psychometric Meta-Analysis

4,673 citations

Book
07 Apr 1997
TL;DR: In this paper, the authors present a foundation for robust regression methods for estimating measures of location and scale, including confidence intervals in the one-sample case, and the correlation and tests of independence.
Abstract: Preface 1 Introduction 2 A Foundation for Robust Methods 3 Estimating Measures of Location and Scale 4 Confidence Intervals in the One-Sample Case 5 Comparing Two Groups 6 Some Multivariate Methods 7 One-Way and Higher Designs for Independent Groups 8 Comparing Multiple Dependent Groups 9 Correlation and Tests of Independence 10 Robust Regression 11 More Regression Methods

1,836 citations

Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "At what sample size do correlations stabilize?" ?

In this report the authors use Monte-Carlo simulations to determine the critical sample size from which on the magnitude of a correlation can be expected to be stable. 

Trending Questions (1)
What is the optimal sample size for a correlational study?

The optimal sample size for a correlational study is approximately 250, according to the provided paper.