scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Measures of reliability in sports medicine and science.

01 Jul 2000-Sports Medicine (Springer International Publishing)-Vol. 30, Iss: 1, pp 1-15
TL;DR: A wider understanding of reliability and adoption of the typical error as the standard measure of reliability would improve the assessment of tests and equipment in the authors' disciplines.
Abstract: Reliability refers to the reproducibility of values of a test, assay or other measurement in repeated trials on the same individuals. Better reliability implies better precision of single measurements and better tracking of changes in measurements in research or practical settings. The main measures of reliability are within-subject random variation, systematic change in the mean, and retest correlation. A simple, adaptable form of within-subject variation is the typical (standard) error of measurement: the standard deviation of an individual’s repeated measurements. For many measurements in sports medicine and science, the typical error is best expressed as a coefficient of variation (percentage of the mean). A biased, more limited form of within-subject variation is the limits of agreement: the 95% likely range of change of an individual’s measurements between 2 trials. Systematic changes in the mean of a measure between consecutive trials represent such effects as learning, motivation or fatigue; these changes need to be eliminated from estimates of within-subject variation. Retest correlation is difficult to interpret, mainly because its value is sensitive to the heterogeneity of the sample of participants. Uses of reliability include decision-making when monitoring individuals, comparison of tests or equipment, estimation of sample size in experiments and estimation of the magnitude of individual differences in the response to a treatment. Reasonable precision for estimates of reliability requires approximately 50 study participants and at least 3 trials. Studies aimed at assessing variation in reliability between tests or equipment require complex designs and analyses that researchers seldom perform correctly. A wider understanding of reliability and adoption of the typical error as the standard measure of reliability would improve the assessment of tests and equipment in our disciplines. CURRENT OPINION

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A practical guideline for clinical researchers to choose the correct form of ICC is provided and the best practice of reporting ICC parameters in scientific publications is suggested.

12,717 citations

Journal ArticleDOI
TL;DR: A more progressive resource for sample-based studies, meta-analyses, and case studies in sports medicine and exercise science is presented, and forthright advice on controversial or novel issues is offered.
Abstract: Statistical guidelines and expert statements are now available to assist in the analysis and reporting of studies in some biomedical disciplines. We present here a more progressive resource for sample-based studies, meta-analyses, and case studies in sports medicine and exercise science. We offer forthright advice on the following controversial or novel issues: using precision of estimation for inferences about population effects in preference to null-hypothesis testing, which is inadequate for assessing clinical or practical importance; justifying sample size via acceptable precision or confidence for clinical decisions rather than via adequate power for statistical significance; showing SD rather than SEM, to better communicate the magnitude of differences in means and nonuniformity of error; avoiding purely nonparametric analyses, which cannot provide inferences about magnitude and are unnecessary; using regression statistics in validity studies, in preference to the impractical and biased limits of agreement; making greater use of qualitative methods to enrich sample-based quantitative projects; and seeking ethics approval for public access to the depersonalized raw data of a study, to address the need for more scrutiny of research and better meta-analyses. Advice on less contentious issues includes the following: using covariates in linear models to adjust for confounders, to account for individual differences, and to identify potential mechanisms of an effect; using log transformation to deal with nonuniformity of effects and error; identifying and deleting outliers; presenting descriptive, effect, and inferential statistics in appropriate formats; and contending with bias arising from problems with sampling, assignment, blinding, measurement error, and researchers' prejudices. This article should advance the field by stimulating debate, promoting innovative approaches, and serving as a useful checklist for authors, reviewers, and editors.

6,467 citations


Cites background from "Measures of reliability in sports m..."

  • ...Arguments have also been presented against the use of limits of agreement as a measure of reliability (13)....

    [...]

Journal ArticleDOI
TL;DR: In this review, the basics of classic reliability theory are addressed in the context of choosing and interpreting an ICC and how the SEM and its variants can be used to construct confidence intervals for individual scores and to determine the minimal difference needed to be exhibited for one to be confident that a true change in performance of an individual has occurred.
Abstract: Reliability, the consistency of a test or measurement, is frequently quantified in the movement sciences literature. A common metric is the intraclass correlation coefficient (ICC). In addition, the SEM, which can be calculated from the ICC, is also frequently reported in reliability studies. However, there are several versions of the ICC, and confusion exists in the movement sciences regarding which ICC to use. Further, the utility of the SEM is not fully appreciated. In this review, the basics of classic reliability theory are addressed in the context of choosing and interpreting an ICC. The primary distinction between ICC equations is argued to be one concerning the inclusion (equations 2,1 and 2,k) or exclusion (equations 3,1 and 3,k) of systematic error in the denominator of the ICC equation. Inferential tests of mean differences, which are performed in the process of deriving the necessary variance components for the calculation of ICC values, are useful to determine if systematic error is present. If so, the measurement schedule should be modified (removing trials where learning and/or fatigue effects are present) to remove systematic error, and ICC equations that only consider random error may be safely used. The use of ICC values is discussed in the context of estimating the effects of measurement error on sample size, statistical power, and correlation attenuation. Finally, calculation and application of the SEM are discussed. It is shown how the SEM and its variants can be used to construct confidence intervals for individual scores and to determine the minimal difference needed to be exhibited for one to be confident that a true change in performance of an individual has occurred.

3,992 citations


Cites background or methods from "Measures of reliability in sports m..."

  • ...Hopkins (26) argues that because the 1-way model combines influences of random and systematic error together, ‘‘The resulting statistic is biased high and is hard to interpret because the relative contributions of random error and changes in the mean are unknown....

    [...]

  • ...As it turns out, when there are 2 levels of trials (as in the examples herein), the SEM is equal to the SDd divided by Ï2 (17, 26):...

    [...]

  • ...The SEM can be estimated as the square root of the mean square error term from the ANOVA (20, 26, 48)....

    [...]

  • ...Hopkins (26) refers to this as the ‘‘typical error....

    [...]

  • ...For physical performance measures, it is common that the absolute error tends to be larger for subjects who score higher (2, 26), e....

    [...]

Journal ArticleDOI
TL;DR: These commonly used gait performance tests are highly reliable and can be recommended to evaluate improvements in various aspects of gaitperformance in individuals with chronic mild to moderate hemiparesis after stroke.
Abstract: Objective: To assess the reliability of 6 gait performance tests in individuals with chronic mild to moderate post-stroke hemiparesis. Design: An intra-rater (between occasions) test-retest reliability study. Subjects: Fifty men and women (mean age 58 6.4 years) 6–46 months post-stroke. Methods: The Timed “Up & Go” test, the Comfortable and the Fast Gait Speed tests, the Stair Climbing ascend and descend tests and the 6-Minute Walk test were assessed 7 days apart. Reliability was evaluated with the intraclass correlation coefficient (ICC 2,1), the Bland & Altman analysis, the standard error of measurement (SEM and SEM%) and the smallest real difference (SRD and SRD%). Results: Test-retest agreements were high (ICC2,1 0.94–0.99) with no discernible systematic differences between the tests. The standard error of measurement (SEM%), representing the smallest change that indicates a real (clinical) improvement for a group of individuals, was small (9%). The smallest real difference (SRD%), representing the smallest change that indicates a real (clinical) improvement for a single individual, was also small (13–23%). Conclusion: These commonly used gait performance tests are highly reliable and can be recommended to evaluate improvements in various aspects of gait performance in individuals with chronic mild to moderate hemiparesis after stroke.

1,001 citations


Cites background from "Measures of reliability in sports m..."

  • ...It has been recommended that the sample size of test-retest reliability studies should be at least 30, and preferably 50 (14, 42)....

    [...]

Journal ArticleDOI
TL;DR: It can be concluded that CMJ and SJ, measured by means of contact mat and digital timer, are the most reliable and valid field tests for the estimation of explosive power of the lower limbs in physically active men.
Abstract: The primary aim of this study was to determine reliability and factorial validity of squat (SJ) and countermovement jump (CMJ) tests. The secondary aim was to compare 3 popular methods for the estimation of vertical jumping height. Physical education students (n = 93) performed 7 explosive power tests: 5 different vertical jumps (Sargent jump, Abalakow's jump with arm swing and without arm swing, SJ, and CMJ) and 2 horizontal jumps (standing long jump and standing triple jump). The greatest reliability among all jumping tests (Cronbach's alpha = 0.97 and 0.98) had SJ and CMJ. The reliability alpha coefficients for other jumps were also high and varied between 0.93 and 0.96. Within-subject variation (CV) in jumping tests ranged between 2.4 and 4.6%, the values being lowest in both horizontal jumps and CMJ. Factor analysis resulted in the extraction of only 1 significant principal component, which explained 66.43% of the variance of all 7 jumping tests. Since all jumping tests had high correlation coefficients with the principal component (r = 0.76-0.87), it was interpreted as the explosive power factor. The CMJ test showed the highest relationship with the explosive power factor (r = 0.87), that is, the greatest factorial validity. Other jumping tests had lower but relatively homogeneous correlation with the explosive power factor extracted. Based on the results of this study, it can be concluded that CMJ and SJ, measured by means of contact mat and digital timer, are the most reliable and valid field tests for the estimation of explosive power of the lower limbs in physically active men.

879 citations


Cites background or methods from "Measures of reliability in sports m..."

  • ...Within-subject variation for all tests was determined by calculating coefficient of variation (CV) as outlined by Hopkins (9)....

    [...]

  • ...According to Hopkins (9), reasonable precision for estimates of reliability requires approximately 50 study participants and at least 3 trials....

    [...]

References
More filters
Book
01 Dec 1969
TL;DR: The concepts of power analysis are discussed in this paper, where Chi-square Tests for Goodness of Fit and Contingency Tables, t-Test for Means, and Sign Test are used.
Abstract: Contents: Prefaces. The Concepts of Power Analysis. The t-Test for Means. The Significance of a Product Moment rs (subscript s). Differences Between Correlation Coefficients. The Test That a Proportion is .50 and the Sign Test. Differences Between Proportions. Chi-Square Tests for Goodness of Fit and Contingency Tables. The Analysis of Variance and Covariance. Multiple Regression and Correlation Analysis. Set Correlation and Multivariate Methods. Some Issues in Power Analysis. Computational Procedures.

115,069 citations


"Measures of reliability in sports m..." refers background in this paper

  • ...Cohen J. Statistical power analysis for the behavioral sciences....

    [...]

  • ...When interest centres on experiments involving the average person in a population, Cohen[13] argued that clinical judgement should be guided by the spread of raw scores (not change scores) in the population, and suggested that the smallest worthwhile value of d is 0....

    [...]

  • ...When interest centres on experiments involving the average person in a population, Cohen[13] argued that clinical judgement should be guided by the spread of raw scores (not change scores) in the population, and suggested that the smallest worthwhile value of d is 0.2 of the between-subject standard deviation....

    [...]

Journal ArticleDOI
TL;DR: An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.

43,884 citations


"Measures of reliability in sports m..." refers background or methods in this paper

  • ...Bland and Altman,[4] the researchers who devised this measure, realised that the difference scores between trials give a good indication of the reliability of the test....

    [...]

  • ...analysis of trials, a simple but equivalent method is to plot each participant’s difference score against the mean for the 2 trials.[4] If the residuals for one group of participants are clearly different from another, or if the residuals or difference scores show...

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors present guidelines for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges, and the confidence intervals for each of the forms are reviewed.
Abstract: Reliability coefficients often take the form of intraclass correlation coefficients. In this article, guidelines are given for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges. Relevant to the choice of the coefficient are the appropriate statistical model for the reliability and the application to be made of the reliability results. Confidence intervals for each of the forms are reviewed.

21,185 citations


"Measures of reliability in sports m..." refers background or methods in this paper

  • ...Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability....

    [...]

  • ...The appropriate correlation is the intraclass correlation ICC(2,1) of Shrout and Fleiss....

    [...]

  • ...For example, Kovaleski and co-workers[8] cited the classic Shrout and Fleiss paper on reliability[9] to support...

    [...]

  • ...For example, Kovaleski and co-workers[8] cited the classic Shrout and Fleiss paper on reliability[9] to support their claim that a clinically acceptable correlation was 0.75[8] or 0.80....

    [...]

  • ...[10] It turns out that Shrout and Fleiss[9] did not assess the utility of magnitudes of retest correlations....

    [...]

Journal ArticleDOI
TL;DR: In this article, an alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability, which is often used in clinical comparison of a new measurement technique with an established one.

9,160 citations

Trending Questions (3)
What are the reliability?

The paper discusses three important measures of reliability: within-subject variation, change in the mean, and retest correlation.

What is reliability/?

Reliability refers to the reproducibility of values in repeated trials. It is measured by within-subject random variation, systematic change in the mean, and retest correlation.