scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Reliability, repeatability and reproducibility: analysis of measurement errors in continuous variables

01 Apr 2008-Ultrasound in Obstetrics & Gynecology (Wiley)-Vol. 31, Iss: 4, pp 466-475
TL;DR: The general concepts of agreement and reliability are distinguished to aid researchers in considering which are relevant for their particular application, and the fact that reliability depends on the population in which measurements are made, and not just on the measurement errors of the measurement method is highlighted.
Abstract: Clinical practice involves measuring quantities for a variety of purposes, such as aiding diagnosis, predicting future patient outcomes, and serving as endpoints in studies or randomized trials. Measurements are almost always prone to various sorts of errors, which cause the measured value to differ from the true value; accordingly, studies investigating measurement error frequently appear in this and other journals. The importance of measurement error depends upon the context in which the measurements in question are to be used. For example, a certain degree of measurement error may be acceptable if measurements are to be used as an outcome in a comparative study such as a clinical trial, but the same measurement errors may be unacceptably large to make measurements usable in individual patient management, such as screening or risk prediction. In the past 20 years many papers have been published advocating how studies of measurement error should be analyzed, with a paper by Bland and Altman1 being one of the most cited and well known examples. There has been much controversy concerning the choice of parameter to be estimated and reported, and consequently confusion surrounding the meaning and interpretation of results from studies investigating measurement error. In this paper we first distinguish between the general concepts of agreement and reliability to aid researchers in considering which are relevant for their particular application. We then review the statistical methods that can be used to investigate and quantify agreement and reliability, dealing separately with the different types of measurement error study, while emphasizing the largely common techniques that should be used for data analysis. We reiterate that the judgment of whether agreement or reliability are acceptable must be related to the clinical application, and cannot be proven by a statistical test. We highlight the fact that reliability depends on the population in which measurements are made, and not just on the measurement errors of the measurement method. We discuss the advantages of method comparison studies making at least two measurements with each measurement method on each subject. A key advantage is that the cause of a correlation between paired differences and means in the so-called Bland–Altman plot can be determined, in contrast to when only a single measurement is made with each method. Throughout the paper, we try to emphasize that calculated values of agreement and reliability from measurement error studies are estimates of parameters, and as such we should report such estimates with CIs to indicate the uncertainty with which they have been estimated. We restrict our attention to measurements of a continuous quantity; alternative methods are required for categorical data2.
Citations
More filters
Journal ArticleDOI
TL;DR: The results suggest that SPM12 TIV estimates are an acceptable substitute for labour-intensive manual estimates even in the challenging context of multiple centres and the presence of neurodegenerative pathology.

388 citations


Cites background from "Reliability, repeatability and repr..."

  • ...This cannot be disambiguated without repeated measurements (Bartlett and Frost, 2008; Dunn and Roberts, 1999)....

    [...]

Journal ArticleDOI
TL;DR: This review focuses on the lack of comprehensive information about the factors influencing the use of IRT in humans, and proposes a comprehensive classification in three primary groups: environmental, individual and technical factors.

366 citations

Book ChapterDOI
TL;DR: Luminescent ratiometric thermometers combining high spatial and temporal resolution at the micro-and nanoscale, where the conventional methods are ineffective, have emerged over the last decade as an effervescent field of research, essentially motivated by their potential applications in nanotechnology, photonics, and biomedicine as discussed by the authors.
Abstract: Luminescent ratiometric thermometers combining high spatial and temporal resolution at the micro- and nanoscale, where the conventional methods are ineffective, have emerged over the last decade as an effervescent field of research, essentially motivated by their potential applications in nanotechnology, photonics, and biomedicine. Among the distinct luminescent thermal probes, lanthanide-based materials play a central role in the field due to their unique thermometric response and intriguing emission features (eg, high quantum yield, narrow bandwidth, long-lived emission, large Stokes shifts, and ligand-dependent luminescence sensitization). This chapter offers a general overview of recent examples of single- and dual-center Ln3 +-based thermometers, emphasizing those working at nanometric scale, being focused on how to quantify their performance accordingly to the relevant parameters: relative sensitivity, temperature uncertainty, spatial and temporal resolution, repeatability (or test–retest reliability), and reproducibility. The emission mechanisms supporting single- and dual-center emissions are reviewed, together with the advantages and limitations of each approach. Illustrative examples of the rich variety of systems designed and developed to sense temperature are provided and explored. Finally, we discuss the challenges and opportunities in the development of highly sensitive luminescent ratiometric thermometers that are currently facing the scientists in this exciting research field.

330 citations

Journal ArticleDOI
TL;DR: This paper presents a helpful tool for readers who want to evaluate or assess the quality of a measurement instrument on reliability and validity using standardised criteria that were recently published by the COSMIN group.
Abstract: High quality instruments are useful tools for clinical and research purposes. To determine whether an instrument has high quality, measurement properties such as reliability and validity need to be assessed, using standardised criteria. This paper discusses these quality domains and measurement properties using the standardised criteria that were recently published by the COSMIN group. Examples are given of studies evaluating the measurement properties of instruments frequently used in trauma. This paper presents a helpful tool for readers who want to evaluate or assess the quality of a measurement instrument on reliability and validity.

207 citations

Journal ArticleDOI
TL;DR: The simulation of such realistic conditions does not only require thorough reporting of variability between observers and techniques; it also requires a sufficiently high number of observers.
Abstract: We believe that our readers are interested in investigations describing physicians’ performance of specific techniques under reasonably realistic conditions. The simulation of such realistic conditions, however, does not only require thorough reporting of variability between observers and techniques; it also requires a sufficiently high number of observers.

153 citations

References
More filters
Journal ArticleDOI
TL;DR: An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.

43,884 citations

Book
01 Jan 1993
TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
Abstract: This article presents bootstrap methods for estimation, using simple arguments. Minitab macros for implementing these methods are given.

37,183 citations

Journal Article

17,468 citations

Journal ArticleDOI
TL;DR: The 95% limits of agreement, estimated by mean difference 1.96 standard deviation of the differences, provide an interval within which 95% of differences between measurements by the two methods are expected to lie.
Abstract: Agreement between two methods of clinical measurement can be quantified using the differences between observations made using the two methods on the same subjects. The 95% limits of agreement, estimated by mean difference +/- 1.96 standard deviation of the differences, provide an interval within which 95% of differences between measurements by the two methods are expected to lie. We describe how graphical methods can be used to investigate the assumptions of the method and we also give confidence intervals. We extend the basic approach to data where there is a relationship between difference and magnitude, both with a simple logarithmic transformation approach and a new, more general, regression approach. We discuss the importance of the repeatability of each method separately and compare an estimate of this to the limits of agreement. We extend the limits of agreement approach to data with repeated measurements, proposing new estimates for equal numbers of replicates by each method on each subject, for unequal numbers of replicates, and for replicated data collected in pairs, where the underlying value of the quantity being measured is changing. Finally, we describe a nonparametric approach to comparing methods.

7,976 citations