Measurement in Medicine: A Practical Guide

Home
/
Papers
/
Measurement in Medicine: A Practical Guide

Book•

Measurement in Medicine: A Practical Guide

H.C.W. (Riekie) de Vet, Caroline B. Terwee, Lidwine B. Mokkink, Dirk L. Knol

01 Aug 2011-

TL;DR: This chapter discusses the development of a measurement instrument, field testing - item reduction and data structure, and systematic reviews of measurement properties Index.

read less

Abstract: 1. Introduction 2. Concepts, theories and models, and types of measurements 3. The development of a measurement instrument 4. Field testing - item reduction and data structure 5. Reliability 6. Validity 7. Responsiveness 8. Interpretation 9. Systematic reviews of measurement properties Index.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist

[...]

Caroline B. Terwee¹, Lidwine B. Mokkink¹, Dirk L. Knol¹, Raymond W. J. G. Ostelo², Lex M. Bouter², Henrica C.W. de Vet¹ - Show less +2 more•Institutions (2)

VU University Medical Center¹, VU University Amsterdam²

01 May 2012-Quality of Life Research

TL;DR: The COSMIN checklist with the proposed scoring system seems to be a useful tool for assessing the methodological quality of studies included in systematic reviews of measurement properties.

...read moreread less

Abstract: Background The COSMIN checklist is a standardized tool for assessing the methodological quality of studies on measurement properties. It contains 9 boxes, each dealing with one measurement property, with 5–18 items per box about design aspects and statistical methods. Our aim was to develop a scoring system for the COSMIN checklist to calculate quality scores per measurement property when using the checklist in systematic reviews of measurement properties.

...read moreread less

1,502 citations

Journal Article•DOI•

COSMIN guideline for systematic reviews of patient-reported outcome measures

[...]

Cecilia A.C. Prinsen, Lidwine B. Mokkink¹, Lex M. Bouter¹, Jordi Alonso, Donald L. Patrick², H.C.W. de Vet¹, Caroline B. Terwee¹ - Show less +3 more•Institutions (2)

Public Health Research Institute¹, University of Washington²

12 Feb 2018-Quality of Life Research

TL;DR: The COSMIN guideline for systematic reviews of PROMs includes methodology to combine the methodological quality of studies on measurement properties with the quality of the PROM itself (i.e., its measurement properties).

...read moreread less

Abstract: Systematic reviews of patient-reported outcome measures (PROMs) differ from reviews of interventions and diagnostic test accuracy studies and are complex. In fact, conducting a review of one or more PROMs comprises of multiple reviews (i.e., one review for each measurement property of each PROM). In the absence of guidance specifically designed for reviews on measurement properties, our aim was to develop a guideline for conducting systematic reviews of PROMs. Based on literature reviews and expert opinions, and in concordance with existing guidelines, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) steering committee developed a guideline for systematic reviews of PROMs. A consecutive ten-step procedure for conducting a systematic review of PROMs is proposed. Steps 1–4 concern preparing and performing the literature search, and selecting relevant studies. Steps 5–8 concern the evaluation of the quality of the eligible studies, the measurement properties, and the interpretability and feasibility aspects. Steps 9 and 10 concern formulating recommendations and reporting the systematic review. The COSMIN guideline for systematic reviews of PROMs includes methodology to combine the methodological quality of studies on measurement properties with the quality of the PROM itself (i.e., its measurement properties). This enables reviewers to draw transparent conclusions and making evidence-based recommendations on the quality of PROMs, and supports the evidence-based selection of PROMs for use in research and in clinical practice.

...read moreread less

1,321 citations

Journal Article•DOI•

COSMIN Risk of Bias checklist for systematic reviews of Patient-Reported Outcome Measures

[...]

Lidwine B. Mokkink¹, H.C.W. de Vet², Cecilia A.C. Prinsen², Donald L. Patrick³, J. Alonso, Lex M. Bouter⁴, Caroline B. Terwee² - Show less +3 more•Institutions (4)

VU University Medical Center¹, Public Health Research Institute², University of Washington³, VU University Amsterdam⁴

01 May 2018-Quality of Life Research

TL;DR: The COSMIN Risk of Bias checklist was developed exclusively for use in systematic reviews of PROMs to distinguish this application from other purposes of assessing the methodological quality of studies on measurement properties, such as guidance for designing or reporting a study on the measurement properties.

...read moreread less

Abstract: The original COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist was developed to assess the methodological quality of single studies on measurement properties of Patient-Reported Outcome Measures (PROMs). Now it is our aim to adapt the COSMIN checklist and its four-point rating system into a version exclusively for use in systematic reviews of PROMs, aiming to assess risk of bias of studies on measurement properties. For each standard (i.e., a design requirement or preferred statistical method), it was discussed within the COSMIN steering committee if and how it should be adapted. The adapted checklist was pilot-tested to strengthen content validity in a systematic review on the quality of PROMs for patients with hand osteoarthritis. Most important changes were the reordering of the measurement properties to be assessed in a systematic review of PROMs; the deletion of standards that concerned reporting issues and standards that not necessarily lead to biased results; the integration of standards on general requirements for studies on item response theory with standards for specific measurement properties; the recommendation to the review team to specify hypotheses for construct validity and responsiveness in advance, and subsequently the removal of the standards about formulating hypotheses; and the change in the labels of the four-point rating system. The COSMIN Risk of Bias checklist was developed exclusively for use in systematic reviews of PROMs to distinguish this application from other purposes of assessing the methodological quality of studies on measurement properties, such as guidance for designing or reporting a study on the measurement properties.

...read moreread less

1,038 citations

Cites result from "Measurement in Medicine: A Practica..."

...1 3 between groups [12], and compare the results found in the included studies to the hypotheses formulated by the review team....
[...]

Journal Article•DOI•

COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study

[...]

Caroline B. Terwee¹, Cecilia A.C. Prinsen², Alessandro Chiarotto², Marjan J. Westerman², Donald L. Patrick³, Jordi Alonso⁴, Lex M. Bouter, H.C.W. de Vet², Lidwine B. Mokkink² - Show less +5 more•Institutions (4)

VU University Medical Center¹, Public Health Research Institute², University of Washington³, Pompeu Fabra University⁴

17 Mar 2018-Quality of Life Research

TL;DR: The consensus-based COS MIN methodology for content validity is more detailed, standardized, and transparent than earlier published guidelines, including the previous COSMIN standards, and can contribute to the selection and use of high-quality PROMs in research and clinical practice.

...read moreread less

Abstract: Content validity is the most important measurement property of a patient-reported outcome measure (PROM) and the most challenging to assess. Our aims were to: (1) develop standards for evaluating the quality of PROM development; (2) update the original COSMIN standards for assessing the quality of content validity studies of PROMs; (3) develop criteria for what constitutes good content validity of PROMs, and (4) develop a rating system for summarizing the evidence on a PROM’s content validity and grading the quality of the evidence in systematic reviews of PROMs. An online 4-round Delphi study was performed among 159 experts from 21 countries. Panelists rated the degree to which they (dis)agreed to proposed standards, criteria, and rating issues on 5-point rating scales (‘strongly disagree’ to ‘strongly agree’), and provided arguments for their ratings. Discussion focused on sample size requirements, recording and field notes, transcribing cognitive interviews, and data coding. After four rounds, the required 67% consensus was reached on all standards, criteria, and rating issues. After pilot-testing, the steering committee made some final changes. Ten criteria for good content validity were defined regarding item relevance, appropriateness of response options and recall period, comprehensiveness, and comprehensibility of the PROM. The consensus-based COSMIN methodology for content validity is more detailed, standardized, and transparent than earlier published guidelines, including the previous COSMIN standards. This methodology can contribute to the selection and use of high-quality PROMs in research and clinical practice.

...read moreread less

837 citations

Journal Article•DOI•

The grounded psychometric development and initial validation of the Health Literacy Questionnaire (HLQ)

[...]

Richard H. Osborne¹, Roy Batterham¹, Gerald R. Elsworth¹, Melanie Hawkins¹, Rachelle Buchbinder - Show less +1 more•Institutions (1)

Deakin University¹

16 Jul 2013-BMC Public Health

TL;DR: The HLQ covers 9 conceptually distinct areas of health literacy to assess the needs and challenges of a wide range of people and organisations and is likely to be useful in surveys, intervention evaluation, and studies of theneeds and capabilities of individuals.

...read moreread less

Abstract: Health literacy has become an increasingly important concept in public health. We sought to develop a comprehensive measure of health literacy capable of diagnosing health literacy needs across individuals and organisations by utilizing perspectives from the general population, patients, practitioners and policymakers. Using a validity-driven approach we undertook grounded consultations (workshops and interviews) to identify broad conceptually distinct domains. Questionnaire items were developed directly from the consultation data following a strict process aiming to capture the full range of experiences of people currently engaged in healthcare through to people in the general population. Psychometric analyses included confirmatory factor analysis (CFA) and item response theory. Cognitive interviews were used to ensure questions were understood as intended. Items were initially tested in a calibration sample from community health, home care and hospital settings (N=634) and then in a replication sample (N=405) comprising recent emergency department attendees. Initially 91 items were generated across 6 scales with agree/disagree response options and 5 scales with difficulty in undertaking tasks response options. Cognitive testing revealed that most items were well understood and only some minor re-wording was required. Psychometric testing of the calibration sample identified 34 poorly performing or conceptually redundant items and they were removed resulting in 10 scales. These were then tested in a replication sample and refined to yield 9 final scales comprising 44 items. A 9-factor CFA model was fitted to these items with no cross-loadings or correlated residuals allowed. Given the very restricted nature of the model, the fit was quite satisfactory: χ 2 WLSMV(866 d.f.) = 2927, p<0.000, CFI = 0.936, TLI = 0.930, RMSEA = 0.076, and WRMR = 1.698. Final scales included: Feeling understood and supported by healthcare providers; Having sufficient information to manage my health; Actively managing my health; Social support for health; Appraisal of health information; Ability to actively engage with healthcare providers; Navigating the healthcare system; Ability to find good health information; and Understand health information well enough to know what to do. The HLQ covers 9 conceptually distinct areas of health literacy to assess the needs and challenges of a wide range of people and organisations. Given the validity-driven approach, the HLQ is likely to be useful in surveys, intervention evaluation, and studies of the needs and capabilities of individuals.

...read moreread less

794 citations

Cites background from "Measurement in Medicine: A Practica..."

...Traditional approaches to the development of measures of complex multi-dimensional phenomena include undertaking literature reviews, reviews of items and scales in previously developed measures, and undertaking qualitative interviews with the target population to define the constructs within a predefined theoretical model [46]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Cutoff criteria for fit indexes in covariance structure analysis : Conventional criteria versus new alternatives

[...]

Li-tze Hu, Peter M. Bentler¹•Institutions (1)

University of California, Los Angeles¹

01 Jan 1999-Structural Equation Modeling

TL;DR: In this article, the adequacy of the conventional cutoff criteria and several new alternatives for various fit indexes used to evaluate model fit in practice were examined, and the results suggest that, for the ML method, a cutoff value close to.95 for TLI, BL89, CFI, RNI, and G...

...read moreread less

Abstract: This article examines the adequacy of the “rules of thumb” conventional cutoff criteria and several new alternatives for various fit indexes used to evaluate model fit in practice. Using a 2‐index presentation strategy, which includes using the maximum likelihood (ML)‐based standardized root mean squared residual (SRMR) and supplementing it with either Tucker‐Lewis Index (TLI), Bollen's (1989) Fit Index (BL89), Relative Noncentrality Index (RNI), Comparative Fit Index (CFI), Gamma Hat, McDonald's Centrality Index (Mc), or root mean squared error of approximation (RMSEA), various combinations of cutoff values from selected ranges of cutoff criteria for the ML‐based SRMR and a given supplemental fit index were used to calculate rejection rates for various types of true‐population and misspecified models; that is, models with misspecified factor covariance(s) and models with misspecified factor loading(s). The results suggest that, for the ML method, a cutoff value close to .95 for TLI, BL89, CFI, RNI, and G...

...read moreread less

76,383 citations

Journal Article•DOI•

The measurement of observer agreement for categorical data

[...]

J. R. Landis¹, Gary G. Koch•Institutions (1)

University of Michigan¹

01 Mar 1977-Biometrics

TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.

...read moreread less

Abstract: This paper presents a general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies. The procedure essentially involves the construction of functions of the observed proportions which are directed at the extent to which the observers agree among themselves and the construction of test statistics for hypotheses involving these functions. Tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interobserver agreement are developed as generalized kappa-type statistics. These procedures are illustrated with a clinical diagnosis example from the epidemiological literature.

...read moreread less

64,109 citations

Journal Article•DOI•

Statistical methods for assessing agreement between two methods of clinical measurement.

[...]

J M Bland¹, J M Bland², Douglas G. Altman¹, Douglas G. Altman²•Institutions (2)

Northwick Park Hospital¹, St George's Hospital²

08 Feb 1986-The Lancet

TL;DR: An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.

...read moreread less

43,884 citations

Journal Article•DOI•

A Coefficient of agreement for nominal Scales

[...]

Jacob Cohen¹•Institutions (1)

York University¹

01 Apr 1960-Educational and Psychological Measurement

TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.

...read moreread less

Abstract: CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of measurement obtainable is nominal scaling (Stevens, 1951, pp. 2526), i.e. placement in a set of k unordered categories. Because the categorizing of the units is a consequence of some complex judgment process performed by a &dquo;two-legged meter&dquo; (Stevens, 1958), it becomes important to determine the extent to which these judgments are reproducible, i.e., reliable. The procedure which suggests itself is that of having two (or more) judges independently categorize a sample of units and determine the degree, significance, and

...read moreread less

34,965 citations

Journal Article•DOI•

Intraclass correlations: uses in assessing rater reliability.

[...]

Patrick E. Shrout¹, Joseph L. Fleiss•Institutions (1)

Columbia University¹

01 Mar 1979-Psychological Bulletin

TL;DR: In this article, the authors present guidelines for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges, and the confidence intervals for each of the forms are reviewed.

...read moreread less

Abstract: Reliability coefficients often take the form of intraclass correlation coefficients. In this article, guidelines are given for choosing among six different forms of the intraclass correlation for reliability studies in which n target are rated by k judges. Relevant to the choice of the coefficient are the appropriate statistical model for the reliability and the application to be made of the reliability results. Confidence intervals for each of the forms are reviewed.

...read moreread less

21,185 citations