scispace - formally typeset

Journal ArticleDOI

Quality criteria were proposed for measurement properties of health status questionnaires

01 Jan 2007-Journal of Clinical Epidemiology (Elsevier USA)-Vol. 60, Iss: 1, pp 34-42

TL;DR: The criteria can be used in systematic reviews of health status questionnaires, to detect shortcomings and gaps in knowledge of measurement properties, and to design validation studies.
Abstract: Objectives: Recently, an increasing number of systematic reviews have been published in which the measurement properties of health status questionnaires are compared. For a meaningful comparison, quality criteria for measurement properties are needed. Our aim was to develop quality criteria for design, methods, and outcomes of studies on the development and evaluation of health status questionnaires. Study Design and Setting: Quality criteria for content validity, internal consistency, criterion validity, construct validity, reproducibility, longitudinal validity, responsiveness, floor and ceiling effects, and interpretability were derived from existing guidelines and consensus within our research group. Results: For each measurement property a criterion was defined for a positive, negative, or indeterminate rating, depending on the design, methods, and outcomes of the validation study. Conclusion: Our criteria make a substantial contribution toward defining explicit quality criteria for measurement properties of health status questionnaires. Our criteria can be used in systematic reviews of health status questionnaires, to detect shortcomings and gaps in knowledge of measurement properties, and to design validation studies. The future challenge will be to refine and complete the criteria and to reach broad consensus, especially on quality criteria for good measurement properties. 2006 Elsevier Inc. All rights reserved.

Content maybe subject to copyright    Report

VU Research Portal
Quality criteria were proposed for measurement properties of health status
questionnaire
Terwee, Caroline B; Bot, Sandra D M; de Boer, Michael R; van der Windt, Daniëlle A W
M; Knol, Dirk L; Dekker, Joost; Bouter, Lex M; de Vet, Henrica C W
published in
Journal of Clinical Epidemiology
2007
DOI (link to publisher)
10.1016/j.jclinepi.2006.03.012
document version
Publisher's PDF, also known as Version of record
Link to publication in VU Research Portal
citation for published version (APA)
Terwee, C. B., Bot, S. D. M., de Boer, M. R., van der Windt, D. A. W. M., Knol, D. L., Dekker, J., Bouter, L. M., &
de Vet, H. C. W. (2007). Quality criteria were proposed for measurement properties of health status
questionnaire. Journal of Clinical Epidemiology, 60(1), 34-42. https://doi.org/10.1016/j.jclinepi.2006.03.012
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal ?
Take down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
E-mail address:
vuresearchportal.ub@vu.nl
Download date: 10. Aug. 2022

Quality criteria were proposed for measurement properties
of health status questionnaires
Caroline B. Terwee
a,
*
, Sandra D.M. Bot
a
, Michael R. de Boer
a,b
,
Danie
¨
lle A.W.M. van der Windt
a,c
, Dirk L. Knol
a,d
, Joost Dekker
a,e
,
Lex M. Bouter
a
, Henrica C.W. de Vet
a
a
EMGO Institute, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands
b
Department of Ophthalmology, VU University Medical Center, Amsterdam, The Netherlands
c
Department of General Practice, VU University Medical Center, Amsterdam, The Netherlands
d
Department of Clinical Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands
e
Department of Rehabilitation Medicine, VU University Medical Center, Amsterdam, The Netherlands
Accepted 29 March 2006
Abstract
Objectives: Recently, an increasing number of systematic reviews have been published in which the measurement properties of health
status questionnaires are compared. For a meaningful comparison, quality criteria for measurement properties are needed. Our aim was to
develop quality criteria for design, methods, and outcomes of studies on the development and evaluation of health status questionnaires.
Study Design and Setting: Quality criteria for content validity, internal consistency, criterion validity, construct validity, reproducibil-
ity, longitudinal validity, responsiveness, floor and ceiling effects, and interpretability were derived from existing guidelines and consensus
within our research group.
Results: For each measurement property a criterion was defined for a positive, negative, or indeterminate rating, depending on the de-
sign, methods, and outcomes of the validation study.
Conclusion: Our criteria make a substantial contribution toward defining explicit quality criteria for measurement properties of health
status questionnaires. Our criteria can be used in systematic reviews of health status questionnaires, to detect shortcomings and gaps in
knowledge of measurement properties, and to design validation studies. The future challenge will be to refine and complete the criteria
and to reach broad consensus, especially on quality criteria for good measurement properties. Ó 2006 Elsevier Inc. All rights reserved.
Keywords: Reproducibility; Reliability; Validity; Responsiveness; Guidelines; Criteria
1. Introduction
The number of available health status questionnaires has
increased dramatically over the past decades. Consequently,
the choice of which questionnaire to use is becoming a
major difficulty. Recently a large number of systematic
reviews have been published of available questionnaires
measuring a specific concept in a specific population, for
example [1e11]. In these systematic reviews, typically,
the content and measurement properties of the available
questionnaires are compared. In analogy to systematic
reviews of clinical trials, criteria are needed to determine
the methodological quality of studies on the development
and evaluation of health status questionnaires. In addition,
criteria for good measurement properties are needed to
legitimize what the best questionnaire is.
Several articles offer criteria for the evaluation of ques-
tionnaires. Probably the best-known and most comprehen-
sive criteria are those from the Scientific Advisory
Committee (SAC) of the Medical Outcomes Trust [12].The
SAC defined eight attributes of instrument properties that
warrant consideration in evaluation. These include (1) con-
ceptual and measurement model, (2) validity, (3) reliability,
(4) responsiveness, (5) interpretability, (6) respondent and
administrative burden, (7) alternative forms, and (8) cultural
and language adaptations (translations). Within each of these
attributes, specific criteria were defined by which instru-
ments should be reviewed. Similar criteria have been defined,
e.g., by Bombardier and Tugwell [13], Andresen [14], and
McDowell and Jenkinson [15]. What is often lacking in these
criteria, however, are explicit criteria for what constitutes
good measurement properties. For example, for the
* Corresponding author. Tel.: þ31-20-4448187; fax: þ31-20-4446775.
E-mail address: cb.terwee@vumc.nl (C.B. Terwee).
0895-4356/06/$ e see front matter Ó 2006 Elsevier Inc. All rights reserved.
doi: 10.1016/j.jclinepi.2006.03.012
Journal of Clinical Epidemiology 60 (2007) 34e42

assessment of validity it is often recommended that hypoth-
eses about expected results should be tested, but no criteria
have been defined for how many hypotheses should be con-
firmed to justify that a questionnaire has good validity. No
criteria have been defined for what constitutes good agree-
ment (acceptable measurement error), good responsiveness,
or good interpretability, and no criteria have been defined
for the required sample size of studies assessing measure-
ment properties.
As suggested by the SAC [12], we took on the challenge
to further discuss and refine the available quality criteria for
studies on the development and evaluation of health status
questionnaires, including explicit criteria for the following
measurement properties: (1) content validity, (2) internal
consistency, (3) criterion validity, (4) construct validity,
(5) reproducibility, (6) responsiveness, (7) floor and ceiling
effects, and (8) interpretability. We used our criteria in two
systematic reviews comparing the measurement properties
of questionnaires for shoulder disability [1] and for visual
functioning [4], and revised them based on our experiences
in these reviews. Our criteria can also be used to detect
shortcomings and gaps in knowledge of measurement
properties, and to design validation studies.
In this article we define our quality criteria for measure-
ment properties, discuss the difficult and sometimes arbi-
trary choices we made, and indicate future challenges.
We emphasize that, just like the criteria offered by the
SAC and others, our criteria are open to further discussion
and refinement. Our aim is to contribute to the development
of explicit quality criteria for the design, methods, and out-
comes of studies on the development and evaluation of
health status questionnaires.
2. Content vali dity
Content validity examines the extent to which the con-
cepts of interest are comprehensively represented by the
items in the questionnaire [16]. To be able to rate the
quality of a questionnaire, authors should provide a clear
description of the following aspects regarding the develop-
ment of a questionnaire:
e Measurement aim of the questionnaire, i.e., discrimi-
native, evaluative, or predictive [17]. The measure-
ment aim is important, because different items may
be valid for different aims. For example, a question
on stiffness could be a valid item of a discriminative
questionnaire used to measure the impact of osteoar-
thritis on quality of life (to distinguish between pa-
tients with different levels of quality of life), but
would be considered invalid for an evaluative ques-
tionnaire used as an outcome measure in a pain med-
ication trial, because it is unlikely to be changed by
pain medication.
e Target population, i.e., the population for which the
questionnaire was developed. This is important to
judge the relevance and comprehensiveness of the
items. For example, a questionnaire developed to
measure functional status of patients with shoulder
problems may be less valid to measure functional sta-
tus of patients with wrist/hand problems, because
some items may be less relevant for these patients
(e.g., lifting above shoulder level), whereas important
items for patients with wrist/hand problems may be
missing (e.g., buttoning a shirt). The relevance of
items may also depend on disease severity. An ade-
quate description of the target population is therefore
important for judging the comprehensiveness and the
applicability of the questionnaire in (other) populations.
e Concepts that the questionnaire is intended to measure.
To judge the suitability of a questionnaire for a specific
purpose, it is important that authors provide a clear
framework of what the overall concept to be measured
is. Relevant concepts can be defined in terms of symp-
toms; functioning (physical, psychological, and so-
cial); general health perceptions; or overall quality of
life [18]. These different outcome levels should clearly
be distinguished and measured by separate subscales.
For physical functioning it is important to distinguish
between capacity (what a patient thinks he can do)
and performance (what a patient actually does).
e Item selection and item reduction. The methods for
item selection, item reduction, and the execution of
a pilot study to examine the readability and compre-
hension should be justified and reported. Items in the
questionnaire must reflect areas that are important to
the target population that is being studied. Therefore,
the target population should be involved during item
selection. In some guidelines it is recommended that
developers start with a large number of items and apply
item reduction techniques to select a small number of
final items. This strategy, however, does not guarantee
a better content validity, because a comprehensive set
of items can also be achieved without item reduction.
Therefore, we do not consider this to be mandatory.
e Interpretability of the items. Completing the question-
naire should not require reading skills beyond that of
a 12-year-old to avoid missing values and unreliable
answers [19]. That means that items should be short
and simple and should not contain difficult words or
jargon terms. Moreover, items should not consist of
two questions at the same time [19]. Furthermore,
the time period to which the questions refer should
be clearly stated and justified.
We give a positive rating for content validity if a clear
description is provided of the measurement aim, the target
population, the concepts that are being measured, and the
item selection. Furthermore, the target population should
have been involved during item selection, as well as either
investigators or experts. If a clear description is lacking,
content validity is rated as indeterminate.
35C.B. Terwee et al. / Journal of Clinical Epidemiology 60 (2007) 34e42

3. Internal consistency
Internal consistency is a measure of the extent to which
items in a questionnaire (sub)scale are correlated (homoge-
neous), thus measuring the same concept. Internal consis-
tency is an important measurement property for
questionnaires that intend to measure a single underlying
concept (construct) by using multiple items. In contrast,
for questionnaires in which the items are merely different
aspects of a complex clinical phenomenon that do not have
to be correlated, such as in the Apgar Scale [20], internal
consistency is not relevant [21,22].
An internally consistent (homogeneous or unidimen-
sional) scale is achieved through good construct definitions,
good items, then principal component analysis or exploratory
factor analysis, followed by confirmatory factor analysis.
When internal consistency is relevant, principal component
analysis or factor analysis should be applied to determine
whether the items form only one overall scale (dimension)
or more than one [23,24]. In case that there is no prior hypoth-
esis regarding the dimensionality of a questionnaire, explor-
atory principal component analysis or factor analyses can be
applied. But if there is a clear hypothesis regarding the factor
structure, e.g., because of an existing theoretical model or be-
cause the factor structure has been determined previously,
confirmatory factor analysis should be used [25,26].The
number of subjects included in a factor analysis is a matter
of debate. Rules-of-thumb vary from four to 10 subjects
per variable, with a minimum number of 100 subjects to en-
sure stability of the varianceecovariance matrix [27].
After determining the number of (homogeneous) (sub)-
scales, Cronbach’s alpha should be calculated for each
(sub)scale separately. Cronbach’s alpha is considered an ad-
equate measure of internal consistency. A low Cronbach’s
alpha indicates a lack of correlation between the items in
a scale, which makes summarizing the items unjustified.
A very high Cronbach’s alpha indicates high correlations
among the items in the scale, i.e., redundancy of one or
more items. Furthermore, a very high Cronbach’s alpha is
usually found for scales with a large number of items, be-
cause Cronbach’s alpha is dependent upon the number of
items in a scale. Note that Cronbach’s alpha gives no infor-
mation on the number of subscales in a questionnaire, be-
cause alpha can be high when two or more subscales
with high alphas are combined. Nunnally and Bernstein
[28] proposed a criterion of 0.70e0.90 as a measure of
good internal consistency. In our experience, however,
manydin our view goodd(subscales of) questionnaires
have higher Cronbach’s alphas. We give a positive rating
for internal consistency when factor analysis was applied
and Cronbach’s alpha is between 0.70 and 0.95.
4. Criterion validity
Criterion validity refers to the extent to which scores on
a particular instrument relate to a gold standard. We give
a positive rating for criterion validity if convincing argu-
ments are presented that the used standard really is ‘gold’
and if the correlation with the gold standard is at least 0.70.
5. Construct validity
Construct validity refers to the extent to which scores on
a particular instrument relate to other measures in a manner
that is consistent with theoretically derived hypotheses con-
cerning the concepts that are being measured [17,19]. Con-
struct validity should be assessed by testing predefined
hypotheses (e.g., about expected correlations between mea-
sures or expected differences in scores between ‘known’
groups). These hypotheses need to be as specific as possi-
ble. Without specific hypotheses, the risk of bias is high be-
cause retrospectively it is tempting to think up alternative
explanations for low correlations instead of concluding that
the questionnaire may not be valid. We therefore give a pos-
itive rating for construct validity if hypotheses are specified
in advance and at least 75% of the results are in correspon-
dence with these hypotheses, in (sub)groups of at least 50
patients.
6. Reproducibility
Reproducibility concerns the degree to which repeated
measurements in stable persons (testeretest) provide simi-
lar answers. We believe that it is important to make a dis-
tinction between reliability and agreement [29,30].
Agreement concerns the absolute measurement error, i.e.,
how close the scores on repeated measures are, expressed
in the unit of the measurement scale at issue. Small mea-
surement error is required for evaluative purposes in which
one wants to distinguish clinically important changes from
measurement error. Reliability concerns the degree to
which patients can be distinguished from each other, de-
spite measurement error [19]. High reliability is important
for discriminative purposes if one wants to distinguish
among patients, e.g., with more or less severe disease (as
in diagnostic applications). Reliability coefficients (intra-
class correlation coefficients (ICC)) concern the variation
in the population (interindividual variation) divided by
the total variation, which is the interindividual variation
plus the intraindividual variation (measurement error),
expressed as a ratio between 0 and 1.
The time period between the repeated administrations
should be long enough to prevent recall, though short
enough to ensure that clinical change has not occurred. Of-
ten, 1 or 2 weeks will be appropriate, but there could be
reasons to choose otherwise. Therefore, we do not rate
the appropriateness of the time period, but only require that
this time period is described and justified.
6.1. Agreement
The measurement error can be adequately expressed
as the standard error of measurement (SEM) [30]. The
36 C.B. Terwee et al. / Journal of Clinical Epidemiology 60 (2007) 34e42

SEM equals the square root of the error variance of an AN-
OVA analysis, either including systematic differences
(SEM
agreement
) or excluding them (SEM
consistency
). Many
authors fail to describe how they calculated the SEM. We
believe that systematic differences should be considered
part of the measurement error, because we want to distin-
guish them from ‘real’ changes, e.g., due to treatment.
Therefore, we prefer SEM
agreement
. The SEM can be con-
verted into the smallest detectable change
(SDC 5 1.96 O2 SEM), which reflects the smallest
within-person change in score that, with P ! 0.05, can be
interpreted as a ‘real’ change, above measurement error,
in one individual (SDC
ind
) [31,32]. The SDC measurable
in a group of people (SDC
group
) can be calculated by
dividing the SDC
ind
by On [32,33].
Another adequate parameter of agreement is described
by Bland and Altman [34]. Their limits of agreement equal
the mean change in scores of repeated measurements
(mean
change
) 6 1.96 standard deviation of these changes
(SD
change
). The limits of agreement are often reported
because they are easily interpretable. Note that SD
change
equals O2 SEM
consistency
.
For evaluative purposes, the absolute measurement error
should be smaller than the minimal amount of change in the
(sub)scale that is considered to be important (minimal im-
portant change (MIC)). Therefore, the MIC of a (sub)scale
should be defined (see under interpretability).
We give a positive rating for agreement if the SDC
(SDC
ind
for application in individuals and SDC
group
for use
in groups) or the limits of agreement (upper or lower limit,
depending on whether the interest is in improvement or de-
terioration) are smaller than the MIC. Because this is a rela-
tively new approach and not yet commonly presented, we
also give a positive rating if authors provide convincing argu-
ments (e.g., based on their experience with the interpretation
of the questionnaire scores) that the agreement is acceptable.
In both cases, we consider a sample size of at least 50 pa-
tients adequate for the assessment of the agreement parame-
ter, based on a general guideline by Altman [35].
6.2. Reliability
The ICC is the most suitable and most commonly used
reliability parameter for continuous measures. Many au-
thors fail to describe which ICC they have used, e.g., an
ICC for consistency (ICC
consistency
) or an ICC for agree-
ment (ICC
agreement
) [19,36]. Because systematic differences
are considered to be part of the measurement error,
ICC
agreement
(two-way random effects model, or ICC
(A,1) according to McGraw and Wong [36]) is preferred.
The Pearson correlation coefficient is inadequate, because
systematic differences are not taken into account [19].
For ordinal measures, the weighted Cohen’s Kappa coeffi-
cient should be used. The absolute percentage of agreement
is inadequate, because it does not adjust for the agreement
attributable to chance. When quadratic weights are being
used, the weighted Kappa coefficient is identical to the
ICC
agreement
[19].
Often 0.70 is recommended as a minimum standard for
reliability [28]. We give a positive rating for reliability
when the ICC or weighted Kappa is at least 0.70 in a sample
size of at least 50 patients.
7. Responsiveness
Responsiveness has been defined as the ability of a ques-
tionnaire to detect clinically important changes over time,
even if these changes are small [37]. A large number of def-
initions and methods were proposed for assessing respon-
siveness [38]. We consider responsiveness to be a measure
of longitudinal validity. In analogy to construct validity, lon-
gitudinal validity should be assessed by testing predefined
hypotheses, e.g., about expected correlations between
changes in measures, or expected differences in changes be-
tween ‘known’ groups [38]. This shows the ability of
a questionnaire to measure changes if they really have hap-
pened. Futhermore, the instrument should be able to distin-
guish clinically important change from measurement error.
Responsiveness should therefore be tested by relating the
SDC to the MIC, as described under agreement (see Section
6.1). This approach equals Guyatt’s responsiveness ratio
(RR), in which the clinically important change (MIC) is re-
lated to the between-subject variability in within-subject
changes in stable subjects (SD
change
; the same as in the limits
of agreement) [39]. The RR should thus be at least 1.96 (at
the value of 1.96 the MIC equals the SDC
ind
, which is
1.96 SD
change
). Another adequate measure of responsive-
ness is the area under the receiver operating characteristics
(ROC) curve (AUC) [40], which is a measure of the ability
of a questionnaire to distinguish patients who have and have
not changed, according to an external criterion. We consider
an AUC of at least 0.70 to be adequate.
8. Floor or ceiling effects
Floor or ceiling effects are considered to be present if
more than 15% of respondents achieved the lowest or high-
est possible score, respectively [41]. If floor or ceiling ef-
fects are present, it is likely that extreme items are
missing in the lower or upper end of the scale, indicating
limited content validity. As a consequence, patients with
the lowest or highest possible score cannot be distinguished
from each other, thus reliability is reduced. Furthermore, the
responsiveness is limited because changes cannot be mea-
sured in these patients. We give a positive rating for (the ab-
sence of) floor and ceiling effects if no floor or ceiling
effects are present in a sample size of at least 50 patients.
9. Interpretability
Interpretability is defined as the degree to which one can
assign qualitative meaning to quantitative scores [42].
37C.B. Terwee et al. / Journal of Clinical Epidemiology 60 (2007) 34e42

Citations
More filters

Journal ArticleDOI
TL;DR: The resulting COSMIN checklist could be useful when selecting a measurement instrument, peer-reviewing a manuscript, designing or reporting a study on measurement properties, or for educational purposes.
Abstract: Aim of the COSMIN study (COnsensus-based Standards for the selection of health status Measurement INstruments) was to develop a consensus-based checklist to evaluate the methodological quality of studies on measurement properties. We present the COSMIN checklist and the agreement of the panel on the items of the checklist. A four-round Delphi study was performed with international experts (psychologists, epidemiologists, statisticians and clinicians). Of the 91 invited experts, 57 agreed to participate (63%). Panel members were asked to rate their (dis)agreement with each proposal on a five-point scale. Consensus was considered to be reached when at least 67% of the panel members indicated ‘agree’ or ‘strongly agree’. Consensus was reached on the inclusion of the following measurement properties: internal consistency, reliability, measurement error, content validity (including face validity), construct validity (including structural validity, hypotheses testing and cross-cultural validity), criterion validity, responsiveness, and interpretability. The latter was not considered a measurement property. The panel also reached consensus on how these properties should be assessed. The resulting COSMIN checklist could be useful when selecting a measurement instrument, peer-reviewing a manuscript, designing or reporting a study on measurement properties, or for educational purposes.

2,221 citations


Cites background from "Quality criteria were proposed for ..."

  • ...Examples of such criteria were previously published by members of our group [6]....

    [...]


Journal ArticleDOI
TL;DR: The aim was to clarify and standardize terminology and definitions of measurement properties by reaching consensus among a group of experts and to develop a taxonomy of measurement property relevant for evaluating health instruments.
Abstract: Objective: Lack of consensus on taxonomy, terminology, and definitions has led to confusion about which measurement properties are relevant and which concepts they represent. The aim was to clarify and standardize terminology and definitions of measurement properties by reaching consensus among a group of experts and to develop a taxonomy of measurement properties relevant for evaluating health instruments. Study Design and Setting: An international Delphi study with four written rounds was performed. Participating experts had a background in epidemiology, statistics, psychology, and clinical medicine. The panel was asked to rate their (dis)agreement about proposals on a five-point scale. Consensus was considered to be reached when at least 67% of the panel agreed. Results: Of 91 invited experts, 57 agreed to participate and 43 actually participated. Consensus was reached on positions of measurement properties in the taxonomy (68e84%), terminology (74e88%, except for structural validity [56%]), and definitions of measurement properties (68e88%). The panel extensively discussed the positions of internal consistency and responsiveness in the taxonomy, the terms ‘‘reliability’’ and ‘‘structural validity,’’ and the definitions of internal consistency and reliability. Conclusions: Consensus on taxonomy, terminology, and definitions of measurement properties was reached. Hopefully, this will lead to a more uniform use of terms and definitions in the literature on measurement properties. 2010 Elsevier Inc. All rights reserved.

2,215 citations


Cites background from "Quality criteria were proposed for ..."

  • ...[15] consider internal consistency not as a subcategory of the domain reliability and defined it as ‘‘the extent to which items in a (sub)scale are intercorrelated, thus measuring the same construct....

    [...]

  • ...[15] and COSMIN focus on health status measurement....

    [...]

  • ...The SAC-MOS standards and the Terwee criteria, however, are not based on consensus among a large group of experts [13,15]....

    [...]


Journal ArticleDOI
Nichole D. Palmer1, Caitrin W. McDonough1, Pamela J. Hicks1, B H Roh1  +381 moreInstitutions (6)
04 Jan 2012-PLOS ONE
TL;DR: It is suggested that multiple loci underlie T2DM susceptibility in the African-American population and that these loci are distinct from those identified in other ethnic populations.
Abstract: African Americans are disproportionately affected by type 2 diabetes (T2DM) yet few studies have examined T2DM using genome-wide association approaches in this ethnicity. The aim of this study was to identify genes associated with T2DM in the African American population. We performed a Genome Wide Association Study (GWAS) using the Affymetrix 6.0 array in 965 African-American cases with T2DM and end-stage renal disease (T2DM-ESRD) and 1029 population-based controls. The most significant SNPs (n = 550 independent loci) were genotyped in a replication cohort and 122 SNPs (n = 98 independent loci) were further tested through genotyping three additional validation cohorts followed by meta-analysis in all five cohorts totaling 3,132 cases and 3,317 controls. Twelve SNPs had evidence of association in the GWAS (P<0.0071), were directionally consistent in the Replication cohort and were associated with T2DM in subjects without nephropathy (P<0.05). Meta-analysis in all cases and controls revealed a single SNP reaching genome-wide significance (P<2.5×10(-8)). SNP rs7560163 (P = 7.0×10(-9), OR (95% CI) = 0.75 (0.67-0.84)) is located intergenically between RND3 and RBM43. Four additional loci (rs7542900, rs4659485, rs2722769 and rs7107217) were associated with T2DM (P<0.05) and reached more nominal levels of significance (P<2.5×10(-5)) in the overall analysis and may represent novel loci that contribute to T2DM. We have identified novel T2DM-susceptibility variants in the African-American population. Notably, T2DM risk was associated with the major allele and implies an interesting genetic architecture in this population. These results suggest that multiple loci underlie T2DM susceptibility in the African-American population and that these loci are distinct from those identified in other ethnic populations.

1,953 citations


Cites background or methods from "Quality criteria were proposed for ..."

  • ...Responsiveness was measured as the area under the receiver operating characteristic (ROC) curve which indicates the probability of correctly identifying subjects who report improvement [27,30]....

    [...]

  • ...Reproducibility can be divided in agreement and reliability [27]....

    [...]

  • ...Absence of floor or ceiling effects indicates a good content validity [17,27]....

    [...]


Journal ArticleDOI
Gill Windle1, Kate M. Bennett2, Jane Noyes1Institutions (2)
TL;DR: There is no current 'gold standard' amongst 15 measures of resilience, and a number of the scales are in the early stages of development, and all require further validation work.
Abstract: The evaluation of interventions and policies designed to promote resilience, and research to understand the determinants and associations, require reliable and valid measures to ensure data quality. This paper systematically reviews the psychometric rigour of resilience measurement scales developed for use in general and clinical populations. Eight electronic abstract databases and the internet were searched and reference lists of all identified papers were hand searched. The focus was to identify peer reviewed journal articles where resilience was a key focus and/or is assessed. Two authors independently extracted data and performed a quality assessment of the scale psychometric properties. Nineteen resilience measures were reviewed; four of these were refinements of the original measure. All the measures had some missing information regarding the psychometric properties. Overall, the Connor-Davidson Resilience Scale, the Resilience Scale for Adults and the Brief Resilience Scale received the best psychometric ratings. The conceptual and theoretical adequacy of a number of the scales was questionable. We found no current 'gold standard' amongst 15 measures of resilience. A number of the scales are in the early stages of development, and all require further validation work. Given increasing interest in resilience from major international funders, key policy makers and practice, researchers are urged to report relevant validation statistics when using the measures.

1,337 citations


Cites background or methods from "Quality criteria were proposed for ..."

  • ...Fundamental to the robustness of a methodological review are the quality criteria used to distinguish the measurement properties of a scale to enable a meaningful comparison [15]....

    [...]

  • ...content validity advocate that the target group should be involved with the item selection when measures are being developed[11,15]....

    [...]

  • ...In order to address known methodological weaknesses in the current evidence informing practice, this paper reports a methodological systematic review of resilience measurement scales, using published quality assessment criteria to evaluate psychometric properties[15]....

    [...]


Journal ArticleDOI
Jan Kottner1, Laurent Audigé, Stig Brorson2, Allan Donner3  +5 moreInstitutions (7)
Abstract: Objective: Results of reliability and agreement studies are intended to provide information about the amount of error inherent in any diagnosis, score, or measurement. The level of reliability and agreement among users of scales, instruments, or classifications is widely unknown. Therefore, there is a need for rigorously conducted interrater and intrarater reliability and agreement studies. Information about sample selection, study design, and statistical analysis is often incomplete. Because of inadequate reporting, interpretation and synthesis of study results are often difficult. Widely accepted criteria, standards, or guidelines for reporting reliability and agreement in the health care and medical field are lacking. The objective was to develop guidelines for reporting reliability and agreement studies. Study Design and Setting: Eight experts in reliability and agreement investigation developed guidelines for reporting. Results: Fifteen issues that should be addressed when reliability and agreement are reported are proposed. The issues correspond to the headings usually used in publications. Conclusion: The proposed guidelines intend to improve the quality of reporting. 2011 Elsevier Inc. All rights reserved.

1,329 citations


References
More filters

Journal ArticleDOI
08 Feb 1986-The Lancet
TL;DR: An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.
Abstract: In clinical measurement comparison of a new measurement technique with an established one is often needed to see whether they agree sufficiently for the new to replace the old. Such investigations are often analysed inappropriately, notably by using correlation coefficients. The use of correlation is misleading. An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.

41,576 citations


"Quality criteria were proposed for ..." refers methods in this paper

  • ...Another adequate parameter of agreement is described by Bland and Altman [34]....

    [...]

  • ...[34] Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement....

    [...]

  • ...[35] Altman DG. Practical statistics for medical research....

    [...]

  • ...In both cases, we consider a sample size of at least 50 patients adequate for the assessment of the agreement parameter, based on a general guideline by Altman [35]....

    [...]


Book
28 Apr 1989-
Abstract: Model Notation, Covariances, and Path Analysis. Causality and Causal Models. Structural Equation Models with Observed Variables. The Consequences of Measurement Error. Measurement Models: The Relation Between Latent and Observed Variables. Confirmatory Factor Analysis. The General Model, Part I: Latent Variable and Measurement Models Combined. The General Model, Part II: Extensions. Appendices. Distribution Theory. References. Index.

18,996 citations


Book
15 Jun 2006-
TL;DR: Practical Statistics for Medical Research is a problem-based text for medical researchers, medical students, and others in the medical arena who need to use statistics but have no specialized mathematics background.
Abstract: Most medical researchers, whether clinical or non-clinical, receive some background in statistics as undergraduates. However, it is most often brief, a long time ago, and largely forgotten by the time it is needed. Furthermore, many introductory texts fall short of adequately explaining the underlying concepts of statistics, and often are divorced from the reality of conducting and assessing medical research. Practical Statistics for Medical Research is a problem-based text for medical researchers, medical students, and others in the medical arena who need to use statistics but have no specialized mathematics background. The author draws on twenty years of experience as a consulting medical statistician to provide clear explanations to key statistical concepts, with a firm emphasis on practical aspects of designing and analyzing medical research. The text gives special attention to the presentation and interpretation of results and the many real problems that arise in medical research

16,669 citations


"Quality criteria were proposed for ..." refers methods in this paper

  • ...Another adequate parameter of agreement is described by Bland and Altman [34]....

    [...]

  • ...[34] Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement....

    [...]

  • ...[35] Altman DG. Practical statistics for medical research....

    [...]

  • ...In both cases, we consider a sample size of at least 50 patients adequate for the assessment of the agreement parameter, based on a general guideline by Altman [35]....

    [...]


Journal ArticleDOI
Alejandro R. Jadad1, R. A. Moore1, Dawn Carroll1, C. Jenkinson1  +3 moreInstitutions (1)
TL;DR: An instrument to assess the quality of reports of randomized clinical trials (RCTs) in pain research is described and its use to determine the effect of rater blinding on the assessments of quality is described.
Abstract: It has been suggested that the quality of clinical trials should be assessed by blinded raters to limit the risk of introducing bias into meta-analyses and systematic reviews, and into the peer-review process There is very little evidence in the literature to substantiate this This study describes the development of an instrument to assess the quality of reports of randomized clinical trials (RCTs) in pain research and its use to determine the effect of rater blinding on the assessments of quality A multidisciplinary panel of six judges produced an initial version of the instrument Fourteen raters from three different backgrounds assessed the quality of 36 research reports in pain research, selected from three different samples Seven were allocated randomly to perform the assessments under blind conditions The final version of the instrument included three items These items were scored consistently by all the raters regardless of background and could discriminate between reports from the different samples Blind assessments produced significantly lower and more consistent scores than open assessments The implications of this finding for systematic reviews, meta-analytic research and the peer-review process are discussed

14,663 citations


"Quality criteria were proposed for ..." refers methods in this paper

  • ...We did not summarize the quality criteria into one overall quality score, as is often done in systematic reviews of randomized clinical trials [46]....

    [...]


Journal ArticleDOI
01 Jun 1992-Biometrics

11,532 citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
202210
2021907
2020746
2019666
2018629
2017661