Showing papers on "Intraclass correlation published in 2005"

PDF

Open Access

Journal Article•DOI•

Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM.

[...]

Joseph P. Weir¹•Institutions (1)

01 Feb 2005-Journal of Strength and Conditioning Research

TL;DR: In this review, the basics of classic reliability theory are addressed in the context of choosing and interpreting an ICC and how the SEM and its variants can be used to construct confidence intervals for individual scores and to determine the minimal difference needed to be exhibited for one to be confident that a true change in performance of an individual has occurred.

...read moreread less

Abstract: Reliability, the consistency of a test or measurement, is frequently quantified in the movement sciences literature. A common metric is the intraclass correlation coefficient (ICC). In addition, the SEM, which can be calculated from the ICC, is also frequently reported in reliability studies. However, there are several versions of the ICC, and confusion exists in the movement sciences regarding which ICC to use. Further, the utility of the SEM is not fully appreciated. In this review, the basics of classic reliability theory are addressed in the context of choosing and interpreting an ICC. The primary distinction between ICC equations is argued to be one concerning the inclusion (equations 2,1 and 2,k) or exclusion (equations 3,1 and 3,k) of systematic error in the denominator of the ICC equation. Inferential tests of mean differences, which are performed in the process of deriving the necessary variance components for the calculation of ICC values, are useful to determine if systematic error is present. If so, the measurement schedule should be modified (removing trials where learning and/or fatigue effects are present) to remove systematic error, and ICC equations that only consider random error may be safely used. The use of ICC values is discussed in the context of estimating the effects of measurement error on sample size, statistical power, and correlation attenuation. Finally, calculation and application of the SEM are discussed. It is shown how the SEM and its variants can be used to construct confidence intervals for individual scores and to determine the minimal difference needed to be exhibited for one to be confident that a true change in performance of an individual has occurred.

...read moreread less

3,992 citations

Journal Article•DOI•

Development of the QuickDASH: comparison of three item-reduction approaches.

[...]

Dorcas E. Beaton, James G. Wright, Jeffrey N. Katz¹•Institutions (1)

Brigham and Women's Hospital¹

01 May 2005-Journal of Bone and Joint Surgery, American Volume

TL;DR: A comparison of item-reduction approaches suggested that the retention of clinically sensible and important content produced a comparable, if not slightly better, instrument than did more statistically driven approaches.

...read moreread less

Abstract: Background: The purpose of this study was to develop a short, reliable, and valid measure of physical function and symptoms related to upper-limb musculoskeletal disorders by shortening the full, thirty-item DASH (Disabilities of the Arm, Shoulder and Hand) Outcome Measure. Methods: Three item-reduction techniques were used on the cross-sectional field-testing data derived from a study of 407 patients with various upper-limb conditions. These techniques were the concept-retention method, the equidiscriminative item-total correlation, and the item response theory (Rasch modeling). Three eleven-item scales were created. Data from a longitudinal cohort study in which the DASH questionnaire was administered to 200 patients with shoulder and wrist/hand disorders were then used to assess the reliability (Cronbach alpha and test-retest reliability) and validity (cross-sectional and longitudinal construct) of the three scales. Results were compared with those derived with the full DASH. Results: The three versions were comparable with regard to their measurement properties. All had a Cronbach alpha of ≥0.92 and an intraclass correlation coefficient of ≥0.94. Evidence of construct validity was established (r ≥ 0.64 with single-item indices of pain and function). The concept-retention method, the most subjective of the approaches to item reduction, ranked highest in terms of its similarity to the original DASH. Conclusions: The concept-retention version is named the QuickDASH. It contains eleven items and is similar with regard to scores and properties to the full DASH. A comparison of item-reduction approaches suggested that the retention of clinically sensible and important content produced a comparable, if not slightly better, instrument than did more statistically driven approaches. Clinical Relevance: The QuickDASH is a more efficient version of the DASH outcome measure that appears to retain its measurement properties.

...read moreread less

1,429 citations

Journal Article•DOI•

Reliability of gait performance tests in men and women with hemiparesis after stroke

[...]

Ulla-Britt Flansbjer¹, Anna Maria Holmbäck¹, David Downham², Carolynn Patten³, Carolynn Patten⁴, Jan Lexell⁵, Jan Lexell¹ - Show less +3 more•Institutions (5)

Lund University¹, University of Liverpool², Veterans Health Administration³, Stanford University⁴, Luleå University of Technology⁵

01 Mar 2005-Journal of Rehabilitation Medicine

TL;DR: These commonly used gait performance tests are highly reliable and can be recommended to evaluate improvements in various aspects of gaitperformance in individuals with chronic mild to moderate hemiparesis after stroke.

...read moreread less

Abstract: Objective: To assess the reliability of 6 gait performance tests in individuals with chronic mild to moderate post-stroke hemiparesis. Design: An intra-rater (between occasions) test-retest reliability study. Subjects: Fifty men and women (mean age 58 6.4 years) 6–46 months post-stroke. Methods: The Timed “Up & Go” test, the Comfortable and the Fast Gait Speed tests, the Stair Climbing ascend and descend tests and the 6-Minute Walk test were assessed 7 days apart. Reliability was evaluated with the intraclass correlation coefficient (ICC 2,1), the Bland & Altman analysis, the standard error of measurement (SEM and SEM%) and the smallest real difference (SRD and SRD%). Results: Test-retest agreements were high (ICC2,1 0.94–0.99) with no discernible systematic differences between the tests. The standard error of measurement (SEM%), representing the smallest change that indicates a real (clinical) improvement for a group of individuals, was small (9%). The smallest real difference (SRD%), representing the smallest change that indicates a real (clinical) improvement for a single individual, was also small (13–23%). Conclusion: These commonly used gait performance tests are highly reliable and can be recommended to evaluate improvements in various aspects of gait performance in individuals with chronic mild to moderate hemiparesis after stroke.

...read moreread less

1,001 citations

Journal Article•DOI•

Comparison of multifidus muscle atrophy and trunk extension muscle strength: percutaneous versus open pedicle screw fixation.

[...]

Dong-Yun Kim, Sang-Ho Lee, Ho-Yeon Lee, Hyun-Ju Lee, Sang-Beom Chang, Chung Ss, Hyun-Jib Kim - Show less +3 more

01 Jan 2005-Spine

TL;DR: The results of this study indicate that the Korean version of the ODI is a reliable and valid instrument for the measurement of disability in Korean patients with lower back problems.

...read moreread less

Abstract: Study Design. Validation of a translated, culturally adapted questionnaire. Objectives. To translate and culturally adapt a Korean version of the Oswestry Disability Index (ODI) and to validate its use in Korean patients. Summary of Background Data. The ODI is one of the most widely used and validated instruments for measuring disability in spinal disorders. However, no validated Korean version of the index was available at the time our study was initiated. Methods. The study was carried out in three phases: the first was translation into Korean and cultural adaptation of the questionnaire; the second was a pilot study to assess the comprehensibility of the prefinal version and modification; the third was a reliability and validity study of the final version. The Korean version was tested on 206 patients with lumbar spinal disorders who had undergone operations at the authors’ institute. Test-retest reliability, internal consistency, concurrent validity, and construct validity were investigated. Follow-up questionnaires were obtained from 39 patients at the 3-month postoperative follow-up meeting. Differences in the ODI, visual analog scale (VAS), and World Health Organization (WHO) quality of life assessment (WHOQOL-BREF) between preoperative and follow-up questionnaires were evaluated. The correlation of the postoperative ODI with the pain rating on a visual analog scale and WHOQOL-BREF was also analyzed. Results. Test-retest reliability was assessed with 88 patients in a time interval of 48 hours. The intraclass correlation coefficient of test-retest reliability was 0.9167. Reliability estimated by the internal consistency reached a Cronbach’s alpha of 0.84. The correlation of the preoperative ODI with the pain rating on a visual analog scale (100 mm) was r 0.425 (P 0.0001). The correlation between three of the WHOQOL-BREF domains (physical health, psychological health, and environment) and the ODI was statistically significant. The correlation coefficient between the ODI and physical health domain of the WHOQOL-BREF was r 0.48 (P 0.05). The correlations with psychological health and environment domains were low with r 0.192 and 0.160, respectively, even though statistically significant (P 0.05). The correlation of the postoperative ODI with the pain rating on a visual analog scale (100 mm) was r 0.626 (P 0.0001). The correlation between all four domains of the WHOQOL-BREF and the postoperative ODI was statistically significant. Conclusions. The results of this study indicate that the Korean version of the ODI is a reliable and valid instrument for the measurement of disability in Korean patients with lower back problems. The authors recommend this Korean version of the ODI for use in future clinical studies in Korea.

...read moreread less

528 citations

Journal Article•DOI•

The six-minute walk test in healthy children: reliability and validity

[...]

Albert M. Li¹, J. Yin¹, C. W. Yu¹, T. Tsang¹, Hung K. So¹, E. Wong¹, Denise P. Chan¹, E. K. L. Hon¹, R. Sung¹ - Show less +5 more•Institutions (1)

The Chinese University of Hong Kong¹

01 Jun 2005-European Respiratory Journal

TL;DR: In healthy children, the 6-min walk test is a reliable and valid functional test for assessing exercise tolerance and endurance and Bland and Altman plots demonstrated a high degree of repeatability.

...read moreread less

Abstract: The aim of this study was to assess the reliability and validity of the 6-min walk test (6MWT) in healthy children. Chinese secondary school students were randomly recruited. They attended the current authors' unit on two occasions, separated by 2 weeks. Physical examination and standardised maximum incremental exercise testing on a treadmill were performed on the first visit. Spirometry and 6MWT were carried out on the second visit. A randomly selected subgroup was invited to return for repeat 6MWT at an interval of 2-4 weeks. Seventy-eight subjects were recruited; however, four failed to achieve maximal effort on exercise test. The final group included 43 young females and the mean+/-sd age of the subjects was 14.2+/-1.2 yrs. Physical examination was unremarkable in all cases. The mean+/-sd per cent predicted forced expiratory volume in one second was 91.4+/-10.2%. Concurrent validity was demonstrated by good correlation between the 6-min walking distance and maximum oxygen uptake determined on the exercise treadmill. Test-retest reliability was undertaken in 52 subjects, and the intraclass correlation coefficient (95% confidence interval) was calculated as 0.94 (0.89-0.96). In addition, Bland and Altman plots demonstrated a high degree of repeatability. In healthy children, the 6-min walk test is a reliable and valid functional test for assessing exercise tolerance and endurance.

...read moreread less

345 citations

Journal Article•DOI•

Assessing Depression in Primary Care with the PHQ‐9: Can It Be Carried Out over the Telephone?

[...]

Alejandra Pinto-Meza, Antoni Serrano-Blanco, Maria T. Peñarrubia, Elena Blanco, Josep Maria Haro - Show less +1 more

01 Aug 2005-Journal of General Internal Medicine

TL;DR: Telephone administration of the PHQ-9 seems to be a reliable procedure for assessing depression in PC, and its internal consistency was high and close to the self-administered one.

...read moreread less

Abstract: BACKGROUND: Telephone assessment of depression for research purposes is increasingly being used. The Patient Health Questionnaire 9-item depression module (PHQ-9) is a well-validated, brief, self-reported, diagnostic, and severity measure of depression designed for use in primary care (PC). To our knowledge, there are no available data regarding its validity when administered over the telephone. OBJECTIVE: The aims of the present study were to evaluate agreement between self-administered and telephone-administered PHQ-9, to investigate possible systematic bias, and to evaluate the internal consistency of the telephone-administered PHQ-9. METHODS: Three hundred and forty-six participants from two PC centers were assessed twice with the PHQ-9. Participants were divided into 4 groups according to administration procedure order and administration procedure of the PHQ-9: Self-administered/Telephone-administered; Telephone-administered/Self-administered; Telephone-administered/Telephone-administered; and Self-administered/Self-administered. The first 2 groups served for analyzing the procedural validity of telephone-administered PHQ-9. The last 2 allowed a test-retest reliability analysis of both self- and telephone-administered PHQ-9. Intraclass correlation coefficient (ICC) and weighted κ (for each item) were calculated as measures of concordance. Additionally, Pearson’s correlation coefficient, Student’s t-test, and Cronbach’s α were analyzed. RESULTS: Intraclass correlation coefficient and weighted κ between both administration procedures were excellent, revealing a strong concordance between telephone- and self-administered PHQ-9. A small and clinically nonsignificant tendency was observed toward lower scores for the telephone-administered PHQ-9. The internal consistency of the telephone-administered PHQ-9 was high and close to the self-administered one. CONCLUSIONS: Telephone and in-person assessments by means of the PHQ-9 yield similar results. Thus, telephone administration of the PHQ-9 seems to be a reliable procedure for assessing depression in PC.

...read moreread less

338 citations

Journal Article•DOI•

Translation into Brazilian Portuguese, cultural adaptation and evaluation of the reliability of the Disabilities of the Arm, Shoulder and Hand Questionnaire.

[...]

Adriana Garcia Orfale¹, Pola Maria Poli de Araújo¹, Marcos Bosi Ferraz¹, Jamil Natour¹•Institutions (1)

Federal University of São Paulo¹

01 Feb 2005-Brazilian Journal of Medical and Biological Research

TL;DR: The Portuguese version of the DASH is a reliable instrument, and the Ritchie Index showed a weak correlation with Brazilian DASH scores, while the visual analog scale of pain showed a good correlation with DASH score.

...read moreread less

Abstract: The objective of the present study was to translate, adapt and validate a Brazilian Portuguese version of the Disabilities of the Arm, Shoulder and Hand (DASH) Questionnaire. The study was carried out in two steps. The first was to translate the DASH into Portuguese and to perform cultural adaptation and the second involved the determination of the reliability and validity of the DASH for the Brazilian population. For this purpose, 65 rheumatoid arthritis patients of either sex (according to the classification criteria of the American College of Rheumatology), ranging in age from 18 to 60 years and presenting no other diseases involving the upper limbs, were interviewed. The patients were selected consecutively at the rheumatology outpatient clinic of UNIFESP. The following results were obtained: in the first step (translation and cultural adaptation), all patients answered the questions. In the second step, Spearman's correlation coefficients for interobserver evaluation ranged from 0.762 to 0.995, values considered to be highly reliable. In addition, intraclass correlation coefficients ranged from 0.97 to 0.99, also highly reliable values. Spearman's correlation coefficients and the intraclass correlation coefficients obtained during intra-observer evaluation ranged from 0.731 to 0.937 and from 0.90 to 0.96, respectively, being highly reliable values. The Ritchie Index showed a weak correlation with Brazilian DASH scores, while the visual analog scale of pain showed a good correlation with DASH score. We conclude that the Portuguese version of the DASH is a reliable instrument.

...read moreread less

297 citations

Journal Article•DOI•

Reliability and validity of three strength measures obtained from community-dwelling elderly persons.

[...]

Karen L. Schaubert¹, Richard W. Bohannon•Institutions (1)

University of Connecticut¹

01 Aug 2005-Journal of Strength and Conditioning Research

TL;DR: The authors' examination of the 3 measures for 12 weeks extends previous evidence of the stability of these strength measures and justifies the use of hand-held dynamometry and the STS test when investigating limitations in mobility.

...read moreread less

Abstract: The purpose of this study was to describe the reliability and validity of 3 strength measures obtained from community-dwelling elderly individuals. The strength of 10 elders was tested initially and 6 and 12 weeks later using the MicroFET 2 hand-held dynamometer (knee extension strength), the Jamar dynamometer (grip strength), and the sit-to-stand (STS) test. Mobility was tested using the timed up-and-go (TUG) test and a timed walk test. Intraclass correlation coefficients, which were used to characterize the reliability of the strength tests, ranged from 0.807 to 0.981. Pearson correlations between the lower extremity strength measures and the TUG and gait speed ranged from 0.635 to -0.943. Our examination of the 3 measures for 12 weeks extends previous evidence of the stability of these strength measures and justifies the use of hand-held dynamometry and the STS test when investigating limitations in mobility.

...read moreread less

260 citations

Journal Article•DOI•

Cross-cultural adaptation, reliability and validity of the resilience scale

[...]

Renata Pires Pesce¹, Simone Gonçalves de Assis¹, Joviana Quintes Avanci¹, Nilton César dos Santos¹, Juaci Vitória Malaquias¹, Raquel Carvalhaes¹ - Show less +2 more•Institutions (1)

Oswaldo Cruz Foundation¹

01 Mar 2005-Cadernos De Saude Publica

TL;DR: The cross-cultural adaptation to Portuguese and the psychometric evaluation of the resilience scale developed by Wagnild & Young showed good results in the semantic equivalence for: general meaning and referential meaning and there was an inverse correlation with the scale that evaluates psychological violence.

...read moreread less

Abstract: This study describes the cross-cultural adaptation to Portuguese and the psychometric evaluation of the resilience scale developed by Wagnild & Young. The scale was adapted for a sample of students from public schools in Sao Goncalo, Rio de Janeiro, Brazil. Data from the pilot study (203 students interviewed at two points in time) and from the entire study (977) are presented. The cross-cultural adaptation showed good results in the semantic equivalence for: general meaning (above 90.0%) and referential meaning (above 85.0%). Chronbach alpha was 0.85 in the pilot study and 0.80 in the total sample. Kappa between the two points in time was regular and moderate, and the intraclass correlation coefficient was 0.746 (p = 0.000). Factorial analysis indicated three non-homogeneous factors. Construct validity demonstrated direct and significant correlation with self-esteem, family supervision, life satisfaction, and social support. There was an inverse correlation with the scale that evaluates psychological violence.

...read moreread less

228 citations

Journal Article•DOI•

Test-Retest Reliability of Grip-strength Measures Obtained over a 12-week Interval from Community-dwelling Elders

[...]

Richard W. Bohannon¹, Karen L. Schaubert²•Institutions (2)

University of Connecticut¹, MedStar Washington Hospital Center²

01 Oct 2005-Journal of Hand Therapy

TL;DR: Measurements of hand-grip strength obtained from elders over a 12-week period are reliable, and test and retest measurements did not differ significantly over time on either side.

...read moreread less

228 citations

Journal Article•DOI•

Test-retest reliability of the Short-Form McGill Pain Questionnaire: assessment of intraclass correlation coefficients and limits of agreement in patients with osteoarthritis.

[...]

Kate Grafton¹, Nadine E. Foster, Christine C. Wright•Institutions (1)

Sheffield Hallam University¹

01 Jan 2005-The Clinical Journal of Pain

TL;DR: The Short-Form McGill Pain Questionnaire was demonstrated to be a highly reliable measure of pain and should be generalized to a more elderly population, as increasing age was correlated with greater variability of the sensory component scores.

...read moreread less

Abstract: Objectives: No previous study has adequately demonstrated the test-retest reliability of the Short-Form McGill Pain Questionnaire, yet it is increasingly being used as a measure of pain. This study evaluates the test-retest reliability in patients with osteoarthritis. Methods: A prospective, observational cohort study was undertaken using serial evaluation of 57 patients at 2 time points. A sample of patients awaiting primary hip or knee joint replacement surgery were recruited in clinic or via mail (mean age 64.8 years). Short-Form McGill Pain Questionnaires were delivered by mail 5 days apart, and a supplementary questionnaire was completed on the second occasion to explore if the patients’ pain report had remained stable. Results: The intraclass correlation coefficient was used as an estimate of reliability. For the total, sensory, affective, and average pain scores, high intra-class correlations were demonstrated (0.96, 0.95, 0.88, and 0.89, respectively). The current pain component demonstrated a lower intraclass correlation of 0.75. The coefficient of repeatability was calculated as an estimation of the minimum metrically detectable change. The coefficients of repeatability for the total, sensory, affective, average, and current pain components were 5.2, 4.5, 2.8, 1.4 cm, and 1.4, respectively. Discussion: Problems of adequate completion of the Short-Form McGill Pain Questionnaire were highlighted in this sample, and supervision via telephone contact was required. Patients recruited in clinic who had practiced completing the Short-Form McGill Pain Questionnaire demonstrated fewer errors than those recruited by mail. The Short-Form McGill Pain Questionnaire was demonstrated to be a highly reliable measure of pain. These results should not be generalized to a more elderly population, as increasing age was correlated with greater variability of the sensory component scores.

...read moreread less

Journal Article•DOI•

The hand eczema severity index (HECSI): a scoring system for clinical assessment of hand eczema. A study of inter- and intraobserver reliability.

[...]

Elisabeth Held¹, Rikke Skoet¹, Jeanne D. Johansen¹, Tove Agner¹•Institutions (1)

University of Copenhagen¹

01 Feb 2005-British Journal of Dermatology

TL;DR: There is a need for a standardized clinical grading system for a more objective and accurate assessment of the severity of hand eczema (HE).

...read moreread less

Abstract: Summary Background There is a need for a standardized clinical grading system for a more objective and accurate assessment of the severity of hand eczema (HE). Objectives To develop and validate a scoring system called the hand eczema severity index (HECSI) designed for clinical assessment of HE. Methods Twelve dermatologists (observers) assessed 15 HE patients twice, with an interval of 30 min. The study was performed blinded for the observers, and only the hands and wrists of the patients were visible to the observers. Agreement between the observers was determined by using the intraclass correlation coefficient (ICC), which is the correlation between (single) ratings of the same patient. Results ICC for total HECSI score was 0·79 at the first assessment and 0·84 at the second assessment. ICC for intraobserver agreement was 0·90. Conclusions Overall excellent agreement existed for both inter- and intraobserver reliability and the scoring system is suggested for use in future clinical studies on HE. Because HECSI is an entirely objective assessment of clinical signs, in addition, inclusion of patient-rated symptoms should be considered.

...read moreread less

Journal Article•DOI•

Unified Parkinson's disease rating scale motor examination: Are ratings of nurses, residents in neurology, and movement disorders specialists interchangeable?

[...]

Bart Post¹, Maruschka P. Merkus, Rob M.A. de Bie¹, Rob J. de Haan¹, Johannes D. Speelman¹ - Show less +1 more•Institutions (1)

University of Amsterdam¹

01 Dec 2005-Movement Disorders

TL;DR: In this paper, the authors assessed the variability of the UPDRS motor examination (UPDRS-ME) of nurse practitioners, residents in neurology, and a movement disorders specialist (MDS) compared to a senior MDS.

...read moreread less

Abstract: The Unified Parkinson's Disease Rating Scale (UPDRS) is widely used for the clinical evaluation of Parkinson's disease (PD). We assessed the rater variability of the UPDRS Motor examination (UPDRS-ME) of nurse practitioners, residents in neurology, and a movement disorders specialist (MDS) compared to a senior MDS. We assessed the videotaped UPDRS-ME of 50 PD patients. Inter-rater and intra-rater variability were estimated using weighted kappa (kappa(w)) and intraclass correlation coefficients (ICC). Additionally, inter-rater agreement was quantified by calculation of the mean difference between 2 raters and its 95% limits of agreement. Intra-rater agreement was also estimated by calculation of a 95% repeatability limits. The kappa(w) and ICC statistics indicated good to very good inter-rater and intra-rater reliability for the majority of individual UPDRS items and the sum score of the UPDRS-ME in all raters. However, for inter-rater agreement, it appeared that both nurses, residents, and the MDS consistently assigned higher scores than the senior MDS. Mean differences ranged between 1.7 and 5.4 (all differences P < 0.05), with rather wide 95% limits of agreement. The intra-rater 95% repeatability limits were rather wide. We found considerable rater difference for the whole range of UPDRS-ME scores between a senior MDS and nurse practitioners, residents in neurology, and the MDS. This finding suggests that the amount by which raters may disagree should be quantified before starting longitudinal studies of disease progression or clinical trials. Finally, evaluation of rater agreement should always include the assessment of the extent of bias between different raters.

...read moreread less

Journal Article•DOI•

Agreement between child self-report and parent proxy-report to evaluate quality of life in children with cancer

[...]

Pi Chen Chang¹, Chao Hsing Yeh²•Institutions (2)

Taipei Medical University¹, Chang Gung University²

01 Feb 2005-Psycho-oncology

TL;DR: The results suggest that when children who are younger are not able to evaluate QOL assessment due to their developmental limitation or severity of illness, parents can provide valid information about their QOL, however, parent‐proxy of QOL for adolescents provides significantly different information than self‐report and proxy data of Q OL for adolescents should be used with caution.

...read moreread less

Abstract: Assessment of children' Quality of life (QOL) is a special challenge for clinicians and researchers because different cognitive abilities of children at various ages and illness levels are so varied. In addition, statistical strategies reported to evaluate proxy agreement have been inconclusive. The specific aims of this study were to examine agreement between child self-reports and parent proxy-reports to evaluate QOL in a sample of pediatric cancer patients. Previously tested QOL instruments (Quality of Life for Children with Cancer, QOLCC) were completed by 141 patients (82 children and 59 adolescents) and 141 of their parents. Three different statistical approaches were employed to evaluate convergence of self-report and proxy-report: product-moment correction coefficient, intraclass correlation (ICC), and comparison of group means. In addition, scatter bias was used to examine the degree of differences across the range of measurement. Our findings indicate that neither Pearson product correlation, ICC or group difference provided enough information to detect the individual differences of measures of QOL. We found that scatter bias should be supplemented to quantify the degree of individual-level differences. The results suggest that when children who are younger are not able to evaluate QOL assessment due to their developmental limitation or severity of illness, parents can provide valid information about their QOL. However, parent-proxy of QOL for adolescents provides significantly different information than self-report and proxy data of QOL for adolescents should be used with caution.

...read moreread less

Journal Article•DOI•

The L Test of Functional Mobility: Measurement Properties of a Modified Version of the Timed “Up & Go” Test Designed for People With Lower-Limb Amputations

[...]

A Barry Deathe¹, William C. Miller²•Institutions (2)

University of Western Ontario¹, University of British Columbia²

01 Jul 2005-Physical Therapy

TL;DR: The L Test is a 20-m test of basic mobility skills that includes 2 transfers and 4 turns that demonstrated excellent measurement properties in this study.

...read moreread less

Abstract: Background and Purpose. Walk tests provide essential outcome information when assessing ambulation of individuals with lower-limb amputation and a prosthetic device. Existing tests have limitations such as ceiling effects or insufficient challenge. The objective of this study was to assess the reliability and validity of data for a clinical measure of basic mobility, the L Test of Functional Mobility (L Test). Subjects. For this methodological study, 93 people with unilateral amputations (74% transtibial, 26% transfemoral; 78% male, 22% female; mean age=55.9 years) were consecutively recruited from an outpatient clinic. Twenty-seven subjects returned for retesting. Methods. To assess concurrent validity, subjects completed the L Test, Timed “Up & Go” Test (TUG), 10-Meter Walk Test, and 2-Minute Walk Test, followed by the Activities-specific Balance Confidence scale, Frenchay Activities Index (FAI), and mobility subscale of the Prosthetic Evaluation Questionnaire (PEQ-MS). Amputation cause and level, walking aid use, automatic stepping, and age variables were used to assess discriminant validity. Results. Intraclass correlation coefficients were .96 for interrater reliability and .97 for intrarater reliability, and minimal bias existed upon retesting. The magnitude of concurrent validity correlations ( r ) was very high between the L Test data and data for other walk tests and fair to moderate between the L Test data and data for self-report measures. The L Test discriminated between all groups as hypothesized. Discussion and Conclusion. The L Test is a 20-m test of basic mobility skills that includes 2 transfers and 4 turns. It demonstrated excellent measurement properties in this study.

...read moreread less

Journal Article•DOI•

Reliability and validity of the short form of the child health questionnaire for parents (CHQ-PF28) in large random school based and general population samples

[...]

Hein Raat¹, Anita M Botterweck¹, Jeanne M. Landgraf, W Christina Hoogeveen, Marie-Louise Essink-Bot - Show less +1 more•Institutions (1)

Erasmus University Rotterdam¹

01 Jan 2005-Journal of Epidemiology and Community Health

TL;DR: This study showed that the CHQ-PF28 resulted in score distributions, and discriminative validity that are comparable to its longer counterpart, but that the internal consistency of most individual scales was low.

...read moreread less

Abstract: Study objectives: This study assessed the feasibility, reliability, and validity of the 28 item short child health questionnaire parent form (CHQ-PF28) containing the same 13 scales, but only a subset of the items in the widely used 50 item CHQ-PF50. Design: Questionnaires were sent to a random regional sample of 2040 parents of schoolchildren (4–13 years); in a random subgroup test-retest reliability was assessed (n = 234). Additionally, the study assessed CHQ-PF28 score distributions and internal consistencies in a nationwide general population sample of (parents of) children aged 4–11 (n = 2474) from Statistics Netherlands. Main results: Response was 70%. In the school and general population samples seven scales showed ceiling effects. Both CHQ summary measures and one multi-item scale showed adequate internal consistency in both samples (Cronbach’s α>0.70). One summary measure and one scale showed excellent test-retest reliability (intraclass correlation coefficient >0.70); seven scales showed moderate test-retest reliability (intraclass correlation coefficient 0.50–0.70). The CHQ could discriminate between a subgroup with no parent reported chronic conditions (n = 954) and subgroups with asthma (n = 134), frequent headaches (n = 42), and with problems with hearing (n = 38) (Cohen’s effect sizes 0.12–0.92; p Conclusions: This study showed that the CHQ-PF28 resulted in score distributions, and discriminative validity that are comparable to its longer counterpart, but that the internal consistency of most individual scales was low. In community health applications, the CHQ-PF28 may be an acceptable alternative for the longer CHQ-PF50 if the summary measures suffice and reliable estimates of each separate CHQ scale are not required.

...read moreread less

Journal Article•DOI•

Intraclass correlation coefficients for cluster randomized trials in primary care: the cholesterol education and research trial (CEART).

[...]

Donna R. Parker¹, Evangelos Evangelou², Evangelos Evangelou¹, Charles B. Eaton¹, Charles B. Eaton² - Show less +1 more•Institutions (2)

Memorial Hospital of Rhode Island¹, Brown University²

01 Apr 2005-Contemporary Clinical Trials

TL;DR: It is suggested that cluster randomization may substantially increase the sample size necessary to maintain adequate statistical power for selected outcomes such as diastolic blood pressure studies compared with simple randomization for most outcomes evaluated in this study where the design effect is small to moderate.

...read moreread less

Journal Article•DOI•

Development and validation of a lower-extremity activity scale: Use for patients treated with revision total knee arthroplasty

[...]

Khaled J. Saleh¹, Kevin J. Mulhall¹, Boris Bershadsky, Hassan M.K. Ghomrawi², Laura E. White³, Cathy M. Buyea³, Kenneth A. Krackow³ - Show less +3 more•Institutions (3)

University of Virginia¹, University of Minnesota², University at Buffalo³

01 Sep 2005-Journal of Bone and Joint Surgery, American Volume

TL;DR: A lower-extremity activity scale was responsive, accurately reflecting changes in the patient's condition between baseline and the time of follow-up, and it will become a useful, practical adjunct to objective clinical decision-making and intervention for patients undergoing arthroplasty.

...read moreread less

Abstract: Background: Valid outcome measurement tools are required to reliably demonstrate the effectiveness and clinical outcomes of lower-extremity arthroplasty. Having ascertained a lack of a practical and valid measure of the change in actual daily physical activity that occurs prior to and following lower-limb arthroplasty, we developed and validated a lower-extremity activity scale. Methods: The eighteen-level self-administered scale was developed with the aid of content experts to ensure face validity. Validity and reliability were assessed with the use of (1) pedometer measurements of seventy subjects over seven days; (2) next-of-kin proxy measurements of the activity levels of ninety patients before they underwent lower-limb arthroplasty; and (3) application, and correlation with the Western Ontario and McMaster Universities Osteoarthritis Index scores, in a prospective seventeen-center clinical study of 297 consecutive patients undergoing revision total knee arthroplasty. In this latter study, demographic and comorbidity data were also collected. Univariate and bivariate correlations were performed, and a multivariate structured equation modeling approach was used to further test responsiveness, reliability, and validity of the lower-extremity activity scale. Results: Pedometer readings correlated with the activity levels derived with the lower-extremity activity scale (r = 0.79). Of note was the finding that age, weight, and body mass index did not correlate well with the average number of steps per day (r = -0.32, -0.32, and -0.25, respectively). A significant correlation was found between the lower-extremity activity scores recorded by the patients and those reported by their next of kin (Pearson correlation, r = 0.715; p = 0.0001) and between the initial lower-extremity activity scores and two-week-retest scores (intraclass correlation = 0.9147; p < 0.0001), demonstrating the validity and reliability of the scale. The lower-extremity activity scale was responsive, accurately reflecting changes in the patient's condition between baseline and the time of follow-up (p < 0.001), and it was reliable, with baseline values correlating with follow-up scores (p < 0.001). The convergent validity of the lower-extremity activity scale was established by correlations with the function scores (r = -0.301, p < 0.001) and pain scores (r = -0.241, p < 0.001) derived with the Western Ontario and McMaster Universities Osteoarthritis Index and with a higher number of comorbidities (r = -0.244, p < 0.001). Multivariate path modeling further demonstrated diminished activity in patients who had more difficulty in functioning and a greater number of comorbidities. Conclusions: We developed a lower-extremity activity scale and validated that it was an effective instrument for the assessment of patients' actual activity levels. It is easy to apply and interpret, and it is valid and ready for use in the clinical setting. This scale will allow more accurate analysis and prediction of outcomes. Consequently, it will become a useful, practical adjunct to objective clinical decision-making and intervention for patients undergoing arthroplasty.

...read moreread less

Journal Article•DOI•

Reliability and validity of adapted Turkish version of scoliosis research society-22 (SRS-22) questionnaire

[...]

Ahmet Alanay¹, Akin Cil, Haluk Berk, R Emre Acaroglu, Muharrem Yazici, Omer Akcali, Can Kosay, Yasemin Genç, Adil Surat - Show less +5 more•Institutions (1)

Hacettepe University¹

01 Nov 2005-Spine

TL;DR: This study demonstrated that, if measures are to be used across cultures, the items must not only be translated well linguistically but also must be culturally adapted to maintain the content validity of the instrument at a conceptual level across different cultures.

...read moreread less

Abstract: Study design Outcome study to determine the internal consistency, and validity of adapted Turkish version of Scoliosis Research Society-22 (SRS-22) Instrument. Objectives To evaluate the validity and reliability of adapted Turkish Version of SRS-22 questionnaire. Summary of background data The SRS-22 questionnaire is a widely accepted questionnaire to assess the health-related quality of life for scoliotic patients in the United States. However, its adaptation in languages other than the source language is necessary for its multinational use. Methods Translation/retranslation of the English version of the SRS-22 was done, and all steps for cross-cultural adaptation process were performed properly by an expert committee. Later, SRS-22 questionnaires and previously validated Short Form-36 (SF-36) outcome instruments were mailed to 82 patients who had been surgically treated for idiopathic scoliosis. All patients had a minimum of 2 years follow-up. Fifty-four patients (66%) responded to the first set of questionnaires. Forty-seven of the first time respondents returned their second survey. The average age of the 47 patients (12 male, 35 female) was 19.8 years (range, 14-31 years). The two measures of reliability as internal consistency and reproducibility were determined by Cronbach alpha statistics and intraclass correlation coefficient, respectively. Concurrent validity was measured by comparing with an already validated questionnaire (SF-36). Measurement was made using the Pearson correlation coefficient (r). Results The study demonstrated satisfactory internal consistency with high Cronbach alpha values for the four of the corresponding domains (pain, 0.72; self-image, 0.80; mental health, 0.72; and satisfaction, 0.83). However, the Cronbach alpha value for function/activity domain (0.48) was considerably lower than the original questionnaire. The intraclass correlation coefficient for the same domains was 0.80, 0.82, 0.78, 0.81, and 0.76, respectively, demonstrating a satisfactory test/retest reproducibility. Considering concurrent validity, two domains had excellent correlation (r = 0.75-1), while 9 had good correlation (r = 0.50 to 0.75), and 6 had moderate correlation (r =0.25-0.50). Based on these results, question 18 in the function/activity domain with lower Cronbach alpha value was revised while question 15 was excluded. The revised SRS-22 was given to 30 adolescent idiopathic scoliosis patients not included in the index study. The revision could improve the Cronbach alpha value for function/activity domain from 0.48 to 0.81. Conclusion This study demonstrated that, if measures are to be used across cultures, the items must not only be translated well linguistically but also must be culturally adapted to maintain the content validity of the instrument at a conceptual level across different cultures. This may necessitate several validation studies to ensure and improve consistency in the content and face validity between source and target versions of a questionnaire due to difficulty in detecting subtle differences in the living habits of different cultures.

...read moreread less

Journal Article•DOI•

Implications for the use of postural analysis as a clinical diagnostic tool: reliability of quantifying upright standing spinal postures from photographic images.

[...]

Nadine M. Dunk¹, Jennifer Lalonde¹, Jack P. Callaghan¹•Institutions (1)

University of Waterloo¹

01 Jul 2005-Journal of Manipulative and Physiological Therapeutics

TL;DR: Although the repeatability of posture was improved in the sagittal view, when a biological measure was used instead of an external vertical reference to calculate spinal angles, individual subject posture was still variable and brings into question the effectiveness and validity of using surface skin markers to track postural changes due to clinical interventions.

...read moreread less

Journal Article•DOI•

OAKHQOL: a new instrument to measure quality of life in knee and hip osteoarthritis.

[...]

Anne-Christine Rat, Joël Coste, Jacques Pouchot, Michèle Baumann, E. Spitz¹, Nathalie Retel-Rude², Janine-Sophie Giraudet Le Quintrec, Dominique Dumont-Fischer, Francis Guillemin - Show less +5 more•Institutions (2)

Metz¹, University of Franche-Comté²

01 Jan 2005-Journal of Clinical Epidemiology

TL;DR: The OAKHQOL is the first specific knee and hip OA quality of life instrument that meets psychometric requirements for validity and reliability and followed an a priori structured strategy to ensure content validity.

...read moreread less

Journal Article•DOI•

Multilevel modeling of a clustered continuous outcome: nurses' work hours and burnout.

[...]

Sunhee Park¹, Eileen T. Lake•Institutions (1)

University of Pennsylvania¹

01 Nov 2005-Nursing Research

TL;DR: Multilevel models provide a more accurate and comprehensive description of relationships in clustered data than do conventional models, by correcting underestimated standard errors, by estimating components of variance at several levels, and by estimating cluster-specific intercepts and slopes.

...read moreread less

Abstract: Background Multilevel models were designed to analyze data generated from a nested structure (e.g., nurses within hospitals) because conventional linear regression models underestimate standard errors and, in turn, overestimate test statistics. Objectives To introduce 2 types of multilevel models, the random intercept model and the random coefficient model, to describe the correlation among observations within a cluster, and to demonstrate how to identify the superior model. Method The conceptual and mathematical bases for the 2 multilevel model types are presented. Intraclass correlation is defined and assessment of model fit is detailed. An empirical example is presented in which average work hours per week and burnout are analyzed using data from 4,320 staff nurses clustered in 19 hospitals. Results Average work hours were positively associated with nurse burnout. The multilevel models corrected the problem of underestimated standard errors in conventional linear regression models. Graphs displaying the hospital-level differences illustrated the 2 multilevel model types. Although the multilevel models corrected the underestimation of standard errors, the results did not differ substantively for the conventional or the 2 multilevel models. The intraclass correlation coefficient was .044, indicating that the extent of shared variance among nurses in a hospital was low. The random intercept model fit the data better than did the random coefficient model. Conclusions Multilevel models provide a more accurate and comprehensive description of relationships in clustered data than do conventional models, by correcting underestimated standard errors, by estimating components of variance at several levels, and by estimating cluster-specific intercepts and slopes.

...read moreread less

Journal Article•DOI•

Intraclass correlation coefficient and outcome prevalence are associated in clustered binary data

[...]

Martin Gulliford¹, Geoffrey Adams¹, Obioha C Ukoumunne¹, Radoslav Latinovic¹, Susan Chinn¹, Michael J. Campbell² - Show less +2 more•Institutions (2)

King's College London¹, University of Sheffield²

01 Mar 2005-Journal of Clinical Epidemiology

TL;DR: The prevalence of an outcome may be used to make an informed assumption about the magnitude of the intraclass correlation coefficient in a range of outcomes in community and health services settings.

...read moreread less

Journal Article•DOI•

Validation of the Restless Legs Syndrome Quality of Life questionnaire.

[...]

Linda Abetz, Susan M. Vallow¹, Jeff Kirsch², Richard P. Allen³, Tinna Washburn³, Christopher J. Earley³ - Show less +2 more•Institutions (3)

Janssen Pharmaceutica¹, GlaxoSmithKline², Johns Hopkins University³

01 Mar 2005-Value in Health

TL;DR: The RLSQoL is a valid and reliable measure of the impact of RLS on QoL and is responsive to short-term changes in symptom severity and appears to be an appropriate tool for trial-based assessments of treatments for RLS.

...read moreread less

Journal Article•DOI•

The reproducibility of the Canadian Occupational Performance Measure

[...]

Isaline C J M Eyssen¹, A. Beelen¹, Christine Dedding, M. Cardol, Joost Dekker¹ - Show less +1 more•Institutions (1)

VU University Amsterdam¹

01 Aug 2005-Clinical Rehabilitation

TL;DR: The reproducibility of the mean performance and satisfaction scores was moderate, but it was poor for the scores of the separate problems, therefore, the mean scores should be used for individual assessment.

...read moreread less

Abstract: Objective: To assess the reproducibility (reliability and inter-rater agreement) of the client-centred Canadian Occupational Performance Measure (COPM).Design: The COPM was administered twice, with a mean interval of seven days (SD 1.6, range 4-14), by two different occupational therapists. Data analysis was based on intraclass correlation coefficients, the Bland and Altman method and Cohen's weighted kappas.Setting: Occupational therapy departments of two university medical centres.Subjects: Consecutive clients, with various diagnoses, newly referred to the outpatient clinic of two occupational therapy departments, were included. They were all over 18 years of age and perceived limitations in more than one activity of daily life. Complete data on 95 clients were obtained: 31 men and 64 women.Results: Sixty-six per cent of the activities prioritized at the first assessment were also prioritized at the second assessment. The intraclass correlation coefficients were 0.67 (95% confidence interval (CI) 0.54-0...

...read moreread less

Journal Article•DOI•

Reliability of simple portable tests of physical performance in older people after hip fracture

[...]

Catherine Sherrington¹, Stephen R. Lord²•Institutions (2)

Bankstown Lidcombe Hospital¹, Prince of Wales Medical Research Institute²

01 May 2005-Clinical Rehabilitation

TL;DR: The test–retest reliability of a number of simple measures of physical performance is excellent when used with older people following hip fracture.

...read moreread less

Abstract: Objective: To investigate the test–retest reliability of measures of strength, balance, gait and functional performance when used with older people following hip fracture.Subjects: Thirty people (16 hospital inpatients and 14 community dwellers).Design: Subjects underwent two assessments: one day apart for the hospital inpatients and one week apart for the community dwellers.Measurement: Strength (dynamometer, sphygmomanometer, spring balance, lateral step-up ability), balance (sway-meter, Functional Reach Test, single leg stance time, Step Test), gait (timed 6-m walk with steps taken, base of support and step length), and functional performance (PPME total score and timed supine-to-sit and sit-to-stand) were measured.Results: Eleven of the 14 continuously scaled measurement tools achieved excellent reliability (intraclass correlation coefficient (ICC)>0.75) for one or more tests. A hand-held dynamometer was found to be the tool with the highest test–retest reliability for measuring hip muscle strength (I...

...read moreread less

Journal Article•DOI•

Continuous-scale physical functional performance test: validity, reliability, and sensitivity of data for the short version.

[...]

M. Elaine Cress¹, John K. Petrella², Trudy L. Moore¹, Margaret Schenkman³•Institutions (3)

University of Georgia¹, University of Alabama at Birmingham², Anschutz Medical Campus³

01 Apr 2005-Physical Therapy

TL;DR: The PFP-10 yields valid, reliable, and sensitive measurements and can be confidently substituted for the CS-PFP, and is sensitive to change.

...read moreread less

Abstract: Background and Purpose. The Continuous-Scale Physical Functional Performance Test (CS-PFP) can be used to obtain valid, reliable, and sensitive measurements of physical functional capacity. This test requires a fixed laboratory space and approximately 1 hour to administer. This study was carried out in 4 steps, or substudies, to develop and validate a short, community-based version (PFP-10) that requires less space and equipment than the CS-PFP. Subjects and Methods. Retrospective data (n=228) and prospective data (n=91) on men and women performing the CS-PFP or the PFP-10 are reported. A 12-week exercise program was used to examine sensitivity to change. Data analyses were done using paired t -test, Pearson correlation, intraclass correlation coefficient (ICC), and delta index (DI) procedures. Results. The PFP-10 total score and 4 of the 5 domain scores were statistically similar (within 3%) to those of the CS-PFP. The PFP-10 upper-body strength domain score was 17% lower, but was highly correlated (ICC=.97). Community and established laboratory PFP-10 scores were similar (ICC=.85–.97). The PFP-10 also is sensitive to change (DI=.21–.54). Discussion and Conclusion. The PFP-10 yields valid, reliable, and sensitive measurements and can be confidently substituted for the CS-PFP.

...read moreread less

Journal Article•DOI•

Performance of health status measures with a pen based personal digital assistant

[...]

T K Kvien, Petter Mowinckel, Turid Heiberg, K L Dammann, Ø Dale, G J Aanerud, T N Alme, Till Uhlig - Show less +4 more

01 Oct 2005-Annals of the Rheumatic Diseases

TL;DR: The clinimetric performance of paper/pencil versions of self reported health status measures was similar to an electronic version, using an inexpensive PDA.

...read moreread less

Abstract: Background: Increasing use of self reported health status in clinical practice and research, as well as patient appreciation of monitoring fluctuations of health over time, suggest a need for more frequent collection of data. Electronic use of health status measures in the follow up of patients is a possible way to achieve this. Objective: To compare self reported health status measures in a personal digital assistant (PDA) version and a paper/pencil version for test–retest reliability, agreement between scores, and feasibility. Methods: 30 patients with stable rheumatoid arthritis (mean age 61.6 years, range 49.8 to 70.0; mean disease duration, 16.7 years; 63% female; 67% rheumatoid factor positive; 46.6% on disease modifying antirheumatic drugs) completed self reported health status measures (pain, fatigue, and global health on visual analogue scales (VAS), rheumatoid arthritis disease activity index, modified health assessment questionnaire, SF-36) in a conventional paper based questionnaire version and on a PDA (HP iPAQ, model h5450). Completion was repeated after five to seven days. Results: Test–retest reliability was similar, as evaluated by the Bland–Altman approach, the coefficient of variation, and intraclass correlation coefficients. The scores showed acceptable agreement, but with a slight tendency to higher scores on VAS with the PDA than the paper/pencil version. No significant differences were seen for measures of feasibility (time to complete, satisfaction score), but 65.5% preferred PDA, 20.7% preferred paper, and 13.8% had no preference. Conclusions: The clinimetric performance of paper/pencil versions of self reported health status measures was similar to an electronic version, using an inexpensive PDA.

...read moreread less

Journal Article•DOI•

Comparison of preference-based utilities of the 15D, EQ-5D and SF-6D in patients with HIV/AIDS

[...]

Knut Stavem¹, Stig S. Frøland, Kjell Block Hellum¹•Institutions (1)

Akershus University Hospital¹

01 May 2005-Quality of Life Research

TL;DR: There was no evidence for better discriminative capacity or responsiveness for the 15D, than for the two other multiattribute measures, although many of the measurement properties were similar.

...read moreread less

Abstract: Objective: This article compares preference-based utilities from the multiattribute utility instrument 15D with those derived from the EQ-5D and the Short Form 36 (SF-6D) in patients with HIV/AIDS. In particular, we wanted to examine if the finer descriptive system of the 15D would result in better discriminative capacity or responsiveness. Methods: In a prospective observational study of 60 Norwegian patients with HIV/AIDS from two hospitals, the authors compared scores, assessed associations with disease staging systems, and assessed test–retest reliability and responsiveness of the instruments. Results: On average, the 15D gave higher utility scores than the other two measures, the mean utility scores were: 15D – 0.86, SF-6D – 0.73, and EQ-5D Index – 0.77. Test-retest reliability was acceptable for all measures, with intraclass correlation coefficients between 0.78 and 0.94. The correlation between scores of the 3 scales was substantial (ρ=0.74–0.80). There was no major difference in responsiveness between the measures. Conclusions: The different measures gave different utility values in this sample of patients with HIV/AIDS, although many of the measurement properties were similar. There was no evidence for better discriminative capacity or responsiveness for the 15D, than for the two other multiattribute measures.

...read moreread less

Journal Article•DOI•

Validity of parent ratings as proxy measures of pain in children with cognitive impairment.

[...]

Terri Voepel-Lewis¹, Shobha Malviya¹, Alan R. Tait¹•Institutions (1)

University of Michigan¹

01 Dec 2005-Pain Management Nursing

TL;DR: It is suggested that parents of children with CI provide reasonable estimates of their child's pain, particularly when using a structured pain tool, while parents may, however, tend to overestimate their children's pain during the early postoperative period.

...read moreread less