scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 2005"


Journal ArticleDOI
TL;DR: In this review, the basics of classic reliability theory are addressed in the context of choosing and interpreting an ICC and how the SEM and its variants can be used to construct confidence intervals for individual scores and to determine the minimal difference needed to be exhibited for one to be confident that a true change in performance of an individual has occurred.
Abstract: Reliability, the consistency of a test or measurement, is frequently quantified in the movement sciences literature. A common metric is the intraclass correlation coefficient (ICC). In addition, the SEM, which can be calculated from the ICC, is also frequently reported in reliability studies. However, there are several versions of the ICC, and confusion exists in the movement sciences regarding which ICC to use. Further, the utility of the SEM is not fully appreciated. In this review, the basics of classic reliability theory are addressed in the context of choosing and interpreting an ICC. The primary distinction between ICC equations is argued to be one concerning the inclusion (equations 2,1 and 2,k) or exclusion (equations 3,1 and 3,k) of systematic error in the denominator of the ICC equation. Inferential tests of mean differences, which are performed in the process of deriving the necessary variance components for the calculation of ICC values, are useful to determine if systematic error is present. If so, the measurement schedule should be modified (removing trials where learning and/or fatigue effects are present) to remove systematic error, and ICC equations that only consider random error may be safely used. The use of ICC values is discussed in the context of estimating the effects of measurement error on sample size, statistical power, and correlation attenuation. Finally, calculation and application of the SEM are discussed. It is shown how the SEM and its variants can be used to construct confidence intervals for individual scores and to determine the minimal difference needed to be exhibited for one to be confident that a true change in performance of an individual has occurred.

3,992 citations


Journal ArticleDOI
TL;DR: A comparison of item-reduction approaches suggested that the retention of clinically sensible and important content produced a comparable, if not slightly better, instrument than did more statistically driven approaches.
Abstract: Background: The purpose of this study was to develop a short, reliable, and valid measure of physical function and symptoms related to upper-limb musculoskeletal disorders by shortening the full, thirty-item DASH (Disabilities of the Arm, Shoulder and Hand) Outcome Measure. Methods: Three item-reduction techniques were used on the cross-sectional field-testing data derived from a study of 407 patients with various upper-limb conditions. These techniques were the concept-retention method, the equidiscriminative item-total correlation, and the item response theory (Rasch modeling). Three eleven-item scales were created. Data from a longitudinal cohort study in which the DASH questionnaire was administered to 200 patients with shoulder and wrist/hand disorders were then used to assess the reliability (Cronbach alpha and test-retest reliability) and validity (cross-sectional and longitudinal construct) of the three scales. Results were compared with those derived with the full DASH. Results: The three versions were comparable with regard to their measurement properties. All had a Cronbach alpha of ≥0.92 and an intraclass correlation coefficient of ≥0.94. Evidence of construct validity was established (r ≥ 0.64 with single-item indices of pain and function). The concept-retention method, the most subjective of the approaches to item reduction, ranked highest in terms of its similarity to the original DASH. Conclusions: The concept-retention version is named the QuickDASH. It contains eleven items and is similar with regard to scores and properties to the full DASH. A comparison of item-reduction approaches suggested that the retention of clinically sensible and important content produced a comparable, if not slightly better, instrument than did more statistically driven approaches. Clinical Relevance: The QuickDASH is a more efficient version of the DASH outcome measure that appears to retain its measurement properties.

1,429 citations


Journal ArticleDOI
TL;DR: These commonly used gait performance tests are highly reliable and can be recommended to evaluate improvements in various aspects of gaitperformance in individuals with chronic mild to moderate hemiparesis after stroke.
Abstract: Objective: To assess the reliability of 6 gait performance tests in individuals with chronic mild to moderate post-stroke hemiparesis. Design: An intra-rater (between occasions) test-retest reliability study. Subjects: Fifty men and women (mean age 58 6.4 years) 6–46 months post-stroke. Methods: The Timed “Up & Go” test, the Comfortable and the Fast Gait Speed tests, the Stair Climbing ascend and descend tests and the 6-Minute Walk test were assessed 7 days apart. Reliability was evaluated with the intraclass correlation coefficient (ICC 2,1), the Bland & Altman analysis, the standard error of measurement (SEM and SEM%) and the smallest real difference (SRD and SRD%). Results: Test-retest agreements were high (ICC2,1 0.94–0.99) with no discernible systematic differences between the tests. The standard error of measurement (SEM%), representing the smallest change that indicates a real (clinical) improvement for a group of individuals, was small (9%). The smallest real difference (SRD%), representing the smallest change that indicates a real (clinical) improvement for a single individual, was also small (13–23%). Conclusion: These commonly used gait performance tests are highly reliable and can be recommended to evaluate improvements in various aspects of gait performance in individuals with chronic mild to moderate hemiparesis after stroke.

1,001 citations


Journal ArticleDOI
01 Jan 2005-Spine
TL;DR: The results of this study indicate that the Korean version of the ODI is a reliable and valid instrument for the measurement of disability in Korean patients with lower back problems.
Abstract: Study Design. Validation of a translated, culturally adapted questionnaire. Objectives. To translate and culturally adapt a Korean version of the Oswestry Disability Index (ODI) and to validate its use in Korean patients. Summary of Background Data. The ODI is one of the most widely used and validated instruments for measuring disability in spinal disorders. However, no validated Korean version of the index was available at the time our study was initiated. Methods. The study was carried out in three phases: the first was translation into Korean and cultural adaptation of the questionnaire; the second was a pilot study to assess the comprehensibility of the prefinal version and modification; the third was a reliability and validity study of the final version. The Korean version was tested on 206 patients with lumbar spinal disorders who had undergone operations at the authors’ institute. Test-retest reliability, internal consistency, concurrent validity, and construct validity were investigated. Follow-up questionnaires were obtained from 39 patients at the 3-month postoperative follow-up meeting. Differences in the ODI, visual analog scale (VAS), and World Health Organization (WHO) quality of life assessment (WHOQOL-BREF) between preoperative and follow-up questionnaires were evaluated. The correlation of the postoperative ODI with the pain rating on a visual analog scale and WHOQOL-BREF was also analyzed. Results. Test-retest reliability was assessed with 88 patients in a time interval of 48 hours. The intraclass correlation coefficient of test-retest reliability was 0.9167. Reliability estimated by the internal consistency reached a Cronbach’s alpha of 0.84. The correlation of the preoperative ODI with the pain rating on a visual analog scale (100 mm) was r 0.425 (P 0.0001). The correlation between three of the WHOQOL-BREF domains (physical health, psychological health, and environment) and the ODI was statistically significant. The correlation coefficient between the ODI and physical health domain of the WHOQOL-BREF was r 0.48 (P 0.05). The correlations with psychological health and environment domains were low with r 0.192 and 0.160, respectively, even though statistically significant (P 0.05). The correlation of the postoperative ODI with the pain rating on a visual analog scale (100 mm) was r 0.626 (P 0.0001). The correlation between all four domains of the WHOQOL-BREF and the postoperative ODI was statistically significant. Conclusions. The results of this study indicate that the Korean version of the ODI is a reliable and valid instrument for the measurement of disability in Korean patients with lower back problems. The authors recommend this Korean version of the ODI for use in future clinical studies in Korea.

528 citations


Journal ArticleDOI
TL;DR: In healthy children, the 6-min walk test is a reliable and valid functional test for assessing exercise tolerance and endurance and Bland and Altman plots demonstrated a high degree of repeatability.
Abstract: The aim of this study was to assess the reliability and validity of the 6-min walk test (6MWT) in healthy children. Chinese secondary school students were randomly recruited. They attended the current authors' unit on two occasions, separated by 2 weeks. Physical examination and standardised maximum incremental exercise testing on a treadmill were performed on the first visit. Spirometry and 6MWT were carried out on the second visit. A randomly selected subgroup was invited to return for repeat 6MWT at an interval of 2-4 weeks. Seventy-eight subjects were recruited; however, four failed to achieve maximal effort on exercise test. The final group included 43 young females and the mean+/-sd age of the subjects was 14.2+/-1.2 yrs. Physical examination was unremarkable in all cases. The mean+/-sd per cent predicted forced expiratory volume in one second was 91.4+/-10.2%. Concurrent validity was demonstrated by good correlation between the 6-min walking distance and maximum oxygen uptake determined on the exercise treadmill. Test-retest reliability was undertaken in 52 subjects, and the intraclass correlation coefficient (95% confidence interval) was calculated as 0.94 (0.89-0.96). In addition, Bland and Altman plots demonstrated a high degree of repeatability. In healthy children, the 6-min walk test is a reliable and valid functional test for assessing exercise tolerance and endurance.

345 citations


Journal ArticleDOI
TL;DR: Telephone administration of the PHQ-9 seems to be a reliable procedure for assessing depression in PC, and its internal consistency was high and close to the self-administered one.
Abstract: BACKGROUND: Telephone assessment of depression for research purposes is increasingly being used. The Patient Health Questionnaire 9-item depression module (PHQ-9) is a well-validated, brief, self-reported, diagnostic, and severity measure of depression designed for use in primary care (PC). To our knowledge, there are no available data regarding its validity when administered over the telephone. OBJECTIVE: The aims of the present study were to evaluate agreement between self-administered and telephone-administered PHQ-9, to investigate possible systematic bias, and to evaluate the internal consistency of the telephone-administered PHQ-9. METHODS: Three hundred and forty-six participants from two PC centers were assessed twice with the PHQ-9. Participants were divided into 4 groups according to administration procedure order and administration procedure of the PHQ-9: Self-administered/Telephone-administered; Telephone-administered/Self-administered; Telephone-administered/Telephone-administered; and Self-administered/Self-administered. The first 2 groups served for analyzing the procedural validity of telephone-administered PHQ-9. The last 2 allowed a test-retest reliability analysis of both self- and telephone-administered PHQ-9. Intraclass correlation coefficient (ICC) and weighted κ (for each item) were calculated as measures of concordance. Additionally, Pearson’s correlation coefficient, Student’s t-test, and Cronbach’s α were analyzed. RESULTS: Intraclass correlation coefficient and weighted κ between both administration procedures were excellent, revealing a strong concordance between telephone- and self-administered PHQ-9. A small and clinically nonsignificant tendency was observed toward lower scores for the telephone-administered PHQ-9. The internal consistency of the telephone-administered PHQ-9 was high and close to the self-administered one. CONCLUSIONS: Telephone and in-person assessments by means of the PHQ-9 yield similar results. Thus, telephone administration of the PHQ-9 seems to be a reliable procedure for assessing depression in PC.

338 citations


Journal ArticleDOI
TL;DR: The Portuguese version of the DASH is a reliable instrument, and the Ritchie Index showed a weak correlation with Brazilian DASH scores, while the visual analog scale of pain showed a good correlation with DASH score.
Abstract: The objective of the present study was to translate, adapt and validate a Brazilian Portuguese version of the Disabilities of the Arm, Shoulder and Hand (DASH) Questionnaire. The study was carried out in two steps. The first was to translate the DASH into Portuguese and to perform cultural adaptation and the second involved the determination of the reliability and validity of the DASH for the Brazilian population. For this purpose, 65 rheumatoid arthritis patients of either sex (according to the classification criteria of the American College of Rheumatology), ranging in age from 18 to 60 years and presenting no other diseases involving the upper limbs, were interviewed. The patients were selected consecutively at the rheumatology outpatient clinic of UNIFESP. The following results were obtained: in the first step (translation and cultural adaptation), all patients answered the questions. In the second step, Spearman's correlation coefficients for interobserver evaluation ranged from 0.762 to 0.995, values considered to be highly reliable. In addition, intraclass correlation coefficients ranged from 0.97 to 0.99, also highly reliable values. Spearman's correlation coefficients and the intraclass correlation coefficients obtained during intra-observer evaluation ranged from 0.731 to 0.937 and from 0.90 to 0.96, respectively, being highly reliable values. The Ritchie Index showed a weak correlation with Brazilian DASH scores, while the visual analog scale of pain showed a good correlation with DASH score. We conclude that the Portuguese version of the DASH is a reliable instrument.

297 citations


Journal ArticleDOI
TL;DR: The authors' examination of the 3 measures for 12 weeks extends previous evidence of the stability of these strength measures and justifies the use of hand-held dynamometry and the STS test when investigating limitations in mobility.
Abstract: The purpose of this study was to describe the reliability and validity of 3 strength measures obtained from community-dwelling elderly individuals. The strength of 10 elders was tested initially and 6 and 12 weeks later using the MicroFET 2 hand-held dynamometer (knee extension strength), the Jamar dynamometer (grip strength), and the sit-to-stand (STS) test. Mobility was tested using the timed up-and-go (TUG) test and a timed walk test. Intraclass correlation coefficients, which were used to characterize the reliability of the strength tests, ranged from 0.807 to 0.981. Pearson correlations between the lower extremity strength measures and the TUG and gait speed ranged from 0.635 to -0.943. Our examination of the 3 measures for 12 weeks extends previous evidence of the stability of these strength measures and justifies the use of hand-held dynamometry and the STS test when investigating limitations in mobility.

260 citations


Journal ArticleDOI
TL;DR: The cross-cultural adaptation to Portuguese and the psychometric evaluation of the resilience scale developed by Wagnild & Young showed good results in the semantic equivalence for: general meaning and referential meaning and there was an inverse correlation with the scale that evaluates psychological violence.
Abstract: This study describes the cross-cultural adaptation to Portuguese and the psychometric evaluation of the resilience scale developed by Wagnild & Young. The scale was adapted for a sample of students from public schools in Sao Goncalo, Rio de Janeiro, Brazil. Data from the pilot study (203 students interviewed at two points in time) and from the entire study (977) are presented. The cross-cultural adaptation showed good results in the semantic equivalence for: general meaning (above 90.0%) and referential meaning (above 85.0%). Chronbach alpha was 0.85 in the pilot study and 0.80 in the total sample. Kappa between the two points in time was regular and moderate, and the intraclass correlation coefficient was 0.746 (p = 0.000). Factorial analysis indicated three non-homogeneous factors. Construct validity demonstrated direct and significant correlation with self-esteem, family supervision, life satisfaction, and social support. There was an inverse correlation with the scale that evaluates psychological violence.

228 citations


Journal ArticleDOI
TL;DR: Measurements of hand-grip strength obtained from elders over a 12-week period are reliable, and test and retest measurements did not differ significantly over time on either side.

228 citations


Journal ArticleDOI
TL;DR: The Short-Form McGill Pain Questionnaire was demonstrated to be a highly reliable measure of pain and should be generalized to a more elderly population, as increasing age was correlated with greater variability of the sensory component scores.
Abstract: Objectives: No previous study has adequately demonstrated the test-retest reliability of the Short-Form McGill Pain Questionnaire, yet it is increasingly being used as a measure of pain. This study evaluates the test-retest reliability in patients with osteoarthritis. Methods: A prospective, observational cohort study was undertaken using serial evaluation of 57 patients at 2 time points. A sample of patients awaiting primary hip or knee joint replacement surgery were recruited in clinic or via mail (mean age 64.8 years). Short-Form McGill Pain Questionnaires were delivered by mail 5 days apart, and a supplementary questionnaire was completed on the second occasion to explore if the patients’ pain report had remained stable. Results: The intraclass correlation coefficient was used as an estimate of reliability. For the total, sensory, affective, and average pain scores, high intra-class correlations were demonstrated (0.96, 0.95, 0.88, and 0.89, respectively). The current pain component demonstrated a lower intraclass correlation of 0.75. The coefficient of repeatability was calculated as an estimation of the minimum metrically detectable change. The coefficients of repeatability for the total, sensory, affective, average, and current pain components were 5.2, 4.5, 2.8, 1.4 cm, and 1.4, respectively. Discussion: Problems of adequate completion of the Short-Form McGill Pain Questionnaire were highlighted in this sample, and supervision via telephone contact was required. Patients recruited in clinic who had practiced completing the Short-Form McGill Pain Questionnaire demonstrated fewer errors than those recruited by mail. The Short-Form McGill Pain Questionnaire was demonstrated to be a highly reliable measure of pain. These results should not be generalized to a more elderly population, as increasing age was correlated with greater variability of the sensory component scores.

Journal ArticleDOI
TL;DR: There is a need for a standardized clinical grading system for a more objective and accurate assessment of the severity of hand eczema (HE).
Abstract: Summary Background There is a need for a standardized clinical grading system for a more objective and accurate assessment of the severity of hand eczema (HE). Objectives To develop and validate a scoring system called the hand eczema severity index (HECSI) designed for clinical assessment of HE. Methods Twelve dermatologists (observers) assessed 15 HE patients twice, with an interval of 30 min. The study was performed blinded for the observers, and only the hands and wrists of the patients were visible to the observers. Agreement between the observers was determined by using the intraclass correlation coefficient (ICC), which is the correlation between (single) ratings of the same patient. Results ICC for total HECSI score was 0·79 at the first assessment and 0·84 at the second assessment. ICC for intraobserver agreement was 0·90. Conclusions Overall excellent agreement existed for both inter- and intraobserver reliability and the scoring system is suggested for use in future clinical studies on HE. Because HECSI is an entirely objective assessment of clinical signs, in addition, inclusion of patient-rated symptoms should be considered.

Journal ArticleDOI
TL;DR: In this paper, the authors assessed the variability of the UPDRS motor examination (UPDRS-ME) of nurse practitioners, residents in neurology, and a movement disorders specialist (MDS) compared to a senior MDS.
Abstract: The Unified Parkinson's Disease Rating Scale (UPDRS) is widely used for the clinical evaluation of Parkinson's disease (PD). We assessed the rater variability of the UPDRS Motor examination (UPDRS-ME) of nurse practitioners, residents in neurology, and a movement disorders specialist (MDS) compared to a senior MDS. We assessed the videotaped UPDRS-ME of 50 PD patients. Inter-rater and intra-rater variability were estimated using weighted kappa (kappa(w)) and intraclass correlation coefficients (ICC). Additionally, inter-rater agreement was quantified by calculation of the mean difference between 2 raters and its 95% limits of agreement. Intra-rater agreement was also estimated by calculation of a 95% repeatability limits. The kappa(w) and ICC statistics indicated good to very good inter-rater and intra-rater reliability for the majority of individual UPDRS items and the sum score of the UPDRS-ME in all raters. However, for inter-rater agreement, it appeared that both nurses, residents, and the MDS consistently assigned higher scores than the senior MDS. Mean differences ranged between 1.7 and 5.4 (all differences P < 0.05), with rather wide 95% limits of agreement. The intra-rater 95% repeatability limits were rather wide. We found considerable rater difference for the whole range of UPDRS-ME scores between a senior MDS and nurse practitioners, residents in neurology, and the MDS. This finding suggests that the amount by which raters may disagree should be quantified before starting longitudinal studies of disease progression or clinical trials. Finally, evaluation of rater agreement should always include the assessment of the extent of bias between different raters.

Journal ArticleDOI
TL;DR: The results suggest that when children who are younger are not able to evaluate QOL assessment due to their developmental limitation or severity of illness, parents can provide valid information about their QOL, however, parent‐proxy of QOL for adolescents provides significantly different information than self‐report and proxy data of Q OL for adolescents should be used with caution.
Abstract: Assessment of children' Quality of life (QOL) is a special challenge for clinicians and researchers because different cognitive abilities of children at various ages and illness levels are so varied. In addition, statistical strategies reported to evaluate proxy agreement have been inconclusive. The specific aims of this study were to examine agreement between child self-reports and parent proxy-reports to evaluate QOL in a sample of pediatric cancer patients. Previously tested QOL instruments (Quality of Life for Children with Cancer, QOLCC) were completed by 141 patients (82 children and 59 adolescents) and 141 of their parents. Three different statistical approaches were employed to evaluate convergence of self-report and proxy-report: product-moment correction coefficient, intraclass correlation (ICC), and comparison of group means. In addition, scatter bias was used to examine the degree of differences across the range of measurement. Our findings indicate that neither Pearson product correlation, ICC or group difference provided enough information to detect the individual differences of measures of QOL. We found that scatter bias should be supplemented to quantify the degree of individual-level differences. The results suggest that when children who are younger are not able to evaluate QOL assessment due to their developmental limitation or severity of illness, parents can provide valid information about their QOL. However, parent-proxy of QOL for adolescents provides significantly different information than self-report and proxy data of QOL for adolescents should be used with caution.

Journal ArticleDOI
TL;DR: The L Test is a 20-m test of basic mobility skills that includes 2 transfers and 4 turns that demonstrated excellent measurement properties in this study.
Abstract: Background and Purpose. Walk tests provide essential outcome information when assessing ambulation of individuals with lower-limb amputation and a prosthetic device. Existing tests have limitations such as ceiling effects or insufficient challenge. The objective of this study was to assess the reliability and validity of data for a clinical measure of basic mobility, the L Test of Functional Mobility (L Test). Subjects. For this methodological study, 93 people with unilateral amputations (74% transtibial, 26% transfemoral; 78% male, 22% female; mean age=55.9 years) were consecutively recruited from an outpatient clinic. Twenty-seven subjects returned for retesting. Methods. To assess concurrent validity, subjects completed the L Test, Timed “Up & Go” Test (TUG), 10-Meter Walk Test, and 2-Minute Walk Test, followed by the Activities-specific Balance Confidence scale, Frenchay Activities Index (FAI), and mobility subscale of the Prosthetic Evaluation Questionnaire (PEQ-MS). Amputation cause and level, walking aid use, automatic stepping, and age variables were used to assess discriminant validity. Results. Intraclass correlation coefficients were .96 for interrater reliability and .97 for intrarater reliability, and minimal bias existed upon retesting. The magnitude of concurrent validity correlations ( r ) was very high between the L Test data and data for other walk tests and fair to moderate between the L Test data and data for self-report measures. The L Test discriminated between all groups as hypothesized. Discussion and Conclusion. The L Test is a 20-m test of basic mobility skills that includes 2 transfers and 4 turns. It demonstrated excellent measurement properties in this study.

Journal ArticleDOI
TL;DR: This study showed that the CHQ-PF28 resulted in score distributions, and discriminative validity that are comparable to its longer counterpart, but that the internal consistency of most individual scales was low.
Abstract: Study objectives: This study assessed the feasibility, reliability, and validity of the 28 item short child health questionnaire parent form (CHQ-PF28) containing the same 13 scales, but only a subset of the items in the widely used 50 item CHQ-PF50. Design: Questionnaires were sent to a random regional sample of 2040 parents of schoolchildren (4–13 years); in a random subgroup test-retest reliability was assessed (n = 234). Additionally, the study assessed CHQ-PF28 score distributions and internal consistencies in a nationwide general population sample of (parents of) children aged 4–11 (n = 2474) from Statistics Netherlands. Main results: Response was 70%. In the school and general population samples seven scales showed ceiling effects. Both CHQ summary measures and one multi-item scale showed adequate internal consistency in both samples (Cronbach’s α>0.70). One summary measure and one scale showed excellent test-retest reliability (intraclass correlation coefficient >0.70); seven scales showed moderate test-retest reliability (intraclass correlation coefficient 0.50–0.70). The CHQ could discriminate between a subgroup with no parent reported chronic conditions (n = 954) and subgroups with asthma (n = 134), frequent headaches (n = 42), and with problems with hearing (n = 38) (Cohen’s effect sizes 0.12–0.92; p Conclusions: This study showed that the CHQ-PF28 resulted in score distributions, and discriminative validity that are comparable to its longer counterpart, but that the internal consistency of most individual scales was low. In community health applications, the CHQ-PF28 may be an acceptable alternative for the longer CHQ-PF50 if the summary measures suffice and reliable estimates of each separate CHQ scale are not required.

Journal ArticleDOI
TL;DR: It is suggested that cluster randomization may substantially increase the sample size necessary to maintain adequate statistical power for selected outcomes such as diastolic blood pressure studies compared with simple randomization for most outcomes evaluated in this study where the design effect is small to moderate.

Journal ArticleDOI
TL;DR: A lower-extremity activity scale was responsive, accurately reflecting changes in the patient's condition between baseline and the time of follow-up, and it will become a useful, practical adjunct to objective clinical decision-making and intervention for patients undergoing arthroplasty.
Abstract: Background: Valid outcome measurement tools are required to reliably demonstrate the effectiveness and clinical outcomes of lower-extremity arthroplasty. Having ascertained a lack of a practical and valid measure of the change in actual daily physical activity that occurs prior to and following lower-limb arthroplasty, we developed and validated a lower-extremity activity scale. Methods: The eighteen-level self-administered scale was developed with the aid of content experts to ensure face validity. Validity and reliability were assessed with the use of (1) pedometer measurements of seventy subjects over seven days; (2) next-of-kin proxy measurements of the activity levels of ninety patients before they underwent lower-limb arthroplasty; and (3) application, and correlation with the Western Ontario and McMaster Universities Osteoarthritis Index scores, in a prospective seventeen-center clinical study of 297 consecutive patients undergoing revision total knee arthroplasty. In this latter study, demographic and comorbidity data were also collected. Univariate and bivariate correlations were performed, and a multivariate structured equation modeling approach was used to further test responsiveness, reliability, and validity of the lower-extremity activity scale. Results: Pedometer readings correlated with the activity levels derived with the lower-extremity activity scale (r = 0.79). Of note was the finding that age, weight, and body mass index did not correlate well with the average number of steps per day (r = -0.32, -0.32, and -0.25, respectively). A significant correlation was found between the lower-extremity activity scores recorded by the patients and those reported by their next of kin (Pearson correlation, r = 0.715; p = 0.0001) and between the initial lower-extremity activity scores and two-week-retest scores (intraclass correlation = 0.9147; p < 0.0001), demonstrating the validity and reliability of the scale. The lower-extremity activity scale was responsive, accurately reflecting changes in the patient's condition between baseline and the time of follow-up (p < 0.001), and it was reliable, with baseline values correlating with follow-up scores (p < 0.001). The convergent validity of the lower-extremity activity scale was established by correlations with the function scores (r = -0.301, p < 0.001) and pain scores (r = -0.241, p < 0.001) derived with the Western Ontario and McMaster Universities Osteoarthritis Index and with a higher number of comorbidities (r = -0.244, p < 0.001). Multivariate path modeling further demonstrated diminished activity in patients who had more difficulty in functioning and a greater number of comorbidities. Conclusions: We developed a lower-extremity activity scale and validated that it was an effective instrument for the assessment of patients' actual activity levels. It is easy to apply and interpret, and it is valid and ready for use in the clinical setting. This scale will allow more accurate analysis and prediction of outcomes. Consequently, it will become a useful, practical adjunct to objective clinical decision-making and intervention for patients undergoing arthroplasty.

Journal ArticleDOI
01 Nov 2005-Spine
TL;DR: This study demonstrated that, if measures are to be used across cultures, the items must not only be translated well linguistically but also must be culturally adapted to maintain the content validity of the instrument at a conceptual level across different cultures.
Abstract: Study design Outcome study to determine the internal consistency, and validity of adapted Turkish version of Scoliosis Research Society-22 (SRS-22) Instrument. Objectives To evaluate the validity and reliability of adapted Turkish Version of SRS-22 questionnaire. Summary of background data The SRS-22 questionnaire is a widely accepted questionnaire to assess the health-related quality of life for scoliotic patients in the United States. However, its adaptation in languages other than the source language is necessary for its multinational use. Methods Translation/retranslation of the English version of the SRS-22 was done, and all steps for cross-cultural adaptation process were performed properly by an expert committee. Later, SRS-22 questionnaires and previously validated Short Form-36 (SF-36) outcome instruments were mailed to 82 patients who had been surgically treated for idiopathic scoliosis. All patients had a minimum of 2 years follow-up. Fifty-four patients (66%) responded to the first set of questionnaires. Forty-seven of the first time respondents returned their second survey. The average age of the 47 patients (12 male, 35 female) was 19.8 years (range, 14-31 years). The two measures of reliability as internal consistency and reproducibility were determined by Cronbach alpha statistics and intraclass correlation coefficient, respectively. Concurrent validity was measured by comparing with an already validated questionnaire (SF-36). Measurement was made using the Pearson correlation coefficient (r). Results The study demonstrated satisfactory internal consistency with high Cronbach alpha values for the four of the corresponding domains (pain, 0.72; self-image, 0.80; mental health, 0.72; and satisfaction, 0.83). However, the Cronbach alpha value for function/activity domain (0.48) was considerably lower than the original questionnaire. The intraclass correlation coefficient for the same domains was 0.80, 0.82, 0.78, 0.81, and 0.76, respectively, demonstrating a satisfactory test/retest reproducibility. Considering concurrent validity, two domains had excellent correlation (r = 0.75-1), while 9 had good correlation (r = 0.50 to 0.75), and 6 had moderate correlation (r =0.25-0.50). Based on these results, question 18 in the function/activity domain with lower Cronbach alpha value was revised while question 15 was excluded. The revised SRS-22 was given to 30 adolescent idiopathic scoliosis patients not included in the index study. The revision could improve the Cronbach alpha value for function/activity domain from 0.48 to 0.81. Conclusion This study demonstrated that, if measures are to be used across cultures, the items must not only be translated well linguistically but also must be culturally adapted to maintain the content validity of the instrument at a conceptual level across different cultures. This may necessitate several validation studies to ensure and improve consistency in the content and face validity between source and target versions of a questionnaire due to difficulty in detecting subtle differences in the living habits of different cultures.

Journal ArticleDOI
TL;DR: Although the repeatability of posture was improved in the sagittal view, when a biological measure was used instead of an external vertical reference to calculate spinal angles, individual subject posture was still variable and brings into question the effectiveness and validity of using surface skin markers to track postural changes due to clinical interventions.

Journal ArticleDOI
TL;DR: The OAKHQOL is the first specific knee and hip OA quality of life instrument that meets psychometric requirements for validity and reliability and followed an a priori structured strategy to ensure content validity.

Journal ArticleDOI
TL;DR: Multilevel models provide a more accurate and comprehensive description of relationships in clustered data than do conventional models, by correcting underestimated standard errors, by estimating components of variance at several levels, and by estimating cluster-specific intercepts and slopes.
Abstract: Background Multilevel models were designed to analyze data generated from a nested structure (e.g., nurses within hospitals) because conventional linear regression models underestimate standard errors and, in turn, overestimate test statistics. Objectives To introduce 2 types of multilevel models, the random intercept model and the random coefficient model, to describe the correlation among observations within a cluster, and to demonstrate how to identify the superior model. Method The conceptual and mathematical bases for the 2 multilevel model types are presented. Intraclass correlation is defined and assessment of model fit is detailed. An empirical example is presented in which average work hours per week and burnout are analyzed using data from 4,320 staff nurses clustered in 19 hospitals. Results Average work hours were positively associated with nurse burnout. The multilevel models corrected the problem of underestimated standard errors in conventional linear regression models. Graphs displaying the hospital-level differences illustrated the 2 multilevel model types. Although the multilevel models corrected the underestimation of standard errors, the results did not differ substantively for the conventional or the 2 multilevel models. The intraclass correlation coefficient was .044, indicating that the extent of shared variance among nurses in a hospital was low. The random intercept model fit the data better than did the random coefficient model. Conclusions Multilevel models provide a more accurate and comprehensive description of relationships in clustered data than do conventional models, by correcting underestimated standard errors, by estimating components of variance at several levels, and by estimating cluster-specific intercepts and slopes.

Journal ArticleDOI
TL;DR: The prevalence of an outcome may be used to make an informed assumption about the magnitude of the intraclass correlation coefficient in a range of outcomes in community and health services settings.

Journal ArticleDOI
TL;DR: The RLSQoL is a valid and reliable measure of the impact of RLS on QoL and is responsive to short-term changes in symptom severity and appears to be an appropriate tool for trial-based assessments of treatments for RLS.

Journal ArticleDOI
TL;DR: The reproducibility of the mean performance and satisfaction scores was moderate, but it was poor for the scores of the separate problems, therefore, the mean scores should be used for individual assessment.
Abstract: Objective: To assess the reproducibility (reliability and inter-rater agreement) of the client-centred Canadian Occupational Performance Measure (COPM).Design: The COPM was administered twice, with a mean interval of seven days (SD 1.6, range 4-14), by two different occupational therapists. Data analysis was based on intraclass correlation coefficients, the Bland and Altman method and Cohen's weighted kappas.Setting: Occupational therapy departments of two university medical centres.Subjects: Consecutive clients, with various diagnoses, newly referred to the outpatient clinic of two occupational therapy departments, were included. They were all over 18 years of age and perceived limitations in more than one activity of daily life. Complete data on 95 clients were obtained: 31 men and 64 women.Results: Sixty-six per cent of the activities prioritized at the first assessment were also prioritized at the second assessment. The intraclass correlation coefficients were 0.67 (95% confidence interval (CI) 0.54-0...

Journal ArticleDOI
TL;DR: The test–retest reliability of a number of simple measures of physical performance is excellent when used with older people following hip fracture.
Abstract: Objective: To investigate the test–retest reliability of measures of strength, balance, gait and functional performance when used with older people following hip fracture.Subjects: Thirty people (16 hospital inpatients and 14 community dwellers).Design: Subjects underwent two assessments: one day apart for the hospital inpatients and one week apart for the community dwellers.Measurement: Strength (dynamometer, sphygmomanometer, spring balance, lateral step-up ability), balance (sway-meter, Functional Reach Test, single leg stance time, Step Test), gait (timed 6-m walk with steps taken, base of support and step length), and functional performance (PPME total score and timed supine-to-sit and sit-to-stand) were measured.Results: Eleven of the 14 continuously scaled measurement tools achieved excellent reliability (intraclass correlation coefficient (ICC)>0.75) for one or more tests. A hand-held dynamometer was found to be the tool with the highest test–retest reliability for measuring hip muscle strength (I...

Journal ArticleDOI
TL;DR: The PFP-10 yields valid, reliable, and sensitive measurements and can be confidently substituted for the CS-PFP, and is sensitive to change.
Abstract: Background and Purpose. The Continuous-Scale Physical Functional Performance Test (CS-PFP) can be used to obtain valid, reliable, and sensitive measurements of physical functional capacity. This test requires a fixed laboratory space and approximately 1 hour to administer. This study was carried out in 4 steps, or substudies, to develop and validate a short, community-based version (PFP-10) that requires less space and equipment than the CS-PFP. Subjects and Methods. Retrospective data (n=228) and prospective data (n=91) on men and women performing the CS-PFP or the PFP-10 are reported. A 12-week exercise program was used to examine sensitivity to change. Data analyses were done using paired t -test, Pearson correlation, intraclass correlation coefficient (ICC), and delta index (DI) procedures. Results. The PFP-10 total score and 4 of the 5 domain scores were statistically similar (within 3%) to those of the CS-PFP. The PFP-10 upper-body strength domain score was 17% lower, but was highly correlated (ICC=.97). Community and established laboratory PFP-10 scores were similar (ICC=.85–.97). The PFP-10 also is sensitive to change (DI=.21–.54). Discussion and Conclusion. The PFP-10 yields valid, reliable, and sensitive measurements and can be confidently substituted for the CS-PFP.

Journal ArticleDOI
TL;DR: The clinimetric performance of paper/pencil versions of self reported health status measures was similar to an electronic version, using an inexpensive PDA.
Abstract: Background: Increasing use of self reported health status in clinical practice and research, as well as patient appreciation of monitoring fluctuations of health over time, suggest a need for more frequent collection of data. Electronic use of health status measures in the follow up of patients is a possible way to achieve this. Objective: To compare self reported health status measures in a personal digital assistant (PDA) version and a paper/pencil version for test–retest reliability, agreement between scores, and feasibility. Methods: 30 patients with stable rheumatoid arthritis (mean age 61.6 years, range 49.8 to 70.0; mean disease duration, 16.7 years; 63% female; 67% rheumatoid factor positive; 46.6% on disease modifying antirheumatic drugs) completed self reported health status measures (pain, fatigue, and global health on visual analogue scales (VAS), rheumatoid arthritis disease activity index, modified health assessment questionnaire, SF-36) in a conventional paper based questionnaire version and on a PDA (HP iPAQ, model h5450). Completion was repeated after five to seven days. Results: Test–retest reliability was similar, as evaluated by the Bland–Altman approach, the coefficient of variation, and intraclass correlation coefficients. The scores showed acceptable agreement, but with a slight tendency to higher scores on VAS with the PDA than the paper/pencil version. No significant differences were seen for measures of feasibility (time to complete, satisfaction score), but 65.5% preferred PDA, 20.7% preferred paper, and 13.8% had no preference. Conclusions: The clinimetric performance of paper/pencil versions of self reported health status measures was similar to an electronic version, using an inexpensive PDA.

Journal ArticleDOI
TL;DR: There was no evidence for better discriminative capacity or responsiveness for the 15D, than for the two other multiattribute measures, although many of the measurement properties were similar.
Abstract: Objective: This article compares preference-based utilities from the multiattribute utility instrument 15D with those derived from the EQ-5D and the Short Form 36 (SF-6D) in patients with HIV/AIDS. In particular, we wanted to examine if the finer descriptive system of the 15D would result in better discriminative capacity or responsiveness. Methods: In a prospective observational study of 60 Norwegian patients with HIV/AIDS from two hospitals, the authors compared scores, assessed associations with disease staging systems, and assessed test–retest reliability and responsiveness of the instruments. Results: On average, the 15D gave higher utility scores than the other two measures, the mean utility scores were: 15D – 0.86, SF-6D – 0.73, and EQ-5D Index – 0.77. Test-retest reliability was acceptable for all measures, with intraclass correlation coefficients between 0.78 and 0.94. The correlation between scores of the 3 scales was substantial (ρ=0.74–0.80). There was no major difference in responsiveness between the measures. Conclusions: The different measures gave different utility values in this sample of patients with HIV/AIDS, although many of the measurement properties were similar. There was no evidence for better discriminative capacity or responsiveness for the 15D, than for the two other multiattribute measures.

Journal ArticleDOI
TL;DR: It is suggested that parents of children with CI provide reasonable estimates of their child's pain, particularly when using a structured pain tool, while parents may, however, tend to overestimate their children's pain during the early postoperative period.