scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 1999"


Journal ArticleDOI
TL;DR: The LEFS is reliable, and construct validity was supported by comparison with the SF-36, and the sensitivity to change of the LEFS was superior to that of theSF-36 in this population.
Abstract: Background and Purpose. The purpose of this study was to assess the reliability, construct validity, and sensitivity to change of the Lower Extremity Functional Scale (LEFS). Subjects and Methods. The LEFS was administered to 107 patients with lower-extremity musculoskeletal dysfunction referred to 12 outpatient physical therapy clinics. Methods. The LEFS was administered during the initial assessment, 24 to 48 hours following the initial assessment, and then at weekly intervals for 4 weeks. The SF-36 (acute version) was administered during the initial assessment and at weekly intervals. A type 2,1 intraclass correlation coefficient was used to estimate test-retest reliability. Pearson correlations and one-way analyses of variance were used to examine construct validity. Spearman rank-order correlation coefficients were used to examine the relationship between an independent prognostic rating of change for each patient and change in the LEFS and SF-36 scores. Results. Test-retest reliability of the LEFS scores was excellent ( R =.94 [95% lower limit confidence interval (CI)=.89]). Correlations between the LEFS and the SF-36 physical function subscale and physical component score were r =.80 (95% lower limit CI=.73) and r =.64 (95% lower limit CI=.54), respectively. There was a higher correlation between the prognostic rating of change and the LEFS than between the prognostic rating of change and the SF-36 physical function score. The potential error associated with a score on the LEFS at a given point in time is ±5.3 scale points (90% CI), the minimal detectable change is 9 scale points (90% CI), and the minimal clinically important difference is 9 scale points (90% CI). Conclusion and Discussion. The LEFS is reliable, and construct validity was supported by comparison with the SF-36. The sensitivity to change of the LEFS was superior to that of the SF-36 in this population. The LEFS is efficient to administer and score and is applicable for research purposes and clinical decision making for individual patients.

1,348 citations


Journal ArticleDOI
TL;DR: The Mini Asthma Quality of Life Questionnaire has good measurement properties but they are not quite as strong as those of the original Asthma quality of life Questionnaire.
Abstract: The 32-item Asthma Quality of Life Questionnaire (AQLQ) has shown good responsiveness, reliability and construct validity; properties that are essential for use in clinical trials, clinical practice and surveys. However, to meet the needs of large clinical trials and long-term monitoring, where efficiency may take precedent over precision of measurement, the 15-item self-administered MiniAQLQ has been developed. The MiniAQLQ was tested in a 9-week observational study of 40 adults with symptomatic asthma. Patients completed the MiniAQLQ, the AQLQ, the Short Form (SF)-36, the Asthma Control Questionnaire and spirometry at baseline, 1, 5 and 9 weeks. In patients whose asthma was stable between clinic visits, reliability was very acceptable for the MiniAQLQ (intraclass correlation coefficient (ICC)=0.83), but not quite as good as for the AQLQ (ICC=0.95). Similarly, responsiveness in the MiniAQLQ (p=0.0007) was good but not quite so good as for the AQLQ (p<0.0001). Construct validity (correlation with other indices of health status) was strong for both the MiniAQLQ and the AQLQ. Criterion validity showed that there was no bias between the instruments (p=0.61) and the correlation between them was high (r=0.90). The Mini Asthma Quality of Life Questionnaire has good measurement properties but they are not quite as strong as those of the original Asthma Quality of Life Questionnaire. The choice of questionnaire should depend on the task at hand.

688 citations


Journal ArticleDOI
TL;DR: Satisfactory test-retest reliability was demonstrated for two empirically derived subscales, the MASC-10 and Anxiety Index, and stability was unaffected by age or gender, but was lower for African American than Caucasian subjects.

385 citations


Journal ArticleDOI
TL;DR: The data demonstrated that the Cincinnati Knee Rating System has acceptable reliability, validity, and responsiveness for use in outcome studies after knee ligament reconstruction.
Abstract: Although many instruments are used to assess outcome after knee ligament reconstruction, their reliability, validity, and responsiveness have not been adequately proven. Our purpose was to assess these statistical measures in a commonly used instrument, the Cincinnati Knee Rating System. Reliability was determined from the responses of 100 subjects who completed the instrument twice, a mean of 7 days apart. Validity and responsiveness were assessed from 250 patients observed for at least 2 years after autogenous ACL reconstruction. Questionnaire items included symptoms, functional limitations with sports and daily activities, patient perception of the knee condition, and sports- and occupational-activity levels. The items demonstrated high test-retest reliability, supporting their use in evaluating groups of patients between two different treatment periods (all intraclass correlation coefficients > 0.70). In addition, the questionnaire demonstrated good content validity, construct validity, and item-discriminant validity. For the overall rating score, no "floor effects" (worst score possible) were found before or after surgery. No "ceiling effects" (best score possible) were found before surgery, and, at follow-up, these effects were calculated in only 22 patients (9%). The questions were found to be highly responsive to detecting changes between evaluations. The data demonstrated that this rating system has acceptable reliability, validity, and responsiveness for use in outcome studies after knee ligament reconstruction.

308 citations


Journal ArticleDOI
TL;DR: This paper reviews many different estimators of intraclass correlation that have been proposed for binary data and compares them in an extensive simulation study to identify several useful estimators.
Abstract: This paper reviews many different estimators of intraclass correlation that have been proposed for binary data and compares them in an extensive simulation study. Some of the estimators are very specific, while others result from general methods such as pseudo-likelihood and extended quasi-likelihood estimation. The simulation study identifies several useful estimators, one of which does not seem to have been considered previously for binary data. Estimators based on extended quasi-likelihood are found to have a substantial bias in some circumstances.

260 citations


Journal ArticleDOI
TL;DR: This study provides some evidence of EQ-5D construct validity and reliability, however, the restricted and non-normal distribution of scores, the marked difference between patients' self evaluation and derived societal utility tariffs, as well as the lack of discriminative ability for patients with 'moderate' morbidity within each of the five EQ- 5D dimensions, are of concern.
Abstract: Objective. To assess the reliability and validity of the EuroQol (EQ-5D) for osteoarthritis of the knee (OA knee). Methods. Eighty-two patients with OA knee were asked to complete on two occasions, separated by 1 week, the EQ-5D, the Western Ontario and McMaster Universities (WOMAC) Osteoarthritis Index and the 36-item short form of the Medical Outcomes Study (SF-36). Results. In this patient population, < 10% of the 243 EQ-5D health states were active. The EQ-5D demonstrated a non-Gaussian distribution. Reliability [intraclass correlation coefficient (ICC) = 0.70] is acceptable for aggregate level data. There were significant rank correlations with both the WOMAC and SF-36. Conclusions. This study provides some evidence of EQ-5D construct validity and reliability. However, the restricted and non-normal distribution of scores, the marked difference between patients' self evaluation and derived societal utility tariffs, as well as the lack of discriminative ability for patients with 'moderate' morbidity within each of the five EQ-5D dimensions, are of concern.

223 citations


Journal ArticleDOI
TL;DR: This psychometric evaluation provides empirical support for the reliability and validity of the LIFE-RIFT, a brief measure of functional impairment, which showed that those in episode were significantly more impaired than those in recovery.
Abstract: Background. The literature documents that functional impairment is associated with affective disorders. Nevertheless, the choice among thorough, yet brief, well-validated assessments of functional impairment is limited. The objective of this study was to evaluate the psychometric properties of a brief scale of functional impairment, the Range of Impaired Functioning Tool (LIFE–RIFT).Method. The study sample included subjects who presented with major depressive disorder at intake into the NIMH Collaborative Depression Study (CDS). The LIFE–RIFT is composed of items that are included in the Longitudinal Interval Follow-up Evaluation (LIFE). The reliability and validity were examined using data from LIFE–RIFT assessments conducted at four points in time: 6, 12, 18 and 24 months after intake into the CDS.Results. Cross-sectional one factor models accounted for the covariance structure among the four scale items. A longitudinal factor model, with an invariant factor structure over time, also fitted the data well and indicated that the scale items are measures of one construct, namely functional impairment. The internal consistency reliability of the scale was supported with alpha coefficients ranging from 0·81 to 0·83. The inter-rater reliability intraclass correlation coefficient (ICC) was 0·94. Mixed-effect linear regression models showed that those in episode were significantly more impaired than those in recovery. Furthermore, in analyses of predictive validity, impairment was positively associated with subsequent recurrence and negatively associated with subsequent recovery.Conclusions. This psychometric evaluation provides empirical support for the reliability and validity of the LIFE–RIFT, a brief measure of functional impairment.

204 citations


Journal ArticleDOI
01 Jul 1999-Cancer
TL;DR: Although the prediction of the duration of life of patients with end of life cancer most often relies on the clinical estimation of survival (CES) made by the treating physician, the accuracy and practical value of CES remains controversial.
Abstract: BACKGROUND Although the prediction of the duration of life of patients with end of life cancer most often relies on the clinical estimation of survival (CES) made by the treating physician, the accuracy and practical value of CES remains controversial. METHODS The authors prospectively evaluated the accuracy of CES in an inception and population-based cohort of 233 cancer patients who were seen at the onset of their terminal phase. They also systematically reviewed the literature on CES in advanced or end-stage cancer patients in MEDLINE, CANCERLIT, and EMBASE data bases, using two search strategies developed by a research librarian. RESULTS CES had low sensitivity in detecting patients who died within shorter time frames (≤2 months), and a tendency to overestimate survival was noted. A moderate correlation was observed between actual survival (AS) and CES (Pearson correlation coefficient = 0.47, intraclass correlation coefficient = 0.46, weighted kappa coefficient = 0.42). CONCLUSIONS Treating physicians appear to overestimate the duration of life of end of life ill cancer patients, particularly those patients who die early in the terminal phase and who may potentially benefit from earlier participation in palliative care programs. CES should be considered one of many criteria, rather than a unique criterion, by which to choose therapeutic intervention or health care programs for patients in the end of life cancer phase. Cancer 1999;86:170–6. © 1999 American Cancer Society.

177 citations


Journal ArticleDOI
TL;DR: The posturographic protocol has the potential to be a useful tool for evaluating severity and nature of postural instability and the effects of pharmacologic and rehabilitative treatment and results indicate that combining direct body measurements with force-plate data has the Potential to expose the underlying impairments that cause disequilibrium.

129 citations


Journal ArticleDOI
TL;DR: These findings question the test-retest reliability of the RPE scale when used to monitor subjective estimates of exercise intensity in progressive (or graded) exercise tests.
Abstract: Objective—To assess the test-retest reliability (repeatability) of Borg’s 6‐20 rating of perceived exertion (RPE) scale using a more appropriate statistical technique than has been employed in previous investigations. The RPE scale is used widely in exercise science and sports medicine to monitor and/or prescribe levels of exercise intensity. The “95% limits of agreement” technique has recently been advocated as a better means of assessing within-subject (trial to trial) agreement than traditional indicators such as Pearson and intraclass correlation coeYcients. Methods—Sixteen male athletes (mean (SD) age 23.6 (5.1) years) completed two identical multistage (incremental) treadmill running protocols over a period of two to five days. RPEs were requested and recorded during the final 15 seconds of each three minute stage. All subjects successfully completed at least four stages in each trial, allowing the reliability of RPE responses to be examined at each stage. Results—The 95% limits of agreement (bias ± 1.96 × SDdiV) were found to widen as exercise intensity increased: 0.88 (2.02) RPE units (stage 1), 0.25 (2.53) RPE units (stage 2), ˛0.13 (2.86) RPE units (stage 3), and ˛0.13 (2.94) RPE units (stage 4). Pearson correlations (0.81, 0.72, 0.65, and 0.60) and intraclass correlations (0.82, 0.80, 0.77, and 0.75) decreased as exercise intensity increased. Conclusions—These findings question the test-retest reliability of the RPE scale when used to monitor subjective estimates of exercise intensity in progressive (or graded) exercise tests. (Br J Sports Med 1999;33:336‐339)

128 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe the reliability of measuring maximal strength of eight muscle groups of the lower limb by a handheld dynamometer, according to a standard assessment protocol, and describe how reliable measurements of muscle strength can be obtained by a hand-held dynamometer in frail older persons.
Abstract: The aim of this study is to describe the reliability of measuring maximal strength of eight muscle groups of the lower limb by a handheld dynamometer, according to a standard assessment protocol. The study population consisted of 26 patients (14 males and 12 females; age range 60–90 years) admitted to a geriatric hospital. Multiple assessments of muscle strength by two different examiners were compared to estimate test-retest and inter-rater reliability. The range of strength evaluated across the eight muscle groups was 2.1–29.8 Kg/force. Overall, short-term (same day) and long-term (one week apart) test-retest and inter-rater reliability were very high, with 60% of the intraclass correlation coefficient values above 0.8, and the majority above 0.7. No significant differences in strength were found comparing the left and the right side of each muscle group. Differences between values collected in the same subject by two different examiners, and by the same examiner at different points in time were similar, not influenced by the average strength of the muscle group, and significantly larger for long-term than for short-term comparisons. By using a standardized measurement protocol, reliable measurements of muscle strength can be obtained by a hand-held dynamometer in frail older persons.

Journal ArticleDOI
TL;DR: Isokinetic tests of ankle dorsiflexor strength in healthy young adults using the Biodex dynamometer were highly reliable (ICC 0.61-0.93).
Abstract: The purposes of this study were: (i) to determine the test-retest reliability of isokinetic ankle dorsiflexor strength measurements in young healthy adults using the Biodex dynamometer, and (ii) to examine several statistical measures for the interpretation of reliability. Thirty men and women (mean age 23 +/- 3 years) performed three maximal concentric contractions at 30 degrees/s, 60 degrees/s, 90 degrees/s, 120 degrees/s and 150 degrees/s. Reliability of peak torque, work and torque at a specific time were assessed by calculating the intraclass correlation coefficient (ICC 2,1), Pearson product moment correlation coefficient (r), standard error of the measurement (SEM), method error (ME) and coefficient of variation (CV), and by plotting the differences between observations against their means. Isokinetic tests of ankle dorsiflexor strength in healthy young adults using the Biodex dynamometer were highly reliable (ICC 0.61-0.93). It is recommended that test-retest reliability analyses include the ICC and assessments of measurement errors (SEM, ME or CV), as well as graphs to indicate any systematic variations in the data.

Journal ArticleDOI
TL;DR: Whether the low intraclass correlation coefficient for the EMG parameters in the presently studied test group implies a low potential in discriminating subjects with back pain can not be decisively concluded.

Journal ArticleDOI
TL;DR: The utility of the SF-36 may be limited to assessments of subjects with higher cognitive and physical functioning than typical nursing home residents, and might benefit from modification for this setting, or by tests of proxy ratings.
Abstract: Objective: to assess test characteristics of the Medical Outcomes Study SF-36 (Short-Form 36) with residents of nursing homes. Research design: nursing home residents with 17 or more points on the Mini-Mental State Examination (MMSE) and $ 3 months residence (128 of 552 screened) were selected randomly. Interviewers administered the SF-36 (repeated after 1 week), Geriatric Depression Scale and MMSE. We recorded activities of daily living and medication data from medical records. Data analysis included test‐retest intraclass correlations, item completion, score distributions and SF-36 correlations with measures of physical and mental functioning. Results: 97 nursing home residents (75.8%) consented. Test‐retest intraclass correlation coefficients were good to excellent (range = 0.55 to 0.82). Convergent validity between SF-36 physical health scales and the activities of daily living index was modest (r range = π0.37 to π0.43). About 25% of residents scored zero (lowest score) on at least one SF-36 physical function measure. SF-36 mental health scales correlated strongly with the Geriatric Depression Scale (r range = π0.63 to π0.71) and modestly with bodily pain (r = π0.35). No SF-36 scales correlated strongly with the MMSE. Conclusion: only one in five nursing home residents met minimal participation criteria, suggesting limited utility of the SF-36 in nursing homes. Reliability and validity characteristics were fairly good. Skewed scores were noted for some SF-36 scales. The utility of the SF-36 may be limited to assessments of subjects with higher cognitive and physical functioning than typical nursing home residents. The SF-36 might benefit from modification for this setting, or by tests of proxy ratings.

Journal ArticleDOI
TL;DR: Interrater reliability was acceptable for some of the postural observations in this study, and the use of more appropriate statistical methods may lead to greater insight into sources of variability in reliability and validity studies and may help to develop more effective ergonomic exposure assessment methods.

Journal ArticleDOI
TL;DR: The Spanish adaptation of the Quality of Life in Epilepsy Inventory (QOLIE‐31) helps improve the quality of life in the clinic and provides a scaffolding for future studies.
Abstract: Summary: Purpose: Spanish adaptation of the Quality of Life in Epilepsy Inventory (QOLIE-31). Methods: Internal consistency and construct validity of the Spanish translation of the QOLIE-31 were tested in 252 patients with epilepsy. Patients also were administered the General Health Questionnaire (GHQ-28), and the Nottingham Health Profile (NHP). Two weeks after the first test, a subgroup of randomly selected patients were readministered the QOLIE-31 along with a new five-option question about change in health status. Patients reporting no change in health status were included in the study of temporal stability. Sensitivity to clinical change was assessed in 31 additional patients who had successfully undergone epilepsy surgery. Results: The QOLIE-31 was highly correlated with the GHQ-28 (r=−0.63) and the NHP (r=−0.69), demonstrating construct validity. Cronbach's alpha coefficient was 0.92, showing the items of the QOLIE-31 to be interdependent and homogeneous. For a 2-week test retest, both Pearson product-moment correlation and intraclass correlation coefficients were 0.90, indicating temporal stability. Sensitivity to clinical change was suggested by a significant mean difference between the global scores both before and after epilepsy surgery (-21.87, p < 0.0001; 95% CI, −28.08 to −15.66). The standardized response mean of the global score was 1.67, and the effect size was 1.35, both indicating large clinical change as a result of seizure relief. Conclusions: The similarity of psychometric properties between the English and the Spanish versions of the QOLIE-31 supports their conceptual equivalence. The questionnaire's responsiveness to clinical change suggests its utility in outcome assessment of drug trials and epilepsy surgery.

Journal ArticleDOI
TL;DR: Stability of the PCL-R was generally good whether it was evaluated as a dichotomous or dimensional measure, and Factor 1 was more reliably measured in women compared to men.
Abstract: The 2-year test-retest reliability of the Psychopathy Checklist-Revised (PCL-R) was examined in 200 men and 25 women methadone patients. Stability of the PCL-R was generally good whether it was evaluated as a dichotomous or dimensional measure. Utilizing a diagnostic cutoff score of 25 or more the intraclass correlation coefficients (ICCs) were.48 for men and.67 for women. For the Total PCL-R score ICCs were.60 and.65 for men and women, respectively. Factor 1 was more reliably measured in women compared to men (.63 vs.43). For men, Factor 1 was significantly less reliable than Factor 2 or the Total score. For women, Factor 2 was significantly less reliable than the Total PCL-R score or Factor 1.

Journal Article
TL;DR: It is concluded that in AS, only the SASSS method for the spine and the BASRI reached good reliability, and other methods for spine, SI joints, and hips were moderately reliable at best.
Abstract: Our aim was to compare reliability and sensitivity to change of different radiological scoring methods in ankylosing spondylitis (AS). Two trained observers scored 30 AS radiographs twice with an interval of 4 weeks. The same two observers scored 187 AS radiographs in pairs, at baseline and after one year followup, to measure change and agreement on change. The sacroiliac (SI) joints were scored in 5 grades by the New York method and the SASSS (Stoke Ankylosing Spondylitis Spine Score). Hips were graded 0-5 (according to Larsen). Cervical and lumbar spine were graded (0-4, Bath Ankylosing Spondylitis Radiological Index, BASRI), and scored in detail (0-72, SASSS). SASSS of the cervical and lumbar spine scored on the anterior sites of the vertebrae proved most reliable, with both intra and interobserver intraclass correlation coefficients (ICC) between 0.87 and 0.97. BASRI was only moderately reliable, with Cohen's kappa ranging between 0.50 and 0.82 for intra, and 0.38-0.64 for interobserver reliability. Similarly, SI joint scores (New York, SASSS) showed intraobserver kappa between 0.56 and 0.84, and interobserver reliability with kappa between 0.37 and 0.47. Larsen hip scores proved unreliable: moderate intraobserver kappa of 0.47-0.58 and low interobserver kappa of 0.29. After retraining, interobserver kappa did not improve (0.45 and 0.17). In retrospect, a one year period was too short to measure sensitivity to change. Observers agreed that no change occurred in up to 89% of cases. A measurable change of deterioration or improvement occurred rarely. We conclude that in AS, only the SASSS method for the spine and the BASRI reached good reliability. Other methods for spine, SI joints, and hips were moderately reliable at best. There was moderate to good agreement on no change between the observers. No method showed change over a period of one year in a considerable number of patients.

Journal ArticleDOI
TL;DR: A German version of the Calgary Depression Rating Scale for Schizophrenia (CDSS-G) approved by the author of the original scale is presented comprising a semi-structured interview for 9 items to sensitively and specifically assess depression in schizophrenia and related disorders.

Journal ArticleDOI
TL;DR: The scales showed acceptable reliability and validity, and they will be useful in quantifying dyspnea experienced by patients receiving mechanical ventilation, and further work is needed to evaluate the extent and the severity of Dyspnea in such patients in order to evaluation the effectiveness of interventions.
Abstract: Background Dyspnea, or difficult breathing, is common in patients receiving mechanical ventilation; however, dyspnea is not routinely or systematically measured. Objective The primary purpose of this methodological study was to evaluate the test-retest reliability of 5 dyspnea rating scales and the criterion validity of 4 dyspnea rating scales in patients receiving mechanical ventilation. The secondary purpose was to examine the correlations between each of these 5 rating scales and physiological measures of respiratory function. Methods The convenience sample consisted of 28 patients on mechanical ventilation during their hospitalization in the intensive care units of a large, inner-city hospital. Patients rated their dyspnea twice at 30-minute intervals on the visual analogue scale, the vertical analogue dyspnea scale, the modified Borg scale, the numerical scale, and the faces scale. Test-retest reliability was computed by using the intraclass correlation coefficient. Criterion validity was evaluated by using the Spearman rank-order correlation coefficient. Results The 5 rating scales had acceptable test-retest reliabilities, with intraclass correlation coefficients ranging from 0.81 to 0.97. Criterion validity of the 4 scales also was acceptable, with Spearman rank-order correlation coefficients from 0.76 to 0.96. The rating scales were not correlated with most of the physiological variables. At least half of the patients reported moderate to severe dyspnea. Conclusion The scales showed acceptable reliability and validity, and they will be useful in quantifying dyspnea experienced by patients receiving mechanical ventilation. Further work is needed to evaluate the extent and the severity of dyspnea in such patients in order to evaluate the effectiveness of interventions.

Journal ArticleDOI
TL;DR: The results suggest that experience in using theKT-1000 is related to the interrater error of measurements and that training is an important consideration when using the KT-1000 arthrometer.
Abstract: Study Design Single group repeated measures with multiple raters. Objective To determine the inter-rater reliability of KT-1000 measurements of novice and experienced raters and to provide error estimates for these raters. Background The KT-1000 arthrometer is often used clinically to quantify anterior tibial displacement. Few data have been documented, however, about the relative reliability of KT-1000 measurements obtained by novice compared with experienced users. Methods and Measures Two novice and two experienced KT-1000 users performed measurements on 29 knees of 25 patients after anterior cruciate ligament (ACL) reconstruction or with a diagnosis of ACL deficiency. Measurements were performed at 131 N. Interrater and intertriai reliability coefficients (interclass correlation coefficient; ICC and the standard error of measurement were calculated for expert and novice raters. Results The interrater ICC for novices was 0.65 and the interrater error was ±3.52 mm (90% confidence interval [CI]). The int...

Journal ArticleDOI
TL;DR: Men and women tended to underestimate their weight while differences between self-reported and measured height were insignificant, and combining a graphical method with ICC may be useful in pilot studies to detect populational groups capable of providing reliable information on weight and height, thus minimizing resources needed for field work.
Abstract: INTRODUCTION: Self-reported weight and height were compared with direct measurements in order to evaluate the agreement between the two sources. METHOD: Data were obtained from a cross-sectional study on health status from a probabilistic sample of 1,183 employees of a bank, in Rio de Janeiro State, Brazil. Direct measurements were made of 322 employees. Differences between the two sources were evaluated using mean differences, limits of agreement and intraclass correlation coefficient (ICC). RESULTS AND CONCLUSIONS: Men and women tended to underestimate their weight while differences between self-reported and measured height were insignificant. Body mass index (BMI) mean differences were smaller than those observed for weight. ICC was over 0.98 for weight and 0.95 for BMI, expressing close agreement. Combining a graphical method with ICC may be useful in pilot studies to detect populational groups capable of providing reliable information on weight and height, thus minimizing resources needed for field work.

Journal ArticleDOI
TL;DR: The German version of the NASS Cervical and Lumbar Spine Outcome Assessment Instrument allows the standardized assessment of pain, functional limitations and neurogenic symptoms in patients with back pain and the international comparison of health states and therapeutic outcomes.
Abstract: BACKGROUND Pain and functional limitations are the chief symptoms in patients with back pain. However, standardized assessment of these domains are still not commonplace in clinical practice. The objective of this study was the cultural adaptation and validation of the North American Spine Society (NASS) Lumbar Spine Outcome Assessment Instrument for German speaking patients with back pain. METHODS Translation and backtranslation of the NASS instrument was performed according to international recommendations. 56 consecutive inpatients with a confirmed diagnosis of dorsopathia completed a German version of the NASS instrument, the SF-36 and an established German instrument for back patients (FFbH-R). All patients completed the questionnaires 48 hours apart to assess test-retest reliability. Validity was assessed through correlation with corresponding subscales of the SF-36, the FFbH-R and a 0-10 pain numeric rating scale. Internal consistency and item-to-scale correlation served as statistics of reliability. RESULTS The two subscales of the NASS Instrument for cervical and lumbar problems correlate significantly with the corresponding subscales of the FFbH-R and the SF-36 (r = 0.28-0.83, p < 0.05) and 0.39-0.68 (p < 0.05) with a pain numeric rating scale. Test-retest reliability demonstrated intraclass correlation coefficients between 0.82 to 0.89. CONCLUSION The German version of the NASS Cervical and Lumbar Spine Outcome Assessment Instrument allows the standardized assessment of pain, functional limitations and neurogenic symptoms in patients with back pain and the international comparison of health states and therapeutic outcomes.

Journal ArticleDOI
TL;DR: This paper will demonstrate that exact inference is possible using generalized inference on regression coefficients when an intraclass correlation structure is assumed and is demonstrated to be possible using Bayesian inference.
Abstract: Summary. We consider repeated observations taken over time for each of several subjects. For example, one might consider the growth curve of a cohort of babies over time. We assume a simple linear growth curve model. Exact results based on sufficient statistics (exact tests of the null hypothesis that a coefficient is zero, or exact confidence intervals for coefficients) are not available to make inference on regression coefficients when an intraclass correlation structure is assumed. This paper will demonstrate that such exact inference is possible using generalized inference.

Journal ArticleDOI
TL;DR: These findings suggest that the proposed test is sensitive with moderate test-retest reliability to examine lumbosacral position sense in healthy subjects and further adjustments in the testing protocol are needed to improve the test- retest reliability.
Abstract: Study design A single group test-retest design to evaluate the reproducibility of lumbosacral position sense measurements. Objectives To develop a measure of position sense in the lumbosacral area and to determine test-retest reliability. Background Proprioception, muscle control, and coordination training could be the key issues in resolving neuromuscular dysfunction in patients with low back pain, but there are no standard ways to assess these parameters. Methods and Measures A piezoresistive accelerometer attached to the skin over the sacrum was used to research the repositioning accuracy of active pelvic tilting, between days, of 14 young nonimpaired subjects (20 to 26 years of age) in standing. Results The mean absolute error for repositioning accuracy (the difference between criterion and matching positions) was 1.81° (± 0.85). The intraclass correlation coefficient between measurements obtained on days 1 and 2 was moderate (R = 0.51). The average standard error of measurement associated with the in...

Journal ArticleDOI
TL;DR: All tests (except the bimanual test) can be used for both cross-sectional and follow-up group studies with high-functioning stroke patients and showed a moderately high to high test-retest reliability without systematic trend from test to retest.

Journal ArticleDOI
TL;DR: Therapists should consider procedures other than those that assess SIJ alignment when evaluating the SIJ if they are considering using handheld calipers or an inclinometer to obtain reliable measurements.
Abstract: Background and Purpose . Previous research suggests that visual estimates of sacroiliac joint (SIJ) alignment are unreliable. The purpose of this study was to determine whether handheld calipers and an inclinometer could be used to obtain reliable measurements of SIJ alignment in subjects suspected of having SIJ dysfunction. Subjects . Seventy-three subjects, evaluated at 1 of 5 outpatient clinics, participated in the study. Methods . A total of 23 therapists, randomly paired for each subject, served as examiners. The angle of inclination of each innominate was measured while the subject was standing. The position of the innominates relative to each other was then derived. An intraclass correlation coefficient (ICC), the standard error of measurement (SEM), and a kappa coefficient were calculated to examine the reliability of the derived measurements. Results . The ICC was .27, the SEM was 5.4 degrees, and the kappa value was .18. Conclusion and Discussion . Measurements of SIJ alignment were unreliable. Therapists should consider procedures other than those that assess SIJ alignment when evaluating the SIJ.

Journal ArticleDOI
TL;DR: 4 different methods of estimating peak spinal loading and their relationship with the reporting of low-back pain are examined, finding that they cannot all be used interchangeably for risk assessment at the individual level.
Abstract: Objectives This paper examines the performance of 4 different methods of estimating peak spinal loading and their relationship with the reporting of low-back pain. Methods The data used for this comparison was a subset of subjects from a case-referent study of low-back-pain reporting in the automotive industry, in which 130 random referents and 105 cases (or job-matched proxies) were studied. The peak load on the lumbar spine was determined using a biomechanical model with model inputs coming from a detailed self-report questionnaire, a task-based check list, a video digitization method, and a posture and load sampling technique. Results The methods were directly comparable through a common metric of newtons or newton meters of spinal loading in compression, shear, or moment modes. All the methods showed significant and substantial associations with low-back pain in all modes (odds ratios 1.6--2.3). The intraclass correlation coefficients (ICC) showed strong similarities between the checklist and video digitized techniques (ICC 0.84--0.91), moderate similarities between these techniques and the work sampling method (ICC 0.49--0.52), and poor correlations (ICC 0.16--0.40) between the self-report questionnaire and the observer recorded measures. Conclusions While all the methods detected significant odds ratios, they cannot all be used interchangeably for risk assessment at the individual level. Peak spinal compression, moment, and shear are important risk factors for low-back pain reporting, no matter which measurement method is used. Questionnaires can be used for large-scale studies. At the individual level a task-based checklist provides biomechanical model inputs at lower cost and equal performance compared with the criterion video digitization system.

Journal ArticleDOI
TL;DR: This paper outlines and illustrates the use of some statistical methods which are well suited for post-prediction data scrutiny and discusses Reproducibility measures such as concordance correlation, intraclass correlation, and correlation between difference and sum.
Abstract: Monitoring of calibration equation performance is essential if high quality of predicted analytical data is to be sustained. In this paper we outline and illustrate the use of some statistical methods which are well suited for post-prediction data scrutiny. Mean square prediction error is partitioned into three components, viz. mean bias, systematic bias and random error. Reproducibility measures such as concordance correlation (rc), intraclass correlation (r2) and correlation between difference and sum (r(X – Y)(X + Y)) are also discussed. Other topics discussed include the maximisation of R2, type II regression (both variables with error model) and new graphical displays.

Journal Article
TL;DR: The computed system has the advantage of allowing an examiner to view a rapid, simultaneous display of multiple grading scale scores at a keystroke from one clinical assessment input, obviating the labor of repeating measures by hand.
Abstract: Objective The senior authors developed a computer-assisted rapid, simultaneous comparison system for nine international grading scales for facial paralysis. The purpose of this study is to present the system and to compare the agreement of hand-performed House-Brackmann and Sunnybrook scales, two frequently used scales herein taken as the concurrent criterion test standards, with those like scales done simultaneously in the computed system. Study design The study design was a prospective concurrent criterion validity study. Test-retest reliability and interobserver agreement were assessed using the kappa statistic (k) for ordinal data and the intraclass correlation coefficient (ICC) for semidimensional data. Setting The study was conducted at a university practice. Patients Ten consecutive consenting subjects with varying degrees of facial paralysis were studied. Intervention Each subject was measured, in random order, twice by each method by each of two independent observers. Main outcome measures House-Brackmann score, Sunnybrook score, and like-scale scores done simultaneously in the computed system were measured. Results Agreement between the computed system and hand-performed criterion standards was equal to each scale compared against itself; for the House-Brackmann, agreement was moderate (k = 0.554); for the Sunnybrook, agreement was excellent (ICC = 0.976). Conclusions The computed system has the advantage of allowing an examiner to view a rapid, simultaneous display of multiple grading scale scores at a keystroke from one clinical assessment input, obviating the labor of repeating measures by hand.