scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 1995"


Journal ArticleDOI
TL;DR: The clinically determined ratio of navicular height-to-foot length correlated most closely with the radiographic indices of MLA structure, and the strengths of associations between anthropometric and radiographic data were assessed with Pearson correlation coefficients.

283 citations


Journal ArticleDOI
TL;DR: The VF-14 was reproducible in stable patients during an 8-month period, and it was more responsive to clinically significant changes in vision than was a generic health status measure (ie, the Sickness Impact Profile).
Abstract: Objective: To assess the test-retest reliability and responsiveness of the VF-14, which is an index of functional impairment in patient with cataracts. Design: Observational longitudinal study. Patients were enrolled prior to undergoing their first cataract surgery between July 15 and December 15, 1991, and they were followed up for 1 year after surgery. Setting: Patients were recruited from 72 ophthalmologists' practices in three US cities. Patients: Five hundred fifty-two patients who had undergone a surgical procedure in only one eye by the 4-month postoperative follow-up (responsiveness analyses) and a subset of these (n=426) who had not subsequently undergone surgery for the second eye by the 12-month postoperative follow-up (reproducibility analyses). Main Outcome Measures: Two health status measures (the VF-14 and the Sickness Impact Profile, two global measures of a patient's trouble and satisfaction with his or her vision, and best corrected visual acuity in each eye. Results: The VF-14 is highly reproducible, with an intraclass correlation coefficient of.79 when patientrated criteria are used to define stable patients. The intraclass correlation coefficient was lower (.57 to.71) when various measures of visual acuity were used to define stable patients. The VF-14 is also about three times more responsive to a change in vision than the Sickness Impact Profile, which is a generic health status measure (effect size of approximately 1.00 vs 0.30). Estimates of the responsiveness of the VF-14 and the Sickness Impact Profile were not associated with preoperative visual acuity in the operated on or better eye. Responsiveness of the VF-14, however, was higher in patients with greater self-rated trouble with vision preoperatively. Conclusions: The VF-14 was reproducible in stable patients during an 8-month period, and it was more responsive to clinically significant changes in vision than was a generic health status measure (ie, the Sickness Impact Profile).

180 citations


Journal ArticleDOI
TL;DR: The Wheelchair User's Shoulder Pain Index shows high levels of reliability and internal consistency, as well as concurrent validity with loss of shoulder range of motion.
Abstract: Many long term wheelchair users develop shoulder pain. The purpose of this study was to examine the reliability and validity of the Wheelchair User's Shoulder Pain Index (WUSPI), an instrument which measures shoulder pain associated with the functional activities of wheelchair users. This 15-item functional index was developed to access shoulder pain during transfers, self care, wheelchair mobility and general activities. To establish test-retest reliability, the index was administered twice in the same day to 16 long term wheelchair users and their scores for the two administrations were compared by intraclass correlation. To establish concurrent validity, the index was administered to 64 long term wheelchair users and index scores were compared to shoulder range of motion measurements. Results showed that intraclass correlation for test-retest reliability of the total index score was 0.99. There were statistically significant negative correlations of total index scores to range of motion measurements of shoulder abduction (r = -0.485), flexion (r = -0.479) and shoulder extension (r = -0.304), indicating that there is a significant relationship of total index score to loss of shoulder range of motion in this sample. The Wheelchair User's Shoulder Pain Index shows high levels of reliability and internal consistency, as well as concurrent validity with loss of shoulder range of motion. As a valid and reliable instrument, this tool may be useful to both clinicians and researchers in documenting baseline shoulder dysfunction and for periodic measurement in longitudinal studies of musculoskeletal complications in wheelchair users.

167 citations


Journal ArticleDOI
TL;DR: There is poor interrater agreement on determination of the segmental level of a marked spinous process, or reliability, of accessory motion mobility testing of the lumbar spine in patients with low back pain.
Abstract: Background and Purpose. This study examined the interrater agreement, or reliability, of accessory motion mobility testing of the lumbar spine in patients with low back pain. Subjects. Subjects were 18 patients with low back pain referred to the physical therapy outpatient department of a university teaching hospital. Methods. Six orthopedic physical therapists evaluated the posterior-anterior (P-A) accessory motion mobility at each of six levels, L-1 to the sacral base, on each subject. The mobility was recorded on a nine-point scale, and reproduction of pain was noted. The physical therapists noted any level at which mobility or pain findings were of significance to treat. To evaluate agreement on the identification of spinal levels, therapists were asked to identify one spinous process, which was arbitrarily marked on each subject. Kappa analyses and intraclass correlation coefficients (ICCs) were calculated to evaluate agreement on the level of the marked segment and the mobility at that level, respectively. Results. The ICC for determination of the marked level was R(2, 1)=.69 (95% confidence interval=.53–.82). The ICC for mobility findings at the marked level was R(2,1)=.25 (95% confidence interval=0–.44). A secondary Kappa analysis to determine agreement on treatment decision making demonstrated similarly low levels of agreement. Conclusion and Discussion. There is poor interrater agreement on determination of the segmental level of a marked spinous process. There is poor interrater reliability of P-A accessory mobility testing in the absence of corroborating clinical data. Caution should be exercised when physical therapists make clinical decisions related to the evaluation of motion at a specific spinal level using P-A accessory motion testing.

140 citations


Journal ArticleDOI
TL;DR: Four studies on the inter-rater reliability of a proposed Axis V version for DSM-IV and of the CGAS involving 162 child and adolescent patients and 20 clinicians showed moderate agreement (intraclass correlation: 0.53-0.66), comparable to previous versions of Axis V, but lower than that reported for theCGAS.
Abstract: Four studies on the inter-rater reliability of a proposed Axis V version for DSM-I V and of the CGAS involving 162 child and adolescent patients and 20 clinicians showed moderate agreement (intraclass correlation: 0.53–0.66). This was comparable to previous versions of Axis V, but lower than that reported for the CGAS. More detailed description of anchor points did not increase reliability nor there were differences in agreement when rating current or previous functioning.

127 citations


Journal ArticleDOI
15 Sep 1995-Spine
TL;DR: The Instrument developed for measuring quality of life in patients with spine deformities during the period of bone growth has validity, internal consistency, and high test-retest reliability.
Abstract: Study design The development and construction of a specific instrument for measuring quality of life in adolescents with spine deformities was investigated. Objectives To assess the validity and reliability of the Quality of Life Profile for Spine Deformities. Summary of background data An 88-item questionnaire was self-administered to 174 patients ranging in age from 10 to 20 years with spine deformities. Items were rated on a five-point Likert scale. Higher scores means high level of impairment in quality of life. Age, gender, menarche or voice change, salient symptoms in the medical record, ordinary parameters on physical examination, and measurements on standard anteroposterior and lateral radiographs were recorded. The retest was done 10 days after the initial administration in a subsample of 35 patients. Methods The test-retest reliability was analyzed calculating the intraclass correlation coefficient. Internal consistency was measured with the Cronbach's alpha method. Factor analysis was used to obtain a reduced number of variables. Construct validity was assessed using the principal components model of factor analysis based on the correlation matrix and using the varimax computer algorithm for orthogonal rotation. Discriminant validity was assessed using the Kruskal-Wallis test. Results The Quality of Life Profile for Spine Deformities contained 21 items and five factors in conceptual terms labeled psychosocial functioning, sleep disturbances, back pain, body image, and back flexibility. The overall questionnaire score showed an internal consistency of 0.88 and a test-retest correlation of 0.91. Patients with structural curves showed significantly higher scores in all dimensions of the Quality of Life Profile for Spine Deformities except for the subscale of body image than patients with postural curves. When patients were grouped according to the symptom of back pain, those with backache had a significantly higher quality of life overalls score and scores in the dimensions of sleep disturbances and pain. Brace-treated patients showed statistically significant differences in the quality of life overall score and scores in the dimensions of psychosocial functioning and back flexibility. Conclusions The instrument developed for measuring quality of life in patients with spine deformities during the period of bone growth has validity, internal consistency, and high test-retest reliability. The conceptualization of quality of life of the Quality of Life Profile for Spine Deformity includes psychosocial dimensions and pain and function.

118 citations


Journal Article
TL;DR: The CHAQ can serve as a valid and sensitive tool in the evaluation of functional outcomes in JDM and was reliable in subjects who showed no clinical change in muscle strength and responsive to treatment induced clinical change.
Abstract: OBJECTIVE Physical disability is perhaps the most important outcome of juvenile dermatomyositis (JDM). No functional assessment tool has been validated for inflammatory myopathies either in children or adults. We studied the measurement properties of the Childhood Health Assessment Questionnaire (CHAQ) in children with JDM. METHODS We studied 37 patients followed at the JDM clinic and compared the results obtained by the CHAQ to a global disease severity score and quantitative muscle strength testing measured by sphygmomanometry (construct validity). We also measured the reliability of the CHAQ and its responsiveness to clinical change. RESULTS For the initial measurement of each subject, the correlation between disease severity and CHAQ was high [Spearman's correlation, (rs = 0.71, p 0.20). The CHAQ was reliable in subjects who showed no clinical change in muscle strength (intraclass correlation coefficient = 0.87) and responsive to treatment induced clinical change (responsiveness coefficient = 0.90). CONCLUSION The CHAQ can serve as a valid and sensitive tool in the evaluation of functional outcomes in JDM.

114 citations


Journal ArticleDOI
TL;DR: Reviewing some of the more commonly used techniques, such as Raw Agreement, Cohen's kappa and weighted kappa, shows that they can all be replaced by the intraclass correlation coefficient (ICC).
Abstract: Whenever two or more raters evaluate a patient or student, it may be necessary to determine the degree to which they assign the same label or rating to the subject. The major problem in deciding which statistic to use is the plethora of different techniques which are available. This paper reviews some of the more commonly used techniques, such as Raw Agreement, Cohen's kappa and weighted kappa, and shows that, in most circumstances, they can all be replaced by the intraclass correlation coefficient (ICC). This paper also shows how the ICC can be used in situations where the other statistics cannot be used and how to select the best subset of raters.

100 citations


Journal ArticleDOI
TL;DR: There is measurable variation in measures related to alcohol use among young adults that is attributable to their community of residence, and the magnitude of these correlations can be sharply reduced allowing the investigator to plan a more efficient community trial.
Abstract: Objective: Alcohol intervention studies that allocate intact social groups to study conditions require adjustment to the usual analytic methods to account for the positive intraclass correlation that exists in such groups. This article presents intraclass correlations for measures related to alcohol use among young adults and discusses the use of those estimates to plan new studies. Method: Young adults aged 18-20 were selected at random from driver's license lists in each of the 15 communities participating in the Communities Mobilizing for Change on Alcohol project. Respondents were surveyed by telephone to assess their drinking habits and other factors related to alcohol use. Community-level intraclass correlations were computed for those measures, both prior to and after adjustment for person- and community-level covariates. Results: The community-level intraclass correlations tend to be small, with larger values for belief and attitude items than for self-reported behaviors. Even so, correlations of ...

88 citations


Journal ArticleDOI
TL;DR: The results of this study support the use of the STSTS test to characterize LEMP in kidney transplant candidates, particularly those who are diabetic or have deficits in gait performance.

83 citations


Journal Article
TL;DR: Results indicate that relying on patients' recollections does not provide an accurate measure of preoperative state, and that attempting to adjust data is not feasible because the directions and magnitudes of recollection error vary for major subgroups of patients.
Abstract: New cross-sectional studies have been designed to evaluate therapeutic effectiveness of medical and surgical treatments. The extent to which error in recollection may threaten the validity of conclusions reached in these studies has not been determined. The purpose of this research was to evaluate the impact of recollection error by comparing patients' prospectively acquired reports about their condition before total hip replacement with their recollections of their preoperative condition obtained several years after surgery. A total of 104 patients prospectively completed the Hip Rating Questionnaire (HRQ), a valid, reproducible, responsive, disease-specific scale composed of four domains (pain, walking, function, and impact of hip arthritis on overall health). These same patients then completed the HRQ several years after surgery by recalling their preoperative condition. Current postoperative condition was also obtained several years after surgery with the HRQ. Patient characteristics include: 55% were women, mean age was 67 years, 90% had osteoarthritis, 78% had no prognostically significant comorbid disease, and the mean time interval between surgery and recall was 2.5 years. Comparison of prospective and recalled responses with the weighted kappa and intraclass correlation coefficients showed poor to fair agreement in three domains, and moderate agreement in the fourth domain. Overall, the directions of the recollection errors were toward patients' recalling more pain, better walking, better function, and worse impact of hip arthritis on health than they reported before surgery. When the data were stratified to determine if there were systematic biases among major patient subgroups, there were discrepancies in the percentage of patients within each subgroup who had recollection error for the different domains, as well as differences in the magnitudes and directions of the recollection errors. These results indicate that relying on patients' recollections does not provide an accurate measure of preoperative state, and that attempting to adjust data is not feasible because the directions and magnitudes of recollection error vary for major subgroups of patients. In addition, when outcome was assessed using postoperative HRQ responses, the cross-sectional data overestimated the effectiveness of total hip replacement in 68% of patients. It is concluded that cross-sectional data do not accurately portray baseline preintervention condition and therefore can lead to overestimating, as in this instance, or to underestimating effectiveness.

Journal Article
TL;DR: There was low agreement in the assessment of joint swelling and limitation of motion, and differences in examiners' techniques, patients with severe disease, and the small hand joints were important sources of disagreement.
Abstract: Objective To assess the interobserver agreement of articular examination in children with juvenile rheumatoid arthritis (JRA) and identify sources of disagreement. Methods Four rheumatologists graded tenderness/pain on motion, swelling, and limitation of motion in the joints of 10 children with JRA, as recommended by the Pediatric Rheumatology Collaborative Study Group, and 17 different joint indices were computed. Agreement was measured by kappa (kappa) and intraclass correlation coefficients (Ri). Results All 4 observers detected tenderness in 15.7% of the joints, but they disagreed (2 vs 2) on 4.2% (kappa = 0.71). They detected swelling in 5.2% but disagreed on 6.2% (kappa = 0.47). They found limitation in 4.9%, but disagreed on 8.1% (kappa = 0.54). The tender joint count, and the American Rheumatism Association cooperating clinics and Hart modified Ritchie indices were the most reliable (Ri > 0.93); the swelling severity index fared the worst (Ri = 0.40). There were differences in examination maneuvers and judgment among examiners. Discrepancies were larger in metacarpophalangeal joints and in patients with many involved joints. Conclusion There was low agreement in the assessment of joint swelling and limitation of motion. Differences in examiners' techniques, patients with severe disease, and the small hand joints were important sources of disagreement.

Journal ArticleDOI
TL;DR: Most measures of oculomotor function are stable across time and may reflect underlying neurophysiologic traits, according to a small sample of subjects.

Journal Article
TL;DR: It can be concluded that most variables determined during active movements can be measured with satisfactory reliability, whereas variables for other tests are not measured with the same reliability on the basis of the kappa scores.
Abstract: The aim of the present investigation was to study the interexaminer reliability of orthopedic tests and palpation techniques routinely used in the clinical diagnosis of disorders of the masticatory system. The tests were performed by a dentist and a physiotherapist, who both used the tests routinely when examining patients with temporomandibular disorders. Seventy-nine patients participated in this study. In the analysis, percentage agreement, intraclass correlation, and Cohen's kappa were used. The interexaminer reliability of the tests measuring maximal active mouth opening and registration of clicking during active mouth opening was high. The interexaminer reliability was fair for the tests measuring the intensity of pain during active movements and moderate for tests recording joint sounds (kappa = 0.47 to 0.59). There was high interobserver agreement on several items of the traction and translation tests, although the kappa values were low. The interexaminer reliability of the multitest scores for compression was substantial for joint sounds (kappa = 0.66) and fair for pain (kappa = 0.40). The interexaminer reliability of the multitest scores for muscle palpation and joint palpation was moderate (kappa = 0.51) and fair (kappa = 0.33), respectively. It can be concluded that most variables determined during active movements can be measured with satisfactory reliability, whereas variables for other tests are not measured with the same reliability on the basis of the kappa scores. The main symptoms of temporomandibular disorders can be evaluated reliably with multitest scores. It is recommended that clinicians calibrate their techniques regularly to improve the reliability of results in daily practice.

Journal ArticleDOI
TL;DR: The reliability of salivary testosterone assays was evaluated by nine laboratories in four countries and agreement among the laboratories on mean scores was within the range reported by Read (Ann N Y Acad Sci 1993; 694: 161-76).
Abstract: The reliability of salivary testosterone assays was evaluated by nine laboratories in four countries. Each laboratory used its own RIA procedures to assay samples from a set of 100 male and 100 female subjects. Agreement among the laboratories on mean scores was within the range reported by Read (Ann N Y Acad Sci 1993; 694: 161-76). Overall agreement on individual scores, as indicated by the intraclass correlation coefficient computed within subjects across laboratories, was r = 0.87 for men and r = 0.78 for women. Mean agreement between each laboratory and the combined set of all other laboratories (via Fisher's Z-transformation) was r = 0.61 for men and r = 0.58 for women. We take these latter values to be the best estimates of the average reliability of laboratories in their ordering of individual samples.

Journal ArticleDOI
TL;DR: Balance, as scored, appears to be a reliable and valid measure worth broader application among hospitalized patients and to examine the relationships between the impairment and gait measures.
Abstract: The purposes of this study of 30 patients referred to physical therapy were to describe the reliability of two measures of impairment and a measure of gait performance and to examine the relationships between the impairment and gait measures. The impairments measured were standing balance and muscle strength of the lower extremities, the former with a seven-level ordinal scale and the latter with a handheld dynamometer. Gait performance was measured using the seven-category scale of the Functional Independence Measure. Interday reliability was acceptable for all three measures, standing balance (weighted Kappa = .905), muscle strength (intraclass correlation coefficients = .871 to .951), and gait (weighted Kappa = .915). A Spearman correlation of .860 was found between balance and gait measures. The correlations between the strengths of various muscle groups and gait were lower (.138 to .581). Multiple regression identified none of the strength scores as offering additional independent explanation of gait...

Journal ArticleDOI
TL;DR: The Gross Motor Performance Measure was administered by therapists who are familiar with the Gross Motor Function Measure and had a 1-day training workshop, reliability of the total scores was above recommended minimums and scores of single attributes were less reproducible.
Abstract: Background and Purpose. The reporting of reliability coefficients and the method of their determination is expected of test developers. The purpose of this study was to estimate the interrater, intrarater, and test-retest reliability of the Gross Motor Performance Measure, a measure of quality of movement designed to accompany the Gross Motor Function Measure. Subjects. Subjects were 28 children (25 with cerebral palsy, 2 nondisabled, 1 with head injury) between the ages of 1 and 10 years. Methods. Reliability data were obtained from assessments of 19 therapists. Results. Intraclass correlation coefficients for reliability varied from .92 to .96 for the total scores and from .84 to .94 for the five attribute scores. Conclusion and Discussion. When the Gross Motor Performance Measure was administered by therapists who are familiar with the Gross Motor Function Measure and had a 1-day training workshop, reliability of the total scores was above recommended minimums. Scores of single attributes were less reproducible.

Journal Article
TL;DR: The results of this investigation demonstrate that reliable measures of isokinetic muscle performance of knee extension and flexion may be obtained by four clinicians with varied experience when following a standardized measurement protocol.
Abstract: The purpose of this investigation was to determine the interrater reliability of peak torque and total work values obtained with isokinetic measures of knee flexion and extension. Eight male and eight female students were evaluated on four occasions by four different examiners (range of isokinetic test experience: 0 to 10 yrs) using a standardized isokinetic measurement protocol. Subjects were randomly assigned to participate in a test sequence determined by a 4 x 4 balanced Latin square. Peak torque and total work values at 60 degrees /sec and 180 degrees /sec were obtained for the concentric measures of knee extension and flexion. The measures of peak torque and total work were corrected for the effects of gravity. Intraclass correlation coefficients and standard error of measurement estimates were used to estimate the interrater reliability for each test condition (test speed x muscle group). Intraclass correlation coefficient values ranged from .90 to .96 for peak torque and .90 to .95 for total work. Standard error of measurement estimates ranged from 8.9 to 13.3 Nm for peak torque and 11.3 to 16.8 Nm for total work. The results of this investigation demonstrate that reliable measures of isokinetic muscle performance of knee extension and flexion may be obtained by four clinicians with varied experience when following a standardized measurement protocol.

Journal ArticleDOI
TL;DR: The present observation method, designed to make postural observations continuously for several hours, is easy to learn and seems reliable.
Abstract: Objectives The aim of the study was to present an observation method focusing on the positions of the hands relative to the body and to evaluate whether this simple observation technique gives a reliable estimate of the total time spent in each of five work postures during one workday. Methods In the first part of the study the interobserver reliability of the observation method was tested with eight blue-collar workers. In the second part the observed time spent with work above the shoulder level was tested in relation to an upper-arm position analyzer, and observed time spent in work below knuckle level was tested in relation to a trunk flexion analyzer, both with 72 blue-collar workers. Results The interobserver reliability for full-day registrations was high. The intraclass correlation coefficients ranged from 0.99 to 1.00. The observed duration of work with hands above shoulder level correlated well with the measured duration of pronounced arm elevation (> 75 ). The product moment correlation coefficient was 0.97. The observed duration of work with hands below knuckle level correlated well with the measured duration of pronounced trunk flexion angles (> 40 ). The product moment correlation coefficient was 0.98. Conclusion The present observation method, designed to make postural observations continuously for several hours, is easy to learn and seems reliable.

Journal ArticleDOI
TL;DR: This study addresses the test-retest reliability and clinical applicability of an adapted external perturbation balance assessment, ie, the Postural Stress Test (PST), designed to assess the clinical features of a component of balance disorder in stroke.

Journal ArticleDOI
TL;DR: A reliable method of scoring the MPM is developed and evidence of its validity in a community- based sample of elderly subjects is shown.
Abstract: Identifying and quantifying the location of pain may be important for understanding specific functional impairments in elderly populations. The purpose of the present analysis was two- fold: first, to describe the reliability of a scoring method for the McGill Pain Map (MPM), and second, to validate the method of scoring the MPM as a tool for assessing areas of body pain in an epidemiologic study. In interviews performed at the subjects’ homes, 411 community dwelling Mexican- American and non- Hispanic white subjects aged 65–74 from the San Antonio Longitudinal Study of Aging (SALSA) were asked to describe the location of their pain on the map of the human body included in the McGill Pain Questionnaire. The location of pain was scored by overlaying the survey figures with a MPM template divided into 36 anatomical areas. Inter- and intra- rater agreement among three raters was measured by calculating a kappa statistic for each of the body areas, and an intraclass correlation coefficient for the total number of painful areas (NPA). Internal validity was measured by Spearman’s rho between the NPA and the Present Pain Index (PPI) and Pain Rating Index (PRI) of the McGill Pain Questionnaire, and external validity by correlation between NPA and the Perceived Health (PH), Amount of Bodily Pain (APB), and Pain Interference with Work (PIW) items of the Medical Outcomes Study, and the Perceived Physical Health (PPH) question of the San Antonio Heart Study. Average inter- rater agreement for individual MPM areas was 0.92± 0.01, and average agreement for NPA was 0.96± 0.01. Intra- rater agreement for individual areas averaged 0.94± 0.01, and for NPA = 0.99± 0.001. Pain in one or more areas was present in 47.7% of the subjects. For the whole sample, correlations between NPA and the validation indices were: PPI (0.91), PRI (0.89), PH (0.25), ABP (0.64), PIW (0.49), and PPH (0.20). Among the 196 subjects with pain, correlations were: PPI (0.34), PRI (0.34), PH (0.19), ABP (0.21), PIW (0.38), and PPH (0.19) — p<0.01 for all correlations. In conclusion, we have developed a reliable method of scoring the MPM and have shown evidence of its validity in a community- based sample of elderly subjects. Patterns of painful body areas may be associated with specific diseases and functional impairments.

Journal ArticleDOI
TL;DR: By using a reference set of of photographs of the retinal nerve fiber layer, a method to derive a quantitative measurement of retinal nerves fiber layer with good reliability and to extend evaluation ofretinal nerve Fiber layer photographs to nonspecialists is defined.

Journal ArticleDOI
TL;DR: Physical therapists demonstrated much better ability to judge spring stiffness than the PA stiffness of human spines, which implies that mechanical stiffness is not equivalent to the clinical concept of PA stiffness.
Abstract: Background and Purpose. This study investigated whether the poor reliability of judgments of posteroanterior (PA) spinal stiffness is due to rater bias or is a consequence of raters each having individual concepts of PA stiffness. Subjects. Three pairs of manipulative physical therapists with a minimum of 5 years of experience took part in the study. Methods. The raters were required to make stiffness judgments of a series of metal springs, and their performance at this task was compared with that obtained when they rated the PA stiffness of patients with low back pain. A range of reliability indices were calculated and evaluated to establish whether rater bias contributed to poor reliability in either task. The relationship between each rater's estimates of the magnitude of the stimuli and the measured stiffness of the springs was also assessed using the Pearson Product-Moment Correlation Coefficient. Results. The average intraclass correlation coefficient (2,1) for rating spring stiffness was found to be .60, whereas for human spines it was found to be .19. There was no evidence of rater bias contributing to poor reliability for rating stiffness of human spines. The average correlation between the rater's estimates of the magnitude of the stimuli and the measured stiffness of the stimuli was .80. Conclusion and Discussion. Physical therapists demonstrated much better ability to judge spring stiffness than the PA stiffness of human spines. This difference in performance implies that mechanical stiffness is not equivalent to the clinical concept of PA stiffness. Posteroanterior stiffness may have more than one dimension, and individual interpretation of stiffness as a construct may lead to rater disagreement in the clinic. The reliability of judgments of PA spinal stiffness may be enhanced in the future if its dimensions can be identified, defined, and taken into account during clinical procedures. [ ARTICLE][1] [1]: /lookup/volpage/74/801?iss=9

Journal ArticleDOI
TL;DR: Most of the variability in the data set for the micronucleus assay was due to sampling error, and a strong differential gender effect favoring females was verified, suggesting that more attention should be directed toward improving the assay's utility, while reducing sampling error.
Abstract: The cytokinesis block method was used to examine the intraclass correlation coefficient of the human lymphocyte micronucleus assay, sources of variability, and practical issues regarding the number of samples per subject. Twenty samples of 100 binucleate cells from a single phlebotomy per subject were analyzed (n = 112), using methods to evaluate variance components. The results showed marked intraindividual (sampling error) variation greater than interindividual variation, and no between-group contribution to the total variance. The intraclass correlation was 41.6%, indicating that slightly greater than half of the total variation in micronucleus outcomes was due to error variance (i.e., 58.4%). After adjusting for age, the intraclass correlation coefficient decreased trivially from 41.6% to 39.8%. There was a strong differential gender effect, favoring a greater micronuclei frequency in women. In conclusion, the data suggest that most of the variability in our data set for the micronucleus assay was due to sampling error; a strong differential gender effect favoring females was also verified. Equally important, in terms of practical applications, our analysis of the appropriate number of samples per subject revealed that scoring greater than 1,000 cells (10 determinations per subject) yielded no substantial improvement in statistical sensitivity, compared to the traditional 20 determinations. We suggest that more attention should be directed toward improving the assay's utility, while reducing sampling error. © 1995 Wiley-Liss, Inc.

Journal ArticleDOI
TL;DR: The overall stability of the MRFA instrument was found to be adequate for gathering screening information in outpatient settings and additional research is necessary to confirm the findings and extend the results to a larger outpatient population.
Abstract: The stability of the musculoskeletal form of the Medical Rehabilitation Follow Along (MRFA) instrument was examined in 47 patients receiving outpatient rehabilitation services. The MRFA instrument was designed to provide information on quality of daily living, including physical function, pain, satisfaction, and emotional/psychological well-being. The instrument consists of thirty questions and can be administered as an interview or a written questionnaire. The MRFA instrument was developed using Rasch analysis procedures and is an extension of previous research involving the Functional Assessment Screening Questionnaire. Forty-seven patients completed the musculoskeletal form of the MRFA on two occasions separated by an interval of 1 to 7 days. The stability of responses was examined using the intraclass correlation coefficient (ICC) and kappa. ICC values for the sections of the MRFA instrument examining quality of daily living and physical functioning ranged from 0.74 to 0.97. ICC values for items assessing pain and feelings of well-being were more variable, ranging from 0.36 to 0.93. The kappa values displayed a similar pattern. The overall stability of the MRFA instrument was found to be adequate for gathering screening information in outpatient settings. Additional research is necessary to confirm the findings of this investigation and extend the results to a larger outpatient population.

Journal ArticleDOI
TL;DR: An estimate of the kappa-coefficient of agreement between two methods of rating based on matched pairs of binary responses is presented and it is shown that the estimate depends on the common intraclass correlation coefficient between the pairs.
Abstract: We present an estimate of the kappa-coefficient of agreement between two methods of rating based on matched pairs of binary responses and show that the estimate depends on the common intraclass correlation coefficient between the pairs. Via Monte Carlo simulation, we investigate power of the test of significance on kappa, and the large sample bias and variance of its maximum likelihood estimator.

Journal ArticleDOI
TL;DR: The retest reliability of COP measures using a force platform in 17 women with a mean age of 69.5 years was investigated, indicating low correlations between test occasions for static stance postures and high correlations for test conditions involving angular perturbation of the support surface.
Abstract: Centre of pressure (COP) measurements have been frequently used in the evaluation of postural control, especially in the elderly. Despite this, the retest reliability of these measures has not been well investigated in this age group. This study investigated the retest reliability of COP measures using a force platform in 17 women with a mean age of 69.5 years. Results indicated low correlations between test occasions for static stance postures with Intra class Correlation Coefficients (ICCs) ranging between 0.27 and 0.55. In contrast, ICCs were high (between 0.81 and 0.92) for test conditions involving angular perturbation of the support surface. Investigation of test durations of 10 and 25 second length did not indicate any significant difference between these two test durations (p>0.05). The study also revealed no series effects using twelve consecutive 25 second tests on the force platform. These results support further investigation of dynamic standing balance in older women using the test protocol described in this study, including whether these tests are able to discriminate between fallers and non fallen, and their predictive validity.

Journal ArticleDOI
TL;DR: Sharp's method should be used preferentially in studies evaluating the radiologic changes in rheumatoid arthritis over time, especially in clinical trials and may be considered as a complementary method, when wrist destruction is of conceptual importance.
Abstract: RATIONALE AND OBJECTIVES: To assess the intraobserver reliability of three methods used frequently to evaluate joint destruction in rheumatoid arthritis: the Sharp method, the Larsen method, and the carpo:metacarpal ratio. METHODS: One observer analyzed twice within a 6-week interval 71 radiographs from patients with rheumatoid arthritis. Reliability was estimated by intraclass correlation coefficient (R) and by Altman-Bland graphical method. Correlations were examined by the Spearman's coefficient (r). RESULTS: The intraobserver reliability of each method appeared satisfactory with a good result for the Sharp method (R = 0.97). The correlation was strong (r > 0.80) between the results obtained by Sharp's and Larsen's methods and weaker between the results of the two former methods and the carpo: metacarpal ratio. CONCLUSIONS: Sharp's method should be used preferentially in studies evaluating the radiologic changes in rheumatoid arthritis over time, especially in clinical trials. The carpo:metacarpal ratio may be considered as a complementary method, when wrist destruction is of conceptual importance.

01 Jan 1995
TL;DR: The GMFM has inter-rater reliability for assessing gross motor function in patients with cerebral palsy, with the walking, running, and jumping dimension having higher reliability values.
Abstract: The purpose of this study was to examine the inter-rater reliability of the Korean translation of the GMFM(Gross Motor Function Measure). Three licensed physical therapists with varying amounts(2 - 6 years) of clinical experience served as raters. Thirty patients with cerebral palsy were subjects for this study. Subjects were 22 boys and 8 girls, aged 1 to 8 years. Reliability of each dimension and each total score of the GMGM were analyzed using ICCs(intraclass correlation coefficients). The reliability of each dimension score ranged from .76 to .98, with the walking, running, and jumping dimension having higher reliability values. The reliability of the total dimension score was .94. We conclude that the GMFM has inter-rater reliability for assessing gross motor function in patients with cerebral palsy.

Journal ArticleDOI
TL;DR: The use of the PDGMS as an assessment tool for children with cerebral palsy and the reliability of videotaping assessments are supported.
Abstract: The test/retest, intrarater, and interrater reliability of the Peabody Development Gross Motor Scale (PDGMS) was assessed in 12 children with mild or moderate cerebral palsy. A baseline test was administered, scored, and videotaped by one rater and rescored from the videotape by a second independent rater. In order to minimize the effect of developmental maturation, test/retest correlation coefficients of the tests were performed two weeks apart. The intraclass correlation coefficients ranged from 0.82 to 0.98. For interrater reliability, testing following the same protocol was repeated at 2 weeks, 3 and 6 months. Interrater correlation coefficients (r) ranged from 0.89 to 0.98. Interrater correlation coefficients (ICC) from scoring and later rescoring ten videotapes with the closest and furthest interrater agreement ranged from 0.88 to 0.99. The balance and locomotor skill categories were most responsive for assessing gross motor function in this population. These data support the use of the PDGMS as an ...