scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 1993"


Journal ArticleDOI
TL;DR: The overall reliability of assessments made with the Fugl-Meyer evaluation of physical performance in a rehabilitation setting was high, and the intraclass correlation coefficients for the subsections of the assessment varied from .61 for pain to .97 for the upper extremity.
Abstract: Background and Purpose. The purpose of this study was to establish the interrater reliability of assessments made with the Fugl-Meyer evaluation of physical performance in a rehabilitation setting. Subjects. Twelve patients (7 male, 5 female), aged 49 to 86 years (X=66), who had sustained a cerebrovascular accident participated in the study. All patients were admitted consecutively to a rehabilitation center and were between 6 days and 6 months poststroke. Methods. Three physical therapists, each with more than 10 years of experience, assessed the patients in a randomized and balanced order using this assessment. The therapists standardized the assessment approach prior to the study but did not discuss the procedure once the study began. Results. The overall reliability was high (overall intraclass correlation coefficient=.96), and the intraclass correlation coefficients for the subsections of the assessment varied from .61 for pain to .97 for the upper extremity. Conclusion and Discussion. The relative merits of using the Fugl-Meyer assessment as a research tool versus a clinical assessment for stroke are discussed.

630 citations


Journal ArticleDOI
TL;DR: The need for a standardized set of formulas for intraclass correlation is demonstrated, and it is urged that the standard error of measurement be included when estimates of reliability are reported.
Abstract: The reliability and precision of measurement in sports medicine are of concern in both research and clinical practice. The validity of conclusions drawn from a research project and the rationale for decisions made about the care of an injured athlete are directly related to the precision of measurement. Through analysis of variance, estimates of reliability and precision of measurement can be quantified. The purpose of this manuscript is to introduce the concepts of intraclass correlation as an estimate of reliability and standard error of measurement as an estimate of precision. The need for a standardized set of formulas for intraclass correlation is demonstrated, and it is urged that the standard error of measurement be included when estimates of reliability are reported. In addition, three examples are provided to illustrate important concepts and familiarize the reader with the process of calculating these estimates of reliability and precision of measurement.

296 citations


Journal ArticleDOI
TL;DR: Validity coefficients of the 12-item General Health Questionnaire (GHQ-12) were established against the Clinical Interview Schedule (CIS) in a sample of primary care patients and the C-GHQ by itself did not result in an improvement of the screening capacity of the GHQ; however the best results were obtained by combining the conventional scoring and C- GHQ case criteria.

259 citations


Journal Article
TL;DR: Prospective intraclass (within-method) statistical analysis of the various hip-scoring methods indicated that DI was superior to NA and OFA/WHR in comparability of score over time, and the associated large error questions the predictive use of the 7-point, subjective hip- scoring scheme.
Abstract: A 3-year prospective study of large-breed dogs (4 months to 3 years of age) was conducted to evaluate the influence of radiographic positioning and age on coxofemoral joint (hip) laxity, subjective hip score, and development of degenerative joint disease (DJD). The dogs (n = 142) were breeder- or client-owned and represented 14 breeds. With dogs under heavy sedation, hips were radiographed in the standard hip-extended position and in the new compression/distraction position at 4, 6, 12, 24, and 36 months of age. The standard hip-extended radiographic view was evaluated by 3 methods: subjective evaluation by a board-certified veterinary radiologist (WHR), according to the standard 7-point Orthopedic Foundation for Animals (OFA) scoring scheme (OFA/WHR); joint laxity quantitation, using the Norberg angle (NA) method; and subjective scoring by a veterinary orthopedic surgeon for radiographic evidence of DJD. The hips in the distraction radiographic view were evaluated for passive hip laxity, as measured by use of a unitless distraction index (DI). Results of the study indicated that at a specific age (4, 6, 12, 24, or 36 months), all methods of hip evaluation correlated with each other at a moderate level (P < 0.05). The strength of contemporaneous correlation tended to increase with age of evaluation. Longitudinally, the between-method correlations were usually significant (P < 0.05), but not at a sufficiently high level to permit reliable between-method prediction. Prospective intraclass (within-method) statistical analysis of the various hip-scoring methods indicated that DI was superior to NA and OFA/WHR in comparability of score over time. The intraclass correlation coefficient ranged from 0.55 to 0.91 for DI in contrast to 0.40 to 0.78 for NA, and 0.06 to 0.39 for OFA/WHR over the age intervals of the study. For reference, the highest Kappa of 0.39 for the subjective OFA/WHR scoring reflected a maximal level of agreement between time intervals, only slightly better than chance. The associated large error questions the predictive use of the 7-point, subjective hip-scoring scheme, particularly prior to the age of 2 years.

166 citations


Journal ArticleDOI
TL;DR: It is suggested that the reproducibility of the dietary history method used was acceptable, and that the dietary patterns of examinees were sufficiently stable to be compatible with the needs of epidemiological follow-up studies.
Abstract: This study gives results for comparisons between dietary history interviews repeated at short-term (4-8 months) and long-term (4-7 years) intervals in conjunction with the Finnish Mobile Clinic Health Examination Survey. Interviews surveying the whole range of consumable foods over the preceding year were completed in 1967-1976. Short-term study was accomplished among 93 adults, and long-term study among 1844 adults. Comparisons were made for intakes of 32 food groups and 32 nutrient indices. In the short term, the intraclass correlation coefficients for nutrient indices ranged from 0.16 to 0.80, with 90% of values higher than 0.5. The corresponding figures for repeated measurements at long-term interval were generally poorer, being in the range 0.12-0.60, with 45% of values > 0.5. When studied in population subgroups, long-term agreement in dietary data was not found to be affected by sex, age, body mass index or smoking status, but it may be reduced among heavier drinkers (> or = 20 g alcohol per day). The intraclass correlation coefficients for separate nutrients tended to be higher than those for different food groups. In conclusion, we suggest that the reproducibility of the dietary history method used was acceptable, and that the dietary patterns of examinees were sufficiently stable to be compatible with the needs of epidemiological follow-up studies.

68 citations


Journal ArticleDOI
TL;DR: It is made that product-moment correlations and percentage agreement indexes are inadequate measures of interrater, intrarater or test-retest agreement, and should be used and interpreted with caution.
Abstract: Twenty studies examining the reliability of assessment devices and outcome measures in therapeutic research were reviewed and analyzed. The 20 investigations contained 215 quantitative reliability values published in either the American Journal of Occupational Therapy or Physical Therapy during the past 5 years. The reliability studies were classified as interrater, intrarater, test-retest, or internal consistency. Examination of interrater reliability accounted for 41% of all reported reliability values. Studies published in Physical Therapy were more likely to be concerned with test-retest reliability, whereas studies published in the American Journal of Occupational Therapy more often focused on interrater reliability. Examination of the data revealed that the intraclass correlation coefficient (ICC) was the most frequently reported estimate of reliability, accounting for 57% of all reported reliability coefficients. Further review of the results indicated that Pearson product-moment correlations and percentage of agreement indexes accounted for 22% of all reliability values reported in the studies examined. The Pearson product-moment correlation measures association or covariation among variables, but not agreement, and percentage agreement indexes do not correct for chance agreement. The argument is made that product-moment correlations and percentage agreement indexes are inadequate measures of interrater, intrarater or test-retest agreement. They should be used and interpreted with caution.

67 citations


Journal ArticleDOI
TL;DR: The results indicate that physical therapists demonstrate low reliability in assessment of the presence of dysmetria and tremor using videotaped performances of the finger-to-nose test, and physical therapists should seek alternative methods of evaluation of UE coordination.
Abstract: Background and Purpose. The purpose of this study was to determine the intrarater and interrater reliability of measurements of three clinical features of coordination based on the performance of the “finger-to-nose” test. Subjects. Thirty-seven persons with traumatic brain injury (26 male, 11 female), aged 17 to 64 years (X=29.1, SD=9.9), participated in the study. Methods. Each subject's performance was videotaped and evaluated for the right and left upper extremities (UEs) (two trials each) with respect to the following variables: time of execution, degree of dysmetria, and degree of tremor (four-point ordinal ratings). One year later, five experienced physical therapists (including the original investigator) independently rated each patient's videotaped performance in the same manner as described above. Results. Intraclass correlation coefficients (ICC[3,1])for intrarater reliability were .971 and .986 and ICCs for interrater reliability were .920 and .913 for right and left UEs, respectively, for the time of execution. A generalized Kappa statistic of .54 was calculated for the scoring of dysmetria (both UEs), and Kappa statistics calculated for the scoring of tremor were .18 and .31 for right and left UEs, respectively. Interrater reliability was lower for the scoring of these variables and varied from .36 to .40 for dysmetria and from .27 to .26 for tremor (right and left UEs, respectively). Conclusion and Discussion. These results indicate that physical therapists demonstrate low reliability in assessment of the presence of dysmetria and tremor using videotaped performances of the finger-to-nose test. The results suggest, however, that therapists reliably measure the time of execution of this test. If the limitations associated with therapists' capacity for objective measurement of subjective phenomena cannot be overcome (eg, by establishment of more definitive scoring criteria for the measures of dysmetria and tremor), then therapists should seek alternative methods of evaluation of UE coordination.

65 citations


Journal ArticleDOI
TL;DR: PBS seems to be the most reliable index (both intra- and inter-examiner) for measuring the oral health status and is therefore recommended for use in clinical studies.
Abstract: Evaluation of periodontal therapy involves the use of several oral indices to describe the health status of hard and soft tissues. It was the objective of the present study to evaluate the reliability and reproducibility of some of these indices. A calibration and standardization session was designed to calibrate 10 examiners and a “gold standard” (an experienced examiner) in evaluating the following indices: the Volpe-Manhold calculus index (VM), the Lobene stain index (SI), a non-invasive modification of the Loe-Silness gingival index (GI), the papillary bleeding score (PBS) of Loesche, and the plaque index (PI) of Quigley-Hein as modified by Turesky. For each index, the average intraclass correlation was calculated between two subject visits. The highest intraclass correlation, 0.94, was found for PBS. The intraclass correlation for PI was 0.70 and for VM. 0.65. The lowest intraclass correlations were for stain, 0.47, and GI, 0.25. Intra-subject correlations between the 2 visits were good for all indices, but were best for PBS, followed by VM. PBS seems to be the most reliable index (both intra- and inter-examiner) for measuring the oral health status and is therefore recommended for use in clinical studies.

62 citations


Journal ArticleDOI
TL;DR: In this paper, the reliability and accuracy of the Delphi technique are assessed by means of the intraclass correlation coefficient, which is derived from the ratings of only one group of respondents.

48 citations


Journal Article
TL;DR: The effectiveness of teaching critical appraisal of the literature remains uncertain and more rigorous methods are needed in research in this area.
Abstract: OBJECTIVE: To evaluate studies assessing the effectiveness of teaching critical appraisal of the literature to medical students. DATA SOURCES: French and English articles published between 1980 and 1990 indexed on MEDLINE or FAMLI as well as articles identified from the bibliographies. STUDY SELECTION: Studies were evaluated if the subjects were undergraduate or postgraduate medical students and if the teaching intervention was aimed at improving one or more of the following areas: knowledge in clinical epidemiology and biostatistics, reading habits and ability to critically appraise a scientific article. DATA EXTRACTION: The methodologic quality of the articles was assessed by three evaluators, who used a modified version of Poynard's checklist to assign a score. Articles with a score of 60% or more were considered satisfactory. The reliability of the checklist was evaluated by means of the kappa (kappa) coefficient and a coefficient of intraclass correlation. DATA SYNTHESIS: For the three evaluators the mean kappa coefficient was 0.33 and the coefficient of intraclass correlation 0.70. Five of the 10 studies had an overall score of 60% or higher. The quality of the individual sections of the articles varied: purpose of the study 85%, description of the population 58%, methods 44%, analysis of results 50%, and conclusions 90%. CONCLUSIONS: The effectiveness of teaching critical appraisal of the literature remains uncertain. More rigorous methods are needed in research in this area.

38 citations


Journal Article
TL;DR: It is shown that under certain conditions, it is possible to estimate the reliability by using the results of a Principal Component Analysis only and the relation between Cronbach's alpha and intraclass correlation coefficient, which are both used to estimates the reliability of continuous measures.
Abstract: The objective is to establish a simple relationship between two frequently used validation techniques which have been developed in the literature along the same lines: Principal Component Analysis and Cronbach's alpha. We have shown that under certain conditions, it is possible to estimate the reliability by using the results of a Principal Component Analysis only. Moreover, we report the relation between Cronbach's alpha and intraclass correlation coefficient, which are both used to estimate the reliability of continuous measures.

Journal ArticleDOI
TL;DR: In this paper, the reliability and validity of a Japanese version of the Dementia Behavior Disturbance Scale (DBD Scale) was investigated and the relationship between DBD scores and the degree of burden felt by caregivers was investigated.
Abstract: Since behavioral disturbance among patients with dementia is a great burden for their caregivers, quantification of behavioral disturbance is essential in determining disease severity and assessing the impact of the disease on caregivers However, the method of its quantification for objective assessment is not established yet We studied the reliability and validity of a Japanese version of the Dementia Behavior Disturbance Scale (DBD Scale) which was originally developed by Baumgarten et al We also studied the relationship between DBD scores and the degree of burden felt by caregivers Our subjects consisted of 27 cases with dementia (mean age 777 years), and 17 cases of patients with neurological disorders without dementia (768 years), and 10 institutionalized patients with dementia (823 years) The test-retest reliability, internal consistency, and inter-rater reliability were very good; the coefficient of correlation between DBD scores at the two interviews was 096, the coefficient of internal consistency was 095, and the intraclass correlation coefficient was 071 +/- 010 DBD scores correlated significantly with SPMSQ errors and caregivers' burden; r = 054 and 053, respectively Our results indicate that the DBD Scale is highly reliable, and may be useful for objective assessment of behavioral disturbance and caregivers' burden

Journal ArticleDOI
TL;DR: Interrater reliability of the Worker Role Interview was computed for three raters for a sample of 30 adult subjects receiving rehabilitation due to an upper extremity injury, indicating further need for instrument refinement in select areas.
Abstract: The Worker Role Interview is a semistructured interview designed to be used as the psychosocial-environmental component of the initial rehabilitation assessment for the injured worker. Interrater reliability of the Worker Role Interview was computed for three raters for a sample of 30 adult subjects receiving rehabilitation due to an upper extremity injury. Reliability was assessed with the intraclass correlation approach. The coefficients estimating interrater reliability for six content areas ranged from .46 to .92 with a total value of .81. Three out of six content areas received ratings well below the accepted standard of .80, suggesting further need for instrument refinement in select areas. Test-retest reliability was computed for one rater for a sample of 20 subjects. The intraclass coefficient values ranged from .86 to .94 with a total value of .95 indicating high test-retest reliability.

Journal ArticleDOI
TL;DR: The results seem to indicate that the video motion analysis system used in this study yields repeatable ROM and velocity measures on a clinical population, and these measures may be obtained without undue concern for measurement artifact due to the instrumentation reliability.
Abstract: Background and Purpose. The purpose of this study was to investigate the repeatability of spinal range of motion (ROM) and movement velocity measurements of patients with chronic low back pain, using a two-dimensional motion analysis system. This apparatus uses reflective markers placed on anatomical landmarks and video digitization to derive ROM measurements from three segments of the spine and associated velocities through the respective ROMs. Subjects. Forty-two patients with chronic LBP underwent ROM and movement velocity testing. Methods. Each subject was tested twice without removal of the markers to minimize error contribution from differences in marker placement. Results. Results indicated that both the ROM measures and the velocity measures were highly repeatable. Intraclass correlations for the ROM measures ranged from .77 to .96. Velocity measures were also reliable, with intraclass correlation coefficients ranging from .75 to .97. Conclusion and Discussion. Overall, the results seem to indicate that the video motion analysis system used in this study yields repeatable ROM and velocity measures on a clinical population. In practice, however, the measures may reflect greater errors due to the need of examiners to relocate markers at different testing sessions. These systems also offer distinct advantages over other means of obtaining ROM and velocity measures. The results of this study indicate that these measures may be obtained without undue concern for measurement artifact due to the instrumentation reliability.

Journal ArticleDOI
TL;DR: This preliminary review was performed to determine the intra- and interrater reliability of TROM and resulted in moderate consistency of measures of intrarater reliability between trials, and between instruments used (digital and manual).

Journal ArticleDOI
TL;DR: In this paper, the authors examined agreement between importance ratings used in job analysis and found that there was a high level of agreement as measured by the relative and dichotomous indexes.
Abstract: The authors examined agreement between importance ratings used in job analysis. The ratings were obtained from small committees of content experts and from field-survey respondents. Three measures of agreement were used: a relative index (product-moment correlation), an absolute index (intraclass correlation), and a dichotomous index (cutpoint). Data were obtained from 2 job analysis studies conducted for purposes of developing teacher-licensure tests in Spanish and in chemistry. In both studies there was a high level of agreement as measured by the relative and dichotomous indexes and a moderately high level of agreement as measured by the absolute index. The implications of these findings for job analysts and test developers are discussed

Journal Article
TL;DR: The mean value of at least 3 subsequent evaluations after an adequate adaptation period (5 to 10 minutes) to the equipment will be useful for predicting energy requirements of apparently resting, clinically normal dogs.
Abstract: Energy expenditure (EE) was determined, using an open-flow indirect calorimetry system in a group of 20 clinically normal, apparently resting, client-owned dogs. Five evaluations were performed over an 8-hour period to determine reliability of the method. The intraclass correlation coefficient was calculated as the ratio of within- and between-subject variances, using repeated-measures ANOVA. When only the middle 3 evaluations were included, the intraclass correlation coefficient was 0.87, indicating good reliability. The first evaluation was higher than the subsequent 4 evaluations for rate of O2 consumption (Vo2/kg and Vo2/kg0.75; (P < or = 0.01), and EE/kg and EE/kg0.75 (P < or = 0.005). The respiratory quotients at the first (P = 0.004) and second (P = 0.013) evaluations were different from the respiratory quotient at the fourth evaluation. Therefore, the first evaluation may not be representative of the actual EE. The mean value of at least 3 subsequent evaluations after an adequate adaptation period (5 to 10 minutes) to the equipment will be useful for predicting energy requirements of apparently resting, clinically normal dogs.

Journal ArticleDOI
TL;DR: Only moderate agreement was established between raters in such 'class' assignments of the modified 18-item, sign-based CORE index of melancholia, a limitation which can be redressed by imposing a 'probable/possible melancholia' band of scores.

Journal ArticleDOI
TL;DR: The primary finding from this study is that TVPS scores on the total test show adequate test-retest reliability for use in clinical settings, however, the Scores on the subtests should be used with extreme caution, as the test- retest reliability estimates were low.
Abstract: This study examined the test-retest reliability of the Test of Visual Perceptual Skills (nonmotor) (TVPS). The sample consisted of 30 first- and second-grade children (aged 6 years through 8 years) with identified learning disabilities. The TVPS was administered on two separate occasions that were 1 to 2 weeks apart. The intraclass correlation coefficient for the total test standard scores was .81. The intraclass correlation coefficients for the subtests ranged from .33 (Sequential Memory) to .78 (Form Constancy). The primary finding from this study is that TVPS scores on the total test show adequate test-retest reliability for use in clinical settings. The scores on the subtests, however, should be used with extreme caution, as the test-retest reliability estimates were low.

Journal ArticleDOI
TL;DR: The results suggest that these tiltboard tests do not give stable and reliable measurements across test sessions, and before these tests can be used to document change in postural control abilities across time, further research is warranted.
Abstract: Background and Purpose. Most clinical evaluations of postural control in children are relatively subjective and have not been tested for reliability of scoring. The purpose of this study was to investigate the test-retest reliability of measurements obtained with two tiltboard tests. Subjects. Subjects were 18 children, aged 53 to 81 months (X=64.4, SD=8.3), who were typically developing (TD group) and 18 children, aged 50 to 79 months (X=63.3, SD=8.4), with developmental delays (DD group). Methods. Each child was tested using the two tiltboard tests and was then retested using the same tests approximately 1 week later. The maximum angle of tiltboard tilt prior to any postural adjustment by the child was recorded. Results. Intraclass correlation coefficients for test-retest reliability (two-way, random-effects, repeated-measures model) ranged from .49 to .54 for the TD group and from .52 to .82 for the DD group. Angles were higher for both groups for the second test. Conclusion and Discussion. The results suggest that these tiltboard tests do not give stable and reliable measurements across test sessions. Before these tests can be used to document change in postural control abilities across time, further research is warranted.

Journal Article
TL;DR: Findings indicate the functional mobility assessment tool is reliable, thereby increasing the usefulness of this method for clinical assessment and the high resolution of the FMAT makes this tool ideally suited for use in future studies focusing on the prediction of mobility function following neurological insult.
Abstract: The purpose of this preliminary study was to examine the intratester and intertester reliability of a functional mobility assessment tool (FMAT). Seven licensed physical therapists with varying amounts of clinical experience served as raters. Twelve patients with neurological deficits were subjects for this study. Raters were asked to provide eight possible ratings for each of eight critical mobility functions. Average weighted kappa coefficients ranged from .82 to .97. Intraclass correlation coefficients ranged from .73 to .97 for the first assessment and from .52 to .97 for the second (retest) assessment. A high degree of agreement between and within seven raters indicated that this tool may provide an effective assessment of stroke, brain injury, and spinal cord injury. Preliminary findings indicate the functional mobility assessment tool is reliable, thereby increasing the usefulness of this method for clinical assessment. The high resolution of the FMAT makes this tool ideally suited for use in future studies focusing on the prediction of mobility function following neurological insult. Language: en

Journal ArticleDOI
TL;DR: In this paper, it was shown that four estimators of the measure of agreement between two dichotomous ratings of a person have been proposed, i.e., Scott's (1955) π coefficient, Cohen's (1960) k, Maxwell & Pilliner's (1968) r, and Mak's (1988) p.
Abstract: Many estimators of the measure of agreement between two dichotomous ratings of a person have been proposed. The results of Fleiss (1975) are extended, and it is shown that four estimators— Scott's (1955) π coefficient, Cohen's (1960) k, Maxwell & Pilliner's (1968) r,,, and Mak's (1988) p—are interpretable both as chance-corrected measures of agreement and as intraclass correla tion coefficients for different ANOVA models. Rela tionships among these estimators are established for finite samples. Under Kraemer's (1979) model, it is shown that these estimators are equivalent in large samples, and that the equations for their large sample variances are equivalent.

Journal ArticleDOI
TL;DR: In this paper, a test statistic for testing the validity of the assumptions of family independence and constant interclass and intraclass correlations is developed for constant family size; it has an asymptotic chi-square distribution.
Abstract: When familial data are analysed, the model usually employed assumes independence of family observations and constancy of interclass and intraclass correlations. A statistic for testing the validity of the assumptions of family independence and constant interclass and intraclass correlations is developed. The test statistic is for constant family size; it has an asymptotic chi-square distribution. An example to illustrate the theory is given using Frets's data on head lengths. A recommendation is made on how to apply the test to the general case of varying sizes. Another recommendation is made on models to be tried once the null hypothesis is rejected.

Journal ArticleDOI
TL;DR: In this paper, new coefficients of agreement are suggested for the measure of intraclass consistency between observations on two variables, derived from a general coefficient for measuring intra-class dependence in a bivariate analysis context.
Abstract: The community of preference of N judges, or raters, can be quantified with coefficients measuring the agreement of the values assigned by the judges to a set of H units ("classes"). In this paper, new coefficients of agreement are suggested for the measure of intraclass consistency between observations on two variables. The coefficients are derived from a general coefficient for measuring intraclass dependence in a bivariate analysis context. It is shown that various coefficients for the univariate agreement analysis (Krippendorff's r; Fisher's intraclass correlation coefficient; Cohen's k; Scott's π Robinson's A, and Kendall's W) are particular cases of the suggested coefficients.

Journal ArticleDOI
TL;DR: A simple, efficient, iterative method of computing the maximum likelihood estimates of components of variance and intraclass correlation under the assumption of multivariate normality is provided.