scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 1991"


Journal ArticleDOI
TL;DR: Using data from a clinical trial of therapy for back pain, it is shown that reproducibility should generally be quantified with the intraclass correlation coefficient rather than the more common Pearson r.r. coefficient, and that reproduction by retest at one-to-two week intervals may result in more realistic estimates of the variability to be observed among control subjects in a longitudinal study.

1,363 citations


Journal ArticleDOI
TL;DR: Although the improved reliability of the Standardized Mini-Mental State was achieved by reducing measurement noise, this advantage would likely occur in a broad spectrum of patients.
Abstract: Objective The objective of this study was to compare the reliability of the Mini-Mental State Examination with that of a new Standardized Mini-Mental State Examination, which has expanded guidelines for administration and scoring. Method The subjects were 32 stable elderly residents of a nursing home and 16 elderly residents of a chronic care hospital unit. Six raters administered the Folstein Mini-Mental State to 22 of these stable elderly subjects, and five raters administered the standardized version to 26 of these subjects. Each subject was tested on three different occasions 1 week apart. Each rater tested 4-6 subjects at the first and third weeks and 4-6 different subjects at the second week. The analytic technique used was one-way analysis of variance to estimate the interrater variance and the intrarater variance. Results The intrarater variance on all occasions was reduced by 86% and the interrater variance was reduced by 76% when the Standardized Mini-Mental State was used; the reductions in variance were significant (p less than 0.003). The intraclass correlation for the Mini-Mental State was 0.69; for the standardized version it was 0.90. It took less time to administer the Standardized Mini-Mental State than the Mini-Mental State. Conclusions The Standardized Mini-Mental State had better reliability than the Mini-Mental State in this study group. Although the improved reliability of the Standardized Mini-Mental State was achieved by reducing measurement noise, this advantage would likely occur in a broad spectrum of patients.

595 citations


Journal ArticleDOI
TL;DR: There is increasing awareness among researchers that the two most appropriate measures of reliability are the intraclass correlation coefficient and kappa, however, unacceptable statistical Measures of reliability such as chi-square, percent agreement, product moment correlation, as well as any measure of association and Yule's Y still appear in the literature.
Abstract: Reliability is defined as the degree to which multiple assessments of a subject agree (reproducibility). There is increasing awareness among researchers that the two most appropriate measures of reliability are the intraclass correlation coefficient and kappa. However, unacceptable statistical measures of reliability such as chi-square, percent agreement, product moment correlation, as well as any measure of association and Yule's Y still appear in the literature. There are costs associated with improper measurements, unreliable diagnostic systems, inappropriate statistics and measures of reliability, and poor quality research. Costs are incurred when misleading information directs resources and talents into nonproductive avenues of research. The consequences of unreliable measurements and diagnosis are illustrated with some studies of schizophrenia.

331 citations


Journal ArticleDOI
TL;DR: The epidemiological literature has been ambiguous concerning reliability estimates for continuous variables, but based on demonstrations available in the literature this paper recalls that both estimates are equivalent and highlights the conditions under which they are equivalent.

296 citations


Journal ArticleDOI
TL;DR: The results indicate that physical therapists' VOGA assessments are only slightly to moderately reliable and that improved interrater reliability of the assessments of physical therapists utilizing this technique is needed and suggests that there is a need for greater standardization of gait-analysis training.
Abstract: The purpose of this study was to determine the interrater reliability of videotaped observational gait-analysis (VOGA) assessments. Fifty-four licensed physical therapists with varying amounts of clinical experience served as raters. Three patients with rheumatoid arthritis who demonstrated an abnormal gait pattern served as subjects for the videotape. The raters analyzed each patient's most severely involved knee during the four subphases of stance for the kinematic variables of knee flexion and genu valgum. Raters were asked to determine whether these variables were inadequate, normal, or excessive. The temporospatial variables analyzed throughout the entire gait cycle were cadence, step length, stride length, stance time, and step width. Generalized kappa coefficients ranged from .11 to .52. Intraclass correlation coefficients (2,1) and (3,1) were slightly higher. Our results indicate that physical therapists' VOGA assessments are only slightly to moderately reliable and that improved interrater reliability of the assessments of physical therapists utilizing this technique is needed. Our data suggest that there is a need for greater standardization of gait-analysis training.

230 citations


Journal Article
TL;DR: Suggestions for the standardization of experimental design, analysis and evaluation for reliability studies are offered, with a primary focus on appropriate concordance statistics, i.e., Kappa and intraclass correlation.

175 citations


Journal ArticleDOI
TL;DR: Significant agreement among the ethnic-sex groups varied, with the Chinese females and the Japanese males having the higher rl's, and the Hawaiian males and females having the lowest values.
Abstract: The validity of a quantitative diet history method was evaluated among 262 men and women from the five major ethnic groups of Hawaii (Japanese, Caucasian, Chinese, Filipino, and Hawaiian) in 1984-1987. The reference data included four 1-week food records obtained at approximately 3-month intervals. The diet history was administered 6 months after the fourth week of food records and included 47 foods that were major sources of protein, fat, cholesterol, vitamins A and C, and beta-carotene. Photographs showing three portion sizes were utilized for quantifying intakes in the food records and the diet history. Generally, among all ethnic-sex groups, intakes from the diet history were greater than those from the record sets, particularly for the vitamins. Agreement was measured by the intraclass correlation coefficient (rl) and the weighted kappa statistic (kappa w), and consistency was measured by Spearman's rank correlation (rho). For the total group, the rl's ranged from 0.48 for vitamin A to 0.61 for cholesterol. The kappa w's were generally lower than the rl's, whereas the rho's were higher, ranging from 0.52 for vitamin C to 0.64 for cholesterol. Agreement among the ethnic-sex groups varied, with the Chinese females and the Japanese males having the higher rl's, and the Hawaiian males and females having the lowest values. The results provide evidence that the quantitative diet history gives reasonably accurate estimates of the usual dietary intakes among the major ethnic groups of Hawaii.

173 citations


Journal ArticleDOI
TL;DR: In this article, the authors compared the reliability of the Mini-Mental State Examination with that of a new Standardized Mini Mental State Examination, which has expanded guidelines for administration and scoring.
Abstract: Objective The objective of this study was to compare the reliability of the Mini-Mental State Examination with that of a new Standardized Mini-Mental State Examination, which has expanded guidelines for administration and scoring. Method The subjects were 32 stable elderly residents of a nursing home and 16 elderly residents of a chronic care hospital unit. Six raters administered the Folstein Mini-Mental State to 22 of these stable elderly subjects, and five raters administered the standardized version to 26 of these subjects. Each subject was tested on three different occasions 1 week apart. Each rater tested 4-6 subjects at the first and third weeks and 4-6 different subjects at the second week. The analytic technique used was one-way analysis of variance to estimate the interrater variance and the intrarater variance. Results The intrarater variance on all occasions was reduced by 86% and the interrater variance was reduced by 76% when the Standardized Mini-Mental State was used; the reductions in variance were significant (p less than 0.003). The intraclass correlation for the Mini-Mental State was 0.69; for the standardized version it was 0.90. It took less time to administer the Standardized Mini-Mental State than the Mini-Mental State. Conclusions The Standardized Mini-Mental State had better reliability than the Mini-Mental State in this study group. Although the improved reliability of the Standardized Mini-Mental State was achieved by reducing measurement noise, this advantage would likely occur in a broad spectrum of patients.

172 citations


Journal ArticleDOI
TL;DR: The strong correlation of the scales with an independently assessed parallel measure and the separate positive and negative factors found with principal components analysis confirm the construct validity of the instruments.
Abstract: This 4-center study assesses the reliability and validity of the Chinese versions of the Scale for Assessment of Positive Symptoms and the Scale for Assessment of Negative Symptoms. Interrater reliability, short-term test-retest reliability, and internal consistency were excellent; intraclass correlation coefficients and Cronbach alphas for the overall scores were all over 0.8. The strong correlation of the scales with an independently assessed parallel measure (the Chinese version of the Brief Psychiatric Rating Scale) and the separate positive and negative factors found with principal components analysis confirm the construct validity of the instruments. These findings demonstrate the importance of culturally sensitive revision and rigorous psychometric evaluation of Western instruments prior to their use in non-Western cultures.

76 citations


Journal ArticleDOI
TL;DR: It is concluded that isokinetic evaluation of torque, as measured by PT and APT in subjects with spastic hemiparesis, can yield reliable results in both extremities.
Abstract: The purpose of this study was to evaluate and compare the test-retest reliability of isokinetic torque measurements in the involved and uninvolved knee musculature of 20 subjects with spastic hemiparesis. An isokinetic dynamometer was used to measure maximal voluntary knee extension and flexion at 60 degrees and 120 degrees/s. Peak torque (PT) and average peak torque (APT) data were collected from five repetitions on two separate occasions. Average peak torque was defined as the mean of the PT values obtained during each of the five repetitions. Spasticity was measured in the involved knee musculature prior to isokinetic testing using the Ashworth Scale. Pearson Product-Moment Correlation Coefficients and intraclass correlation coefficients (ICCs) were high (greater than or equal to .90) for both knees for PT and APT at both angular velocities. No clinically meaningful differences were found between the Pearson correlation coefficients and the ICCs of the involved versus the uninvolved knee for any testing conditions. We concluded that isokinetic evaluation of torque, as measured by PT and APT in subjects with spastic hemiparesis, can yield reliable results in both extremities.

50 citations


Journal ArticleDOI
TL;DR: The DBRI is a specific, reliable and valid caregiver‐reported measure of dysfunctional behaviour in cognitively impaired elderly living in the community.
Abstract: The objective of this study was to examine the reliability and validity of the Dysfunctional Behaviour Rating Instrument (DBRI) in cognitively impaired older adults living in the community. A total of 184 adults with suspected cognitive impairment received a standardized history, physical examination and work-up that included the Standardized Mini-Mental State Examination. Caregivers scored a DBRI Behaviour Problem Checklist (BPC) and Lawton Scale for each patient. The reliability of the DBRI, measured by an intraclass correlation coefficient, was 0.75. The correlation coefficient between the DBRI and the BPC total score was 0.71. The correlations between the DBRI and the cognitive, activities of daily living and self-care domain scores of the BPC were lower (0.66, 0.38 and 0.26 respectively). The DBRI is a specific, reliable and valid caregiver-reported measure of dysfunctional behaviour in cognitively impaired elderly living in the community.

Journal ArticleDOI
TL;DR: Interobserver reliability in head circumference measurement was assessed in a cohort of 1105 low birthweight infants enrolled in a study of brain hemorrhage, finding that this level of reliability may be acceptable, but in research studies this degree of misclassification would lead to attenuation of the odds ratio.

Journal ArticleDOI
TL;DR: It is proposed that the correlation coefficient may be inappropriate in these studies as a measure of association and it is recommended that other techniques be used to assess agreement between nutrient scores derived in reliability or validation studies.
Abstract: Accuracy and precision of nutritional data are crucial in estimating effects in nutritional epidemiology Because it is known that such data are usually flawed, studies have been designed to estimate both the validity of diet assessment methods in measuring “true” diet and the reliability of these methods in providing nutrient data that are at least reproducible In these studies, validity and reliability have often been gauged by computing correlation coefficients between two or more estimates of diet and testing the coefficeint's departure from 0 We propose that the correlation coefficient may be inappropriate in these studies as a measure of association If correlation coefficients are presented, we suggest that one should also present confidence intervals and test the departure of the coefficient from approximately 1 rather than 0 We have examined this approach using dietary data from various studies We have computed 95% confidence intervals of the correlation coefficients and have tested Ho:rho=095 as an approximation of rho=100 In all of the studies selected, comparisons produced correlation coefficients statistically significantly different from both 095 and 0 Due to the dependence of the correlation coefficient on factors unique to individual studies, it is recommended that other techniques be used to assess agreement between nutrient scores derived in reliability or validation studies Viable options include linear regression, analyses of the standard deviations of the differences between scores, and examinations of the intraclass correlation coefficient

Journal ArticleDOI
TL;DR: Validity indices were higher on diagnostic categories on which the flow-chart is simpler, suggesting that the memorylcognition process is a factor that attenuates reliability and face validity assessments.
Abstract: The Diagnostic-Interview Schedule (DIS) is an instrument useful for cross-cultural studies and for research service delivery, therefore a study of the reliability and concurrent validity in Mexico was carried out. Interrater reliability showed intraclass correlation coefficient (ICC) values of .89 among lay interviewers. The DSM-III syndrome checklist was elicited as the clinical diagnostic measure for concurrent validity. hnzterrater reliability among clinicians showed ICC values ranging from .64 to .92. For the concurrent validity study a minimum quota of 10 patients for each diagnosis was sought. The final sample included 55 inpatients and 94 outpatients interviewed independently. Sensitivity as a whole was low, ranging from .08 to .53; specificity ranged from .80 to .99; kappa agreement ranged from -.02 to .60; Yules' Y ranged from -.05 to .81. Validity indices were higher on diagnostic categories on which the flow-chart is simpler, suggesting that the memorylcognition process is a factor that attenua...

Journal ArticleDOI
TL;DR: The results confirm and extend observations by others that these assessment measures are sufficiently reliable for use in a multiinstitutional collaborative effort and can be used to design clinical trials that have sufficient statistical power to detect changes in the rate of disease progression.

Journal ArticleDOI
TL;DR: The purpose of this study was to compare grip-strength measurements obtained with three instruments: a Jamar handgrip dynamometer, a modified sphygomomanometer (MS) inflated to 20 mmHg, and an MS inflated to 30mmHg.

Journal Article
TL;DR: An experiment was undertaken to determine the intra- and interexaminer reliability of a paraspinal skin temperature differential instrument and numerical ratings were evaluated for agreement with the intraclass correlation coefficient.

Journal ArticleDOI
TL;DR: In this paper, a scale-dependent procedure for assessing the reliability of ratings for multiple judges using intraclass correlation is presented, where scale type is defined in terms of admissible transformations, and standardizing transformations for ratio and in terval scales are presented to solve the problem of adjusting ratings for arbitrary scale factors.
Abstract: Scale-dependent procedures are presented for assessing the reliability of ratings for multiple judges using intraclass correlation. Scale type is defined in terms of admissible transformations, and standardizing transformations for ratio and in terval scales are presented to solve the problem of adjusting ratings for "arbitrary scale factors" (unit and/or origin of the scale). The theory of mean ingfulness of numerical statements is introduced and the coefficient of relational agreement (Stine, 1989b) is defined as the degree of agreement among judges, with respect to (scale-dependent) empirically meaningful relationships. Other topics discussed include the treatment of variability due to judges in relation to scale type, and the reliabili ty of magnitude estimates in psychophysics.

Journal ArticleDOI
TL;DR: In this paper, the interrater agreement of visual judgments made from single-subject data was examined, and the results suggest that the low interraters agreement often associated with visual analysis of single subject data may be improved by simple supplements to visually inspected charts.
Abstract: The interrater agreement of visual judgments made from single-subject data was examined. Seventy-nine raters were given 21 single-subject graphs. Thirty-nine of the raters examined graphs containing single-subject data arrayed in the traditional format. The remaining 40 subjects reviewed AB graphs that were supplemented by a trend line. As measured by intraclass correlation coefficients, interrater agreement was higher for the trend line group than for the group relying only on visual analysis. There was a statistically significant correlation between the change in slope across the phases of the AB design and a score reflecting disagreement among raters in the visual analysis group. This relationship between change in slope and rater disagreement was not present in the trend line group. The results suggest that the low interrater agreement often associated with visual analysis of single-subject data may be improved by simple supplements to visually inspected charts.

Journal ArticleDOI
TL;DR: The adjustments required for valid application of matched pair procedures, including the paired t-test and McNemar's chi 2 test for correlated proportions are presented.
Abstract: Application of standard statistical procedures to site-specific data in periodontal research is invalid unless site-to-site dependencies are accounted for. In this paper, we present the adjustments required for valid application of matched pair procedures, including the paired t-test and McNemar's chi 2 test for correlated proportions. Examples are given involving data arising from: (1) the comparison of pre- and post-treatment clinical measurements; (ii) split-mouth protocols.

Journal Article
TL;DR: An investigation was undertaken to determine the inter- and intraexaminer reliability of the Gonstead pelvic radiographic marking system, and in every case, intraex Examiner agreement was superior to interexaminer concordance.

Journal ArticleDOI
TL;DR: In this article, the authors identify three models of subjective well-being and show that the respective estimates of determination require different interpretations of the correlation statistic, and the differences are illustrated using data from the Newfoundland Longitudinal Study of Aging.
Abstract: The issue has been raised previously that the use of the correlation or squared correlation as an estimate of determination depends on the type of model that is considered (Ozer, 1985). We identify three models of subjective well-being and show that the respective estimates of determination require different interpretations of the correlation statistic. The differences are illustrated using data from the Newfoundland Longitudinal Study of Aging. (NLSA).

Journal ArticleDOI
TL;DR: In this article, the asymptotic variance of the maximum likelihood estimator of the sondaughter interclass correlation was derived under the assumption of variable family sizes and normality.
Abstract: SUMMARY In genetics, the term interclass correlation has been used to refer to the parent-offspring and son-daughter correlations within families. Under the assumption of variable family sizes and normality, we derive the asymptotic variance of the maximum likelihood estimator of the sondaughter interclass correlation.

Journal ArticleDOI
TL;DR: A bias correction was derived for the maximum likelihood estimator (MLE) of the intraclass correlation and it was shown that the first correction term was equivalent to Fisher's reciprocal bias correction on hisZ scores.
Abstract: A bias correction was derived for the maximum likelihood estimator (MLE) of the intraclass correlation. The bias consisted of two parts: a correction from MLE to the analysis of variance estimator (ANOVA) and the bias of ANOVA. The total possible bias was always negative and depended upon both the degree of correlation and the design size and balance. The first part of the bias was an exact algebraic expression from MLE to ANOVA, and the corrected estimator by this part was ANOVA. It was also shown that the first correction term was equivalent to Fisher's reciprocal bias correction on hisZ scores. The total possible bias of MLE was large for small and moderate samples. Relative biases were larger for small parametric values and vice versa. To ensure a relative bias less than 10% assuming an intraclass correlation of 0.025, which is not unusual in most of the animal genetic studies, the total number of observations (N) should be not less than 500. From a design point of view, minimum bias occurred atn = 2, the minimum family size possible, underN fixed.

Journal ArticleDOI
TL;DR: Clinicians using such protocols should be aware of differences within and between days and recognize that measurements will be influenced by the number of trials completed, as results showed slight differences between the first three and last three trials.
Abstract: The purpose of this study was to determine within- and between-day reliability of measurements of nondisabled subjects for the variables of force and velocity when a balance board (STARStation®) was positioned at heights of 4.5 and 7.5 cm from the supporting surface. Twenty-four nondisabled subjects each completed six trials of board rotation at a self-selected velocity. Each trial consisted of 10 revolutions in a clockwise direction. Measurements were repeated within the same day for a second board position, and all tests were completed again on a second day. Descriptive statistics were computed for force and velocity, and intraclass correlation coefficients (ICCs) were calculated. Data were submitted to analyses of variance and follow-up tests. Results showed slight differences between the first three and last three trials. Intraclass correlation coefficients for within-subject reliability for the independent variable day ranged from .72 to .81, and ICCs for within-day reliability for the independent variable trial ranged from .46 to .81. Clinicians using such protocols should be aware of differences within and between days and recognize that measurements will be influenced by the number of trials completed.

Journal ArticleDOI
TL;DR: A procedure for evaluating a variety of rater reliability models is presented, and the results contrasted to those found with an intraclass correlation approach.
Abstract: A procedure for evaluating a variety of rater reliability models is presented. A multivariate linear model is utilized to describe and assess a set of ratings. The parameters of such a model are reexpressed in terms of a factor-analytic model. Maximum likelihood methods are employed to estimate and test the parameters in this factor-analytic model. The approach is related to the use of the intraclass correlation coefficient to estimate reliability. Two examples are presented, and the results contrasted to those found with an intraclass correlation approach. Extensions of the procedure to multiple sets of judges, multiple measures, and multiple groups is introduced.