scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 1984"


Journal ArticleDOI
TL;DR: A new rating instrument, the Alzheimer's Disease Assessment Scale, was designed specifically to evaluate the severity of cognitive and noncognitive behavioral dysfunctions characteristic of persons with Alzheimer's disease.
Abstract: A new rating instrument, the Alzheimer's Disease Assessment Scale, was designed specifically to evaluate the severity of cognitive and noncognitive behavioral dysfunctions characteristic of persons with Alzheimer's disease. Item descriptions, administration procedures, and scoring are outlined. Twenty-seven subjects with Alzheimer's disease and 28 normal elderly subjects were rated on 40 items. Twenty-one items with significant intraclass correlation coefficients for interrater reliability (range, .650-.989) and significant Spearman rank-order correlation coefficients for test-retest reliability (range, .514-1) constitute the final scale. Subjects with Alzheimer's disease had significantly more cognitive and noncognitive dysfunction than the normal elderly subjects.

3,792 citations


Journal ArticleDOI
TL;DR: It is concluded that, even with moderately sized samples, the effects of age and sex can best be adjusted for through a twin-based approach.
Abstract: For most psychological, physiological, and medical variables there are substantial age and sex effects. In assessing twin similarity for these variables, one can either fail to adjust for the effects of age and sex, adjust for these effects using normative data, or use information in the twin sample to define an age-sex adjustment. It is shown that failing to correct for age and sex effects when they exist will result in overestimation of the twin intraclass correlation. Using normative data to define an age-sex adjustment will also result in overestimation of the twin intraclass correlation, although the magnitude of this overestimation is slight for moderate-sized normative samples and virtually nonexistent for large normative samples. Using a twin-based age-sex adjustment will lead to an underestimation of the twin intraclass correlation, but this underestimation can be corrected for through proper specification of the degrees of freedom for the between-pairs mean square. Illustration of the effects of age-sex adjustment are provided as well as the results of a computer simulation comparison of the various approaches. It is concluded that, even with moderately sized samples, the effects of age and sex can best be adjusted for through a twin-based approach.

739 citations


Journal ArticleDOI
TL;DR: Methods are presented for performing multiple regression analyses and multiple logistic regression analyses on ophthalmologic data with normally and binomially distributed outcome variables, while accounting for the intraclass correlation between eyes.
Abstract: Methods are presented for performing multiple regression analyses and multiple logistic regression analyses on ophthalmologic data with normally and binomially distributed outcome variables, while accounting for the intraclass correlation between eyes. These methods are extended to more general nested data structures where a variable number of subunits are available for each primary unit of analysis, as in familial data. These methods can also be applied to other types of paired data, as in matched studies with a variable matching ratio, where one has a continuous outcome variable and wishes to control for other confounding variables while maintaining the matching. Examples are given of these methods with a group of over 400 patients with retinitis pigmentosa, in which spherical refractive error and visual acuity are related to genetic type after the effects of age, sex and the presence of cataract, have been controlled.

262 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed general chance-corrected measures of agreement on individual subjects, for several observers using nominal or ordinal categories, which can be used to identify subjects whom the observers find difficult to rate.
Abstract: General chance-corrected measures of agreement on individual subjects, for several observers using nominal or ordinal categories, are developed. The subject-specific measures can be used to identify subjects whom the observers find difficult to rate. The relationship of the subject-specific measures to a general chance-corrected measure of agreement for a group of subjects is demonstrated. By suitable choices of disagreement functions, the measure of agreement for a group of subjects is shown to include, as special cases, many of the kappa-like statistics. Also, it is asymptotically equivalent to various intraclass correlation coefficients. The measures do not require that the observers all use the classification scale in the same way. The asymptotic null and non-null variances obtained by Taylorseries approximations for the statistics are presented. The application of the measures is illustrated by data obtained when seven pathologists classified slides on a five-point ordinal scale for the diagnosis of carcinoma in situ of the uterine cervix.

75 citations


Journal ArticleDOI
TL;DR: It was shown that inter-rater reliability levels were also good to excellent for the categorical diagnosis of personality disorder and demonstrated that abnormal personality can be reliably assessed by both British and American raters.
Abstract: The inter-rater reliability of a schedule used to assess personality disorders was examined. The Personality Assessment Schedule (PAS) involves an interview with both the patient and a close informant and the ratings for the informant are given most weight in the final scoring. Videotaped interviews with 23 psychiatric patients, most of whom had a clinical diagnosis of personality disorder, and a close informant were scored by seven raters, four in the United Kingdom and three in the United States. Overall inter-rater reliabilities (using the intraclass correlation coefficient, RI) were generally good to excellent for each of the 24 personality variables tested, ranging between .66 and .94 for informants and between .51 and .91 for subjects. Corresponding reliability coefficients for overall mean PAS scores were .82 and .75, respectively. Consistent with these findings, there was little bias between the scores of American and British raters, although there was some tendency for American raters to score higher for the trait of eccentricity and lower for the trait of conscientiousness than was true for British raters. There was less bias for informants' ratings than for those of subjects. In a second set of analyses, it was shown that inter-rater reliability levels (using the Kappa statistic) were also good to excellent (.6 to .8) for the categorical diagnosis of personality disorder. These results, taken together, demonstrate that abnormal personality can be reliably assessed by both British and American raters.

66 citations


Journal ArticleDOI
TL;DR: In this article, the problem of estimating the intraclass correlation when the sampling design is unbalanced is discussed and the method of moments is used to derive an approximation to the distribution of the estimate of the intra-class correlation obtained by the variance components approach.
Abstract: The problem of estimating the intraclass correlation when the sampling design is unbalanced is discussed. The method of moments is used to derive an approximation to the distribution of the estimate of the intraclass correlation obtained by the variance components approach. The maximum likelihood estimator is also presented, along with a simple procedure due to Richard (1961), for numerically maximizing a likelihood function of several parameters. Finally, the issue of optimal study design is considered for both balanced and unbalanced situations. For given power we determine the number of sets required to detect different values of the intraclass correlation.

15 citations


Journal ArticleDOI
TL;DR: In this article, a variety of measures of reliability for two-category nominal scales are reviewed and compared, and it is shown that upon correcting these indices for chance agreement, there are only five distinct indices: Fleiss's modification of A1, the φ coefficient, Cohen's kappa, and two intraclass coefficients.
Abstract: A variety of measures of reliability for two-category nominal scales are reviewed and compared. It is shown that upon correcting these indices for chance agreement, there are only five distinct indices: Fleiss's modification of A1, the φ coefficient, Cohen's kappa, and two intraclass coefficients. Additional derivations indicate that when marginals are held constant, all but one of the measures are linear functions of agreement and, thus, of one another. In particular, they are equal once the maximum obtainable values for a given data set are equated. The single exception is an intraclass correlation that explicitly includes variation due to observer mean differences as part of the error variance. This index is dependent on sample size; moreover, as the number of subjects increases, this index approaches the kappa coefficient as a limit. Recommendations for choosing an index of agreement are made based on definitions, magnitude, convenience, and consistency.

11 citations



Journal ArticleDOI
TL;DR: In this paper, the authors generalized the pure error-lack-of-fit test to the case of nonreplication and error structure for certain known positive definite correlation matrices V. The critical points of the F distribution were used to provide a test of the exact desired size.
Abstract: The well known pure error-lack of fit test which can be used to assess the adequacy of a proposed linear regression model requires replication and assumes that the error structure is . This procedure is generalized to provide a test for lack of fit for the 2 case of nonreplication and error structure for certain known positive definite correlation matrices V. Included in the class of applicable correlation matrices are the cases of intraclass correlation and equicorrelation. The critical points of the F distribution can be used to provide a test of the exact desired size.

8 citations


Journal ArticleDOI
TL;DR: In this paper, the τb and y statistics are interpreted as rank-monotonic coefficients of partial agreement, and the τbi and yi intraclass coefficients of total monotonic agreement are created.
Abstract: The τb and y statistics are interpreted as rank-monotonic coefficients of partial agreement. Using a method of transposition employed by Pearson's ri intraclass correlation coefficient, the τbi and yi intraclass coefficients of total monotonic agreement are created. Transpositional measures of agreement like τbi and τi measure the combined effects of cell and marginal disagreement which make them particularly suitable for reliability studies. The coefficients are also made applicable to K > 2 sets of ranks.

4 citations


Journal ArticleDOI
TL;DR: The limb load monitor can be used by clinicians to measure the five step components and three gait measures (ambulation time, velocity, and cadence) with high measurement reliability.
Abstract: This study assessed the reliability of measurements made by four physical therapists on healthy subject gait data recorded from the Krusen limb load monitor. The five components of step (stance time, time up, time to second peak, and force at the first and second peaks) were analyzed. Six components contributing to gait (ambulation time; velocity; cadence; average swing phase duration, left lower extremity; average swing phase duration, right lower extremity; and ratio of unilateral weight bearing, right lower extremity to left lower extremity) were also analyzed. Intraclass correlation coefficients for the five step components and the gait measures of ambulation time, velocity, and cadence showed high measurement reliability. The other measures of gait showed low intraclass correlation coefficients. The limb load monitor can, therefore, be used by clinicians to measure the five step components and three gait measures (ambulation time, velocity, and cadence) with high measurement reliability.

Journal ArticleDOI
TL;DR: Four faculty members in removable prosthodontics participated in a three-phase rater training program and nine complete dentures were given an overall rating by three outside experts to obtain an accuracy measure.
Abstract: Four faculty members in removable prosthodontics participated in a three-phase rater training program Nine complete dentures were given an overall rating by three outside experts to obtain an accuracy measure A three-point rating scale was used: R (cannot be appreciably improved), S (clinically acceptable), and T (clinically unacceptable) During the pretraining phase, the average interrater reliability was 57 as estimated by intraclass correlation, and the mean accuracy correlation was 76 The rater training phase consisted of a four-hour session including presentations and discussions of rating terminology and formats, observer accuracy, and related issues In the post-training phase, the same nine complete dentures were rated by the four faculty members, using the same three-point rating scale and a five-point scale The average post-training interrater reliabilities were 56 and 76 on the three- and five-point scales, respectively The mean accuracy correlation was 78