scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 1992"


Journal ArticleDOI
TL;DR: The PAR Index offers uniformity and standardization in assessing the outcome of orthodontic treatment and is flexible in that the weightings could be changed to reflect future standards and standards currently being achieved in other countries.
Abstract: The PAR Index has been developed to provide a single summary score for all the occlusal anomalies which may be found in a malocclusion. The score provides an estimate of how far a case deviates from normal alignment and occlusion. The difference in scores between the pre- and post-treatment cases reflects the degree of improvement and, therefore, the success of treatment. Excellent reliability was exhibited within and between examiners (Intraclass Correlation Coefficient, R greater than 0.91). The components of the PAR Index have been weighted to reflect current British orthodontic opinion and is flexible in that the weightings could be changed to reflect future standards and standards currently being achieved in other countries. The PAR Index offers uniformity and standardization in assessing the outcome of orthodontic treatment.

705 citations


Journal ArticleDOI
TL;DR: In 58 subjects with stable asthma, good short term test-retest reliability was demonstrated and weak correlations in the expected direction were seen with three medical markers of asthma severity, supporting the construct validity of the questionnaire and emphasizing that quality of life represents a separate dimension of asthma.

386 citations


Journal ArticleDOI
TL;DR: The 2-page RADAR questionnaire produces valid estimates of joint count and clinical status that are sensitive to change and is suitable for use in patients with rheumatoid arthritis.
Abstract: Objective This study documents the measurement properties of a brief, self-administered questionnaire of disease signs and symptoms in patients with rheumatoid arthritis. Methods The Rapid Assessment of Disease Activity in Rheumatology (RADAR) questionnaire assesses joint pain/tenderness and clinical status. One hundred ninety-three pairs of RADAR forms were completed by 45 subjects and their assigned clinician evaluators. Results Subject-clinician agreement (intraclass correlation coefficients [ICC]) for joint pain/tenderness and clinical status ranged from 0.52 to 0.87 (P = 0.0001), with 83% greater than or equal to 0.65. The ICC for change in joint scores over 6 months was 0.83 (P = 0.0001). Conclusion The 2-page RADAR questionnaire produces valid estimates of joint count and clinical status that are sensitive to change.

150 citations


Journal ArticleDOI
TL;DR: A study comparing the inter-observer reliability of these two approaches to quantify the extent of skin involvement in scleroderma using the ratings of six clinicians on 12 patients found the score method was the preferred method.
Abstract: Two methods have been proposed to quantify the extent of skin involvement in scleroderma. These are (1) a scoring system which quantifies and summates this severity rating in 17 areas of skin surface and (2) a method estimating the percentage of skin involvement using a shaded manikin. We report on a study comparing the inter-observer reliability of these two approaches using the ratings of six clinicians on 12 patients. Systematic bias between observers was noted with both methods, but inter-observer agreement, as-assessed by the intraclass correlation coefficient (ICC), was higher with the score method. The manikin method resulted in a greater degree of disagreement between the observers, as well as a higher amount of random error, reflecting the difficulty of defining the bounds of abnormal skin. Despite the presence of bias, the score method is the preferred method for assessing the level of skin involvement.

107 citations


Journal ArticleDOI
TL;DR: It is suggested that the components of the APACHE II score can be collected reliably and Reproducibility of the age variable suggests that age is accurately abstracted.

70 citations


Journal ArticleDOI
Bolognese Ja1, Kozloff Rc1, Kunitz Sc1, Placido Grino1, Patrick Dl1, Elizabeth Stoner1 
TL;DR: A questionnaire to assess the effect of finasteride on symptoms of benign prostatic hyperplasia (BPH) by modifying that of Boyarsky (1977) is developed and validated, and responsiveness was shown by the 3.7‐point mean TSS improvement in response to TURP, which was significantly different from the near‐zero changes in the other groups.
Abstract: We developed a questionnaire to assess the effect of finasteride on symptoms of benign prostatic hyperplasia (BPH) by modifying that of Boyarsky (1977). To validate the questionnaire, a cohort study was conducted in 2 groups of patients with BPH and 3 control groups without BPH. The BPH groups were: (1) 34 patients before TURP (transurethral resection of the prostate), average age 68 years; (2) 65 patients after TURP, average age 68 years; (3) 40 patients after other nonserious nonurological surgery, average age 50 years; (4) 14 healthy non-BPH volunteers, average age 58 years; and (5) 73 healthy non-BPH volunteers, average age 37 years. The questionnaire was administered once to all subjects, and a subset responded to a second administration. Mean total symptoms scores (TSS) from the initial questioning were 6.4, 3.2, 2.9, 2.6, and 1.6 for the 5 groups, respectively (pooled SD = 3.3); mean total troublesome symptoms scores (TTSS) were 4.8, 2.1, 1.4, 1.1, and 0.6, respectively (pooled SD = 2.2). All other groups were significantly less symptomatic and troubled than the pre-TURP group, and all surgical groups were significantly more so than the younger volunteer group. These data demonstrate the discriminant validity of the questionnaire. Corroborating prior data [Gregg et al., 1990], responsiveness was shown by the 3.7-point mean TSS improvement in response to TURP, which was significantly different from the near-zero changes in the other groups. Reproducibility was shown by kappa statistics being nearly all greater than 0.75 and an intraclass correlation coefficient of 0.64; construct validity and reliability were demonstrated by correlation (r = 0.7) with a general urination problems question; and internal consistency was documented by Cronbach's alpha values of approximately 0.6. We conclude that this questionnaire is a useful and validated tool for assessing BPH symptoms.

66 citations


Journal ArticleDOI
11 Nov 1992-JAMA

66 citations


Journal ArticleDOI
TL;DR: This work compares by simulation the moment method and the more standard ANOVA method of estimating the intraclass correlation and finds the former is less biased for a small to moderate number of clusters but the difference disappears when the appropriate degree of freedom is used for the ANOVA estimator.
Abstract: In group randomized studies, the sample size calculations are complicated by within group (worksite, community, etc.) correlation. We compare by simulation the moment method and the more standard ANOVA method of estimating the intraclass correlation. We find the former is less biased for a small to moderate number of clusters but the difference disappears when the appropriate degree of freedom is used for the ANOVA estimator. We propose a simulation approach for sample size determination and illustrate it with an example.

65 citations


Journal ArticleDOI
TL;DR: This work presents a computer simulation experiment where the correlation between sets of randomly generated numbers is calculated and shows that the average maximum linear correlation for randomly generated Numbers is 0.70 or higher if the sample size is low compared to the number of variables.
Abstract: Many software measures have been forwarded on the simple basis of a high linear correlation coefficient with some measurable quantities The linear correlation coefficient is an unreliable statistic for deciding whether an observed correlation indicates significant association Several published software measure experiments collected more than 20 different measurements, or have 14 or fewer observations With considerable data from small samples, the probability of ‘discovering’ a ‘significant’ correlation is high We present a computer simulation experiment where the correlation between sets of randomly generated numbers is calculated We also look at randomly generated numbers in the ranges that would be expected in Halstead's Software Science [1] measures Our results show that the average maximum linear correlation for randomly generated numbers is 070 or higher if the sample size is low compared to the number of variables Alternative statistical approaches to obtaining meaningful significant results are presented

64 citations


Journal ArticleDOI
TL;DR: It is suggested that, in general, women recall Gestational age well, which supports the use of gestational age derived from maternal interviews, and there was greater misclassification of prematurity in the controls than in the cases.
Abstract: Agreement between maternal interview- and medical record-based gestational age was assessed by using data from a case-control study of childhood strabismus The sample consisted of 383 cases of strabismus and their age-matched controls, diagnosed between 1985 and 1986 in Baltimore, Maryland, who were under age 7 years when diagnosed Medical record-based gestational age was derived, in order of priority, from early ultrasound examination, time from the last menstrual period, pediatric examination, and obstetric examination The intraclass correlation coefficient, kappa, and mean difference were used to compare agreement between maternal interview- and medical record-based gestational age by maternal and pregnancy characteristics and characteristics related to study design Overall, 86 percent of mothers were within 2 weeks of the gestational age reported in the medical record The intraclass correlation coefficient comparing maternal and medical record-based gestational age was 083 (95% confidence interval 080-086) Agreement was positively associated with shorter length of recall, low birth order, and having a neonatal illness related to prematurity Agreement was poor among mothers of healthy preterm infants There was a weak positive association between recall and some sociodemographic covariates There was greater misclassification of prematurity in the controls than in the cases The results suggest that, in general, women recall gestational age well, which supports the use of gestational age derived from maternal interviews

42 citations


Journal ArticleDOI
TL;DR: It is important for researchers to become familiar with the various forms of intraclass correlations and to report the version used in their calculations and the rationale for their choice.
Abstract: Intraclass correlation coefficients are useful statistics for estimating interrater reliability. The ICC provides a means for quantifying the level of rater agreement as well as rater consistency. The ICC is easier to use than the Pearson r when more than two raters are involved and can be computed when data are missing on some subjects (Haggard, 1958). Use of this statistic allows the researcher to decide whether or not to include rater effects in estimating IRR and to determine the precision of the reliability estimate. Information about the various types of intraclass correlations and their use is frequently absent from psychometric references commonly used by nurse researchers, resulting in confusion about correct usage and interpretation. Because different values are obtained depending on which ICC formula is selected, ICC formulae reported in the literature can have varying interpretations. For this reason, it is important for researchers to become familiar with the various forms of intraclass correlations and to report the version used in their calculations and the rationale for their choice.

Journal ArticleDOI
01 Mar 1992-Spine
TL;DR: The interexaminer reliability of an inclinometer procedure to measure lumbar rotation was evaluated by two chiropractic clinicians who examined 25 chronic (>6 months) low- back pain patients and 25 subjects without low-back pain.
Abstract: The interexaminer reliability of an inclinometer procedure to measure lumbar rotation was evaluated by two chiropractic clinicians who examined 25 chronic (greater than 6 months) low-back pain patients and 25 subjects without low-back pain. These groups were compared for differences in mean left, right, and total rotation. Patients who had lumbar spinal surgery were excluded. Twenty-eight men and 22 women, ranging in age from 28-38 years, were evaluated. Reliability between examiners was evaluated by Pearson's correlation coefficient and the intraclass correlation coefficient. All coefficients were significant (P less than 0.01). Errors in prediction and examiner disagreement were evaluated by the standard error of estimate and the interexaminer measurement error. The standard errors of estimate (range: 1.4-4.4) and the interexaminer measurement errors (range: 3.8-10.4) were large compared to the scale of measurement. An analysis of variance of differences between the chronic low-back pain patients and asymptomatics revealed significantly more left rotation in the asymptomatic subjects (F = 8.4; df = 1; P less than 0.006). Also, there was significantly more total rotation in the asymptomatic subjects (F = 4.143; df = 1; P less than 0.048). However, because of the large error attributed to this procedure, it is not possible to say whether the difference between the two groups is a result of the large error or some "real" difference. Therefore, the procedure described in this study should not be used as a clinical outcome measure.

Journal ArticleDOI
TL;DR: The reliability of maximal upper and lower lip closing forces measured using a strain-gauged cantilever beam assembly is determined and helpful information is yielded for the design of investigations of oral-motor weakness and for the quantitative assessment of an individual's clinical status.
Abstract: This study determined the reliability of maximal upper and lower lip closing forces measured using a strain-gauged cantilever beam assembly. An intraclass correlation approach was used to explicitl...

Journal ArticleDOI
TL;DR: Age at onset, analyzed as a continuous variable with the intraclass correlation method, was found to be correlated in siblings, suggesting that the search for continuous traits distributed in families of schizophrenic patients might constitute an alternative to discrete category-based family studies.
Abstract: This study examines the concordance of clinical subtypes and age at onset of schizophrenia in 42 sibships of multiply affected schizophrenic patients. Subtypes were defined by four major diagnostic systems ( DSM-III, DSM-III-R, ICD-10 , and Tsuang-Winokur criteria) and rated both for the first hospitalization and long-term diagnosis. When a sibship method was used, no concordance for subtypes was found in siblings. Age at onset, analyzed as a continuous variable with the intraclass correlation method, was found to be correlated in siblings. This finding suggests that the search for continous traits distributed in families of schizophrenic patients might constitute an alternative to discrete category- based family studies.

Journal ArticleDOI
TL;DR: Examination of the ability of a video-based, computer-interfaced motion analysis system to provide reliable data found that measurement of all variables was highly reliable and recommended using a mean of three trials for angular velocity variables.
Abstract: The purpose of this study was to examine the ability of a video-based, computer-interfaced motion analysis system to provide reliable data. Ten subjects with no significant orthopedic or neurological dysfunction and ranging in age from 22 to 45 years (mean = 29.6, SD = 7.8) were tested. Retroreflective markers were placed on the posterior shank and foot of each subject. Footswitches were attached to the plantar forefoot and rear foot. A video camera was placed behind the subject, and video data were collected while the subject walked on a treadmill. One representative gait cycle for each subject was selected and processed 10 times with a video processor and analysis software. Three intraclass correlation coefficients (ICCs) were calculated for variables generated by the analysis software, one for two individual measures and one each for the mean of three and five repeated measures. Except for temporal variables, processing data introduced additional variability into the measurement process, particularly for angular velocity data. Measurement of all variables was highly reliable (ICC values greater than or equal to .95) when based on the mean of at least three repeated measures. Although a single measure of temporal and angular position variables may be considered reliable, we recommend using a mean of three trials for angular velocity variables. Additional research is needed to determine tester and subject variability and validity of the measures.

Journal ArticleDOI
TL;DR: The hand-held dynamometer appears to warrant use and further investigation with pediatric populations, and might be obtained by supporting the lower extremity during hip extension tests; padding the dynamometer end pieces; and using a smaller, digital dynamometer.
Abstract: The long-term stability of hand-held dynamometric measurements was assessed in 30 muscle groups of 12 children with myelomeningocele, before and after a 23-day interval. Measurements from a majority of the muscle groups had excellent stability, based on statistical indicators of association (Pearson Product-Moment Correlation Coefficients, r =.76-.98) and agreement (intraclass correlation coefficients, ICC = .75-.99). Muscle groups with lower long-term stability were the right and left wrist extensors and flexors, the left hip adductors and extensors, the left knee flexors, and the right and left knee extensors. Upper-extremity muscle groups had higher long-term stability than did lower-extremity muscle groups. The results indicate that the dynamometric measurements were highly reliable when the test-retest interval was 23 days. Other researchers have previously shown high reliability for these measurements over shorter periods of time. Improved reliability might be obtained by supporting the lower extremity during hip extension tests; padding the dynamometer end pieces, especially when testing over bony prominences; and using a smaller, digital dynamometer. The hand-held dynamometer appears to warrant use and further investigation with pediatric populations.

Journal ArticleDOI
TL;DR: A reproducibility study of four 24-hour dietary recalls and four biochemical assessments of nutritional status in a group of women in Alabama found plasma beta-carotene levels to be moderately correlated with dietary vitamin A and beta-Carotene and the mean values ranged from 0.3 to 0.4 for most nutrients.
Abstract: We conducted a reproducibility study of four 24-hour dietary recalls (N = 224) and four biochemical assessments of nutritional status (N = 265) in a group of women in Alabama. For 24-hour recalls, the variance component ratios were all greater than 1, and the intraclass correlation coefficients ranged from 0.16 to 0.27 for macronutrients, and from 0.09 to 0.37 for vitamins and minerals. The intraclass correlation coefficients for biochemical assessments ranged between 0.39 and 0.74 with corresponding variance component ratios of 1 or below for most nutrients. The correlation coefficients between the food frequency questionnaire on the usual dietary intake during the year preceding the beginning of study and the mean values of four 24-hour dietary recalls administered at the initial visit and again after 2, 4, and 6 months ranged from 0.3 to 0.4 for most nutrients. We found plasma beta-carotene levels to be moderately correlated with dietary vitamin A (r = 0.20) and beta-carotene (r = 0.22).

Journal ArticleDOI
TL;DR: Five estimators are compared: analysis of variance (ANOVA), concentrated ANOVA, truncated ANOVA and two maximum likelihood-like (ML) estimators; the results indicate that the ANOVA estimator performs well except for designs with family size n = 2.
Abstract: At least two common practices exist when a negative variance component estimate is obtained, either setting it to zero or not reporting the estimate. The consequences of these practices are investigated in the context of the intraclass correlation estimation in terms of bias, variance and mean squared error (MSE). For the one-way analysis of variance random effects model and its extension to the common correlation model, we compare five estimators: analysis of variance (ANOVA), concentrated ANOVA, truncated ANOVA and two maximum likelihood-like (ML) estimators. For the balanced case, the exact bias and MSE are calculated via numerical integration of the exact sample distributions, while a Monte Carlo simulation study is conducted for the unbalanced case. The results indicate that the ANOVA estimator performs well except for designs with family size n = 2. The two ML estimators are generally poor, and the concentrated and truncated ANOVA estimators have some advantages over the ANOVA in terms of MSE. However, the large biases may make the concentrated and truncated ANOVA estimators objectionable when intraclass correlation (ϱ) is small. Bias should be a concern when a pooled estimate is obtained from the literature since ϱ<0.05 in many genetic studies.

Journal ArticleDOI
TL;DR: Four commercial assay kits to measure serum ferritin were compared to establish the degree of agreement and interchangeability between the different techniques based on the intraclass correlation coefficient, suggesting that the methods were interchangeable.
Abstract: Four commercial assay kits to measure serum ferritin were compared to establish the degree of agreement and interchangeability between the different techniques based on the intraclass correlation coefficient ( r i). Radioimmunoassay, microparticle enzyme immunoassay, enzyme-linked immunosorbent assay, and chemiluminescent immunoassay systems were used. The Pearson product-moment correlation factor ( r ) between any two methods was at least 0.98, and the intercept of the regression equations ranged from –0.613 to 3.797, indicating that the methods were comparable. Furthermore, the intraclass correlation coefficient ( r i) was at least 0.98, suggesting that the methods were interchangeable.


Journal ArticleDOI
TL;DR: It is found that when the disease prevalence is rare or moderate, use of the multiple reading procedure with a unanimity rule is effective in increasing the positive predictive value of a single reading procedure for the situation in which the variation of responses among different subjects and the intraclass correlation among repeated tests are small.
Abstract: The use of multiple reading procedures to improve the performance of a diagnostic test occurs often in practice. Evaluation of the utility of multiple reading procedures, however, usually ignores the effect of the intraclass correlation. This paper provides a quantitative assessment of this effect in the multiple reading procedure with a unanimity rule with respect to sensitivity, specificity, positive and negative predictive values. We have found that when the disease prevalence is rare or moderate (less than or equal to 0.20), use of the multiple reading procedure with a unanimity rule is effective in increasing the positive predictive value of a single reading procedure for the situation in which the variation of responses among different subjects and the intraclass correlation among repeated tests are small. This is, however, not true for the situation in which the disease is rare and the variation of responses among different subjects is large, even when the intraclass correlation is small or 0. Furthermore, when the disease is rare and the variation of responses among subjects is small, a small or moderate intraclass correlation can substantially decrease the positive predictive value that one calculates under the assumption that the intraclass correlation is equal to 0. In general, when the disease is rare or moderate (less than or equal to 0.20), the intraclass correlation between repeated tests and the variation of responses among subjects have little effect on the negative predictive value.

Journal ArticleDOI
TL;DR: It seems unlikely that reliable inference about heterogeneity of genetic variances or heritabilities between individual herds from daily cattle field data can be made.

Journal ArticleDOI
TL;DR: A significant environmental influence on cognitive aging in later adulthood is indicated in MZ and DZ twins.
Abstract: Monozygotic (MZ) and dizygotic (DZ) twins in later adulthood were studied in order to examine genetic and environmental contributions to the decline of cognitive performance. In this study, 118 twin pairs took a comprehensive medical examination at a university hospital. Cognitive function was measured by the Wechsler Adult Intelligence Scale (WAIS). The intraclass correlation coefficients on Digit Span (D) and Digit Symbol (DS) subtests of the WAIS did not show any significant difference between MZ and DZ twins although Block Design (BD) showed a significant difference. The values of the intraclass correlation coefficients were mostly around 0.5 and showed significant within-pair similarity of test scores. The mean score of D, DS and BD declined with advancing age. The intraclass correlation coefficients for D, DS and BD were around 0.2 in the MZ twins reared apart, and around 0.6 in the MZ twins reared together. These results indicated a significant environmental influence on cognitive aging in later adulthood.

Journal Article
TL;DR: The confirmed good reliability of the analyzed parameters enables us to employ BAEPs as useful monitoring instruments in longitudinal studies and calculates the upper normal limits of the test-retest variability of each parameter.
Abstract: Brainstem Auditory Evoked Potentials (BAEPs) are increasingly used in longitudinal evaluation of brainstem function. The reliability of the neurophysiological parameters and definition of the normal test-retest variability are required for such investigations. In the present study we submitted 20 healthy volunteers (10 males and 10 females; mean age 35.1 years, range 24-49) to BAEPs in two sessions separated by seven days. The reliability of the parameters was estimated by means of intraclass correlation coefficient (R). All BAEP parameters showed excellent R values (above 0.75). In addition, the confidence interval lower limits of all R coefficients had good to excellent values. Finally we computed the upper normal limits of the test-retest variability of each parameter, with alpha = 0.01, using the within-subjects mean square. The confirmed good reliability of the analyzed parameters enables us to employ BAEPs as useful monitoring instruments in longitudinal studies.

Journal ArticleDOI
TL;DR: It can be concluded that this instrument, previously demonstrated to quantify patient progress, is also reliable both in intra- and inter-rater dimensions.
Abstract: The intra- and inter-rater reliability of a motor function evaluation of stroke patients, based on the Bobath approach, was studied. The intraclass correlation coefficient (ICC) was used to determine the degree of agreement between repeated measurements on the same patient taken by the same rater and between measurements taken by three raters on the same patient. In the intra-rater study, each of 19 patients was evaluated in three different sessions by one of 19 raters. In the inter-rater study 18 patients were each evaluated by three different raters. The intra-rater data were highly reliable, with ICCs of 0.95 and 0.97 for the upper and lower limbs respectively. For the inter-rater study, the ICCs were 0.79 and 0.77 for the upper and lower limbs respectively. It can therefore be concluded that this instrument, previously demonstrated to quantify patient progress, is also reliable both in intra- and inter-rater dimensions.

Journal ArticleDOI
TL;DR: The hand-held dynamometer appears to warrant use and further investigation with pediatric populations, and might be obtained by supporting the lower extremity during hip extension tests; padding the dynamometer end pieces; and using a smaller, digital dynamometer.
Abstract: The long-term stability of hand-held dynamometric measurements was assessed in 30 muscle groups of 12 children with myelomeningocele, before and after a 23-day interval. Measurements from a majority of the muscle groups had excellent stability, based on statistical indicators of association (Pearson Product-Moment Correlation Coefficients, r =.76-.98) and agreement (intraclass correlation coefficients, ICC = .75-.99). Muscle groups with lower long-term stability were the right and left wrist extensors and flexors, the left hip adductors and extensors, the left knee flexors, and the right and left knee extensors. Upper-extremity muscle groups had higher long-term stability than did lower-extremity muscle groups. The results indicate that the dynamometric measurements were highly reliable when the test-retest interval was 23 days. Other researchers have previously shown high reliability for these measurements over shorter periods of time. Improved reliability might be obtained by supporting the lower extremity during hip extension tests; padding the dynamometer end pieces, especially when testing over bony prominences; and using a smaller, digital dynamometer. The hand-held dynamometer appears to warrant use and further investigation with pediatric populations.