scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 1994"


Journal Article
TL;DR: It is concluded that the 7-level FIM is reliable when used by trained/tested inpatient medical rehabilitation clinicians.
Abstract: The Functional Independence Measure (FIM) is an 18-item, 7-level scale developed to uniformly assess severity of patient disability and medical rehabilitation functional outcome. FIM interrater reliability in the clinical setting is reported here. Clinicians from 89 US inpatient comprehensive medical rehabilitation facilities newly subscribing to the uniform Data System for Medical Rehabilitation from January 1988-June 1990 evaluated 1018 patients with the FIM. FIM total, domain and subscale score intraclass correlation coefficients (ICC) were calculated using ANOVA; FIM item score agreement was assessed with unweighted Kappa coefficient. Total FIM ICC was 0.96; motor domain 0.96 and cognitive domain 0.91; subscale score range: 0.89 (social cognition) to 0.94 (self-care). FIM item Kappa range: 0.53 (memory) to 0.66 (stair climbing). A subset of 24 facilities meeting UDSMR data aggregation reliability criteria had Intraclass and Kappa coefficients exceeding those for all facilities. It is concluded that the 7-level FIM is reliable when used by trained/tested inpatient medical rehabilitation clinicians.

663 citations


Journal ArticleDOI
TL;DR: It is shown that care must be taken in choosing a suitable ICC with respect to the underlying sampling theory, and a decision tree is developed that may be used to choose a coefficient which is appropriate for a specific study setting.
Abstract: In general, intraclass correlation coefficients (ICC's) are designed to assess consistency or conformity between two or more quantitative measurements. They are claimed to handle a wide range of problems, including questions of reliability, reproducibility and validity. It is shown that care must be taken in choosing a suitable ICC with respect to the underlying sampling theory. For this purpose a decision tree is developed. It may be used to choose a coefficient which is appropriate for a specific study setting. We demonstrate that different ICC's may result in quite different values for the same data set, even under the same sampling theory. Other general limitations of ICC's are also addressed. Potential alternatives are presented and discussed, and some recommendations are given for the use of an appropriate method.

627 citations


Journal ArticleDOI
TL;DR: The purpose of this study was to identify which of three EMG normalization values provided the most reproducible data set and to propose the best way to select the common value through a reliability approach.

272 citations


Journal ArticleDOI
TL;DR: Results indicate the CARS-M is both a reliable and valid measure of the severity of manic symptomatology, which incorporates a number of methodological improvements leading to greater precision and clinical utility.

183 citations


Journal ArticleDOI
TL;DR: Judgments of stiffness made by experienced manipulative physical therapists examining patients in their own clinics were found to have poor reliability, whereas pain judgments had good reliability.
Abstract: Background and Purpose. The purpose of this study was to determine the intertherapist reliability of judgments of stiffness and pain at L-1 to L-5 made using posteroanterior (PA) central pressure testing. Subjects. Three pairs of manipulative physical therapists with a minimum of 5 years of experience were asked to rate pain and stiffness in a total of 90 patients with low back pain. Methods. Each pair of therapists assessed 30 patients within their own clinic, using their preferred technique to perform an examination using the PA central pressure test at the five lumbar levels. Each pair of therapists recorded their ratings of pain and stiffness. Reliability of judgments was evaluated by intraclass correlation coefficients (ICC) and percentage of exact agreement scores. Results. The ICC values for pain judgments for the group as a whole ranged from .67 to .72, with agreement scores ranging from 31% to 43%. The ICC values for stiffness judgments ranged from .03 to .37, with agreement scores ranging from 21% to 29%. Conclusion and Discussion. Judgments of stiffness made by experienced manipulative physical therapists examining patients in their own clinics were found to have poor reliability, whereas pain judgments had good reliability. Further investigation of this test is required in order to develop a more reliable method of assessing PA stiffness.

177 citations


Journal ArticleDOI
04 Jun 1994-BMJ
TL;DR: A note on correlation, regression, and repeated measures draws attention to the problem of calculating a correlation coefficient based on repeated observations of the same subject but falls prey to the common misjudgment of placing too much importance on the significance of the correlation coefficient.
Abstract: EDITOR, - J Martin Bland and Douglas G Altman's note on correlation, regression, and repeated measures draws attention to the problem of calculating a correlation coefficient based on repeated observations of the same subject.1 But they fall prey themselves to the common misjudgment of placing too much importance on the significance of the correlation coefficient. The basic problem is the way in which significance is calculated as the correlation coefficient is strongly affected by the number of cases for which there are pairs of data. If, for example, you have about 500 cases the correlation coefficient needs only to be 0.088 to be significant at …

149 citations


Journal ArticleDOI
TL;DR: Reliability appears to be high in both the independent SIP68 as well as the extracted SIP136, indicating that the SIP 68 may very well serve as a generic alternative to the Sip136.

143 citations


Journal ArticleDOI
TL;DR: It is concluded that this standardized testing protocol produces reliable measurements of muscle strength and functional ability in subjects with FSHD.
Abstract: Background and Purpose. The natural history of facioscapulohumeral muscular dystrophy (FSHD) has not been studied prospectively. Knowledge of the natural progression of any disease provides essential information for the design of clinical trials. We present a protocol for the study of the natural history of FSHD using quantitative muscle testing (QMT), manual muscle testing (MMT), and functional testing. Subjects. Thirty-two persons with FSHD (mean age=36.1 years, SD=9.6, range=17–49) and 32 age- and gender-matched volunteer controls (mean age=35.8 years, SD=8.0, range=23–50) served as subjects. Methods. Using standardized testing procedures, we examined intrarater reliability of the MMT, QMT, and functional testing measurements in both groups. We also examined interrater reliability in 7 subjects with FSHD. Eighteen muscle groups were tested for each subject using QMT and MMT. Results. Intraclass correlation coefficient (ICC) values ranged from .86 to .99 for intrarater reliability and from .86 to .99 for interrater reliability of QMT measurements. Weighted kappa values of .81 to .98 for intrarater reliability and .50 to 1.00 for interrater reliability were obtained for MMT measurements. Intrarater ICCs for various functional testing measures ranged from .60 to .97. In addition, the comparability of the two QMT machines used in the study was demonstrated by testing the same set of volunteer controls on each machine's linear force transducer (ICC=.89–.98). Conclusion and Discussion. We conclude that this standardized testing protocol produces reliable measurements of muscle strength and functional ability in subjects with FSHD.

128 citations


Journal ArticleDOI
TL;DR: The PPME, an observer‐administered, performance‐based instrument assessing 6 domains of physical functioning and mobility for hospitalized elderly is developed and validated.
Abstract: OBJECTIVE: To develop and validate the Physical Performance and Mobility Examination (PPME), an observer-administered, performance-based instrument assessing 6 domains of physical functioning and mobility for hospitalized elderly. DESIGN: Development of a pass-fail and 3-level scoring system and training manuals for the PPME instrument for use in both clinical and research settings. Two patient samples were used to assess construct validity and interrater reliability of the PPME. A third sample was selected to assess the test-retest reliability of the instrument. SETTING/PATIENTS: (1) 146 subjects ≥65 years of age with impaired mobility admitted to Medical Units of Stanford University Hospital. (2) 352 subjects ≥65 admitted to acute Medical and Surgical Services of the Palo Alto VA Medical Center. Patient samples were obtained during hospitalization and followed until 3 months post-discharge. To study test-retest reliability, 50 additional patients, whose clinical condition was stable, were selected from both settings. METHODS: An expert panel selected 6 mobility tasks integral to daily life: bed mobility, transfer skills, multiple stands from chair, standing balance, step-up, and ambulation. Tasks were piloted with frail hospitalized subjects for appropriateness and safety. Test-retest and interrater reliability and construct validity were evaluated. Construct validity was tested using the Folstein Mini-Mental State Examination, Activities of Daily Living (ADL), Instrumental Activities of Daily Living (IADL), Geriatric Depression Scale, and modified Medical Outcomes Study Measure of Physical Functioning (MOS-PFR). Two scoring schema were developed for each task: (1) dichotomous pass-fail and (2) 3-level high pass, low pass, and fail. A summary scale was developed for each method of scoring. MAIN RESULTS: High interrater reliability and intrarater reliability were demonstrated for individual tasks. The mean percent agreement (interrater) for each pass/fail task ranged from 96 to 100% and from 90 to 100% for the 3 pairs of raters for each task using the 3-level scoring. Kappas for individual pairs of raters ranged from .80 to 1.0 for pass-fail scoring and from .75 to 1.0 for 3-level scoring (all P < 0.01). Intraclass correlation coefficients for 3-level scoring by pairs of raters ranged from .66 to 1.0. For summary scales, the mean intraclass correlation was .99 for both scoring schema. Test-retest reliability for summary scales using kappa coefficients was .99 for both pass-fail and 3-level scoring, and .99 and .98, respectively, using Pearson Product Moment Correlation. Correlations of PPME with other instruments (construct validity) suggest that the PPME adds a unique dimension of mobility beyond that measured by self-reported ADLs and physical functioning, and it is not greatly influenced by mood or mental status (r = 0.70 (ADL), r = 0.43 (IADL), r = 0.36 (MMSE), r = 0.71 (MOS-PFR), r = 0.23 (GDS)). The 3-level summary scale was sensitive to the variability in the patient population and exhibited neither ceiling nor floor effects. CONCLUSIONS: The PPME is a reliable and valid performance-based instrument measuring physical functioning and mobility in hospitalized and frail elderly.

128 citations


Journal ArticleDOI
TL;DR: The methods and results from a study designed to generate estimates of intraclass correlation for common outcomes in adolescent smoking prevention studies are described and the use of these estimates in the planning of new studies are discussed.
Abstract: Most adolescent smoking prevention studies employ designs in which classrooms, schools, school districts, or sometimes whole communities are assigned to treatment conditions while observations are made on individual students. The critical design feature in such community trials is the nesting of intact social groups within treatment conditions. This combination requires that the treatment effect be assessed against the between-group variance; unfortunately, that variance is usually larger than for randomly constituted groups and its precision is usually less than that for the within-group variance. These factors often combine to reduce power so that it is almost impossible to detect important treatment effects in an otherwise well designed and properly executed study. To address these problems, investigators need good estimates of the intraclass correlation for the variables of interest, which together with the number of observations per unit determine the magnitude of the extra variation in the nested design. The purpose of this paper is to describe the methods and results from a study designed to generate estimates of intraclass correlation for common outcomes in adolescent smoking prevention studies and to discuss the use of these estimates in the planning of new studies.

124 citations


Journal ArticleDOI
TL;DR: A single method that provides results for these approaches is proposed and the bivariate confidence ellipse is suggested to provide boundaries for dispersion.
Abstract: An assessment of measurement agreement made by devices, laboratories, or raters is important in medical practice and research. The setting in which each randomly selected subject is rated by the same two raters raises assorted questions regarding rate agreement. The intraclass correlation coefficient (ICC) is one measure of reliability. The paired t-test can be used to evaluate the overall ratings or bias of the two raters, while their variances can be assessed with Pitman's test. The Bradley-Blackwood test can be used for a simultaneous test of their means and variances. A single method that provides results for these approaches is proposed and the bivariate confidence ellipse is suggested to provide boundaries for dispersion.

Journal Article
TL;DR: Using a footswitch system, the retest reliability of the temporal and distance parameters of gait was investigated within a session for 22 stroke patients in the early phase of rehabilitation, indicating that the use of two consecutive measurements for interpreting an individual patient's change would not be a sensitive method for monitoring progress or deterioration during rehabilitation.

Journal ArticleDOI
TL;DR: The high ICC values indicate that the Functional Independence Measure Instrument and IADL scale of the Multidimensional Functional Assessment of Older Adults provide consistent information across two experienced raters and over time when used with a sample of elderly persons residing in the community.

Journal ArticleDOI
TL;DR: Initial CPT scores predicted risk of institutionalization over a 4-year follow-up period and were significantly correlated with Mini-Mental State Examination scores and two measures of caregiver-rated ADL.
Abstract: A new performance-based assessment instrument for evaluating function in patients with Alzheimer's disease (AD), the Cognitive Performance Test (CPT), is described. This instrument, based on Allen Cognitive Disability Theory, uses six common activities of daily living (ADL) tasks, for which the information-processing requirements can be systematically varied to assess ordinal levels of functional capacity. Seventy-seven patients with mild to moderate Alzheimer's disease (AD) and 15 neurologically normal elderly controls were administered the CPT. Subsets of the AD patients were assessed again at 4 weeks and 1, 2, and 3 years following the initial evaluation. Internal consistency of the CPT estimated by alpha was .84. Intraclass correlation for interrater reliability was .91 and for test-retest reliability at 4 weeks, .89. CPT scores were significantly correlated with Mini-Mental State Examination scores (r = .67) and two measures of caregiver-rated ADL (Instrumental Activities of Daily Living, r = .64; Physical Self-Maintenance Scale, r = .49). Significant declines in CPT scores were seen on 1-, 2-, and 3-year follow-ups. Initial CPT scores predicted risk of institutionalization over a 4-year follow-up period.

Journal ArticleDOI
TL;DR: To compare biomechanics force platform measurements of postural sway with clinical measures of balance and mobility, in frail elderly residents of community nursing homes, in terms of feasibility, correlation with other known risk factors for falls, and intercorrelation with each other.
Abstract: Objective: To compare biomechanics force platform measurements of postural sway with clinical measures of balance and mobility, in frail elderly residents of community nursing homes, in terms of feasibility, correlation with other known risk factors for falls, and intercorrelation with each other. Design: Cross-sectional study. Setting: Twelve Tennessee community nursing homes. Subjects: Of 1315 residents 360 (≥65) could stand independently (≥10 seconds). Of these eligible subjects, 303 (84%) provided informed consent and were assessed. Measurements: The biomechanics force platform measurements were postural sway during quiet standing characterized as elliptical area and mean velocity. The clinical measures were functional reach, mobility maneuvers (adapted from Tinetti's Mobility Index), timed chair stands, and 10-foot walk. Resident characteristics and function were also obtained. Results: Balance measurements were obtained on most (100% for postural sway to 67% for chair stand) consenting residents and were reliable on test-retest (intraclass correlation from .56 to .98). Performance in both groups of balance measures deteriorated with increasing musculoskeletal disability. Functional reach and mobility maneuvers correlated with height, and mobility maneuvers with depressive symptoms. Elliptical area correlated with mean velocity of postural sway (Pearson's r = 0.72; P < 0.001), and the clinical measures of balance (functional reach, mobility maneuvers, timed chair stands and walk) were modestly intercorrelated (r from 0.35 to 0.65; all P values ≤0.05). However, the biomechanical measures were not correlated with the clinical measures. Conclusions: Standard measures of balance were obtained reliably from nursing home residents who could stand independently for ≥10 seconds. However, in this group, further research is needed to determine which measures best predict falls. Further research is also needed to identify predictors of falls in the majority of residents who were too frail to undergo these standard assessments.

Journal ArticleDOI
TL;DR: The intraclass correlation coefficients of the city-year component of variance as estimated in the Minnesota Heart Health Program for a variety of community survey variables are reported and illustrate their use in both design and analysis.
Abstract: Community trials involve the assignment of intact social groups to study conditions and are becoming increasingly common in epidemiologic research. In both the design and analysis of these studies, whether cross-sectional or cohort, allowance must be made for the dependence of elements within intact groups if variances are to be properly estimated. In the design phase, the statistician needs estimates of the level of dependence likely to be encountered. In the analysis phase, external estimates of the level of dependence may be useful in preventing the erosion of power associated with small numbers of intact groups assigned to each condition. We report the intraclass correlation coefficients of the city-year component of variance as estimated in the Minnesota Heart Health Program for a variety of community survey variables and illustrate their use in both design and analysis. Of 23 variables assessed, all but two showed positive estimates of city-year intraclass correlations. In these data, estimates of intraclass correlation coefficients generally were in the range 0.002-0.012.

Journal ArticleDOI
TL;DR: Examination of interrater reliabilities of grip, lateral pinch, and tripod pinch measurements in patients with cumulative trauma disorders found reliability coefficients for the scores obtained were very high, and strength determination in this patient group, when performed by different hand therapists, may be considered reliable.

Journal ArticleDOI
TL;DR: Internal consistency varied widely but was higher for older children and longer subscales, and Implications of the findings for the use of the questionnaires in the evaluation of new asthma treatments are discussed.
Abstract: This paper reports the internal consistency and reproducibility of the Childhood Asthma Questionnaires, measures of quality of life and symptom distress in paediatric asthma. A total of 535 children aged 4–16 years completed age appropriate forms of the questionnaire, over 1-or 3-week intervals. Pearson correlation coefficients between 0.63 and 0.84 for subscales of the questionnaires indicated good test-retest reliability while intraclass correlation coefficients in a very similar range showed that scores also remained at the same level on the two occasions. Comparisons between children with asthma and healthy non-asthmatics indicate that these are likely to be true estimates of stability. Internal consistency varied widely but was higher for older children and longer subscales. Implications of the findings for the use of the questionnaires in the evaluation of new asthma treatments are discussed.

Journal ArticleDOI
TL;DR: The score test for the hypothesis of null intraclass correlation in the exponential family is derived, which does not depend on the particular distribution in this family and is related to the pairwise correlation coefficient.
Abstract: A definition of the intraclass correlation coefficient is given on the basis of a general class of random effect model. The conventional intraclass correlation coefficient and the intracluster correlation coefficient for binary data are both particular cases of the generalized coefficient. We derive the score test for the hypothesis of null intraclass correlation in the exponential family. The statistic does not depend on the particular distribution in this family and is related to the pairwise correlation coefficient. The test can be adjusted for explanatory variables.

Journal ArticleDOI
TL;DR: The findings from this pilot study suggest that children's perceptions of effort might be used to guide intensity of exercise during structured activity classes.
Abstract: The purpose of this study was to examine the validity of a recently developed rating scale of perceived exertion, the Children's Effort Rating Table (CERT), for controlling exercise intensity in young children. 16 children (M age = 9.9 yr., SD = 1.2) performed three separate exercise tests on a mechanically braked cycle ergometer. Stage I (response protocol) consisted of a graded test with heart rate and perceived effort rating recorded in response to specified steady-state work outputs. Stage II (production protocol) examined subjects' ability to produce work outputs corresponding to levels 5, 7, and 9 of the CERT. This protocol was repeated on a further occasion (Stage III) to assess the reliability of the findings. Pearson correlations between ratings and heart rate (0.76) and ratings and work output (0.75) highlight the potential of the scale as a valid measure of exercise intensity. Also, the work rates produced by subjects in Stage II correlated 0.89 with those predicted from Stage I; however, analysis of variance showed that work output was significantly lower in Stage II than predicted. Finally, an intraclass correlation of 0.91 between Stages II and III suggests that the scale gave a reliable estimate of exercise intensity of children. The findings from this pilot study suggest that children's perceptions of effort might be used to guide intensity of exercise during structured activity classes.

Journal Article
TL;DR: Data suggest that, whereas single determinations of total E2 are insufficient to reliably estimate a woman's true mean level, a single measurement of the percentage of SHBG-E2 or U-E1 is adequate to assess bioavailability of E2 in an epidemiological study, irrespective of day of the menstrual cycle.
Abstract: Estradiol (E2) circulates in the blood in three states: unbound (U-E2), bound to sex-hormone binding globulin (SHBG-E2), and bound to albumin. There is evidence to support the concept that only U-E2 and albumin-bound E2, are bioavailable (i.e., rapidly extracted by tissues). A case-control study nested within a large cohort of women, in which we are examining the effect of estrogens on breast cancer risk, offered the opportunity to assess the reliability of measurements of E2, the percentage of SHBG-E2, and the percentage of U-E2 based on multiple annual serum specimens. Long-term (1-2 year) reliability, as estimated by the intraclass correlation coefficient, was assessed in a subgroup of 71 premenopausal and 77 postmenopausal controls for whom two or three serum specimens were assayed. In postmenopausal women the intraclass correlation coefficient for a single measurement of total E2 was only 0.51. As for the percentage of SHBG-E2, intraclass correlation coefficients were 0.83 and 0.94, and for U-E2, 0.72 and 0.77 in the premenopausal and postmenopausal groups, respectively. These data suggest that, whereas single determinations of total E2 are insufficient to reliably estimate a woman's true mean level, a single measurement of the percentage of SHBG-E2 or U-E2 is adequate to assess bioavailability of E2 in an epidemiological study, irrespective of day of the menstrual cycle.

Journal ArticleDOI
TL;DR: The FIM assessment appears to be reliable and valid tool to assess general physical needs in MS and its use to assess neuropsychological needs is, however, questionable with MS subjects.
Abstract: The functional independence measure (FIM) is a new assessment scale, which has been extensively used in the uniform data system (UDS) throughout the United States. Eighty-one multiple sclerosis (MS) subjects were assessed by two trained physiotherapists using the FIM. Inter-rater reliability was found to be high for the total score (intraclass correlation coefficient [ICC] = 0.83 and moderate to substantial for items assessing physical disability (Kappa = 0.70-0.50), except for the item concerned with assessing independence in walking or in wheelchair (Kappa = 0.16). The inter-rater agreement of FIM items in the communication and social cognition subsections was only fair. The internal consistency of the FIM assessment scale was found to be high, with a Cronbach's alpha of at least 0.94. Relatively high Spearman rank correlation coefficients were found between total FIM and total Expanded Disability Status Scale (EDSS). The present reliability and validity study on FIM provides scientific data for a sampl...

Journal ArticleDOI
Ray Marks1
TL;DR: It is suggested that measurements of self-paced walking time can provide both reliable and valid data for evaluating functional performance in this patient population and compared favorably with a validated subjective functional index for knee OA.
Abstract: This study assessed the reliability and validity of self-paced walking time as an outcome measure for osteo-arthritis (OA) of the knee. The 13-m timed walking tests were carried out twice on two separate occasions for 15 patients using photocells. Validity was examined by comparing the walking time data with results of other methodologies for evaluating osteoarthritic gait. As indicated by intra- and inter-day intraclass correlation coefficients of 0.97 and 0.88 the gait measurements were reliable. The measurements also compared favorably with a validated subjective functional index for knee OA (P < 0.01). The findings suggest that measurements of self-paced walking time can provide both reliable and valid data for evaluating functional performance in this patient population.

Journal ArticleDOI
Peter Bryner1
01 Nov 1994-Pain
TL;DR: It is concluded that grid‐based assessment of small areas overestimates the actual area of pain and this may account for the lack of sensitivity to change in clinical status.
Abstract: Seventeen drawings of localised low-back pain were analysed by two assessors using 4 systems. Three were grid-based systems and one was by computer. The mean area or ‘extent’ was calculated to be 7.7%, 4.7%, 3.6% and 2.3% of the body outline using 45, 200, 560 and 61,102 section analyses, respectively. The computer-assisted method provided a significantly lower estimate of pain extent than the grid-based assessments as expected. Analysis of variance showed that the method of analysis provided greater source of variation than raters (P < 0.0001). Inter-rater reliability was high using all 4 systems calculated using intraclass correlation and the kappa statistic. Correlation coefficients of extent between the systems varied from 0.46 to 0.94. Correlation was highest between systems of adjacent magnitude of sections. It is concluded that grid-based assessment of small areas overestimates the actual area of pain and this may account for the lack of sensitivity to change in clinical status.

Journal ArticleDOI
TL;DR: Repeated measures ANOVAs with Tukey’s post hoc tests revealed improved performance from repeated practice sessions in all tests, although the improvement was not consistent between tests.
Abstract: This study assessed the relative and absolute reliability of the five tests in the AAHPERD functional fitness assessment for men and women over 60 years of age. Twenty-eight apparently healthy subjects, ages 60 to 81, were tested three times during a 2-week period on each item in the test battery: sit and reach flexibility, body agility, coordination, strength/endurance, and half-mile walk. Relative reliability was assessed for both sexes via intraclass correlation coefficient. Absolute reliability was evaluated using repeated measures ANOVA. Intraclass correlations among sessions for men and women, respectively, were 0.97 and 0.98 for flexibility, 0.98 and 0.96 for body agility, 0.89 and 0.71 for coordination, 0.94 and 0.81 for strength/endurance, and 0.99 and 0.96 for the walk. Repeated measures ANOVAs with Tukey’s post hoc tests revealed improved performance from repeated practice sessions in all tests, although the improvement was not consistent between tests. Although the tests have high intraclass c...

Journal ArticleDOI
TL;DR: In this article, an extension of the Galton-Pearson correlation coefficient to cases of nonlinear heteroscedastic regression is presented, which addresses what Karl Pearson called in 1905 'the question of generalising correlation'.
Abstract: Summary We present an extension of the Galton-Pearson correlation coefficient to cases of nonlinear heteroscedastic regression. This extension, the correlation curve, addresses what Karl Pearson called in 1905 'the question of generalising correlation'. The correlation curve measures the local variance explained by regression and thus provides a local measure of association in examples where the strength of relationship between a response variable Y and a covariate X differs for different values of the covariate. As an illustration of the use of the correlation curve, we explore a data set first analyzed by Pearson in 1905 as part of his attempts to generalise correlation beyond the linear model. We refine Pearson's analysis to obtain estimates of the correlation curve, and show that Pearson came close to defining the correlation curve in 1905.

Journal ArticleDOI
13 Jul 1994-JAMA
TL;DR: The reliability and preliminary validity of a grading instrument for editors to evaluate the quality of peer reviews is measured and content validity was confirmed and preliminary criterion-related validity was indicated.
Abstract: Objective. —To measure the reliability and preliminary validity of a grading instrument for editors to evaluate the quality of peer reviews. Design. —The consecutive sample design included 53 reviews of 23 manuscripts. Reviews were systematically assigned to interrater reliability (n=41; power greater than 0.90 to detect a difference of greater than one point) and preliminary criterion-related validity (n=12) subsamples. Content validity was closely examined. Setting. —Nonclinical. Participants. —Three graders evaluated reliability. One individual examined content validity and two editors tested preliminary criterion-related validity. Intervention (Instrument). —Attributes reflecting two basic dimensions, review content and format, were identified and scored (values are possible points/percent contribution): timeliness, 3/21%; grade sheet, 1/7%; etiquette, 1/7%; sectional narratives, 3/21%; citations, 2/14%; narrative summary, 2/14%; and insights, 2/14%. A scoring guide was provided. Main Outcome Measures. —Statistical analyses used to test the interrater reliability of the total score included the intraclass correlation coefficient and analysis of variance with the expectation to uphold the null hypothesis. Kendall's coefficient of concordance was used to test preliminary criterion-related validity. Results. —The intraclass correlation coefficient was.84 ( P P =.46). Content validity was confirmed and preliminary criterion-related validity was indicated (Kendall's coefficient of concordance=.94, P =.038). Conclusions.—The instrument is reliable. Content validation has been completed, and further criterion-related validation is warranted. ( JAMA . 1994;272:98-100)

Journal ArticleDOI
TL;DR: In this article, the authors used the average of the observed correlations in meta-analysis and found that using the average correlation estimator is more accurate than use of the traditional, individual correlation estimators.
Abstract: The estimate of the population correlation used in the formula for sampling error variance of a correlation is typically the observed correlation, but in meta-analysis the average of the observed correlations can be used. For the case in which there is no variation in the study population correlations or sample sizes and the number of studies is very large, the authors found that use of the average correlation estimator is more accurate than use of the traditional, individual correlation estimator, except in those rare cases in which the uncorrected population correlation is greater than .60. For typical sample sizes, when the uncorrected population correlation is between -.40 and .40, there is virtually no error in the meta-analysis credibility interval based on the average correlation estimator

Journal ArticleDOI
TL;DR: It was concluded that testers with minimal experience with the NMMT could obtain reliable measurements with theNMMT for determining isometric force of elbow flexors and extensors in individuals with intellectual disabilities.

Journal Article
TL;DR: Comparing in patients with ankylosing spondylitis utilities derived by rating scale and standard gamble method, to relate these values to other outcome measures, and to assess the sensitivity to change of utilities relative to changes in other outcomes is compared.
Abstract: OBJECTIVE To compare in patients with ankylosing spondylitis (AS) utilities derived by rating scale and standard gamble method, to relate these values to other outcome measures, and to assess the sensitivity to change of utilities relative to changes in other outcomes. METHODS Patients with AS were randomly allocated to either weekly sessions of supervised group physical therapy for a period of 9 months or daily exercises at home. Analysis was restricted to the 59 patients who completed the Maastricht Utility Measurement Questionnaire (MUMQ) at baseline and after 9 months' followup and who were seen by the same interviewer. Reliability was assessed by intraclass correlation coefficient and change scores for marker states of disease. Construct validity was evaluated by correlation and multiple regression of baseline values with a variety of disease outcomes (pain and stiffness, patient's and physician's global assessment, Sickness Impact Profile, Health Assessment Questionnaire for the Spondyloarthropathies, Arthritis Impact Measurement Scale, functional, articular, and enthesis indices and spinal mobility measures). Sensitivity to change was assessed against changes in these outcome measures at followup. RESULTS The test-retest intraclass correlation coefficients for patient utilities were 0.95 (rating scale) and 0.79 (standard gamble), and for the marker state of mild disease 0.70 (rating scale) and 0.77 (standard gamble). A multiple regression analysis with the baseline rating scale or standard gamble utilities as dependent variable showed that patient's global assessment explained 59 and 11% of the total variance respectively. By multiple regression analysis 10% of the variance of change in rating scale utilities was explained by changes of patient's global assessment. In contrast, variance in change in standard gamble utilities was not explained by changes in other disease outcomes. CONCLUSION Findings obtained by rating scale and standard gamble differ considerably. Standard gamble utilities seem to address different aspects of health status than do rating scale utilities and more traditional outcomes. Utility measurement is sensitive to the method chosen to elicit patient well being.