scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 1998"


Journal ArticleDOI
TL;DR: The ICC and Bland and Altman tests are appropriate for analysis of reliability studies of similar design to that described, but neither test alone provides sufficient information and it is recommended that both are used.
Abstract: Objective: To provide a practical guide to appropriate statistical analysis of a reliability study using real-time ultrasound for measuring muscle size as an example.Design: Inter-rater and intra-rater (between-scans and between-days) reliability.Subjects: Ten normal subjects (five male) aged 22–58 years.Method: The cross-sectional area (CSA) of the anterior tibial muscle group was measured using real-time ultrasonography.Main outcome measures: Intraclass correlation coefficients (ICCs) and the 95% confidence interval (CI) for the ICCs, and Bland and Altman method for assessing agreement, which includes calculation of the mean difference between measures (d), the 95% CI for d, the standard deviation of the differences (SD diff), the 95% limits of agreement and a reliability coefficient.Results: Inter-rater reliability was high, ICC (3,1) was 0.92 with a 95% CI of 0.72 → 0.98. There was reasonable agreement between measures on the Bland and Altman test, as d was -0.63 cm2, the 95% CI for d was -1.4 → 0.14 ...

908 citations


Journal ArticleDOI
TL;DR: Results indicate excellent reliability for both methods of assessing a DF lunge for both inter- and intra-rater reliability.
Abstract: This study aimed to evaluate the inter-rater and intra-rater reliability of a weight-bearing dorsiflexion (DF) lunge in 13 healthy subjects. Four raters with varying clinical experience tested all subjects in random order. Two of the raters repeated the measurements one week later. Two methods were used to assess the DF lunge: (i) the distance from the great toe to the wall and (ii) the angle between the tibial shaft and the vertical using an inclinometer. The average otthree trials was used in data analysis. Intra-rater intraclass correlation coefficients (ICCs) ranged from 0.97 to 0.98. Inter-rater ICC values were 0.97 (angle) and 0.99 (distance). Results indicate excellent reliability for both methods of assessing a DF lunge.

525 citations


Journal ArticleDOI
TL;DR: The star-excursion test requires the patient to balance on one leg while reaching with the other leg and reliability estimates ranged from 0.67 to 0.87; task complexity may account for the moderate reliability estimates.
Abstract: Quantification of dynamic balance is often necessary to assess a patient's level of injury or ability to function in order to initiate an appropriate plan of care. Some therapists use the star-excursion test in an attempt to quantify dynamic balance. This test requires the patient to balance on one leg while reaching with the other leg. For the purpose of this study, the reach was performed in four directions. No previous researchers have attempted to evaluate the reliability of this test. Twenty healthy subjects between the ages of 18 and 35 years participated in this study. During two testing sessions, each subject was required to perform five reaching trials in four directions. Reliability estimates, calculated using the intraclass correlation coefficient (2, 1), ranged from 0.67 to 0.87. Six duplicate practice sessions were suggested to increase this range above 0.86. Task complexity may account for the moderate reliability estimates. Subjects should engage in a learning period before being evaluated on the star-excursion test.

468 citations


Journal ArticleDOI
15 Nov 1998-Spine
TL;DR: The results provide support for the use of these physical performance measures as a complement to patient self‐report and show significant group differences on all measures except the 50‐foot walk at preferred speed and unloaded forward reach.
Abstract: Study Design. The psychometric properties and clinical use of a battery of physical performance measures were tested on 44 patients with low back pain and 48 healthy, pain-free control subjects. Objectives. Reliability, validity, and clinical use of nine physical performance measures were evaluated. of Background Data. Although physical performance measures have potential use in evaluation, treatment planning, and determination of treatment outcome, there is sparse systematic investigation of their reliability, validity, and clinical use. Methods. Forty-four subjects with low back pain and 48 healthy pain-free subjects participated. The following physical performance measures were tested: distance walked in 5 minutes; 50-foot walk at fastest speed; 50-foot walk at preferred speed; 5 repetitions of a sit-to-stand task; 10 repetitions of a repeated trunk flexion task; timed up-and-go task; unloaded forward reach task, loaded forward reach task; and Sorensen fatigue test. Subjects were assessed twice on 2 days. Results. All measures had excellent intertester reliability (intraclass correlation coefficient [ICC] 1.1 > 0.95). Test-retest (within session) reliability was adequate for all measures (ICC 1,1 > 0.83) except repeated trunk flexion (ICC 1.1 > 0.45) in the low back pain group. Test-retest (day-to-day) reliability ranged between 0.59 and 0.88 in the low back pain group and between 0.46 and 0.76 in the control group. Day-to-day reliability improved when the averages of two trials of repeated trunk flexion and sit-to-stand were used (0.76-0.91 low back pain group and 0.62-0.89 control group). Resuits of a muitivariate analysis of variance showed a significant effect of group (F 10,65 = 3.52, P = 0.001). Results of univariate analyses showed significant group differences on all measures except the 50-foot walk at preferred speed and unloaded forward reach. Self-report of disability was moderately correlated with the performance tasks (r = 0.400 to - 0.603). Conclusions. The results provide support for the use of these physical performance measures as a complement to patient self-report.

346 citations


Journal ArticleDOI
01 Jan 1998-Stroke
TL;DR: Both the EuroQol and SF-36 have acceptable and qualitatively similar test-retest reliability and either instrument might function effectively as a discriminatory measure for assessing health-related quality-of-life outcomes in groups of patients after stroke.
Abstract: Background and Purpose —The reliability of the EuroQol and SF-36 questionnaires after stroke is not known. We therefore aimed to assess and compare the test-retest reliability of both instruments in a group of stroke patients. Methods —A total of 2253 patients with stroke entered by United Kingdom hospitals in the International Stroke Trial were randomized to follow up with either the EuroQol or the SF-36 instruments. For both instruments, we randomly selected one third of respondents and asked them to complete another, identical questionnaire. We assessed test-retest reliability using agreement statistics: unweighted κ statistics for the categorical domains of the EuroQol and intraclass correlation coefficients for the EuroQol visual analog scale, utility scores, and SF-36. Results —For the five categorical domains of the EuroQol, reproducibility was generally good (κ ranged from 0.63 to 0.80). The reproducibility of the domains of the SF-36 was qualitatively similar for all the domains except mental health (intraclass correlation coefficient=.28). However, the 95% confidence intervals for the difference in scores between test and retest were substantial. For both instruments, reproducibility was better when the patient completed the questionnaires than when a proxy did. Conclusions —Both the EuroQol and SF-36 have acceptable and qualitatively similar test-retest reliability. Therefore, either instrument might function effectively as a discriminatory measure for assessing health-related quality-of-life outcomes in groups of patients after stroke. However, our data do not support the use of either instrument for serial assessments in individual patients unless very large differences over time are expected.

241 citations


Journal ArticleDOI
TL;DR: The results suggest that this simple test battery provides reliable scores, and that the different tests relate to a homogeneous construct, while not being redundant, in line with previous findings based on similar subjects and similar, though not identical, testing procedures.
Abstract: Four common tests of balance and mobility were administered to 45 healthy women, aged 55–71 years: Sharpened Romberg (also defined as tandem stance), eyes open and closed; One-Legged Stance Test, eyes open and closed; Functional Reach; and Sit-To-Stand test. Two independent observers scored the tests, which were performed on two successive days. Inter-rater (IRR) and test-retest reliability (TRR) were good. Across the six different tests, Intraclass Correlation Coefficients ranged from 0.95 to 0.99 for scoring consistency between raters, and from 0.73 to 0.93 within raters. This is in line with previous findings based on similar subjects and similar, though not identical, testing procedures. Intercorrelations between the scores were moderate: r coefficients ranged 0.40–0.66. The results suggest that this simple test battery provides reliable scores, and that the different tests relate to a homogeneous construct, while not being redundant. It thus seems worthwhile to further investigate whether they represent and measure a unidimensional domain, rather than conceptually different dimensions, in view of achieving a unique measure of balance performance.

177 citations


Journal ArticleDOI
TL;DR: The modified FRT appears to provide reliable measurements of sitting balance in nonstanding persons with spinal cord injuries and test-retest reliability was high with modification of the FRT with a single rater.
Abstract: Background and Purpose. The primary purpose of this study was to determine whether the Functional Reach Test (FRT) could be modified to provide reliable measurements of sitting balance. A secondary purpose was to determine whether the test could be used to measure differences among levels of spinal cord injury. Subjects. Thirty male subjects with spinal cord injuries were divided into three groups based on injury type. Group 1 consisted of subjects with C5-6 tetraplegia, group 2 consisted of subjects with T1-4 paraplegia, and group 3 consisted of subjects with T10-12 paraplegia. Methods. Subjects sat on similar mat tables (tables varied based on what was available at a given clinic) against the same backboard, set at 80 degrees. During two sessions, forward reach was measured with a yardstick, with a 10-minute break between sessions. Results. Intraclass correlation coefficients (3,2) were high and varied from .85 to .94. Post hoc testing revealed that differences occurred between groups 1 and 3 and groups 2 and 3, but not between groups 1 and 2. Conclusion and Discussion. Test-retest reliability was high with modification of the FRT with a single rater. The measurements reflected differences among levels of lesion. Further study is needed to determine normal values for all levels of lesion, relationships to functional outcomes, and effects of equipment on sitting balance. The modified FRT appears to provide reliable measurements of sitting balance in nonstanding persons with spinal cord injuries.

166 citations


Journal ArticleDOI
TL;DR: Three commonly used measures (chair stand, walking speed, and 360 degree turn) may be less reliable than previously reported and sample sizes that may be needed to detect change in these areas of performance may be larger than previously estimated given this level of imprecision.
Abstract: BACKGROUND: Functional assessments and direct measures of physical performance are standard components of community-based studies of older populations. Estimates of the reliability of these measures are necessary for the assessment of functional change. METHODS: The reproducibility of 13 measures of self-reported function and 11 direct measures of physical performance was assessed. A sample of subjects (N=199; > or =55 yrs) was selected from a larger population-based cohort. Subjects were tested in their homes twice, 48 hours apart, by the same interviewer to replicate study conditions. Age-adjusted kappa statistics were used to assess the reliability of measures of physical function; product moment correlation (Pearson r) and intraclass correlation coefficients (ICC) were used to assess direct measures of performance. A repeated measures model was used to assess learning or practice effects of performance, adjusted for age, sex, general health, and cognitive function. RESULTS: Age-adjusted kappa statistics were > or = .60 for most self-reported items. ICC ranged from .63 to .92. Significant improvements (practice effects) were found for the chair stand, walking speed, and the 360 degree turn. Measures of grip strength, reaching down, and hand dexterity were found to be reliable, with no significant test effect. CONCLUSION: Three commonly used measures (chair stand, walking speed, and 360 degree turn) may be less reliable than previously reported. Sample sizes that may be needed to detect change in these areas of performance may be larger than previously estimated given this level of imprecision. Future studies of reproducibility should assess both the level of agreement and the presence of possible practice effects. Language: en

96 citations


Journal ArticleDOI
TL;DR: The variability in outcome measures from tests which do not maximally challenge the postural control system may be a hallmark of normal balance performance and supports the use of absolute performance measures as the interpretive value of data expressed relative to standard norms is limited.
Abstract: The reliability of outcome measures obtained using the Balance Master and the limits of stability in anterior, posterior, and lateral directions were evaluated in 70 healthy subjects aged 20 to 32 years. Data relating to static sway and the ability to shift the centre of gravity to preset targets were collected on three occasions one week apart. The centre of gravity position and limits of stability were determined over three trials and data converted from a relative reference system to absolute displacements from vertical. Intraclass correlation coefficients revealed fair to poor reliability of static and dynamic sway measures (coefficients or = 0.75). The variability in outcome measures from tests which do not maximally challenge the postural control system may be a hallmark of normal balance performance. Further, the intersubject variation in resting centre of gravity position and in limits of stability supports the use of absolute performance measures as the interpretive value of data expressed relative to standard norms is limited.

95 citations


Journal Article
TL;DR: A new statistic, the Coefficient of Accuracy, C(a), has been developed by Lin for methods comparison and forms a single statistic for both accuracy and precision called the Concordance Correlation Coefficient, rc.
Abstract: A new statistic, the Coefficient of Accuracy, C(a), has been developed by Lin for methods comparison. When an old measurement method is compared to a new measurement method or if the same method is compared in two laboratories, the Coefficient of Determination, r2, is typically used to measure the relationship. However, r2 only measures the precision of the relationship. The newly developed statistic, C(a), measures the accuracy of the relationship. When these two statistics are combined together, they form a single statistic for both accuracy and precision called the Concordance Correlation Coefficient, rc.

94 citations


Journal ArticleDOI
TL;DR: It is concluded that RM-Sw is a reliable and valid measure of functional ability in low back pain and a concurrent validity measure of Roland and Morris Disability Questionnaire.
Abstract: The purpose of this study was to investigate test-retest reliability and concurrent validity of a Swedish version of the Roland and Morris Disability Questionnaire (RM-Sw), and to describe demographic factors in patients with low back pain of at least 4 weeks' duration seeking outpatient physical therapy treatment in primary care settings. Seventy-two patients participated in the study. The intraclass correlation coefficient for a one-week test-retest interval was 0.88. There was moderate positive correlation with measures of perceived disability (r = 0.64, p < 0.001; r = 0.69, p < 0.001) and pain severity (r = 0.54, p < 0.001), and low negative correlation with measures of perceived life control (r = -0.32, p < 0.01) and general activity (r = -0.27, p < 0.05). Gender, education and occupation were only moderately related to RM-Sw scores, explaining 14% of the variance in the scores. It is concluded that RM-Sw is a reliable and valid measure of functional ability in low back pain.

Journal Article
TL;DR: Findings suggest that OPAQ is a reliable, consistent, and valid instrument capable of distinguishing hierarchy of functional loss in disease states in osteoporosis.
Abstract: Objective To determine the reliability, consistency, and clinical utility of the Osteoporosis Assessment Questionnaire (OPAQ), an AIMS2 based self-assessment questionnaire. Methods Reliability of individual questions, scales, and domains were evaluated in 40 subjects by test-retest and intraclass correlation coefficients and internal consistency by Cronbach's alpha. Construct validity was evaluated by disease state. The relationships between domains and scales were modeled by confirmatory factor analysis. Results Mean kappa (79 questions) and intraclass correlation (18 health scales) coefficients were 0.58+/-0.16 (mean+/-SD) and 0.82+/-0.07, respectively. Internal consistency was greater than 0.8 in all but 3 scales. Construct validity was confirmed. Patients with hip fracture recorded lower OPAQ scores than patients with vertebral fracture. Correlation and confirmatory factor analyses grouped the 18 health scales into 7 domains. Conclusion These findings suggest that OPAQ is a reliable, consistent, and valid instrument capable of distinguishing hierarchy of functional loss in disease states in osteoporosis.

Journal ArticleDOI
TL;DR: The reference-based protocol allows for more reliable measures of PA stiffness judgments than previous protocols have; however, the human ratings are not highly correlated with the SAM measures.
Abstract: Background and Purpose. The reliability and criterion-related validity of ratings of posteroanterior (PA) spinal stiffness made using reference values for comparison have not been investigated. In this study, mechanical reference stimuli for points on an 11-point rating scale were used to determine whether using a reference scale may be feasible. Subjects. Five different raters took part in 2 studies in which they rated 40 subjects who were asymptomatic for low back pain. Methods. The interrater reliability of ratings was evaluated with intraclass correlation coefficients (ICCs) and standard errors of the measurement (SEMs). Criterion-related validity was evaluated by correlating judgments of PA spinal stiffness assessed manually with measurements of PA spinal stiffness provided by a mechanical device, the “Stiffness Assessment Machine” (SAM). Results. Although the reliability indices were generally high, with ICCs reaching .77 and with SEMs as low as 0.72 points, the evidence for criterion-related validity (ie, the ability of the examiner to judge spinal stiffness levels) was not strong, with correlations reaching only .56. Conclusion and Discussion. The reference-based protocol allows for more reliable measures of PA stiffness judgments than previous protocols have; however, the human ratings are not highly correlated with the SAM measures. The protocol will have clinical value if judgments made using it are shown to be reliable in clinically relevant subjects and to have validity for clinical management of patients.

Journal ArticleDOI
TL;DR: Agreement among secondary school teachers' behavior ratings for 66 adolescent boys with a history of attention deficit hyperactivity disorder suggests that a multiple teacher assessment strategy should be adopted for clinical assessment, treatment design, and evaluation of treatment efficacy.
Abstract: Examined agreement among secondary school teachers' behavior ratings for 66 adolescent boys with a history of attention deficit hyperactivity disorder. Behavior ratings consisted of the Teacher Report Form, Iowa/Abbreviated Conners, and the Disruptive Behavior Disorders Rating Scale. Ratings from 2 to 5 teachers were collected for each adolescent. In contrast to previous studies, agreement was examined using statistical indices that corrected for chance agreement and discrepancies in scores (i.e., intraclass correlation [KC], kappa) in addition to traditional indices (i.e., Pearson correlation and percentage agreement) typically used in the relatively sparse literature on teacher agreement for adolescent behavior ratings. Agreement was poor for dimensional subscale scores (Pearson correlations were in the .40:50 range, and ICCs were in the .20-SO range) as well as for categorization of youth as above or below clinical cutoffs (percentage agreement was between 52% and 96%, but ICCs and kappas ranged from ....

Journal ArticleDOI
01 Jul 1998-Genetics
TL;DR: Widely used standard expressions for the sampling variance of intraclass correlations and genetic correlation coefficients were reviewed for small and large sample sizes and it was shown that in cases where the population values of the heritabilities are known, using the estimatedHeritabilities rather than their true values to estimate the genetic correlation results in a lower sampling variance for the Genetic correlation.
Abstract: Widely used standard expressions for the sampling variance of intraclass correlations and genetic correlation coefficients were reviewed for small and large sample sizes. For the sampling variance of the intraclass correlation, it was shown by simulation that the commonly used expression, derived using a first-order Taylor series performs better than alternative expressions found in the literature, when the between-sire degrees of freedom were small. The expressions for the sampling variance of the genetic correlation are significantly biased for small sample sizes, in particular when the population values, or their estimates, are close to zero. It was shown, both analytically and by simulation, that this is because the estimate of the sampling variance becomes very large in these cases due to very small values of the denominator of the expressions. It was concluded, therefore, that for small samples, estimates of the heritabilities and genetic correlations should not be used in the expressions for the sampling variance of the genetic correlation. It was shown analytically that in cases where the population values of the heritabilities are known, using the estimated heritabilities rather than their true values to estimate the genetic correlation results in a lower sampling variance for the genetic correlation. Therefore, for large samples, estimates of heritabilities, and not their true values, should be used.

Journal ArticleDOI
TL;DR: The Japanese version of the PANSS is a reliable and efficient tool for comprehensive assessment of the schizophrenic syndrome and the internal consistency were satisfactory and similar to those obtained in the antecedent studies.
Abstract: The purpose of the present study is to test interrater reliability of the Japanese version of the Positive and Negative Syndrome Scale (PANSS) and to examine factors possibly affecting the reliability The study group conducted the PANSS rating on 20 patients with DSM-IV schizophrenia For the analysis of interrater reliability, intraclass correlation coefficient (ICC) was calculated The ICC for individual items of the PANSS ranged from 026 to 092, and those for the positive, negative, and general psychopathology subscales were 085, 083 and 075, respectively The Cronbach's alpha coefficient for the subscales were 084, 087 and 076, respectively The interrater reliability and the internal consistency were satisfactory and similar to those obtained in the antecedent studies No salient training effect was found in a sequential analysis of the concordance rate It is concluded that the Japanese version of the PANSS is a reliable and efficient tool for comprehensive assessment of the schizophrenic syndrome

Journal ArticleDOI
TL;DR: Movement diagram measures conceptually related to the end of joint ROM and end-feel were highly reliable and the fact that additional end- Feel categories were introduced in the study may partially explain the end- feel reliability findings.
Abstract: Background and Purpose. Findings related to joint function can be recorded with movement diagrams or by characterizing the “end-feel” according to the procedure described by Cyriax. Because both methods are used to classify pain and resistance in relation to joint range of motion (ROM), the purpose of this study was to simultaneously evaluate the reliability of these categorizations in a patient sample. Subjects. Two physical therapists performed 2 assessments of passive lateral rotation of the shoulder in 34 patients. Methods. Pain and resistance findings were recorded using movement diagrams and end-feel categories. Intraclass correlation coefficients (ICC[2,1]) were used to analyze the ratio (movement diagram) data, and kappa statistics (κ) were used to analyze the categorical (end-feel) data. Results. Intrarater ICCs varied from .58 to .89. Interrater ICCs for locating maximum pain and resistance in joint ROM varied from .85 to .91. Other interrater ICCs were lower (ICC=.34–.88). Intrarater kappa values for end-feel were moderate (κ=.48–.59), and interrater kappa values were substantial (κ=.62–.76). Conclusion and Discussion. Movement diagram measures conceptually related to the end of joint ROM and end-feel were highly reliable. This finding and the fact that additional end-feel categories were introduced in the study may partially explain the end-feel reliability findings. Consideration of their use in future studies may help to determine their clinical utility.

Journal ArticleDOI
TL;DR: The results suggest that the degree of postural sway and the reliability of the measurement itself are influenced by the age of the child and the measurement system employed.
Abstract: Static standing balance is commonly measured with research laboratory systems (LabSys) or clinical systems (ClinSys). The purposes of this study were to (1) assess the reliability of two systems designed to measure static standing balance in nondisabled children, (2) compare the findings derived from the two systems of measurement, and (3) examine the relationship between anthropometric measures and postural sway. Twenty-five nondisabled children (12 male, 13 female) ages 1 year 11 months to 12 years 2 months (mean = 6 years 4 months; SD = 4 years 3 months) participated in the study. Each child stood on the LabSys and the ClinSys for three consecutive 10 second measurement periods. Intraclass correlation coefficients (ICC (2, 1)) for the three trials on each system were 0.62 (LabSys) and 0.63 (ClinSys). The level of agreement between the two systems was 0.61 (ICC (2, 1)). Younger children exhibited more variability and less agreement between measurement trials using the ClinSys. However, older children de...

Journal ArticleDOI
TL;DR: Test-retest reproducibility of stroke volume and cardiac output using Doppler echocardiography was examined during maximum cycle exercise in 13 young men to indicate a high degree of reproducability using this technique.
Abstract: Test-retest reproducibility of stroke volume and cardiac output using Doppler echocardiography was examined during maximum cycle exercise in 13 young men. A coefficient of variation of 8.5% and 8.1% and intraclass correlation coefficient of 0.90 and 0.91 for maximum stroke volume and cardiac output, respectively, indicate a high degree of reproducibility using this technique.

Journal ArticleDOI
TL;DR: It has been found on the basis of simulation studies that the likelihood ratio test consistently and reliably produces results superior to those of the large- sample z-test and large-sample z- test in terms of power for various combinations of intraclass correlation coefficient values.
Abstract: The likelihood ratio test, the large-sample z-test, and the large-sample z-test have been proposed to test for the equality of two intraclass correlation coefficients under unequal family sizes based on two independent multinomial samples. It has been found on the basis of simulation studies that the likelihood ratio test consistently and reliably produces results superior to those of the large-sample z-test and large-sample z-test in terms of power for various combinations of intraclass correlation coefficient values.

Journal ArticleDOI
TL;DR: The modified sphygmomanometer appears to be practical to use, and the high correlations found in this study for the elbow extensors suggest that reliable measurements can be obtained with this instrument.
Abstract: Background and Purpose. Physical therapists working with elderly people require an instrument that provides reliable force measurements and can be used in a clinical setting. The modified sphygmomanometer has been identified as potentially fulfilling these requirements, yet there is an absence of research on the reliability of measurements taken with this instrument on elderly patients. This study was undertaken to investigate the interrater reliability of force measurements, in a group of elderly subjects, using a modified sphygmomanometer. Subjects. Thirty-six hospitalized subjects (mean age=75.28 years, SD=9.43, range=62–95) participated in the study. Methods. With the modified sphygmamanometer, 3 examiners evaluated the isometric force of the elbow extensors and hip extensors using a break test and a make test, respectively. Results. Intraclass correlation coefficients (2,1) reflecting reliability were .87 for the elbow extensors and .65 for the hip extensors. The estimation of the components of variance for hip extensors revealed that these results were due in part to the raters but that random error contributed to a much larger extent. Conclusion and Discussion. The modified sphygmomanometer appears to be practical to use, and the high correlations found in this study for the elbow extensors suggest that reliable measurements can be obtained with this instrument. Further research is needed, however, to specify the manner in which the modified sphygmomanometer can be used when assessing different muscle groups.

Journal ArticleDOI
TL;DR: This study demonstrated the applicability of the WeeFIM instrument to Japanese children with satisfactory reliability and validity and provided preliminary normative data for future studies.
Abstract: The Functional Independence Measure for Children (WeeFIM) was developed based on the FIMSM instrument to assess disability in children aged six months to seven years. Its reliability and validity have been studied, and normative data are available for American children. The WeeFIM instrument is potentially an internationally useful instrument, but data from other countries are lacking. The objectives of this study are to examine whether the WeeFIM instrument is applicable to Japanese children and to describe preliminary normative data. To study interrater reliability, we had two examiners assess 20 nondisabled children and calculated weighted kappas for individual item scores and intraclass correlation coefficients for total scores and motor and cognitive subscores. We then assessed 110 nondisabled children ages six months to seven years to obtain normative data and compared them with the American data. In 51 of these healthy children, we compared total WeeFIM scores with developmental ages as obtained with the Tsumori test, a standardized developmental test widely used in Japan to assess its concurrent validity. The weighted kappas were greater than 0.8, and the intraclass correlation coefficients were greater than 0.98. Total scores and motor and cognitive subscores increased with age, reaching a plateau at 60 to 72 months, which is similar to the American data. There were three patterns of chronologic changes in individual item scores: items showing high correlations with age (Spearman's rho > 0.8; grooming, dressing, memory, etc.), moderate correlations (0.8 > rho > 0.7; eating, bladder, comprehension, etc.), and lower correlations (0.7 > rho > 0.6; locomotion and chair transfer). Total scores correlated significantly with developmental ages (Spearman's rho = 0.938), but there was a discrepancy between each item score and the pass-or-fail patterns of the Tsumori test. This study demonstrated the applicability of the WeeFIM instrument to Japanese children with satisfactory reliability and validity and provided preliminary normative data for future studies.

Journal ArticleDOI
TL;DR: This instrument provides a new, rapid way to obtain information relative to the differing levels of the disablement process and is suitable for persons with spinal cord disease.

Journal ArticleDOI
TL;DR: The Learning Behaviors Scale (LBS) as mentioned in this paper is a standardized behavior rating scale designed to report how individual students respond to classroom learning situations, and it has been shown that the LBS produces comparable levels of differential learning styles for assessments of individual children.
Abstract: Standardized and reliable rating scales have an important role in educational assessment and behavioral classroom intervention. The Learning Behaviors Scale (LBS) is a standardized behavior rating scale designed to report how individual students respond to classroom learning situations. This study investigated the interobserver agreement of the LBS with the use of linear and intraclass correlation methods. The methods jointly assessed the three salient aspects of observer judgments—severity level, rank order, and directionality. Participants were 72 students enrolled in special education programs as observed by 16 educators in eight self-contained classrooms. Both linear and intraclass coefficients were substantial (averages = .83 and .84, respectively). No significant observer effect was found. Moreover, the LBS produced comparable levels of differential learning styles for assessments of individual children. © 1998 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: The authors evaluated the reliability of two pretreatment assessments of cigarettes smoked per day by the commonly used aggregate method to indicate that the aggregate method provides reasonably consistent data.
Abstract: The authors evaluated the reliability of two pretreatment assessments (screening and intake) of cigarettes smoked per day (CPD) by the commonly used aggregate method. The validity of the aggregate method was also determined by comparison with results of the timeline followback (TLFB) method for the identical periods. The study participants were 49 outpatients undergoing nicotine patch treatment. The reliability of the two aggregate method evaluations of CPD was quite high by Pearson product-moment correlation (r) and good when based on the intraclass correlation. Correspondence between the CPD assessments based on the aggregate and TLFB methods for the two time-points ranged from fair (screening) to good (intake). Overall, the study findings indicate that the aggregate method provides reasonably consistent data.

Journal Article
TL;DR: The reliability of most of the outcome measures was good, and was higher for those measurements evaluated by a rheumatologist and for the composite indexes.
Abstract: OBJECTIVE To determine the reliability of some commonly used outcome measures in patients with rheumatoid arthritis. METHODS We studied 22 consecutive patients with rheumatoid arthritis enrolled in a clinical trial in a tertiary care center. The study design consisted of a test-retest, in which the same rheumatologist evaluated all of the patients twice, with an interval between evaluations of 90 to 120 minutes. Statistical analysis of the data consisted of calculation of the weighted Kappa (kw) and the intraclass correlation coefficient (ICC). RESULTS For the Ritchie articular index, kappa w = 0.83, ICC = 0.49, p < 0.0001; tender joint count, kappa w = 0.82, ICC = 0.49, p < 0.0001; physician's global assessment, kappa w = 0.79, ICC = 0.48, p < 0.0001; disease activity score, kappa w = 0.79, ICC = 0.49, p < 0.0001; utilities, kappa w = 0.71, ICC = 0.48, p < 0.0001; swollen joint count, kappa w = 0.7, ICC = 0.47, p < 0.0001; patient's global assessment, kappa w = 0.58, ICC = 0.44, p < 0.0001; pain kappa w = 0.45, ICC = 0.41, p < 0.0001. CONCLUSIONS The reliability of most of the outcome measures was good. It was higher for those measurements evaluated by a rheumatologist and for the composite indexes. Those requiring patient participation need to be improved.

Journal ArticleDOI
TL;DR: The objectives of this study were to standardize measurement procedures and study the test-retest and interrater reliability of the belt-resisted method for measuring the lower extremity isometric strength of three muscle groups.
Abstract: The objectives of this study were to standardize measurement procedures and study the test-retest and interrater reliability of the belt-resisted method for measuring the lower extremity isometric strength of three muscle groups. The strength of 33 healthy, elderly, community-dwelling subjects was evaluated with a hand-held dynamometer using the belt-resisted method. Isometric strength testing of three muscle groups (hip flexors, knee extensors, and ankle dorsiflexors) was performed on two separate occasions, I week apart, by the same tester to determine test-retest reliability. The test results of two different examiners testing on different days were used to determine interrater reliability. Test-retest reliability was higher than interrater reliability. Test-retest reliability coefficients of the three muscle groups were high (J9-.95). For interrater reliability, intraclass correlation coefficients varied from .64 to .92. depending on the muscle group and side. For the two kinds of reliability, intracl...

Journal ArticleDOI
TL;DR: In this article, the internal consistency and temporal reliability of the Measure of Self-actualization of Potential (MOSOP) was investigated and the results from the initial pilot study (n = 414) used to develop the inventory were compared with those of a new sample (n= 156) designed to evaluate its stability.
Abstract: The purpose of this study was to investigate both the internal consistency and temporal reliability of the Measure of Self-actualization of Potential. Results from the initial pilot study (n = 414) used to develop the inventory are compared with those of a new sample (n = 156) designed to evaluate its stability. Responses from the new sample support our initial findings of moderate to relatively high alpha coefficients for the two main scales and five subscales. Test-retest reliability for the new set of responses indicates high stability, with intraclass correlation coefficients ranging from .74 to .88. For the over-all scale, Cronbach alpha reaches .90 and the intraclass coefficient .87. In addition to better psychometric properties, the new inventory has two other advantages over the Personal Orientation Inventory, fewer items and a self-report format.

Journal ArticleDOI
TL;DR: Results indicate that the 5-minutc walking field test is a reliable and valid method for estimating V˙O2 peak in this population of men and women with knee osteoarthritis.
Abstract: The purposes of the present study were (a) to evaluate the test-retest reliability of the Price et al. (1988) 5-min walking field test, (b) to assess the validity of the test as an estimate of aerobic fitness, and (c) to derive a predictive model for estimating V˙O2 peak. The subjects were men and women age ≥50 with knee osteoarthritis. A high intraclass correlation coefficient was obtained in the reliability study, which included 60 subjects who did the 5-min walk twice within a maximum of 11 days. For the validity study, distances walked at the first walking trial were compared with V˙O2 peak values measured by a maximal treadmill test. The best predictive model included the following predictor variables: distance walked in 5 min, age, sex, and weight. Results indicate that the 5-minutc walking field test is a reliable and valid method for estimating V˙O2 peak in this population.

Journal ArticleDOI
TL;DR: In this paper, the Pearson and intraclass correlation coefficients have been used for exercise and physical education research, and they have been shown to be useful for both exercise and education.
Abstract: This paper highlights an important statistical development for exercise and physical educa tion research. Traditionally, the Pearson and intraclass correlation coefficients have been liberally used...