scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 2004"


Journal ArticleDOI
TL;DR: A comparison of the EQ-5D and the SF-6D across seven patient/population groups (chronic obstructive airways disease, osteoarthritis, irritable bowel syndrome, lower back pain, leg ulcers, post menopausal women and elderly) shows discrepancies arise from differences in their health state classifications and the methods used to value them.
Abstract: As the number of preference-based instruments grows, it becomes increasingly important to compare different preference-based measures of health in order to inform an important debate on the choice of instrument. This paper presents a comparison of two of them, the EQ-5D and the SF-6D (recently developed from the SF-36) across seven patient/population groups (chronic obstructive airways disease, osteoarthritis, irritable bowel syndrome, lower back pain, leg ulcers, post menopausal women and elderly). The mean SF-6D index value was found to exceed the EQ-5D by 0.045 and the intraclass correlation coefficient between them was 0.51. Whilst this convergence lends some support for the validity of these measures, the modest difference at the aggregate level masks more significant differences in agreement across the patient groups and over severity of illness, with the SF-6D having a smaller range and lower variance in values. There is evidence for floor effects in the SF-6D and ceiling effects in the EQ-5D. These discrepancies arise from differences in their health state classifications and the methods used to value them. Further research is required to fully understand the respective roles of the descriptive systems and the valuation methods and to examine the implications for estimates of the impact of health care interventions.

779 citations


Journal ArticleDOI
TL;DR: Crossley et al. as discussed by the authors evaluated the reliability, validity, and responsiveness of several outcome measures in the treatment of patellofemoral pain in a randomized controlled trial (RCT).

583 citations


Journal ArticleDOI
TL;DR: The Brazilian version of the Berg balance scale is a reliable instrument to be used in balance assessment of elderly Brazilian patients and to determine the reliability of scores obtained with the Brazilian adaptation.
Abstract: The purpose of the present study was to translate and adapt the Berg balance scale, an instrument for functional balance assessment, to Brazilian-Portuguese and to determine the reliability of scores obtained with the Brazilian adaptation. Two persons proficient in English independently translated the original scale into Brazilian-Portuguese and a consensus version was generated. Two translators performed a back translation. Discrepancies were discussed and solved by a panel. Forty patients older than 65 years and 40 therapists were included in the cultural adaptation phase. If more than 15% of therapists or patients reported difficulty in understanding an item, that item was reformulated and reapplied. The final Brazilian version was then tested on 36 elderly patients (over age 65). The average age was 72 years. Reliability of the measure was assessed twice by one physical therapist (1-week interval between assessments) and once by one independent physical therapist. Descriptive analysis was used to characterize the patients. The intraclass correlation coefficient (ICC) and Pearson's correlation coefficient were computed to assess intra- and interobserver reliability. Six questions were modified during the translation stage and cultural adaptation phase. The ICC for intra- and interobserver reliability was 0.99 (P < 0.001) and 0.98 (P < 0.001), respectively. The Pearson correlation coefficient for intra- and interobserver reliability was 0.98 (P < 0.001) and 0.97 (P < 0.001), respectively. We conclude that the Brazilian version of the Berg balance scale is a reliable instrument to be used in balance assessment of elderly Brazilian patients.

515 citations


Journal ArticleDOI
TL;DR: The Functional Gait Assessment demonstrates what the authors believe is acceptable reliability, internal consistency, and concurrent validity with other balance measures used for patients with vestibular disorders.
Abstract: Background and Purpose. The Functional Gait Assessment (FGA) is a 10-item gait assessment based on the Dynamic Gait Index. The purpose of this study was to evaluate the reliability, internal consistency, and validity of data obtained with the FGA when used with people with vestibular disorders. Subjects. Seven physical therapists from various practice settings, 3 physical therapist students, and 6 patients with vestibular disorders volunteered to participate. Methods. All raters were given 10 minutes to review the instructions, the test items, and the grading criteria for the FGA. The 10 raters concurrently rated the performance of the 6 patients on the FGA. Patients completed the FGA twice, with an hour's rest between sessions. Reliability of total FGA scores was assessed using intraclass correlation coefficients (2,1). Internal consistency of the FGA was assessed using the Cronbach alpha and confirmatory factor analysis. Concurrent validity was assessed using the correlation of the FGA scores with balance and gait measurements. Results. Intraclass correlation coefficients of .86 and .74 were found for interrater and intrarater reliability of the total FGA scores. Internal consistency of the FGA scores was .79. Spearman rank order correlation coefficients of the FGA scores with balance measurements ranged from .11 to .67. Discussion and Conclusion. The FGA demonstrates what we believe is acceptable reliability, internal consistency, and concurrent validity with other balance measures used for patients with vestibular disorders.

501 citations


Journal ArticleDOI
TL;DR: The precise magnitude of between-cluster variation for a given measure can rarely be estimated in advance and studies should be designed with reference to the overall distribution of ICCs and with attention to features that increase efficiency.

461 citations


Journal Article
TL;DR: The CSA/MTI appeared to have acceptable reliability for most research applications and values with the other devices indicate some possible concerns with reliability, but additional work is needed to better understand factors contributing to variability in accelerometry data.
Abstract: Introduction Numerous studies have examined the validity of accelerometry-based activity monitors but few studies have systematically studied the reliability of different accelerometer units for assessing a standardized bout of physical activity. Improving understanding of error in these devices is an important research objective because they are increasingly being used in large surveillance studies and intervention trials that require the use of multiple units over time. Methods Four samples of college-aged participants were recruited to collect reliability data on four different accelerometer types (CSA/MTI, Biotrainer Pro, Tritrac-R3D, and Actical). The participants completed three trials of treadmill walking (3 mph) while wearing multiple units of a specific monitor type. For each trial, the participant completed a series of 5-min bouts of walking (one for each monitoring unit) with 1-min of standing rest between each bout. Generalizability (G) theory was used to quantify variance components associated with individual monitor units, trials, and subjects as well as interactions between these terms. Results The overall G coefficients range from 0.43 to 0.64 for the four monitor types. Corresponding intraclass correlation coefficients (ICC) ranged from 0.62 to 0.80. The CSA/MTI was found to have the least variability across monitor units and trials and the highest overall reliability. The Actical was found to have the poorest reliability. Conclusion The CSA/MTI appeared to have acceptable reliability for most research applications (G values above 0.60 and ICC values above 0.80), but values with the other devices indicate some possible concerns with reliability. Additional work is needed to better understand factors contributing to variability in accelerometry data and to determine appropriate calibration protocols to improve reliability of these measures for different research applications.

438 citations


Journal ArticleDOI
TL;DR: The Ottawa bowel preparation scale was developed and validated prospectively and demonstrates high interobserver agreement and reliability, whether used as a total score or for individual colon segments.

412 citations


Journal Article
TL;DR: Results suggest good construct validity, internal consistency, reliability and test-retest reliability, but do not demonstrate discriminative validity, which is consistent with theoretical models of oral disease and its consequences.
Abstract: Purpose This study measured oral health-related quality of life for children, which involved the construction of child perceptions questionnaires (CPQs) for ages 6 to 7, 8 to 10, and 11 to 14. The purpose of this study was to present the development and evaluation of the CPQ for 8- to 10-year-olds (CPQ8-10). Methods Questions (N=25) were selected from the CPQ for 11- to 14-year-olds based on the child development literature and input from parents, child psychologist, and teacher of grades 3 and 4. Validity and reliability were evaluated on 68 and 33 children, respectively. Results There was a positive moderate correlation between the CPQ8-10 score and overall well-being rating (R=.45). The level of impact was slightly higher in the orofacial than in the pediatric dentistry group (mean score=19.1 vs 18.4, respectively). Hypotheses concerning the relationship between the CPQ8-10 score and number of decayed surfaces were confirmed with R=.29, and the mean score higher in caries-afflicted than caries-free children (21.1 vs 14.7). The Cronbach's alpha and intraclass correlation coefficients were 0.89 and 0.75, respectively. Conclusions Results suggest good construct validity, internal consistency, reliability and test-retest reliability, but do not demonstrate discriminative validity. This is consistent, however, with theoretical models of oral disease and its consequences. Further research is required, as these are preliminary findings based on convenience sampling.

291 citations


Journal ArticleDOI
TL;DR: OHIP-14 appeared to be responsive to change, but the magnitude of change that it detected in the context described here was modest, probably because it was designed primarily as a discriminative measure.
Abstract: – Objectives: This paper illustrates ways of assessing the responsiveness of measures of oral health-related quality of life (OHRQoL) by examining the sensitivity of the oral health impact profile (OHIP)-14 to change when used to evaluate a dental care program for the elderly. Methods: One hundred and sixteen elderly patients attending four municipally funded dental clinics completed a copy of the OHIP-14 prior to treatment and 1 month after the completion of treatment. The post-treatment questionnaire also included a global transition judgement that assessed subjects' perceptions of change in their oral health following treatment at the clinics. Change scores were calculated by subtracting post-treatment OHIP-14 scores from pre-treatment scores. The longitudinal construct validity of these change scores were assessed by means of their association with the global transition judgements. Measures of responsiveness included effect sizes for the change scores, the minimal important difference, and Guyatt's responsiveness index. An receiver operating characteristic (ROC) curve was constructed to determine the accuracy of the change scores in predicting whether patients had improved or not as a result of the treatment. Results: Based on the global transition judgements, 60.2% of subjects reported improved oral health, 33.6% reported no change, and only 6.2% reported that it was a little worse. These changes are reflected in mean pre- and post-treatment OHIP-14 scores that declined from 15.8 to 11.5 (P < 0.001). Mean change scores showed a consistent gradient in the expected direction across categories of the global transition judgement, but differences between the groups were not significant. However, paired t-tests showed no significant differences in the pre- and post-treatment scores of stable subjects, but showed significant declines for subjects who reported improvement. Analysis of data from stable subjects indicated that OHIP-14 had excellent test–retest reliability with an intraclass correlation coefficient (ICC) of 0.84. Effect size based on change scores for all subjects and subgroups of subjects were small to moderate. The ROC analysis indicated that OHIP-14 change scores were not good ‘diagnostic tests’ of improvement. The minimal important difference for the OHIP-14 was of 5-scale points, but detecting this difference would require relatively large sample sizes. Conclusions: OHIP-14 appeared to be responsive to change. However, the magnitude of change that it detected in the context described here was modest, probably because it was designed primarily as a discriminative measure. The psychometric properties of the global transition judgements that often provide the ‘gold standard’ for responsiveness studies need to be established.

286 citations


Journal ArticleDOI
TL;DR: To determine the relationships between two tests of stepping ability and standard tests of standing balance, gait, mobility, and functional impairment in a group of at‐risk older adults.
Abstract: OBJECTIVES: To determine the relationships between two tests of stepping ability (the maximal step length (MSL) and rapid step test (RST)) and standard tests of standing balance, gait, mobility, and functional impairment in a group of at-risk older adults. DESIGN: Cross-sectional study. SETTING: University-based laboratory. PARTICIPANTS: One hundred sixty-seven mildly balance-impaired older adults recruited for a balance-training and fall-reduction program (mean age 78, range 65–90). MEASUREMENTS: Measures of stepping maximally (MSL, the ability to maximally step out and return to the initial position) and rapidly (RST, the time taken to step out and return in multiple directions as fast as possible); standard measures of balance, gait, and mobility including timed tandem stance (TS), tandem walk (TW, both timing and errors), timed unipedal stance (US), timed up and go (TUG), performance oriented mobility assessment (POMA), and 6-minute walk (SMW); measures of leg strength (peak knee and ankle torque and power at slow and fast speeds); self-report measures of frequent falls (42 per 12 months), disability (Established Population for Epidemiologic Studies of the Elderly (EPESE) physical function), and confidence to avoid falls (Activity-specific Balance Confidence (ABC) Scale). Spearman and Pearson correlation, intraclass correlation coefficient, logistic regression, and linear regression were used for data analysis. RESULTS: MSL consistently predicted a number of selfreport and performance measures at least as well as other standard balance measures. MSL correlations with EPESE physical function, ABC, TUG, and POMA scores; SMW; and peak maximum knee and ankle torque and power were at least as high as those correlations seen with TS, TW, or US. MSL score was associated with the risk of being a frequent faller. In addition, the six MSL directions were highly correlated (up to 0.96), and any one of the leg directions yielded similar relationships with functional measures and a history of falls. Relationships between RST and these measures were relatively modest. CONCLUSION: MSL is as good a predictor of mobility performance, frequent falls, self-reported function, and balance confidence as standard stance tests such as US. MSL simplified to one direction may be a useful clinical indicator of mobility, balance, and fall risk in older adults. JA m Geriatr Soc 52:1168–1173, 2004.

254 citations


Journal ArticleDOI
TL;DR: It is concluded that TSK-SV is reliable for use on patients suffering from chronic low back pain and the validity of the instrument needs to be further tested on larger populations.
Abstract: The aim of the current study was to evaluate the psychometric properties of the Swedish language version of the Tampa Scale for Kinesiophobia (TSK-SV) questionnaire. All in all, 102 patients suffering from chronic low back pain (CLBP) and 60 subjects who took part in aerobics were included in the study. The test of reliability included stability over time and internal consistency. The intraclass correlation coefficient for the total sum of the TSK-SV was 0.91. The Pearson's product-moment correlation coefficient for the total sum of the instrument was r = 0.91. Internal consistency assessed with Cronbach's alpha was 0.81 (n = 75). The test of validity included face and content validity assessed by a panel of experts, while construct validity was measured by an exploratory factor analysis and the known groups’ method. The TSK-SV was considered to have face and content validity. The factor analysis indicated a five-factor solution, although the conclusion was formulated in favor of the use of the total scor...

Journal ArticleDOI
TL;DR: The 20-item QIRC questionnaire, which quantifies the quality of life of people with refractive correction by spectacles, contact lenses, and refractive surgery in the prepresbyopic age group, was developed using Rasch analysis and shown to be valid and reliable.
Abstract: Background. The purpose of the study was to develop a questionnaire that could quantify the quality of life (QOL) of people with refractive correction by spectacles, contact lenses, and refractive surgery in the prepresbyopic age group. Methods. The questionnaire was developed and validated using traditional methods and Rasch analysis. A 90-item pilot questionnaire was developed through extensive literature search and use of professional and lay focus groups. Pilot study data were obtained from 306 subjects for item reduction to produce the 20-item Quality of Life Impact of Refractive Correction (QIRC) questionnaire. Validity and reliability studies (test-retest reliability with intraclass correlation coefficient and Bland-Altman limits of agreement, and internal consistency with Rasch fit statistics, factor analysis, and Cronbach's ) were performed from data of an additional 312 subjects. Results. Rasch analysis demonstrated QIRC has good precision, reliability, and internal consistency (person separation, 2.03; reliabil- ity, 0.80; root-mean-square measurement error, 3.25; mean square SD infit, 0.99 0.38; outfit, 1.00 0.39; item infit range, 0.70 to 1.24; and item outfit range, 0.78 to 1.32). The items (mean score, 50.3 7.3) were well targeted to the subjects (mean score, 47.8 5.5) with a mean difference of 2.45 (scale range, 0 to 100) units. Test-retest reliability (intraclass correlation coefficient, 0.88; coefficient of repeatability, 6.85 units), factor loading range (0.40 to 0.76), and Cronbach's (0.78) also indicated the reliability and validity of QIRC. Conclusions. The 20-item QIRC questionnaire, which quantifies the QOL of people with refractive correction by spectacles, contact lenses, and refractive surgery in the prepresbyopic age group, was developed using Rasch analysis and shown to be valid and reliable. The use of Rasch scaling allows scores to be treated as a valid continuous variable. QIRC has broad applicability for cross-sectional and outcomes research. (Optom Vis Sci 2004;81:769-777)

Journal ArticleDOI
TL;DR: The 2-, 6-, and 12-minute walk tests show acceptable inter- and intrarater reliability and high intertest correlations when they are used for the assessment of walking following stroke.
Abstract: This study assessed inter- and intrarater reliability and sensitivity to change of the 2-, 6-, and 12-minute walk tests following stroke. A convenience sample of patients enrolled in an inpatient stroke rehabilitation program participated in the standardization protocol. The 2-, 6-, and 12-minute walk tests were performed and inter- and intrarater reliability and responsiveness to change assessed. The interrater intraclass correlation coefficients (ICCs) for the 2-, 6-, and 12-minute walk tests were, respectively, 0.85, 0.78, and 0.68 (p < 0.0007 for each). The intrarater ICCs were 0.85, 0.74, and 0.71 (p < 0.0003 for each). Responsiveness to change as measured by standardized response mean (SRM) scores was, respectively, 1.34, 1.52, and 1.90 (F = 24.24, p < 0.001). Pearson correlations for the 2-, 6-, and 12-minute walk tests by the same rater on the same day were 2 versus 6 minutes, r = 0.997; 2 versus 12 minutes, r = 0.993; and 6 versus 12 minutes, r = 0.994 (p < 0.0001 for each). The 2-, 6-, and 12-minute walk tests show acceptable inter- and intrarater reliability and high intertest correlations when they are used for the assessment of walking following stroke. The SRM statistic indicates that the 12-minute walk test is the most responsive to change.

Journal ArticleDOI
01 Feb 2004-Stroke
TL;DR: Examination of patient-proxy agreement on the domains and summary scores of the EQ-5D and Health Utilities Index Mark 3 (HUI3) after stroke suggests that proxy assessments obtained 6 months after stroke are more reliable than those obtained within 2 to 3 weeks after stroke.
Abstract: Background and Purpose— Proxy informants can provide information on patients who are limited in ability to self-assess health-related quality of life (HRQL) after stroke. One alternative is to exclude assessments of such patients and attenuate generalizability. The purpose of this study was to examine patient-proxy agreement on the domains and summary scores of the EQ-5D and Health Utilities Index Mark 3 (HUI3) after stroke. Methods— An observational longitudinal cohort of 124 patients hospitalized after ischemic stroke and their family caregivers completed the HRQL measures at baseline and were followed up for 6 months. Patient and proxy agreement was assessed by use of weighted κ or the intraclass correlation coefficient (ICC). Results— At baseline, the more observable domains of HRQL demonstrated greater agreement than the more subjective components. Cross-sectional point estimates of agreement were generally acceptable (ICC >0.70) for the EQ-5D Index and HUI3 summary scores when assessed ≥1 month afte...

Journal ArticleDOI
TL;DR: The results of regression analysis showed that age and sex only had a statistically significant effect on κ when the (sleep) stages are considered separately, and variations of IRR most probably reflect changes of the sleep electroencephalography (EEG) with age and gender.
Abstract: Interrater variability of sleep stage scorings is a well-known phenomenon. The SIESTA project offered the opportunity to analyse interrater reliability (IRR) between experienced scorers from eight European sleep laboratories within a large sample of patients with different (sleep) disorders: depression, general anxiety disorder with and without non-organic insomnia, Parkinson's disease, period limb movements in sleep and sleep apnoea. The results were based on 196 recordings from 98 patients (73 males: 52.3 +/- 12.1 years and 25 females: 49.5 +/- 11.9 years) for which two independent expert scorings from two different laboratories were available. Cohen's kappa was used to evaluate the IRR on the basis of epochs and intraclass correlation was used to analyse the agreement on quantitative sleep parameters. The overall level of agreement when five different stages were distinguished was kappa = 0.6816 (76.8%), which in terms of kappa reflects a 'substantial' agreement (Landis and Koch, 1977). For different groups of patients kappa values varied from 0.6138 (Parkinson's disease) to 0.8176 (generalized anxiety disorder). With regard to (sleep) stages, the IRR was highest for rapid eye movement (REM), followed by Wake, slow-wave sleep (SWS), non-rapid eye movement 2 (NREM2) and NREM1. The results of regression analysis showed that age and sex only had a statistically significant effect on kappa when the (sleep) stages are considered separately. For NREM2 and SWS a statistically significant decrease of IRR with age has been observed and the IRR for SWS was lower for males than for females. These variations of IRR most probably reflect changes of the sleep electroencephalography (EEG) with age and gender.

Journal ArticleDOI
TL;DR: Kirby et al. as mentioned in this paper evaluated the measurement properties of the wheelchair skills test (WST), version 2.4, and found that the test-retest, intrarater, and interrater reliabilities were determined on a subset of 20 wheelchair users.

Journal ArticleDOI
TL;DR: The findings support the use of records of at least 5 minutes in length in epidemiological studies, in accordance with previous guidelines, and researchers using 10-second records should consider taking the mean of several recordings, when possible, or using statistical methods to correct for measurement error.

01 Jan 2004
TL;DR: The findings suggest that these simple, clinical measures of flexibility and ROM are reliable and support their use as pre-participation screening tools for sports participants.
Abstract: Objectives: Pre-season or pre-participation screening is commonly used to identify intrinsic risk factors for sports injury. Tests chosen are generally based on clinical experience due to the paucity of quality injury risk factor studies for sport and, often, the reliability of these clinical tests has not been established. The purpose of this study was to establish the reliability of eight, musculoskeletal screening tests, commonly used in the screening protocols of elite-level Australian football clubs. Methods: Fifteen participants (n ¼ 9 female, n ¼ 6 male) were tested by two raters on two occasions, 1 week apart to establish the interrater and test – retest reliability of the chosen measurement tools. The tests of interest were Sit and Reach, Active Knee Extension, Passive Straight Leg Raise, slump, active hip internal rotation range of movement (ROM), active hip external rotation ROM, lumbar spine extension ROM and the Modified Thomas Test. Results: All tests demonstrated very good to excellent (Intraclass correlation coefficient, ICC 0.88– 0.97) inter-rater reliability. Test – retest reliability was also shown to be good for these tests (ICC 0.63 – 0.99). Conclusion: The findings suggest that these simple, clinical measures of flexibility and ROM are reliable and support their use as preparticipation screening tools for sports participants. q 2004 Elsevier Ltd. All rights reserved.

Journal ArticleDOI
TL;DR: In contrast to other reliability estimates, test–retest reliability (or reproducibility) captures not only the measurement error of an assessment instrument, but also the stability of the construct measured, so ICCs with absolute agreement definition of concordance capture the degree of identity.
Abstract: In contrast to other reliability estimates, test-retest reliability (or reproducibility) captures not only the measurement error of an assessment instrument, but also the stability of the construct measured. Consequently, one would expect any departure from identity (Y = X) of measurement pairs (X first, and Y second measurement) to be treated as 'error' by the respective reproducibility statistic, even if 'true' changes happened, e.g. worsening of a disease due to its natural course. The Pearson correlation, still often advocated for continuous measures in test-retest reliability studies, however captures the degree of linearity (Y = bX + a): perfect relationship can be computed, even if the measurement pairs differ not only by a additive constant 'a', but also because of a multiplication of the X-values with the slope 'b'. Therefore, intraclass correlation coefficients (ICCs) have been proposed as alternative statistics for reproducibility. However, only ICCs with absolute agreement definition of concordance capture the degree of identity. ICCs with a consistency definition of concordance measure the degree of additivity (Y = X + a). ICCs are calculated from repeated measures analyses of variance (ANOVAs), and a common population variance must be is assumed for the different measurements. Given this assumption, an ICC computed from a one-way ANOVA seems to be the best choice for this purpose. Otherwise, Lin's concordance correlation coefficient is recommended as identity measure.

Journal ArticleDOI
TL;DR: The reliability and validity of the Swedish version of the Oral Health Impact Profile (OHIP‐S) is excellent and can be recommended for assessing the impact of oral health on masticatory ability and psychosocial function.
Abstract: The aim of this study was to translate the Oral Health Impact Profile (OHIP) into Swedish and evaluate the reliability and validity of the Swedish version (OHIP-S). The OHIP is a 49-item, self-administered questionnaire divided into 7 different subscales. The original version in English was translated into Swedish, accompanied by back-translation into English, after which the Swedish version was revised. A total of 145 consecutive patients participated and answered a questionnaire. The patients comprised five clinically separate groups: temporomandibular dysfunction (TMD) (n = 30), Primary Sjogren's Syndrome (SS) (n = 30), burning sensation and pain in the oral mucosa (oral mucosal pain, OMP) (n = 28), skeletal malocclusion (malocclusion) (n = 27), and healthy dental recall patients (controls) (n = 30). The TMD group and the control group participated in a test-retest procedure. The internal reliability of each subscale was calculated with Cronbach's alpha and found to be high and to range from 0.83-0.91. The stability (test-retest) of the instrument, calculated using the intraclass correlation coefficient, ranged from 0.87 to 0.98. The construct validity of OHIP-S was compared with subscales of the Symptom Check List (SCL-90) (rho 0.65) and the Jaw Function Limitation Scale (FLS) (rho 0.76) and analyzed with Spearman's correlation coefficient. Convergent validity was evaluated by comparing OHIP with self-reported health using Spearman's correlation coefficient and was found to be acceptable (rho 0.61). In the evaluation of the discriminative ability of the instrument, significant differences were found in the total OHIP-S score between the controls and the other four groups (P < 0.001). We conclude that the reliability and validity of OHIP-S is excellent. The instrument can be recommended for assessing the impact of oral health on masticatory ability and psychosocial function.

Journal ArticleDOI
TL;DR: Self-reported and measured weight and height information had good agreement and validity and in similar populations, when few resources are available, it is possible to use self-reported data instead of actual measurements.
Abstract: OBJECTIVE: Evaluate the validity of self-reported weight and height and the body mass index (BMI). METHODS: A study was made of 3,713 employees of a public university in Rio de Janeiro, in which they were participants in Phase 1 of a longitudinal study. Information was obtained through a self-administered questionnaire, and measurements were carried out after its application. Student's paired t-test, Bland & Altman's graphs and the intraclass correlation coefficient (ICC) were utilized to evaluate the differences between the measured and the reported parameters. The sensitivity and specificity of the various BMI categories were estimated. RESULTS: There was high agreement between the measured and reported weights (ICC=0.977) and heights (ICC=0.943). The BMI sensitivity, in its various categories, was around 80%, and the specificity was close to 92%. There was a slight and uniform tendency toward self-reported weight underestimation and self-reported height overestimation in both sexes. CONCLUSIONS: Self-reported and measured weight and height information had good agreement and validity. In similar populations, when few resources are available, it is possible to use self-reported data instead of actual measurements.

Journal ArticleDOI
TL;DR: The Veterans Affairs Low-Vision Visual Functioning Questionnaire is valid and reliable and has the range and precision necessary to measure visual ability of low-vision patients with moderate to severe vision loss across diverse clinical settings.
Abstract: PURPOSE. To describe psychometric properties of a self-report questionnaire, the Veterans Affairs (VA) Low-Vision Visual Functioning Questionnaire (LV VFQ-48), which was designed to measure the difficulty visually impaired persons have performing daily activities and to evaluate low-vision outcomes. METHODS. The VA LV VFQ-48 was administered by telephone interview to subjects with visual acuity ranging from near normal to total blindness at five sites in the VA and private sector. Rasch analysis with the Andrich rating scale model was applied to difficulty ratings from 367 subjects, to evaluate measurement properties of the instrument. RESULTS. High intercenter correlations for item measure estimates (intraclass correlation coefficient [ICC] 0.97) justified pooling the data from these sites. The person measure fit statistics (mean square residuals) confirm that the data fit the assumptions of the model. The item measure fit statistics indicate that responses to 19% of the items were confounded by factors other than visual ability. The separation reliabilities for pooled data (0.94 for persons and 0.98 for items) demonstrate that the estimated measures discriminate persons and items well along the visual ability dimension. ICCs for test‐retest data (0.98 for items and 0.84 for persons) confirm temporal stability. Subjects used the rating categories in the same way at all five centers. Ratings of slight and moderate difficulty were used interchangeably, suggesting that the instrument could be modified to a 4-point scale including not difficult, slightly/moderately difficult, extremely difficult, and impossible. Fifty additional subjects were administered the questionnaire with a 4-point scale to confirm that the scale was used in the same way when there were four rather than five difficulty ratings. CONCLUSIONS. The VA LV VFQ-48 is valid and reliable and has the range and precision necessary to measure visual ability of low-vision patients with moderate to severe vision loss across diverse clinical settings. (Invest Ophthalmol Vis Sci. 2004;45: 3919‐3928) DOI:10.1167/iovs.04-0208

Journal ArticleDOI
TL;DR: In this article, the authors present a formula for weighted kappa in terms of rater means, rater variances, and the rater covariance that is particularly helpful in emphasizing the sensitivity of kappa to differences in rater's marginal distributions.
Abstract: This article presents a formula for weighted kappa in terms of rater means, rater variances, and the rater covariance that is particularly helpful in emphasizing that weighted kappa is an absolute agreement measure in the sense that it is sensitive to differences in rater’s marginal distributions. Specifically, rater mean differences will decrease the value of weighted kappa relative to the value of the intraclass correlation that ignores mean differences. In addition, if rater variances also differ, then the value of weighted kappa will be decreased relative to the value of the product-moment correlation. Equality constraints on the rater means and variances are given to illustrate the relationships between weighted kappa, the intraclass correlation, and the product-moment correlation. In addition, the expression for weighted kappa shows that weighted kappa belongs to the Zegers-ten Berge family of chance-corrected association coefficients. More specifically, weighted kappa is equivalent to the chance-co...

Journal ArticleDOI
TL;DR: The poor repeatability of postures documented using the studied method brings into question the validity of this postural analysis approach for either diagnostic use or tracking changes in response to treatment.

Journal ArticleDOI
TL;DR: Simulation results indicate that confidence intervals based on the estimator proposed by Fleiss and Cuzick provide coverage levels close to nominal over a wide range of parameter combinations.
Abstract: We obtain closed-form asymptotic variance formulae for three point estimators of the intraclass correlation coefficient that may be applied to binary outcome data arising in clusters of variable size. Our results include as special cases those that have previously appeared in the literature (Fleiss and Cuzick, 1979, Applied Psychological Measurement 3, 537-542; Bloch and Kraemer, 1989, Biometrics 45, 269-287; Altaye, Donner, and Klar, 2001, Biometrics 57, 584-588). Simulation results indicate that confidence intervals based on the estimator proposed by Fleiss and Cuzick provide coverage levels close to nominal over a wide range of parameter combinations. Two examples are presented.

Journal ArticleDOI
TL;DR: Taylor et al. as discussed by the authors evaluated the test-retest reliability of measuring lower-limb strength with a hand-held dynamometer in young people with cerebral palsy (CP) on two occasions separated by 6 weeks.

Journal ArticleDOI
TL;DR: These findings suggest that epidemiology and non-epidemiology-trained reviewers can apply the levels-of-evidence guide to published studies with acceptable interobserver agreement.
Abstract: Background: Since January 2003, all clinical scientific articles published in the American volume of The Journal of Bone and Joint Surgery (JBJS-A) have included a level-of-evidence rating. The aim of the current study was to evaluate the interobserver agreement among reviewers, with varying levels of epidemiology training, in categorizing the levels of evidence of these clinical studies. Methods: Fifty-one consecutive clinical papers published in the American volume of JBJS were identified by a computerized search of the table of contents from January 2003 through June 2003. Each paper was blinded so that only the title, abstract (without the level of evidence designated), and methods section were provided to the reviewers. The papers were coded and were randomly organized in a binder. Six surgeons graded each blinded paper for (1) the type of study (therapeutic, prognostic, diagnostic test, or economic or decision analysis), (2) the level of evidence (on a scale of I through V), and (3) the subcategory within the particular level of evidence. Three surgeons were members of JBJS American Editorial Board, two surgeons were reviewers for JBJS-A, and one surgeon was an active researcher not formally associated with JBJS-A. The reviewers did not receive any formal training in the application of the classification system, but each was provided with a detailed description of the classification system used by JBJS-A. Intraclass correlation coefficients with 95% confidence intervals were determined for the reviewers' agreement regarding the type of study, level of evidence, and subcategory within the level of evidence. Results: The majority (69%) of the fifty-one included articles were studies of therapy, and 57% of the studies constituted Level-IV evidence. The intraclass correlation coefficients for the agreement among all reviewers with regard to the study type, level of evidence, and subcategory within the level of evidence ranged from 0.61 to 0.75. Reviewers trained in epidemiology demonstrated greater agreement (range in intraclass correlation coefficients, 0.99 to 1.0), across all aspects of the classification system, than did reviewers who were not trained in epidemiology (range in intraclass correlation coefficients, 0.60 to 0.75). Conclusions: These findings suggest that epidemiology and non-epidemiology-trained reviewers can apply the levels-of-evidence guide to published studies with acceptable interobserver agreement. The validity of this system remains a question for future research.

Journal ArticleDOI
TL;DR: Initial evaluation of the P4 suggests that it is more reliable and sensitive to change than the NPRS, and is more adept at assessing change in pain intensity than popular versions of single-item NPRSs.
Abstract: Study Design Prospective observation study. Objectives To compare the test-retest reliability and longitudinal validity (sensitivity to change) of 2 single-item numeric pain rating scales (NPRSs) with a 4-item pain intensity measure (P4). Background Pain is a frequent outcome measure for patients seen in physical therapy; however, the error associated with efficient pain measures, such as the single-item NPRS, is greater than for self-report measures of functional status. Initial evaluation of the P4 suggests that it is more reliable and sensitive to change than the NPRS. Methods and Measures Two single-item NPRSs and the P4 were administered on 3 occasions—initial visit (n = 220), within 72 hours of baseline (n = 213), and 12 days following baseline assessment (n = 183)—to patients with musculoskeletal problems receiving physical therapy. Reliability was assessed using a type 2,1 intraclass correlation coefficient. Longitudinal validity was assessed by correlating the measures' change scores with a retro...

Journal ArticleDOI
01 Aug 2004-Spine
TL;DR: The SRS-22 Patient Questionnaire has proven to be a valid instrument for clinical assessment of patients with idiopathic scoliosis and has been translated and culturally adapted to Spanish.
Abstract: STUDY DESIGN Validation of the transcultural adaptation of a questionnaire for measuring health-related quality of life. OBJECTIVES To translate and culturally adapt the SRS-22 questionnaire to Spanish. To determine the metric qualities (internal consistency and test-retest reproducibility) of this questionnaire. SUMMARY OF BACKGROUND DATA The SRS-22 Patient Questionnaire has proven to be a valid instrument for clinical assessment of patients with idiopathic scoliosis. The widespread use of the SRS-22 in non-English-speaking countries requires its transcultural adaptation. METHODS Transcultural adaptation of the SRS-22 was carried out according to the International Quality of Life Assessment Project guidelines and included two translations and two back-translations of the material. A committee of experts decided on the final version. The questionnaire was administered to 175 individuals (152 women and 23 men) with idiopathic scoliosis. The mean age of the participants at the time they received the questionnaire was 18.9 years, thoracic curve magnitude was 28.8 degrees, and lumbar curve magnitude was 28.1 degrees. At this time, 85 patients had been treated surgically, 45 had been treated with orthesis, and 45 were under observation. A subgroup of 30 patients completed the questionnaire a second time 1 week later. Internal consistency was determined with Cronbach's alpha coefficient and test-retest reliability with the intraclass correlation coefficient. RESULTS The overall alpha coefficient of the questionnaire was 0.89. Coefficients for individual domains were as follows: function/activity, 0.67; pain, 0.81; mental health, 0.83; self-image, 0.73; and satisfaction, 0.78. The questionnaire as a whole had an intraclass correlation coefficient of 0.96. Intraclass correlation coefficients for individual domains were as follows: pain, 0.93; function, 0.82; self-image, 0.94; mental health, 0.94; and satisfaction, 0.98. CONCLUSIONS The Spanish version of the SRS-22 Patient Questionnaire demonstrated adequate internal consistency for the majority of domains and excellent reproducibility. These results suggest that the process of adaptation has produced an instrument that is apparently equivalent to the original and suitable for clinical research.

Journal ArticleDOI
TL;DR: The WHOQOL-BREF has adequate psychometric properties in people with rheumatoid arthritis and should be considered a valid outcome measure for interventions that aim to improve quality of life for people with this disease.
Abstract: OBJECTIVE: To assess the psychometric properties, including responsiveness, of the World Health Organization Quality of Life instrument, short form (WHOQOL-BREF) in people with rheumatoid arthritis. METHODS: A sample of 142 persons with rheumatoid arthritis were randomly selected from a regional disease register and completed questionnaires by postal survey. An additional sample of 72 consecutive inpatients completed questionnaires a few days prior to admission, the day of admission, the day of discharge, and 2 weeks following discharge. RESULTS: Test-retest reliability was adequate (intraclass correlation coefficient 0.71-0.91). Internal consistency was adequate except for the social relationships domain (Cronbach's alpha 0.64-0.87). Factor structure was fairly similar to that previously reported. Correlation with other measures of quality of life was supportive of concurrent validity. Indices of responsiveness were satisfactory except for the social relationships and environment domains, although there was actually no statistical difference in the area under a receiver operating characteristic plot between the WHOQOL-BREF domains and the Health Assessment Questionnaire. CONCLUSION: The WHOQOL-BREF has adequate psychometric properties in people with rheumatoid arthritis and should be considered a valid outcome measure for interventions that aim to improve quality of life for people with this disease.