scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 2009"



Book ChapterDOI
01 Jan 2009

1,054 citations


Journal ArticleDOI
TL;DR: The psychometric properties of the ASES, DASH, and SPADI have been shown to be acceptable for clinical use, and some properties ofThe SST still need be evaluated, particularly the absolute errors of measurement.
Abstract: Objective To conduct a systematic review of the quality and content of the psychometric evidence relating to 4 shoulder disability scales: the Disabilities of the Arm, Shoulder, and Hand (DASH) questionnaire, the Shoulder Pain and Disability Index (SPADI), the American Shoulder and Elbow Surgeons (ASES) score, and the Simple Shoulder Test (SST). Methods We conducted a structured search using 3 databases (Medline, CINAHL, EMBase). In total, 71 published primary studies were analyzed. A pair of raters conducted data extraction and critical appraisal using structured tools. A descriptive synthesis was performed. Results Quality ratings of 55% of the studies reviewed reached a level of ≥75%. Most studies suggest that all 4 questionnaires have excellent reliability (intraclass correlation coefficient ≥0.90). The 4 questionnaires are strongly correlated (r >0.70) with each other and with a number of similar indices, and the questionnaires were able to differentiate between different populations and disability levels. The minimal detectable change (MDC) is ∼9.4 for the ASES, 10.5 for the DASH, and 18 for the SPADI; the minimal clinically important difference (MCID) is ∼6.4 for the ASES and 10.2 for the DASH, and ranges between 8 and 13 for the SPADI. MDC and MCID have not been defined for the SST. Conclusion The psychometric properties of the ASES, DASH, and SPADI have been shown to be acceptable for clinical use. Conversely, some properties of the SST still need be evaluated, particularly the absolute errors of measurement. Overall, validation studies have focused on less clinically relevant properties (construct validity or group reliability) than estimates of MDC and MCID.

618 citations


Journal ArticleDOI
TL;DR: Test-retest reliability values for the TUG, the 6MWT, and gait speed were high for all participants together and for the mild to moderate AD and moderately severe to severe AD groups separately; however, individual variability of performance also was high.
Abstract: Background: With the increasing incidence of Alzheimer disease (AD), determining the validity and reliability of outcome measures for people with this disease is necessary. Objective: The goals of this study were to assess test-retest reliability of data for the Timed “Up & Go” Test (TUG), the Six-Minute Walk Test (6MWT), and gait speed and to calculate minimal detectable change (MDC) scores for each outcome measure. Performance differences between groups with mild to moderate AD and moderately severe to severe AD (as determined by the Functional Assessment Staging [FAST] scale) were studied. Design: This was a prospective, nonexperimental, descriptive methodological study. Methods: Background data collected for 51 people with AD included: use of an assistive device, Mini-Mental Status Examination scores, and FAST scale scores. Each participant engaged in 2 test sessions, separated by a 30- to 60-minute rest period, which included 2 TUG trials, 1 6MWT trial, and 2 gait speed trials using a computerized gait assessment system. A specific cuing protocol was followed to achieve optimal performance during test sessions. Results: Test-retest reliability values for the TUG, the 6MWT, and gait speed were high for all participants together and for the mild to moderate AD and moderately severe to severe AD groups separately (intraclass correlation coefficients ≥.973); however, individual variability of performance also was high. Calculated MDC scores at the 90% confidence interval were: TUG=4.09 seconds, 6MWT=33.5 m (110 ft), and gait speed=9.4 cm/s. The 2 groups were significantly different in performance of clinical tests, with the participants who were more cognitively impaired being more physically and functionally impaired. Limitations: A single researcher for data collection limited sample numbers and prohibited blinding to dementia level. Conclusions: The TUG, the 6MWT, and gait speed are reliable outcome measures for use with people with AD, recognizing that individual variability of performance is high. Minimal detectable change scores at the 90% confidence interval can be used to assess change in performance over time and the impact of treatment.

384 citations


Journal ArticleDOI
TL;DR: It appears that 1RM-testing protocols that include one familiarisation session and one testing session are sufficient for assessing maximal strength in untrained middle-aged individuals.

263 citations


Journal ArticleDOI
TL;DR: Absolute PAEE and MVPA estimated from these self-reports were not valid on an individual level in young people, although some questionnaires appeared to rank individuals accurately.

245 citations


Journal ArticleDOI
TL;DR: All 4 clinical measures for assessing upper-extremity motor function in people with stroke showed sufficient validity, responsiveness, and reliability in participants with stroke, supporting their utility in clinical settings.
Abstract: Background: Functional limitation of the upper extremities is common in patients with stroke. An upper-extremity measure with sound psychometric properties is indispensable for clinical and research use. Objective: The purpose of this study was to compare the psychometric properties of 4 clinical measures for assessing upper-extremity motor function in people with stroke: the upper-extremity subscale of the Fugl-Meyer Motor Test (UE-FM), the upper-extremity subscale of the Stroke Rehabilitation Assessment of Movement, the Action Research Arm Test (ARAT), and the Wolf Motor Function Test. Design: This was a prospective, longitudinal study. Methods: Fifty-three people with stroke were evaluated with the 4 measures at 4 time points (14, 30, 90, and 180 days after stroke). Thirty-five participants completed all of the assessments. The ceiling and floor effects, validity (concurrent validity and predictive validity), and responsiveness of each measure were examined. Interrater reliability and test-retest reliability also were examined. Results: All measures, except for the UE-FM, had significant floor effects or ceiling effects at one or more time points. The Spearman ρ correlation coefficient for each pair of the 4 measures was ≥.81, indicating high concurrent validity. The predictive validity of the 4 measures was satisfactory (Spearman ρ, ≥.51). The responsiveness of the 4 measures at 14 to 180 days after stroke was moderate (.52 ≤ effect size ≤ .79). The 4 measures had good interrater reliability (intraclass correlation coefficient [ICC], ≥.92) and test-retest reliability (ICC, ≥.97). Only the minimal detectable changes of the UE-FM (8% of the highest possible score) and the ARAT (6%) were satisfactory. Limitations: The sample size was too small to conduct data analysis according to type or severity of stroke. In addition, the timed component of the Wolf Motor Function Test was not used in this study. Conclusions: All 4 measures showed sufficient validity, responsiveness, and reliability in participants with stroke. The UE-FM for assessing impairment and the ARAT for assessing disability had satisfactory minimal detectable changes, supporting their utility in clinical settings.

219 citations


Journal ArticleDOI
TL;DR: The IVRS-administered TSQM-9 was found to be a reliable and valid measure to assess treatment satisfaction in naturalistic study designs, in which there is potential that the administration of the side effects domain of the TSQm would interfere with routine clinical care.
Abstract: The 14-item Treatment Satisfaction Questionnaire for Medication (TSQM) Version 1.4 is a reliable and valid instrument to assess patients' satisfaction with medication, providing scores on four scales – side effects, effectiveness, convenience and global satisfaction. In naturalistic studies, administering the TSQM with the side effects domain could provoke the physician to assess the presence or absence of adverse events in a way that is clinically atypical, carrying the potential to interfere with routine medical care. As a result, an abbreviated 9-item TSQM (TSQM-9), derived from the TSQM Version 1.4 but without the five items of the side effects domain was created. In this study, an interactive voice response system (IVRS)-administered TSQM-9 was psychometrically evaluated among patients taking antihypertensive medication. A total of 3,387 subjects were invited to participate in the study from an online panel who self-reported taking a prescribed antihypertensive medication. The subjects were asked to complete the IVRS-administered TSQM-9 at the start of the study, along with the modified Morisky scale, and again within 7 to 14 days. Standard psychometric analyses were conducted; including Cronbach's alpha, intraclass correlation coefficients, structural equation modeling, Spearman correlation coefficients and analysis of covariance (ANCOVA). A total of 396 subjects completed all the study procedures. Approximately 50% subjects were male with a good racial/ethnic mix: 58.3% white, 18.9% black, 17.7% Hispanic and 5.1% either Asian or other. There was evidence of construct validity of the TSQM-9 based on the structural equation modeling findings of the observed data fitting the Decisional Balance Model of Treatment Satisfaction even without the side effects domain. TSQM-9 domains had high internal consistency as evident from Cronbach's alpha values of 0.84 and greater. TSQM-9 domains also demonstrated good test-retest reliability with high intraclass correlation coefficients exceeding 0.70. As expected, the TSQM-9 domains were able to differentiate between individuals who were low, medium and high compliers of medication, with moderate to high effect sizes. There was evidence of convergent validity with significant correlations with the medication adherence scale. The IVRS-administered TSQM-9 was found to be a reliable and valid measure to assess treatment satisfaction in naturalistic study designs, in which there is potential that the administration of the side effects domain of the TSQM would interfere with routine clinical care.

202 citations



Journal ArticleDOI
TL;DR: A brief ataxia rating scale (BARS) for use by movement disorder specialists and general neurologists is developed, valid, reliable, and sufficiently fast and accurate for clinical purposes.
Abstract: To develop a brief ataxia rating scale (BARS) for use by movement disorder specialists and general neurologists. Current ataxia rating scales are cumbersome and not designed for clinical practice. We first modified the International Cooperative Ataxia Rating Scale (ICARS) by adding seven ataxia tests (modified ICARS, or MICARS), and observed only minimally increased scores. We then used the statistics package R to find a five-test subset in MICARS that would correlate best with the total MICARS score. This was accomplished first without constraints and then with the clinical constraint requiring one test each of Gait, Kinetic Function-Arm, Kinetic Function-Leg, Speech, and Eye Movements. We validated these clinical constraints by factor analysis. We then validated the results in a second cohort of patients; evaluated inter-rater reliability in a third cohort; and used the same data set to compare BARS with the Scale for the Assessment and Rating of Ataxia (SARA). Correlation of ICARS with the seven additional tests that when added to ICARS form MICARS was 0.88. There were 31,481 five-test subtests (48% of possible combinations) that had a correlation with total MICARS score of ≥0.90. The strongest correlation of an unconstrained five-test subset was 0.963. The clinically constrained subtest validated by factor analysis, BARS, had a correlation with MICARS-minus-BARS of 0.952. Cronbach alpha for BARS and SARA was 0.90 and 0.92 respectively; and inter-rater reliability (intraclass correlation coefficient) was 0.91 and 0.93 respectively. BARS is valid, reliable, and sufficiently fast and accurate for clinical purposes. © 2009 Movement Disorder Society

195 citations


Book
21 Nov 2009
TL;DR: In this paper, the authors present a folding table of curves for the Pearson's system of frequency-curves and compare it with various systems of curves, including the Folding Table of Curves (FTOC).
Abstract: 1. Introductory 2. Frequency distributions 3. Method of moments 4. Pearson's system of frequency-curves 5. Calculation 6. Comparison of various systems of curves 7. Correlation 8. Theoretical distributions. Spurious correlation 9. Correlation of characters not quantitatively measurable 10. Standard errors 11. The test of goodness of fit 12. The correlation ration - contingency 13. Partial correlation Appendices Index Folding table of curves.

Journal ArticleDOI
TL;DR: Assessment of the reliability of a mobile contact mat in measuring a range of stretch–shortening cycle parameters in young adolescents found it to offer a replicable assessment method for use with paediatric populations.
Abstract: The aim of the study was to assess the reliability of a mobile contact mat in measuring a range of stretch–shortening cycle parameters in young adolescents. Additionally, vertical leg stiffness using contact mat data was validated against a criterion method using force–time data. The reliability study involved 18 youths completing a habituation and three separate test sessions, while 20 youths completed a single test session for the validity study. Participants completed three trials of a squat jump, countermovement jump, and maximal hopping test and a single trial of repeated sub-maximal hopping at 2.0 Hz and 2.5 Hz. All tests were performed on the contact mat. Reliability statistics included repeated-measures analysis of variance, intraclass correlation coefficient, and coefficient of variation (CV), while the correlation coefficient (r) and typical error of estimate (TEE) were reported for the validity study. Squat jump height was the most reliable measure (CV = 8.64%), while leg stiffness dur...

Journal ArticleDOI
TL;DR: The development, utility, validation, and interrater reliability of SCALE are described, which is a clinical tool developed to quantify SVMC in patients with cerebral palsy.
Abstract: Normal selective voluntary motor control (SVMC) can be defined as the ability to perform isolated joint movement without using mass flexor/extensor patterns or undesired movement at other joints, such as mirroring. SVMC is an important determinant of function, yet a valid, reliable assessment tool is lacking. The Selective Control Assessment of the Lower Extremity (SCALE) is a clinical tool developed to quantify SVMC in patients with cerebral palsy (CP). This paper describes the development, utility, validation, and interrater reliability of SCALE. Content validity was based on review by 14 experienced clinicians. Mean agreement was 91.9% (range 71.4-100%) for statements about content, administration, and grading. SCALE scores were compared with Gross Motor Function Classification System Expanded and Revised (GMFCS-ER) levels for 51 participants with spastic diplegic, hemiplegic, and quadriplegic CP (GMFCS levels I - IV, 21 males, 30 females; mean age 11y 11mo [SD 4y 9mo]; range 5-23y). Construct validity was supported by significant inverse correlation (Spearman's r=-0.83, p<0.001) between SCALE scores and GMFCS levels. Six clinicians rated 20 participants with spastic CP (seven males, 13 females, mean age 12y 3mo [SD 5y 5mo], range 7-23y) using SCALE. A high level of interrater reliability was demonstrated by intraclass correlation coefficients ranging from 0.88 to 0.91 (p<0.001).

Journal ArticleDOI
TL;DR: This is the first study that has addressed and demonstrated responsiveness to important change of the DHI, and provided values of SDD and MIC to help interpret change scores.
Abstract: The impact of dizziness on quality of life is often assessed by the Dizziness Handicap Inventory (DHI), which is used as a discriminate and evaluative measure. The aim of the present study was to examine reliability and validity of a translated Norwegian version (DHI-N), also examining responsiveness to important change in the construct being measured. Two samples (n = 92 and n = 27) included participants with dizziness of mainly vestibular origin. A cross-sectional design was used to examine the factor structure (exploratory factor analysis), internal consistency (Cronbach's α), concurrent validity (Pearson's product moment correlation r), and discriminate ability (ROC curve analysis). Longitudinal designs were used to examine test-retest reliability (intraclass correlation coefficient (ICC) statistics, smallest detectable difference (SDD)), and responsiveness (Pearson's product moment correlation, ROC curve analysis; area under the ROC curve (AUC), and minimally important change (MIC)). The DHI scores range from 0 to 100. Factor analysis revealed a different factor structure than the original DHI, resulting in dismissal of subscale scores in the DHI-N. Acceptable internal consistency was found for the total scale (α = 0.95). Concurrent correlations between the DHI-N and other related measures were moderate to high, highest with Vertigo Symptom Scale-short form-Norwegian version (r = 0.69), and lowest with preferred gait (r = - 0.36). The DHI-N demonstrated excellent ability to discriminate between participants with and without 'disability', AUC being 0.89 and best cut-off point = 29 points. Satisfactory test-retest reliability was demonstrated, and the change for an individual should be ≥ 20 DHI-N points to exceed measurement error (SDD). Correlations between change scores of DHI-N and other self-report measures of functional health and symptoms were high (r = 0.50 - 0.57). Responsiveness of the DHI-N was excellent, AUC = 0.83, discriminating between self-perceived 'improved' versus 'unchanged' participants. The MIC was identified as 11 DHI-N points. The DHI-N total scale demonstrated satisfactory measurement properties. This is the first study that has addressed and demonstrated responsiveness to important change of the DHI, and provided values of SDD and MIC to help interpret change scores.

Journal ArticleDOI
TL;DR: These results provide evidence of the reproducibility, convergent validity, and responsiveness to treatment of the Sleep Quality Scale and provide a foundation for its further use and evaluation in FM patients.
Abstract: Sleep disturbances are a common and bothersome symptom of fibromyalgia (FM). This study reports psychometric properties of a single-item scale to assess sleep quality among individuals with FM. Analyses were based on data from two randomized, double-blind, placebo-controlled trials of pregabalin (studies 1056 and 1077). In a daily diary, patients reported the quality of their sleep on a numeric rating scale ranging from 0 ("best possible sleep") to 10 ("worst possible sleep"). Test re-test reliability of the Sleep Quality Scale was evaluated by computing intraclass correlation coefficients. Pearson correlation coefficients were computed between baseline Sleep Quality scores and baseline pain diary and Medical Outcomes Study (MOS) Sleep scores. Responsiveness to treatment was evaluated by standardized effect sizes computed as the difference between least squares mean changes in Sleep Quality scores in the pregabalin and placebo groups divided by the standard deviation of Sleep Quality scores across all patients at baseline. Studies 1056 and 1077 included 748 and 745 patients, respectively. Most patients were female (study 1056: 94.4%; study 1077: 94.5%) and white (study 1056: 90.2%; study 1077: 91.0%). Mean ages were 48.8 years (study 1056) and 50.1 years (study 1077). Test re-test reliability coefficients of the Sleep Quality Scale were 0.91 and 0.90 in the 1056 and 1077 studies, respectively. Pearson correlation coefficients between baseline Sleep Quality scores and baseline pain diary scores were 0.64 (p < 0.001) and 0.58 (p < 0.001) in the 1056 and 1077 studies, respectively. Correlations between the Sleep Quality Scale and the MOS Sleep subscales were statistically significant (p < 0.01), except for the MOS Snoring subscale. Across both studies, standardized effect sizes were generally moderate (0.46 to 0.52) for the 300 mg group and moderate (0.59) or moderate-to-large (0.70) for the 450 mg group. In study 1056, the effect size for the 600 mg group was moderate-to-large (0.73). In study 1077, the effect size for the 600 mg group was large (0.82). These results provide evidence of the reproducibility, convergent validity, and responsiveness to treatment of the Sleep Quality Scale and provide a foundation for its further use and evaluation in FM patients.

Journal ArticleDOI
TL;DR: Measurements of ankle dorsiflexion in a weightbearing position with the knee extended can be performed reliably by experienced and inexperienced raters, however, the reliability of this measurement technique needs to be interpreted in the context of the purpose for which the measurement is intended.

Journal ArticleDOI
TL;DR: Taking account of patient's perceptions of the severity of their own symptoms along with the psychometric properties of the MADRS-S enable its use for evaluative purposes in the development of new antidepressant drugs.
Abstract: The use of Patient-reported Outcomes (PROs) as secondary endpoints in the development of new antidepressants has grown in recent years. The objective of this study was to assess the psychometric properties of the 9-item, patient-administered version of the Montgomery-Asberg Depression Rating Scale (MADRS-S). Data from a multicentre, double-blind, 8-week, randomised controlled trial of 278 outpatients diagnosed with Major Depressive Disorder were used to evaluate the validity, reliability and sensitivity to change of the MADRS-S using psychometric methods. A Receiver Operating Characteristic (ROC) curve was plotted to identify the most appropriate threshold to define perceived remission. No missing values were found at the item level, indicating good acceptability of the scale. The construct validity was satisfactory: all items contributed to a common underlying concept, as expected. The correlation between MADRS-S and physicians' MADRS was moderate (r = 0.54, p < 0.001) indicating that MADRS-S is complementary rather than redundant to the MADRS. Cronbach's alpha was 0.84, and the stability over time of the scale, estimated on a sub-sample of patients whose health status did not change during the first week of the study, was good (intraclass correlation coefficient of 0.78). MADRS-S sensitivity to change was shown. Using a threshold value of 5, the definition of "perceived remission" reached a sensitivity of 82% and a specificity of 75%. Taking account of patient's perceptions of the severity of their own symptoms along with the psychometric properties of the MADRS-S enable its use for evaluative purposes in the development of new antidepressant drugs.

Journal ArticleDOI
TL;DR: Water displacement and ankle circumference showed excellent reliability; however, water displacement is a time-consuming measure and may pose implementation challenges in the clinical and clinical trial environments.
Abstract: Objective: To evaluate methods to assess peripheral edema for reliability, feasibility and correlation with the classic clinical assessment of pitting edema. Design: Cross-sectional observational study. Setting: Large primary care clinic in Marshfield, Wisconsin, USA. Participants: Convenience sample of 20 patients with type 2 diabetes and a range of edema severity, including patients without edema. Methods: Eight methods of edema assessment were evaluated: (1) clinical assessment of pit depth and recovery at three locations, (2) patient questionnaire, (3) ankle circumference, (4) figure-of-eight (ankle circumference using eight ankle/foot landmarks), (5) edema tester (plastic card with holes of varying size pressed to the ankle with a blood pressure cuff), (6) modified edema tester (edema tester with bumps), (7) indirect leg volume (by series of ankle/leg circumferences), and (8) foot/ankle volumetry by water displacement. Patients were evaluated independently by three nurse examiners. Results: Water displacement and ankle circumference had high inter-examiner agreement (intraclass correlation coefficient 0.93, 0.96 right; 0.97, 0.97 left). Agreement was inconsistent for figure-of-eight (0.64, 0.86), moderate for indirect leg volume (0.53, 0.66), and low for clinical assessments at all locations. Agreement was low for the edema testers but varied by the pressure administered. Correlation with the classic, subjective clinical assessment was good for the nurse-performed assessments and patient questionnaire. Ankle circumference and patient questionnaires each took 1 minute to complete. Other tools took >5 minutes to complete. Conclusions: Water displacement and ankle circumference showed excellent reliability; however, water displacement is a time-consuming measure and may pose implementation challenges in the clinical and clinical trial environments. Patient-reported level and frequency of edema, based on an unvalidated questionnaire, was generally well correlated with the physician assessment of edema severity and may prove to be another reliable and accurate method of assessing edema. Additional study is needed to evaluate the validity and responsiveness of these methods.

Journal ArticleDOI
TL;DR: In this paper, the authors evaluated test-retest reliability of the AHA and alternate forms reliability between Small kids vs School kids AHA, and the 2 board games in School kids AAHA.
Abstract: OBJECTIVE: The Assisting Hand Assessment (AHA) has earlier demonstrated excellent validity and rater reliability. This study aimed to evaluate test-retest reliability of the AHA and alternate forms reliability between Small kids vs School kids AHA and the 2 board games in School kids AHA.DESIGN: Test-retest and alternate forms reliability was evaluated by repeated testing with 2 weeks interval.SUBJECTS: Fifty-five children with unilateral cerebral palsy, age range 2 years and 3 months to 11 years and 2 months.METHODS: Intraclass correlation coefficients and smallest detectable difference were calculated. Common item and common person linking plots using Rasch analysis and Bland-Altman plots were created.RESULTS: Intraclass correlation coefficients for test-retest was 0.99. Alternate forms intraclass correlation coefficients were 0.99 between Small kids and School kids AHA and 0.98 between board games. Smallest detectable difference was 3.89 points (sum scores). Items in common item linking plots and persons in common person linking plots were within 95% confidence intervals, indicating equivalence across test forms.CONCLUSION: The AHA has excellent test-retest and alternate forms reliability. A change of 4 points or more between test occasions represents a significant change. Different forms of the AHA give equivalent results.

Journal ArticleDOI
01 Feb 2009-Chest
TL;DR: Evaluating the accuracy of PM performed at home to diagnose OSAS and its outcomes after first validating PM in the laboratory setting by comparing it to polysomnography (PSG) revealed comparable diagnostic ability.

Journal ArticleDOI
TL;DR: A 96-item coding tool was developed and provided a reliable method for analyzing and comparing school district wellness policies in single or multistate studies.
Abstract: In 2006, all local education agencies in the United States participating in the National School Lunch Program were required to establish school wellness policies that covered nutrition education, nutrition standards for school foods, and physical activity. The purpose of this psychometric study was to develop and evaluate the properties of a comprehensive and quantitative coding system to evaluate the quality of these policies. A 96-item coding tool was developed to evaluate seven goal areas: nutrition education, standards for US Department of Agriculture child nutrition programs and school meals, nutrition standards for competitive and other foods and beverages, physical education, physical activity, communication and promotion, and evaluation. Each goal area subscale and the total scale were scored on two dimensions: comprehensiveness and strength. Reliability was assessed by having pairs of researchers from four different states code a sample of 60 polices between July 2007 and July 2008. Goal area subscales were internally reliable (Cronbach's α=.60 to .93). Adequate interrater reliability scores were obtained at each level of scoring: total comprehensiveness and strength scores (intraclass correlation coefficient 0.82), subscale scores (intraclass correlation coefficient 0.70), and individual items (intraclass correlation coefficient 0.72). This coding system provided a reliable method for analyzing and comparing school district wellness policies in single or multistate studies.

Journal ArticleDOI
TL;DR: Initial reliability and validity assessments suggest the PSP has promise as a measure of social functioning in patients with acute symptoms of schizophrenia.
Abstract: Objective: To describe the measurement properties of the Personal and Social Performance scale (PSP), a clinician-reported measure of severity of personal and social dysfunction, in subjects with acute symptoms of schizophrenia.Methods: Pooled data from three paliperidone extended-release clinical studies (n = 1665) and data from a separate noninterventional, cross-sectional, validation study (n = 299) were analyzed.Results: The PSP showed good interrater (intraclass correlation coefficient [ICC] = 0.87) and test–retest (ICCs > 0.90) reliability. Pearson correlation coefficient for association between baseline PSP and Positive and Negative Syndrome Scale (PANSS) total scores was −0.32 for subjects assessed by the same rater and −0.29 for subjects assessed by different raters, suggesting low overlap in measurement constructs between the PANSS and PSP. Spearman Rank correlation coefficient for association between baseline PSP and Clinical Global Impression-Severity (CGI-S) scores was −0.51 with the ...

Journal ArticleDOI
01 Jul 2009-Stroke
TL;DR: Reliability scores were similar among specialists and there were no major differences between nurses and physicians, although scores tended to be lower for neurologists and trended higher among raters not previously certified.
Abstract: NIH Stroke Scale certification is required for participation in modern stroke clinical trials and as part of good clinical care in stroke centers. A new training and demonstration DVD was produced to replace existing training and certification videotapes. Previously, this DVD, with 18 patients representing all possible scores on 15 scale items, was shown to be reliable among expert users. The DVD is now the standard for NIH stroke scale training but the videos have not been validated among general (i.e. non-expert) users. We sought to measure inter-rater reliability of the certification DVD among general users using methodology previously published for the DVD. All raters who used the DVD certification through the American Heart Association website were included in this study. Each rater evaluated one of 3 certification groups. Responses were received from 8214 raters overall, 7419 raters using the internet and 795 raters using other venues. Among raters from other venues, 33% of all responses came from registered nurses, 23% from Emergency Department MD/Other ED/other physicians, and 44% from neurologists. One half (51%) of raters were previously NIHSS certified and 93% were from United States/Canada. Item responses were tabulated, scoring performed as previously published, and agreement measured with unweighted kappa coefficients for individual items and an intraclass correlation coefficient for the overall score. In addition, agreement in this study was compared to the agreement obtained in the original DVD validation study to determine if there were differences between novice and experienced users. Kappa's ranged from 0.15 (Ataxia) to 0.81 (LOC-C questions). Of 15 items, 2 showed poor, 11 moderate, and 2 excellent agreement, based on Kappa scores. Agreement was slightly lower to that obtained from expert users for LOCC, Best Gaze, Visual Fields, Facial Weakness, Motor Left Arm, Motor Right Arm and Sensory Loss. The intraclass correlation coefficient for total score was 0.85 (95% CI 0.72, 0.90). Reliability scores were similar among specialists and there were no major differences between nurses and physicians, though scores tended to be lower for neurologists and trended higher among raters not previously certified. Scores were similar across various certification settings. The data suggest that certification using the NINDS DVDs is robust and surprisingly reliable for NIHSS certification across multiple venues.

Journal ArticleDOI
01 Sep 2009
TL;DR: The BSQ presented good results, thereby providing evidence of its validity and reliability, and is therefore recommended for evaluation of body image attitudes among adolescents.
Abstract: OBJECTIVES: to produce evidence of the validity and reliability of the Body Shape Questionnaire (BSQ) - a tool for measuring an individual's attitude towards his or her body image. METHODS: the study covered 386 young people of both sexes aged between 10 and 18 from a private school and used self-applied questionnaires and anthropometric evaluation. It evaluated the internal consistency, the discriminant validity for differences from the means, according to nutritional status (underweight, eutrophic, overweight and obese), the concurrent validity by way of Spearman's correlation coefficient between the scale and the Body Mass Index (BMI), the waist-hip circumference ratio (WHR) and the waist circumference (WC). Reliability was tested using Wilcoxon's Test, the intraclass correlation coefficient and the Bland-Altman figures. RESULTS: the BSQ displayed good internal consistency (±=0.96) and was capable of discriminating among the total population, boys and girls, according to nutritional status (p<0.001). It correlated with the BMI (r=0.41; p<0.001), WHR (r=-0.10; p=0.043) and WC (r=0.24; p<0.001) and its reliability was confirmed by intraclass correlation (r=0.91; p<0.001) for the total population. The questionnaire was easy to understand and could be completed quickly. CONCLUSIONS: the BSQ presented good results, thereby providing evidence of its validity and reliability. It is therefore recommended for evaluation of body image attitudes among adolescents.

Journal ArticleDOI
TL;DR: The German version of the Oxford Knee Score is a reliable and valid measure for the self-assessment of pain and function in German-speaking patients with osteoarthritis of the knee.

Journal ArticleDOI
TL;DR: This initial validation suggests that Liverpool Osteoarthritis in Dogs (elbow) is worthy of continued investigation, but this instrument requires further validation in larger studies with alternative client groups and alternative therapeutic interventions.
Abstract: Objective: To validate a disease-specific client-based clinical metrology instrument (questionnaire) for dogs with chronic osteoarthritis of the elbow joint. Materialsand Methods: This was a prospective cohort study involving 26 dogs with chronic osteoarthritis of the elbow with 24 associated clients. Validity (face and criterion), reliability and responsiveness of the metrology instrument (named “Liverpool Osteoarthritis in Dogs [elbow]”) were tested in a sequence of studies. Face validity involved use of international peer review. Reliability was assessed using a test-retest scenario with a two week interval; peak vertical force as measured by a force platform was used as an external standard measure. Responsiveness was tested with a two week, single-blinded placebo-controlled intervention using a licensed non-steroidal anti-inflammatory drug. Results: The reliability of Liverpool Osteoarthritis in Dogs (elbow) in the test-retest scenario was good; intraclass correlation coefficient is 0·89, 95 per cent confidence interval 0·75 to 0·95, compared with intraclass correlation coefficient 0·92, 95 per cent confidence interval 0·74 to 0·98, for peak vertical force. Responsiveness testing indicated that the “net” effect size (allowing for placebo effect) for Liverpool Osteoarthritis in Dogs (elbow) was 0·13 compared with (−)0·18 for the force platform. Criterion validity for Liverpool Osteoarthritis in Dogs (elbow) against peak vertical force was poor; Spearman’s rank correlation is −0·24 (P=0·30). Clinical Significance: Liverpool Osteoarthritis in Dogs (elbow) was considered reliable with satisfactory responsiveness. The poor criterion validity suggests a mismatch between force platform peak vertical force and client perceptions of lameness. This instrument requires further validation in larger studies with alternative client groups and alternative therapeutic interventions, but this initial validation suggests that Liverpool Osteoarthritis in Dogs (elbow) is worthy of continued investigation.

Journal ArticleDOI
TL;DR: Some commonly used statistical methods for assessing the reliability of procedures for age estimation in the forensic field, including the concordance correlation coefficient and the intraclass correlation coefficient are introduced.
Abstract: In forensic science, anthropology, and archaeology, several techniques have been developed to estimate chronological age in both children and adults, using the relationship between age and morphological changes in the structure of teeth. Before implementing a statistical model to describe age as a function of the measured morphological variables, the reliability of the measurements of these variables must be evaluated using suitable statistical methods. This paper introduces some commonly used statistical methods for assessing the reliability of procedures for age estimation in the forensic field. The use of the concordance correlation coefficient and the intraclass correlation coefficient are explained. Finally, some pitfalls in the choice of the statistical methods to assess reliability of the measurements in age estimation are discussed.

Journal ArticleDOI
TL;DR: The Dizziness Handicap Inventory into German (DHI-G) demonstrated good reliability and is recommended as a measure of disability in patients with dizziness and unsteadiness.
Abstract: To translate the Dizziness Handicap Inventory into German (DHI-G) and investigate reliability, assess the association between selected items of the University of California Los Angeles Dizziness Questionnaire and the DHI-G, and compare the scores of patients and healthy participants. STUDY DESIGN:: Cross-sectional design. SETTING:: Tertiary center for vertigo, dizziness, or balance disorders. PATIENTS:: One hundred forty-one patients with vertigo, dizziness, and unsteadiness associated with a vestibular disorder, with a mean age (standard deviation) of 51.5 (13.2) years, and 52 healthy individuals participated. INTERVENTIONS:: Fourteen patients participated in the cognitive debriefing; 127 patients completed the questionnaires once or twice within 1 week. MAIN OUTCOME MEASURES:: The DHI-G assesses disability caused by dizziness and unsteadiness; the items of the University of California Los Angeles Dizziness Questionnaire assess dizziness and impact on everyday activities. Internal consistency was estimated using Cronbach alpha, reproducibility by calculating Bland-Altman limits of agreement and intraclass correlation coefficients. Associations were estimated by Spearman correlation coefficients. RESULTS:: Patients filled out the DHI-G without problem and found that their self-perceived disabilities were mostly included. Cronbach alpha values for the DHI-G and the functional, physical, and emotional subscales were 0.90, 0.80, 0.71, and 0.82, respectively. The limits of agreement were +/-12.4 points for the total scale (maximum, 100 points). Intraclass correlation coefficients ranged from 0.90 to 0.95. The DHI-G correlated moderately with the question assessing functional disability (0.56) and fairly with the questions quantifying dizziness (0.43, 0.35). The DHI-G discriminated significantly between healthy participants and patients. CONCLUSION:: The DHI-G demonstrated good reliability and is recommended as a measure of disability in patients with dizziness and unsteadiness.

Journal ArticleDOI
TL;DR: The French short version of the scale Disability of the Arm, Shoulder and Hand-Disability/Symptom (F-QuickDASH-D/S) in patients with shoulder disorders has good reliability, construct validity and responsiveness, and the strong correlation of its score with the full-length DASH/DASH scale score suggests that the QuickDASH/S could be the preferred scale because it is easier to use.

Journal ArticleDOI
TL;DR: Investigators planning future studies of physical therapy CPRs should consider including inception cohorts, using longer follow-up times, performing masked assessments, recruiting larger sample sizes, and incorporating psychological and psychosocial assessments.
Abstract: Background and Purpose: Clinical prediction rules (CPRs) involving physical therapy interventions have been published recently. The quality of the studies used to develop the CPRs was not previously considered, a fact that has potential implications for clinical applications and future research. The purpose of this systematic review was to determine the quality of published CPRs developed for physical therapy interventions. Methods: Relevant databases were searched up to June 2008. Studies were included in this review if the explicit purpose was to develop a CPR for conditions commonly treated by physical therapists. Validated CPRs were excluded from this review. Study quality was independently determined by 3 reviewers using standard 18-item criteria for assessing the methodological quality of prognostic studies. Percentage of agreement was calculated for each criterion, and the intraclass correlation coefficient (ICC) was determined for overall quality scores. Results: Ten studies met the inclusion criteria and were included in this review. Percentage of agreement for individual criteria ranged from 90% to 100%, and the ICC for the overall quality score was .73 (95% confidence interval=.27–.92). Criteria commonly not met were adequate description of inclusion or exclusion criteria, inclusion of an inception cohort, adequate follow-up, masked assessments, sufficient sample sizes, and assessments of potential psychosocial factors. Quality scores for individual studies ranged from 48.2% to 74.0%. Discussion and Conclusion: Validation studies are rarely reported in the literature; therefore, CPRs derived from high-quality studies may have the best potential for use in clinical settings. Investigators planning future studies of physical therapy CPRs should consider including inception cohorts, using longer follow-up times, performing masked assessments, recruiting larger sample sizes, and incorporating psychological and psychosocial assessments.