scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 2014"


Reference EntryDOI
29 Sep 2014
TL;DR: In this paper, the equality of two population correlation coefficients when the data are bivariate normal and Pearson correlation coefficients are used as estimates of the population parameters is a straightforward procedure covered in many introductory statistics courses.
Abstract: Testing the equality of two population correlation coefficients when the data are bivariate normal and Pearson correlation coefficients are used as estimates of the population parameters is a straightforward procedure covered in many introductory statistics courses. The coefficients are converted using Fisher's z-transformation with standard errors (N − 3)−1/2. The two transformed values are then compared using a standard normal procedure. When data are not bivariate normal, Spearman's correlation coefficient rho is often used as the index of correlation. Comparison of two Spearman rhos is not as well documented. Three approaches were investigated using Monte Carlo simulations. Treating the Spearman coefficients as though they were Pearson coefficients and using the standard Fisher's z-transformation and subsequent comparison was more robust with respect to Type I error than either ignoring the nonnormality and computing Pearson coefficients or converting the Spearman coefficients to Pearson equivalents prior to transformation. Keywords: correlation coefficient; pearson correlation; spearman correlation; Fisher z-transformation

344 citations


Reference EntryDOI
29 Sep 2014

259 citations


Journal ArticleDOI
TL;DR: Reliability was good to excellent for IR and ER ROM and isometric strength measurements, regardless of patient or shoulder position or equipment used, and all procedures examined showed acceptable reliability for clinical use.

193 citations


Journal ArticleDOI
Rick W. Wright1, James R. Ross1, Amanda K. Haas, Laura J. Huston, Elizabeth A. Garofoli, David Harris, Kushal Patel, David Pearson, Jake Schutzman, Majd Tarabichi, David Ying, John P. Albright, Christina R. Allen, Annunziato Amendola, Allen F. Anderson, Jack T. Andrish, Christopher C. Annunziata, Robert A. Arciero, Bernard R. Bach, Champ L. Baker, Arthur R. Bartolozzi, Keith M. Baumgarten, Jeffery R. Bechler, Jeffrey H. Berg, Geoffrey A. Bernas, Stephen F. Brockmeier, Robert H. Brophy, Charles A. Bush-Joseph, J. Brad Butler, John D. Campbell, James E. Carpenter, Brian J. Cole, Daniel E. Cooper, Jonathan M. Cooper, Charles L. Cox, R. Alexander Creighton, Diane L. Dahm, Tal S. David, Thomas M. DeBerardino, Warren R. Dunn, David C. Flanigan, Robert W. Frederick, Theodore J. Ganley, Charles J. Gatt, Steven R. Gecha, James Robert Giffin, Sharon L. Hame, Jo A. Hannafin, Christopher D. Harner, Norman Lindsay Harris, Keith S. Hechtman, Elliott B. Hershman, Rudolf G. Hoellrich, Timothy M. Hosea, David W. Johnson, Timothy S. Johnson, Morgan H. Jones, Christopher C. Kaeding, Ganesh V. Kamath, Thomas E. Klootwyk, Brett A. Lantz, Bruce A. Levy, C. Benjamin Ma, G. Peter Maiers, Barton J. Mann, Robert G. Marx, Matthew J. Matava, Gregory M. Mathien, David R. McAllister, Eric C. McCarty, Robert G. McCormack, Bruce S. Miller, Carl W. Nissen, Daniel F. O’Neill, Brett D. Owens, Richard D. Parker, Mark L. Purnell, Arun J. Ramappa, Michael A. Rauh, Arthur C. Rettig, Jon K. Sekiya, Kevin G. Shea, Orrin H. Sherman, James R. Slauterbeck, Matthew V. Smith, Jeffrey T. Spang, Kurt P. Spindler, Michael J. Stuart, Steven J. Svoboda, Timothy N. Taft, Joachim J. Tenuta, Edwin M. Tingstad, Armando F. Vidal, Darius Viskontas, Richard A. White, James S. Williams, Michelle L. Wolcott, Brian R. Wolf, James J. York, James L. Carey 
TL;DR: A multicenter, prospective longitudinal cohort study of patients undergoing revision surgery after anterior cruciate ligament reconstruction found the International Knee Documentation Committee classification demonstrated the best combination of good interobserver reliability and medium correlation with arthroscopic findings.
Abstract: Background: Osteoarthritis of the knee is commonly diagnosed and monitored with radiography. However, the reliability of radiographic classification systems for osteoarthritis and the correlation of these classifications with the actual degree of confirmed degeneration of the articular cartilage of the tibiofemoral joint have not been adequately studied. Methods: As the Multicenter ACL (anterior cruciate ligament) Revision Study (MARS) Group, we conducted a multicenter, prospective longitudinal cohort study of patients undergoing revision surgery after anterior cruciate ligament reconstruction. We followed 632 patients who underwent radiographic evaluation of the knee (an anteroposterior weight-bearing radiograph, a posteroanterior weight-bearing radiograph made with the knee in 45° of flexion [Rosenberg radiograph], or both) and arthroscopic evaluation of the articular surfaces. Three blinded examiners independently graded radiographic findings according to six commonly used systems—the Kellgren-Lawrence, International Knee Documentation Committee, Fairbank, Brandt et al., Ahlback, and Jager-Wirth classifications. Interobserver reliability was assessed with use of the intraclass correlation coefficient. The association between radiographic classification and arthroscopic findings of tibiofemoral chondral disease was assessed with use of the Spearman correlation coefficient. Results: Overall, 45° posteroanterior flexion weight-bearing radiographs had higher interobserver reliability (intraclass correlation coefficient = 0.63; 95% confidence interval, 0.61 to 0.65) compared with anteroposterior radiographs (intraclass correlation coefficient = 0.55; 95% confidence interval, 0.53 to 0.56). Similarly, the 45° posteroanterior flexion weight-bearing radiographs had higher correlation with arthroscopic findings of chondral disease (Spearman rho = 0.36; 95% confidence interval, 0.32 to 0.39) compared with anteroposterior radiographs (Spearman rho = 0.29; 95% confidence interval, 0.26 to 0.32). With respect to standards for the magnitude of the reliability coefficient and correlation coefficient (Spearman rho), the International Knee Documentation Committee classification demonstrated the best combination of good interobserver reliability and medium correlation with arthroscopic findings. Conclusions: The overall estimates with the six radiographic classification systems demonstrated moderate (anteroposterior radiographs) to good (45° posteroanterior flexion weight-bearing radiographs) interobserver reliability and medium correlation with arthroscopic findings. The International Knee Documentation Committee classification assessed with use of 45° posteroanterior flexion weight-bearing radiographs had the most favorable combination of reliability and correlation. Level of Evidence: Diagnostic Level I. See Instructions for Authors for a complete description of levels of evidence.

127 citations


Journal ArticleDOI
TL;DR: The emPHasis-10 questionnaire as discussed by the authors is a short questionnaire for assessing health-related quality of life in pulmonary arterial hypertension patients, which has excellent measurement properties and is sensitive to differences in relevant clinical parameters.
Abstract: The aim of this study was to develop a measure of the impact of pulmonary hypertension (PH) on health-related quality of life (HRQoL) as there is a need for a short, validated instrument that can be used in routine clinical practice. Interviews were conducted with 30 PH patients to derive 32 statements, which were presented as a semantic differential six-point scale (0–5), with contrasting adjectives at each end. This item list was completed by patients attending PH clinics across the UK and Ireland. Rasch analysis was applied to identify items fitting a uni-dimensional model. 226 patients (mean age 55.6±14 years; 70% female) with PH (82% had pulmonary arterial hypertension) completed the study questionnaires. 10 of the 32 items demonstrated fit to the Rasch model (Chi-squared 16; p>0.05) and generated the emPHasis-10 questionnaire. Test–retest (intraclass correlation coefficient 0.95, n=33) and internal consistency (Chronbach’s α=0.9) were strong. emPHasis-10 scores correlated consistently with other relevant measures and discriminated subgroups of patients stratified by World Health Organization functional class (ANOVA F=1.73; p<0.001). The emPHasis-10 is a short questionnaire for assessing HRQoL in pulmonary arterial hypertension. It has excellent measurement properties and is sensitive to differences in relevant clinical parameters. It is freely available for clinical and academic use.

125 citations


Journal ArticleDOI
TL;DR: In this article, the authors identify the kinetic and kinematic factors correlated with the time to complete a cutting maneuver and an analysis of the test-retest reliability of all biomechanical measures was performed.
Abstract: Cutting ability is an important aspect of many team sports, however, the biomechanical determinants of cutting performance are not well understood. This study aimed to address this issue by identifying the kinetic and kinematic factors correlated with the time to complete a cutting maneuver. In addition, an analysis of the test-retest reliability of all biomechanical measures was performed. Fifteen (n = 15) elite multidirectional sports players (Gaelic hurling) were recruited, and a 3-dimensional motion capture analysis of a 75° cut was undertaken. The factors associated with cutting time were determined using bivariate Pearson's correlations. Intraclass correlation coefficients (ICCs) were used to examine the test-retest reliability of biomechanical measures. Five biomechanical factors were associated with cutting time (2.28 ± 0.11 seconds): peak ankle power (r = 0.77), peak ankle plantar flexor moment (r = 0.65), range of pelvis lateral tilt (r = -0.54), maximum thorax lateral rotation angle (r = 0.51), and total ground contact time (r = -0.48). Intraclass correlation coefficient scores for these 5 factors, and indeed for the majority of the other biomechanical measures, ranged from good to excellent (ICC >0.60). Explosive force production about the ankle, pelvic control during single-limb support, and torso rotation toward the desired direction of travel were all key factors associated with cutting time. These findings should assist in the development of more effective training programs aimed at improving similar cutting performances. In addition, test-retest reliability scores were generally strong, therefore, motion capture techniques seem well placed to further investigate the determinants of cutting ability.

117 citations


Journal ArticleDOI
TL;DR: All measurement properties were consistently verified across the two studies, supporting the validity of the HIT-6 among chronic migraine patients.
Abstract: The Headache Impact Test (HIT)-6 was developed and has been validated in patients with various types of headache. The objective of this study was to report the psychometric properties of the HIT-6 among patients with chronic migraine. Data came from two international, multicenter, randomized, double-blind, placebo-controlled clinical trials of chronic migraine patients (N = 1,384) undergoing prophylaxis therapy. Confirmatory factor analysis and differential item functioning (DIF) analysis were used to test the latent structure and cross-cultural comparability of the HIT-6. Reliability, construct validity, and responsiveness were assessed. Two sets of criterion groups were used: (1) 28-day headache frequency: <10, 10–14, and ≥15 days; (2) sample quartiles of the total cumulative hours of headache: <140, 140 to <280, 280 to <420, and ≥420 hours. Two sets of responsiveness categories were defined as reduction of <30%, 30% to <50%, or ≥50% in (1) number of headache days and (2) cumulative hours of headache. Measurement invariance tests supported the stability of the HIT-6 latent structure across studies. DIF analysis supported cross-cultural comparability. Good reliability was observed across studies (Cronbach’s α: 0.75–0.92; intraclass correlation coefficient: 0.76–0.80). HIT-6 scores correlated strongly (−0.86 to −0.59) with scores of the Migraine-Specific Quality-of-Life Questionnaire. Analysis of variance indicated that HIT-6 scores discriminated across both types of criterion groups (P<0.001), across studies and time points. HIT-6 change scores were significantly higher in magnitude in groups experiencing greater improvement (P<0.001). All measurement properties were consistently verified across the two studies, supporting the validity of the HIT-6 among chronic migraine patients. NCT00156910 and NCT00168428 on www.ClinicalTrials.gov .

116 citations


Journal ArticleDOI
TL;DR: The results suggest that PPT is adequately reliable and that 3 measurements should be taken to maximize measurement properties and clinical implications for application and interpretation of PPT are discussed.
Abstract: Background Quantitative sensory testing, including pressure pain threshold (PPT), is seeing increased use in clinical practice. In order to facilitate clinical utility, knowledge of the properties of the tool and interpretation of results are required. Objectives This observational study used a clinical sample of people with mechanical neck pain to determine: (1) the influence of number of testing repetitions on measurement properties, (2) reliability and minimum clinically important difference, and (3) associations between PPT and key psychological constructs. Design This study was observational with both cross-sectional and prospective elements. Methods Experienced clinicians measured PPT in patients with mechanical neck pain following a standardized protocol. Subcohorts also provided repeated measures and completed scales of key psychological constructs. Results The total sample was 206 participants, but not all participants provided data for all analyses. Interrater and 1-week test-retest reliability were excellent (intraclass correlation coefficients [2,1]=.75–.95). Potentially important differences in reliability and PPT scores were found when using only 1 or 2 repeated measures compared with all 3. The PPT over a distal location (tibialis anterior muscle) was not adequately responsive in this sample, but the local site (upper trapezius muscle) was responsive and may be useful as part of a protocol to evaluate clinical change. Sensitivity values (range=0.08–0.50) and specificity values (range=0.82–0.97) for a range of change scores are presented. Depression, catastrophizing, and kinesiophobia were able to explain small but statistically significant variance in local PPT (3.9%–5.9%), but only catastrophizing and kinesiophobia explained significant variance in the distal PPT (3.6% and 2.9%, respectively). Limitations Limitations of the study include multiple raters, unknown recruitment rates, and unknown measurement properties at sites other than those tested here. Conclusions The results suggest that PPT is adequately reliable and that 3 measurements should be taken to maximize measurement properties. The variance explained by the psychological variables was small but significant for 3 constructs related to catastrophizing, depression, and fear of movement. Clinical implications for application and interpretation of PPT are discussed.

92 citations


Journal ArticleDOI
TL;DR: The majority of the questionnaires evaluated were reliable and precise enough for use at the group level and quite large at the individual level, indicating that most questionnaires reached minimal reliability benchmarks.
Abstract: Background/aims The most reliable patient-reported outcomes (PROs) for people with femoroacetabular impingement (FAI) is unknown because there have been no direct comparisons of questionnaires. Thus, the aim was to evaluate the test–retest reliability of six existing PROs in a single cohort of young active people with hip/groin pain consistent with a clinical diagnosis of FAI. Methods Young adults with clinical FAI completed six PRO questionnaires on two occasions, 1–2 weeks apart. The PROs were modified Harris Hip Score, Hip dysfunction and Osteoarthritis Score, Hip Outcome Score, Non-Arthritic Hip Score, International Hip Outcome Tool, Copenhagen Hip and Groin Outcome Score. Results 30 young adults (mean age 24 years, SD 4 years, range 18–30 years; 15 men) with stable symptoms participated. Intraclass correlation coefficient (3,1) values ranged from 0.73 to 0.93 (95% CI 0.38 to 0.98) indicating that most questionnaires reached minimal reliability benchmarks. Measurement error at the individual level was quite large for most questionnaires (minimal detectable change (MDC 95 ) 12.4–35.6, 95% CI 8.7 to 54.0). In contrast, measurement error at the group level was quite small for most questionnaires (MDC 95 2.2–7.3, 95% CI 1.6 to 11). Conclusions The majority of the questionnaires were reliable and precise enough for use at the group level. Samples of only 23–30 individuals were required to achieve acceptable measurement variation at the group level. Further direct comparisons of these questionnaires are required to assess other measurement properties such as validity, responsiveness and meaningful change in young people with FAI.

82 citations


Journal ArticleDOI
TL;DR: This study explores MOBID‐2's test–retest reliability, measurement error and responsiveness to change.

81 citations


Journal ArticleDOI
TL;DR: Both smartphone applications were found to be reliable and comparable to SG, and a photo-based application potentially offers a superior method of measurement as visualizing the landmarks may be simplified in this format and it provides a record of measurement.
Abstract: Purpose/hypothesis: The purpose of this study was to determine the reliability and validity of two smartphone applications: (1) GetMyROM – inclinometery-based and (2) DrGoniometry – photo-based in the measurement of active shoulder external rotation (ER) as compared to standard goniometry (SG). Participants: Ninety-four Texas Woman's University Doctor of Physical Therapy students from the School of Physical Therapy – Houston campus, were recruited to participate in this study. Materials/methods: Two iPhone applications were compared to SG using both novice and experienced raters. Active shoulder ER range of motion was measured over two time periods in random order by blinded novice and experienced raters. Results: Intra-rater reliability using novice raters for the two applications ranged from an intraclass correlation coefficient (ICC) of 0.79 to 0.81 with SG at 0.82. Inter-rater reliability (novice/expert) for the two applications ranged from an ICC of 0.92 to 0.94 with SG at 0.91. Concurrent va...

Journal ArticleDOI
TL;DR: Even though the overall subtest score and individual skill agreement was good, some skill components had lower agreement, suggesting these may be more problematic to assess and need to be specified differently in order to improve component reliability.

Journal ArticleDOI
TL;DR: The majority of the tests evaluated showed satisfactory reliability and construct validity supporting their use in the clinical evaluation of patients with chronic neck pain, however, differences were within the limits of the minimal detectable change.
Abstract: The reliability of clinical tests for the cervical spine has not been adequately evaluated. Six cervical clinical tests, which are low cost and easy to perform in clinical settings, were tested for intra- and inter-examiner reliability, and two performance tests were assessed for test-retest reliability in people with and without chronic neck pain. Moreover, construct and between-group discriminative validity of the tests were examined. Twenty-one participants with chronic neck pain and 21 asymptomatic participants were included. Intra- and inter-reliability were evaluated for the Cranio-Cervical Flexion Test (CCFT), Range of Movement (ROM), Joint Position Error (JPE), Gaze Stability (GS), Smooth Pursuit Neck Torsion Test (SPNTT), and neuromuscular control of the Deep Cervical Extensors (DCE). Test-retest reliability was assessed for Postural Control (SWAY) and Pressure Pain Threshold (PPT) over tibialis anterior, infraspinatus and the C3-C4 segment. Intraclass Correlation Coefficient (ICC) for intra- and inter-examiner reliability was highest for ROM (range: 0.80 to 0.94), DCE (0.75 to 0.90) and CCFT (0.63 to 0.86). JPE had the lowest ICC (0.02 to 0.66). Intra- and inter-reliability for GS and SPNTT showed kappa ranging from 0.66 to 0.92, and 0.57 to 0.78 (prevalence adjusted), respectively. For the test-retest study, ICC was 0.83 to 0.89 for PPT and 0.39 to 0.79 for SWAY. Construct validity was satisfactory for all tests, except JPE. Significant between group discriminative validity was found for CCFT, ROM, GS, SPNTT and PPT, however, differences were within the limits of the minimal detectable change. The majority of the tests evaluated showed satisfactory reliability and construct validity supporting their use in the clinical evaluation of patients with chronic neck pain.

Journal ArticleDOI
TL;DR: The EQ-5D-5L was more suitable than the EQ- 5D-3L in the patients with hepatitis B in China, and age, education, and comorbidity were associated with health-related quality of life (HRQoL).
Abstract: The purpose of the study was to compare psychometric properties of the EQ-5D-5L (5L) and the EQ-5D-3L (3L) health outcomes assessment instruments in patients with hepatitis B in China. Patients, including hepatitis B virus carriers and those with active or inactive chronic hepatitis B, compensated cirrhosis, decompensated cirrhosis or hepatocellular carcinoma, answered a questionnaire composed of 5L, socio-demographic information, 3L, and the visual analog scale (VAS), respectively. After 1 week, a retest was conducted for inpatients. We compared acceptability, face validity, redistribution properties, convergent validity, known-group validity, discriminatory power, ceiling effect, test–retest reliability, and responsiveness of 5L and 3L. A total of 369 outpatients and 276 inpatients were recruited for the first interview. Of the inpatients, 183 were used in the retest. Most patients preferred 5L–3L. The 3L–5L response pairs had an inconsistency rate of 2.4 %. Correlation with the VAS was greater with 5L than with 3L. Age, education, and comorbidity were associated with health-related quality of life (HRQoL). 5L discriminated more infectious conditions than 3L. In all dimensions, the Shannon’s index from 5L was larger while in three dimensions the Shannon’s evenness index from 5L was slightly larger. The ceiling effect was reduced in 5L. In patients with stable health states, no significant difference was detected in the weighted kappa between 5L and 3L, but intraclass correlation coefficient of 5L was higher than that of 3L. In patients with improved health states, HRQoL was seen as increased in both 5L and 3L, without significant difference. The EQ-5D-5L was more suitable than the EQ-5D-3L in the patients with hepatitis B in China.

Journal ArticleDOI
TL;DR: The CEA had high interrater reliability and good intrarater reliability with an overall intraclass correlation coefficient (ICC) and was a reliable scale for measuring the facial erythema of rosacea when used by trained raters.
Abstract: Background Facial erythema is a clinical hallmark of rosacea and often causes social and psychological distress Although facial erythema assessments are a common endpoint in rosacea clinical trials, their reliability has not been evaluated Objective The objective of this study was to evaluate the inter- and intrarater reliability of the Clinician's Erythema Assessment (CEA), a 5-point grading scale of facial erythema severity Methods Twelve board-certified dermatologists, previously trained on use of the scale, rated erythema of 28 rosacea subjects twice on the same day Interrater and intrarater agreement was assessed with the intraclass correlation and κ statistic Results The CEA had high interrater reliability and good intrarater reliability with an overall intraclass correlation coefficient (ICC) for session 1 and session 2 of 0601 and 0576, respectively; the overall weighted κ statistic for session 1 and session 2 was 0692 Limitations Raters were experienced dermatologists and there may be a risk of recall bias Conclusion When used by trained raters, CEA is a reliable scale for measuring the facial erythema of rosacea

Journal ArticleDOI
TL;DR: The current findings agree with those of other reliability studies that have reported acceptable ICCs across 30-day to 1-year testing intervals, and they support the utility of the ImPACT for the multidisciplinary approach to concussion management.
Abstract: Background:Test-retest reliability is a critical issue in the utility of computer-based neurocognitive assessment paradigms employing baseline and postconcussion tests. Researchers have reported low test-retest reliability for the Immediate Post Concussion Assessment and Cognitive Testing (ImPACT) across an interval of 45 and 50 days.Purpose:To re-examine the test-retest reliability of the ImPACT between baseline, 45 days, and 50 days.Study Design:Descriptive laboratory study.Methods:Eighty-five physically active college students (51 male, 34 female) volunteered for this study. Participants completed the ImPACT as well as a 15-item memory test at baseline, 45 days, and 50 days. Intraclass correlation coefficients (ICCs) were calculated for ImPACT composite scores, and change scores were calculated using reliable change indices (RCIs) and regression-based methods (RBMs) at 80% and 95% confidence intervals (CIs).Results:The respective ICCs for baseline to day 45, day 45 to day 50, baseline to day 50, and ov...

Journal Article
TL;DR: The Dynavision™ D2 is a reliable device to assess neuromuscular reactivity given that an adequate practice is provided, and it appears that one familiarization trial is necessary for the choice reaction time (CRT) task while three familiarization trials are necessary for reactive RT tasks.
Abstract: Recently, the Dynavision™ D2 Visuomotor Training Device (D2) has emerged as a tool in the assessment of reaction time (RT); however, information regarding the reliability of the D2 have been limited, and to date, reliability data have been limited to non- generalizable samples. Therefore, the purpose of this study was to establish intraclass correlation coefficients (ICC2,1) for the D2 that are generalizable across a population of recreationally active young adults. Forty-two recreationally active men and women (age: 23.41 ± 4.84 years; height: 1.72 ± 0.11 m; mass: 76.62 ± 18.26 Kg) completed 6 trials for three RT tasks of increasing complexity. Each trial was separated by at least 48-hours. A repeated measures ANOVA was used to detect differences in performance across the six trials. Intraclass correlation coefficients (ICC2,1) standard error of measurement (SEM), and minimal differences (MD) were used to determine the reliability of the D2 from the two sessions with the least significant difference score. Moderate to strong reliability was demonstrated for visual RT (ICC2,1: 0.84, SEM: 0.033), and reactive ability in both Mode A and Mode B tasks (Mode A hits: ICC2,1: 0.75, SEM: 5.44; Mode B hits: ICC2,1: 0.73, SEM: 8.57). Motor RT (ICC2,1: 0.63, SEM: 0.035s) showed fair reliability, while average RT per hit for Modes A and B showed moderate reliability (ICC2,1: 0.68, SEM: 0.43 s and ICC2,1: 0.72, SEM: 0.03 s respectively). It appears that one familiarization trial is necessary for the choice reaction time (CRT) task while three familiarization trials are necessary for reactive RT tasks. In conclusion, results indicate that the Dynavision™ D2 is a reliable device to assess neuromuscular reactivity given that an adequate practice is provided. The data presented are generalizable to a population of recreationally active young adults. Key PointsThe Dynavision™ D2 is a light-training reaction device, developed to train sensory motor integration through the visual system, offering the ability to assess visual and motor reaction to both central and peripheral stimuli, with a capacity to integrate increasing levels of cognitive challenge.The Dynavision™ D2 is a reliable instrument for assessing reaction time in recreationally active young adults.It is recommended that one familiarization trial is necessary for the choice reaction time task assessment to learn the test protocol, while three familiarization trials are needed for reactive ability in Mode A and Mode B before a subsequent reliable baseline score can be established.Significant training effects were observed for all reaction time tests and should be taken into account with continuous trials.

Journal ArticleDOI
TL;DR: For the metrics having three or more ICCs reported for both functional and structural networks, six of seven were higher inStructural networks, indicating that structural networks may be more reliable over time.
Abstract: This systematic review aimed to assess the reproducibility of graph-theoretic brain network metrics. Primary research studies of test-retest reliability conducted on healthy human subjects were included that quantified test-retest reliability using either the intraclass correlation coefficient (ICC) or the coefficient of variance. The MEDLINE, Web of Knowledge, Google Scholar, and OpenGrey databases were searched up to February 2014. Risk of bias was assessed with 10 criteria weighted toward methodological quality. Twenty-three studies were included in the review (n=499 subjects) and evaluated for various characteristics, including sample size (5–45), retest interval ( 1 year), acquisition method, and test-retest reliability scores. For at least one metric, ICCs reached the fair range (ICC 0.40–0.59) in one study, the good range (ICC 0.60–0.74) in five studies, and the excellent range (ICC>0.74) in 16 studies. Heterogeneity of methods prevented further quantitative analysis. Reproducibili...

Journal ArticleDOI
TL;DR: To determine normative values for the Timed Up and Go (TUG) test in typically developing children and adolescents and to validate its use in individuals with Down syndrome.
Abstract: Aim To determine normative values for the Timed Up and Go (TUG) test in typically developing children and adolescents and to validate its use in individuals with Down syndrome. Method Participants in this cross-sectional study were South Brazilian schoolchildren aged 3 to 18 years. In phase 1, 459 typically developing individuals (227 males, 232 females; mean age 10y 8mo (SD 4y 4mo) were included; and in phase 2, 40 individuals with Down syndrome (16 males, 24 females; mean age 10y 6mo (SD 4y 4mo). Anthropometric measurements, real leg length, TUG test scores, and Gross Motor Function Measure (GMFM) scores were evaluated. The association between the TUG test and possible predictive variables was analyzed. Results In phase 1, the mean time to perform the TUG test was 5.61 seconds (SD 1.06). Values were stratified in age groups that served as normative data for both sexes. A multiple linear regression analysis was conducted and the best variables to predict TUG scores were age and weight. The best model obtained presented an R2 of 0.25 and a standard error of the estimate of 0.92. Excellent intrasession reliability in the three tests performed (intraclass correlation coefficient [ICC] of 0.93, 0.94, and 0.95) and between the sessions (both with an ICC of 0.95) was demonstrated. In phase 2, the test also showed excellent reproducibility (ICC=0.82 between the two tests performed). The performance time was significantly longer (p<0.001) in individuals with Down syndrome compared with sex- age-, and weight-matched typically developing children with a mean difference of −3.53 (95% confidence interval −4.05 to −3.00). Dimension E of the GMFM (Walking, Running and Jumping) showed the highest correlation (r=−0.55, p<0.001) with the test. Interpretation This study provides normative values for the TUG test and shows that TUG scores can be predicted as a function of age and weight in typically developing individuals. The test can also be used for assessment of functional mobility in individuals with Down syndrome.

Journal ArticleDOI
TL;DR: The Spanish version of the BPI‐SF is a valid and reliable instrument to measure pain severity and interference and showed good reliability, with Cronbach's alphas of 0.931 for the severity and interfered scales.
Abstract: The Brief Pain Inventory (BPI) is a widely used pain measurement tool. There are 2 versions, the BPI Long Form (BPI-LF) and Short Form (BPI-SF), which share 2 core scales measuring pain severity and pain interference but which use different recall periods (24 hours vs. 1 week). To date, the BPI-SF has not been validated for use in Spain. This study investigated the psychometric properties of the BPI-SF Spanish version and compared results on the core scales between BPI-LF and BPI-SF. The data came from a 3-month observational study of 3,029 nononcologic patients managed in Spanish pain units. The BPI-SF's reliability, validity, and responsiveness were assessed. The effect of different recall periods was investigated by using intraclass correlation coefficients (ICCs) to determine the strength of correlation between BPI-LF and BPI-SF. The BPI-SF showed good reliability, with Cronbach's alphas of 0.931 for the severity and interference scales, which also discriminated well between patients reporting different levels of quality of life on EuroQol-5D dimensions (between group effect sizes [ESs] over 0.8). Substantial improvements were seen on both subscales after 3 months of treatment (ES of 1.76 for pain severity and 1.51 for pain interference). Recall period did not noticeably affect scores; ICCs (95% CI) between the long and short versions were 0.946 (0.938 to 0.954) and 0.929 (0.919 to 0.939) for the severity and interference subscales, respectively. The Spanish version of the BPI-SF is a valid and reliable instrument to measure pain severity and interference.

Journal ArticleDOI
TL;DR: A novel color scale for visual assessment, conforming to theoretical color changes of a gum, to evaluate masticatoryperformance is developed and investigated the reliability and validity of this evaluation method using the color scale.
Abstract: In the present study, we developed a novel color scale for visual assessment, conforming to theoretical color changes of a gum, to evaluate masticatoryperformance; moreover, we investigated the reliability and validity of this evaluation method using the color scale Ten participants (aged 2630 years) with natural dentition chewed the gum at several chewing strokes Changes in color were measured using a colorimeter, and then, linearregression expressions that represented changes in gum color were derived The color scale was developed using these regression expressions Thirty-two chewed gums were evaluated using colorimeter and were assessed three times using the color scale by six dentists aged 2527 (mean, 258) years, six preclinical dental students aged 2123 (mean, 222) years, and six elderly individuals aged 6884 (mean, 740) years The intrarater and interrater reliability of evaluations was assessed using intraclass correlation coefficients Validity of the method compared with a colorimeter was assessed using Spearman's rank correlation coefficient All intraclass correlation coefficients were > 090, and Spearman's rank-correlation coefficients were > 095 in all groups These results indicated that the evaluation method of the color-changeable chewing gum using the newly developed color scale is reliable and valid

Journal ArticleDOI
TL;DR: In this article, a systematic literature review following the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) protocol was performed to determine the measurement properties and feasibility previously investigated for clinical tests that evaluate sit-to-stand and stand-to sit in subjects with neurological disease.
Abstract: BACKGROUND:Subjects with neurological disease (ND) usually show impaired performance during sit-to-stand and stand-to-sit tasks, with a consequent reduction in their mobility levels.OBJECTIVE: To determine the measurement properties and feasibility previously investigated for clinical tests that evaluate sit-to-stand and stand-to-sit in subjects with ND. METHOD: A systematic literature review following the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) protocol was performed. Systematic literature searches of databases (MEDLINE/SCIELO/LILACS/PEDro) were performed to identify relevant studies. In all studies, the following inclusion criteria were assessed: investigation of any measurement property or the feasibility of clinical tests that evaluate sit-to-stand and stand-to-sit tasks in subjects with ND published in any language through December 2012. The COSMIN checklist was used to evaluate the methodological quality of the included studies.RESULTS: Eleven studies were included. The measurement properties/feasibility were most commonly investigated for the five-repetition sit-to-stand test, which showed good test-retest reliability (Intraclass Correlation Coefficient:ICC=0.94-0.99) for subjects with stroke, cerebral palsy and dementia. The ICC values were higher for this test than for the number of repetitions in the 30-s test. The five-repetition sit-to-stand test also showed good inter/intra-rater reliabilities (ICC=0.97-0.99) for stroke and inter-rater reliability (ICC=0.99) for subjects with Parkinson disease and incomplete spinal cord injury. For this test, the criterion-related validity for subjects with stroke, cerebral palsy and incomplete spinal cord injury was, in general, moderate (correlation=0.40-0.77), and the feasibility and safety were good for subjects with Alzheimer's disease.CONCLUSIONS: The five-repetition sit-to-stand test was used more often in subjects with ND, and most of the measurement properties were investigated and showed adequate results.

Journal ArticleDOI
TL;DR: The results support the use of Flock of Birds to measure scapular orientations in subjects with and without impingement symptoms, and show excellent within-day reliability but were not highly reliable over time.
Abstract: T RESULTS: Intraclass correlation coefficients for within- and between-day assessment of scapular orientation during elevation and lowering of the arm in both groups ranged from 0.92 to 0.99 and from 0.54 to 0.88, respectively. Intraclass correlation coefficients for assessment of scapular orientation with the arms relaxed at the side in both groups ranged from 0.66 to 0.95. The standard error of measurement for between-day measurements ranged from 3.37° to 7.44° for both groups. The minimal detectable change for between-day measurements increased from 7.81° at the lower to 17.27° at the higher humerothoracic elevation angles. T

Journal ArticleDOI
TL;DR: A consensus for the classification of DUs in SSc was developed, and after a training session, rheumatologists with expertise in S sc are able to reliably classify DUs and to measure ulcer area.
Abstract: The objectives of this study were to develop a standard classification of digital ulcers (DUs) in systemic sclerosis (SSc) for use in observational or therapeutic studies and to assess the reliability of these definitions as well as of the measurement of ulcer area. Ten North American rheumatologists with expertise in SSc reviewed multiple photos of DUs, examined four SSc subjects with DUs, and came to a consensus on the definitions for digital, active, healed, and indeterminate ulcers. These ten raters then examined the right hand of ten SSc subjects twice and the left hand once to classify ulcers and to measure ulcer area. Weighted and Fleiss kappa were used to calculate intra- and interrater agreement on classification of ulcers, and intraclass correlation coefficient (ICC) was used to assess agreement on ulcer area. Because the traditional ICC calculations relied on a small number of ulcers, ICCs were recalculated using the results of linear mixed models to evaluate the variance components of observations on all the data. Intrarater kappa for classifying DU as not an ulcer/healed ulcer versus active/indeterminate ulcer was substantial (0.76), and interrater kappa was moderate (0.53). The ICC for ulcer area using the linear mixed models was moderate both for intrarater (0.57) and interrater (0.48) measurements. A consensus for the classification of DUs in SSc was developed, and after a training session, rheumatologists with expertise in SSc are able to reliably classify DUs and to measure ulcer area.

Journal ArticleDOI
TL;DR: A multi-center study was conducted to validate the Spanish version of the BNSS (BNSS-Sp) in 20 schizophrenia patients, following the original BnSS validation methodology, and found strong inter-rater, test-retest and internal consistency properties.

Journal ArticleDOI
TL;DR: The Swedish version of the iHOT12 is a valid, reliable and responsive instrument that can be used both for research and in the clinical setting.
Abstract: There is a lack of standardised outcome measures in Swedish for active, young and middle-aged patients with hip and groin disability. The purpose of this study was to adapt the English version of the international Hip Outcome Tool (iHOT12) patient-reported outcome instrument for use in Swedish patients and evaluate the adaptation according to the consensus-based standards for the selection of health status measurement instruments checklist. Cross-cultural adaptation was performed in several steps, including translation, back-translation, expert review and pre-testing. The final version was evaluated for reliability, validity and responsiveness in a clinical study of patients [median age 37 (range 15–75)], undergoing surgery for femoro-acetabular impingement. Cronbach’s alpha was 0.89, and significant correlations were obtained with the Copenhagen Hip and Groin Outcome Score (Spearman’s r 0.10–0.70; p < 0.05) and the EuroQol, EQ-5D average score (Spearman’s r 0.27–0.56; p < 0.01). Test–retest reliability (intraclass correlation coefficient) ranged from 0.59 to 0.93 for the individual items. The smallest detectable change ranged from 17.1 to 44.9 at individual level and 3.6 to 9.4 at group level. Factor analysis revealed one factor of pain and symptoms and another factor of physical function. Effect sizes were generally medium or large. The Swedish version of the iHOT12 is a valid, reliable and responsive instrument that can be used both for research and in the clinical setting. Diagnostic study, Level I.

Journal ArticleDOI
TL;DR: The Rasch-based linear-scale CVSS17 emerged as a useful tool to quantify CRVOS in computer workers and showed good reliability and internal consistency.
Abstract: Purpose.: To develop a questionnaire (in Spanish) to measure computer-related visual and ocular symptoms (CRVOS). Methods.: A pilot questionnaire was created by consulting the literature, clinicians, and video display terminal (VDT) workers. The replies of 636 subjects completing the questionnaire were assessed using the Rasch model and conventional statistics to generate a new scale, designated the Computer-Vision Symptom Scale (CVSS17). Validity and reliability were determined by Rasch fit statistics, principal components analysis (PCA), person separation, differential item functioning (DIF), and item–person targeting. To assess construct validity, the CVSS17 was correlated with a Rasch-based visual discomfort scale (VDS) in 163 VDT workers, this group completed the CVSS17 twice in order to assess test-retest reliability (two-way single-measure intraclass correlation coefficient [ICC] and their 95% confidence intervals, and the coefficient of repeatability [COR]). Results.: The CVSS17 contains 17 items exploring 15 different symptoms. These items showed good reliability and internal consistency (mean square infit and outfit 0.88–1.17, eigenvalue for the first residual PCA component 1.37, person separation 2.85, and no DIF). Pearson's correlation with VDS scores was 0.60 (P < 0.001). Intraclass correlation coefficient for test–retest reliability was 0.849 (95% confidence interval [CI], 0.800–0.887), and COR was 8.14. Conclusions.: The Rasch-based linear-scale CVSS17 emerged as a useful tool to quantify CRVOS in computer workers.

Journal ArticleDOI
TL;DR: The NDI-G emerged from this study as a valid and reliable assessment and its psychometric properties are comparable with the original version.

Journal ArticleDOI
15 Jan 2014-Spine
TL;DR: The French version of the Keele STarT Back Screening Tool is a reliable and valid questionnaire consistent with the original English version and may help French-speaking clinicians and scientists to stratify patients with low back pain.
Abstract: Study Design. Observational prospective study. Objective. Our objective was to assess the reliability and validity of the French version of the Keele STarT Back Screening Tool (SBST). Summary of Background Data. The SBST is a recently validated tool developed to identify subgroups of patients with low back pain (LBP) to guide early secondary prevention in primary care. Methods. Outpatients 18 years or older with LBP, attending a rehabilitation center, a back school, a private physiotherapy unit, or a fitness center were included. Patients were assessed through the SBST, Roland-Morris Disability Questionnaire, Orebro Musculoskeletal Pain Screening Questionnaire, Medical Outcomes Survey Short Form-36 questionnaire, and a pain visual analogue scale. Test-retest reliability was assessed with Kappa score or the intraclass correlation coefficient, internal consistency of the Psychological subscale with the Cronbach α coefficient, construct validity with the Spearman correlation coefficient, and floor and ceiling effects by percentage frequency of lowest or highest possible score achieved by respondents. Results. One hundred eight patients with LBP were included. The test-retest reliability of the SBST total score was excellent with an intraclass correlation coefficient of 0.90 (0.81–0.95). The Cronbach α coefficient was 0.73 showing a good internal consistency for the Psychological subscale. High Spearman correlation coefficients of 0.74 between SBST and Roland-Morris Disability Questionnaire, and 0.74 between the SBST and Orebro Musculoskeletal Pain Screening Questionnaire were observed. As expected, low-to-moderate correlations were observed between the SBST total score and some dissimilar measures of the Short-Form 36. The lowest possible SBST score was observed for 8 patients (7.4%), whereas only 3 patients (2.8%) had the highest possible SBST score. Conclusion. The French version of the SBST is a reliable and valid questionnaire consistent with the original English version. Therefore, this new version may help French-speaking clinicians and scientists to stratify patients with LBP. Conclusion. Level of Evidence: 2

Journal ArticleDOI
TL;DR: The 6-minute walk test, the hand-grip test, and the physical fitness questionnaire can be recommended as a core set of reliable and valid measures to assess health-related physical fitness in patients with various MSCs.
Abstract: Study Design A cross-sectional study with a test-retest design. Objectives To assess measurement properties of the physical fitness questionnaire, the 6-minute walk test, the stair test, the hand-grip test, the 30-second sit-to-stand test, and the fingertip-to-floor test in patients with various musculoskeletal conditions (MSCs). Background Patients suffering from MSCs tend to be more deconditioned and less physically active than healthy people. Physiotherapists should, therefore, focus on health-related physical fitness in addition to their patients' specific MSCs to offer optimal treatment. To enable good decision making, a core set of feasible measures with acceptable measurement properties is needed. Methods Eighty-one patients with MSCs (57.6 ± 14.2 years of age) were recruited from outpatient physiotherapy clinics. Relative reliability was analyzed with intraclass correlation coefficient model 2,1, and absolute reliability with standard error of measurement and smallest detectable change. Construct ...