Showing papers on "Intraclass correlation published in 2011"

PDF

Open Access

Monograph•DOI•

Sequential analysis and observational methods for the behavioral sciences

[...]

Roger Bakeman¹, Vicenç Quera²•Institutions (2)

Georgia State University¹, University of Barcelona²

01 Oct 2011

TL;DR: In this paper, the authors present a survey of observational methods for point-by-point agreement in coding schemes and enumeration of individual codes, as well as summary statistics for individual codes.

...read moreread less

Abstract: 1. Introduction to observational methods 2. Coding schemes and observational measurement 3. Recording observational data 4. Representing observational data 5. Observer agreement and Cohen's kappa 6. Kappas for point-by-point agreement 7. The intraclass correlation coefficient (ICC) for summary measures 8. Summary statistics for individual codes 9. Cell and summary statistics for contingency tables 10. Preparing for sequential and other analyses 11. Time-window and log-linear sequential analysis 12. Recurrence analysis and permutation tests.

...read moreread less

523 citations

Journal Article•DOI•

Spinal Instability Neoplastic Score: An Analysis of Reliability and Validity From the Spine Oncology Study Group

[...]

Daryl R. Fourney¹, Evan Frangou¹, Timothy C. Ryken², Christian P. DiPaola³, Christopher I. Shaffrey⁴, Sigurd Berven⁵, Mark H. Bilsky⁶, James S. Harrop⁷, Michael G. Fehlings⁸, Stefano Boriani, Dean Chou⁵, Meic H. Schmidt⁹, David W. Polly¹⁰, Roberto Biagini, Shane Burch⁵, Mark B. Dekutoski¹¹, Aruna Ganju¹², Peter C. Gerszten¹³, Ziya L. Gokaslan¹⁴, Michael W. Groff¹⁵, Norbert J. Liebsch¹⁵, Ehud Mendel¹⁶, Scott H. Okuno¹¹, Shreyaskumar Patel¹⁷, Laurence D. Rhines¹⁷, Peter S. Rose¹¹, Daniel M. Sciubba¹⁴, Narayan Sundaresan¹⁸, Katsuro Tomita¹⁹, Péter Varga, Luiz Roberto Vialle²⁰, Frank D. Vrionis²¹, Yoshiya Yamada⁶, Charles G. Fisher³ - Show less +30 more•Institutions (21)

01 Aug 2011-Journal of Clinical Oncology

TL;DR: SINS demonstrated near-perfect inter- and intraobserver reliability in determining three clinically relevant categories of stability in patients with spinal tumor-related spinal instability.

...read moreread less

Abstract: Purpose Standardized indications for treatment of tumor-related spinal instability are hampered by the lack of a valid and reliable classification system. The objective of this study was to determine the interobserver reliability, intraobserver reliability, and predictive validity of the Spinal Instability Neoplastic Score (SINS). Methods Clinical and radiographic data from 30 patients with spinal tumors were classified as stable, potentially unstable, and unstable by members of the Spine Oncology Study Group. The median category for each patient case (consensus opinion) was used as the gold standard for predictive validity testing. On two occasions at least 6 weeks apart, each rater also scored each patient using SINS. Each total score was converted into a three-category data field, with 0 to 6 as stable, 7 to 12 as potentially unstable, and 13 to 18 as unstable. Results The statistics for interobserver reliability were 0.790, 0.841, 0.244, 0.456, 0.462, and 0.492 for the fields of location, pain, bone quality, alignment, vertebral body collapse, and posterolateral involvement, respectively. The statistics for intraobserver reliability were 0.806, 0.859, 0.528, 0.614, 0.590, and 0.662 for the same respective fields. Intraclass correlation coefficients for inter- and intraobserver reliability of total SINS score were 0.846 (95% CI, 0.773 to 0.911) and 0.886 (95% CI, 0.868 to 0.902), respectively. The statistic for predictive validity was 0.712 (95% CI, 0.676 to 0.766). Conclusion SINS demonstrated near-perfect inter- and intraobserver reliability in determining three clinically relevant categories of stability. The sensitivity and specificity of SINS for potentially unstable or unstable lesions were 95.7% and 79.5%, respectively. J Clin Oncol 29:3072-3077. © 2011 by American Society of Clinical Oncology

...read moreread less

412 citations

Journal Article•DOI•

Reliability, standard error, and minimum detectable change of clinical pressure pain threshold testing in people with and without acute neck pain.

[...]

David M. Walton¹, Joy C. MacDermid, Warren R. Nielson, Robert Teasell, Marco Chiasson, Lauren Brown - Show less +2 more•Institutions (1)

University of Western Ontario¹

01 Sep 2011-Journal of Orthopaedic & Sports Physical Therapy

TL;DR: This study provides evidence that novice raters can perform digital algometry with adequate reliability for research and clinical use in people with and without neck pain.

...read moreread less

Abstract: Study Design Clinical measurement. Objectives To evaluate the intrarater, interrater, and test-retest reliability of an accessible digital algometer, and to determine the minimum detectable change in normal healthy individuals and a clinical population with neck pain. Background Pressure pain threshold testing may be a valuable assessment and prognostic indicator for people with neck pain. To date, most of this research has been completed using algometers that are too resource intensive for routine clinical use. Methods Novice raters (physiotherapy students or clinical physiotherapists) were trained to perform algometry testing over 2 clinically relevant sites: the angle of the upper trapezius and the belly of the tibialis anterior. A convenience sample of normal healthy individuals and a clinical sample of people with neck pain were tested by 2 different raters (all participants) and on 2 different days (healthy participants only). Intraclass correlation coefficient (ICC), standard error of measurement, ...

...read moreread less

372 citations

Journal Article•DOI•

Fugl-Meyer Assessment of Sensorimotor Function After Stroke Standardized Training Procedure for Clinical Practice and Clinical Trials

[...]

Katherine J. Sullivan¹, Julie K. Tilson¹, Steven Cen¹, Dorian K. Rose², Julie Hershberg¹, Anita Correa³, Joann Gallichio, Molly McLeod, Craig S. Moore, Samuel S. Wu², Samuel S. Wu⁴, Pamela W. Duncan⁴ - Show less +8 more•Institutions (4)

University of Southern California¹, University of Florida², Long Beach Memorial Medical Center³, Duke University⁴

01 Feb 2011-Stroke

TL;DR: Standardized measurement methods and training of therapist assessors for a multi-site, rehabilitation, randomized, clinical trial resulted in high inter-rater reliability for the Fugl-Meyer motor and sensory assessments.

...read moreread less

Abstract: Background and Purpose—Outcome measurement fidelity within and between sites of multi-site, randomized, clinical trials is an essential element to meaningful trial outcomes. As important are the methods developed for randomized, clinical trials that can have practical utility for clinical practice. A standardized measurement method and rater training program were developed for the total Fugl-Meyer motor and sensory assessments; inter-rater reliability was used to test program effectiveness. Methods—Fifteen individuals with hemiparetic stroke, 17 trained physical therapists across 5 regional clinical sites, and an expert rater participated in an inter-rater reliability study of the Fugl-Meyer motor (total, upper extremity, and lower extremity subscores) and sensory (total, light touch, and proprioception subscores) assessments. Results—Intra-rater reliability for the expert rater was high for the motor and sensory scores (range, 0.95–1.0). Inter-rater agreement (intraclass correlation coefficient, 2, 1) be...

...read moreread less

327 citations

Journal Article•DOI•

Minimal Detectable Change of the Timed “Up & Go” Test and the Dynamic Gait Index in People With Parkinson Disease

[...]

Sheau-Ling Huang¹, Ching-Lin Hsieh¹, Ruey-Meei Wu¹, Chun-Hwei Tai¹, Chin-Hsien Lin¹, Wen-Shian Lu² - Show less +2 more•Institutions (2)

National Taiwan University¹, Chung Shan Medical University²

01 Jan 2011-Physical Therapy

TL;DR: The results showed that the TUG and the DGI have generally acceptable random measurement error and test-retest reliability, which should help clinicians and researchers determine whether a change in an individual patient with PD is a true change.

...read moreread less

Abstract: Background The minimal detectable change (MDC) is the smallest amount of difference in individual scores that represents true change (beyond random measurement error). The MDCs of the Timed “Up & Go” Test (TUG) and the Dynamic Gait Index (DGI) in people with Parkinson disease (PD) are largely unknown, limiting the interpretability of the change scores of both measures. Objective The purpose of this study was to estimate the MDCs of the TUG and the DGI in people with PD. Design This investigation was a prospective cohort study. Methods Seventy-two participants were recruited from special clinics for movement disorders at a university hospital. Their mean age was 67.5 years, and 61% were men. All participants completed the TUG and the DGI assessments twice, about 14 days apart. The MDC was calculated from the standard error of measurement. The percentage MDC (MDC%) was calculated as the MDC divided by the mean of all scores for the sample. Furthermore, the intraclass correlation coefficient was used to examine the reproducibility between testing sessions (test-retest reliability). Results The respective MDC and MDC% of the TUG were 3.5 seconds and 29.8, and those of the DGI were 2.9 points and 13.3. The test-retest reliability values for the TUG and the DGI were high; the intraclass correlation coefficients were .80 and .84, respectively. Limitations The study sample was a convenience sample, and the participants had mild to moderately severe PD. Conclusions The results showed that the TUG and the DGI have generally acceptable random measurement error and test-retest reliability. These findings should help clinicians and researchers determine whether a change in an individual patient with PD is a true change.

...read moreread less

295 citations

Journal Article•DOI•

Comparison between an interactive web-based self-administered 24 h dietary record and an interview by a dietitian for large-scale epidemiological studies

[...]

Mathilde Touvier¹, Emmanuelle Kesse-Guyot¹, Caroline Méjean¹, Clothilde Pollet¹, Aurélie Malon¹, Katia Castetbon², Serge Hercberg¹ - Show less +3 more•Institutions (2)

French Institute of Health and Medical Research¹, Institut de veille sanitaire²

01 Apr 2011-British Journal of Nutrition

TL;DR: Online self-administered data collection, by reducing the logistic burden and cost, could advantageously replace classical methods based on dietitian's interviews when assessing dietary intake in large epidemiological studies.

...read moreread less

Abstract: Online self-administered data collection, by reducing the logistic burden and cost, could advantageously replace classical methods based on dietitian's interviews when assessing dietary intake in large epidemiological studies. Studies comparing such new instruments with traditional methods are necessary. Our objective was to compare one NutriNet-Sante web-based self-administered 24 h dietary record with one 24 h recall carried out by a dietitian. Subjects completed the web-based record, which was followed the next day by a dietitian-conducted 24 h recall by telephone (corresponding to the same day and using the same computerised interface for data entry). The subjects were 147 volunteers aged 48-75 years (women 59·2 %). The study was conducted in February 2009 in France. Agreement was assessed by intraclass correlation coefficients (ICC) for foods and energy-adjusted Pearson's correlations for nutrients. Agreement between the two methods was high, although it may have been overestimated because the two assessments were consecutive to one another. Among consumers only, the median of ICC for foods was 0·8 in men and 0·7 in women (range 0·5-0·9). The median of energy-adjusted Pearson's correlations for nutrients was 0·8 in both sexes (range 0·6-0·9). The mean Pearson correlation was higher in subjects ≤ 60 years (P = 0·02) and in those who declared being 'experienced/expert' with computers (P = 0·0003), but no difference was observed according to educational level (P = 0·12). The mean completion time was similar between the two methods (median for both methods: 25 min). The web-based method was preferred by 66·1 % of users. Our web-based dietary assessment, permitting considerable logistic simplification and cost savings, may be highly advantageous for large population-based surveys.

...read moreread less

243 citations

Journal Article•DOI•

Reliability and validity of the Spanish version of the 10-item Connor-Davidson Resilience Scale (10-item CD-RISC) in young adults.

[...]

Blanca Notario-Pacheco¹, Montserrat Solera-Martínez¹, María Dolores Serrano-Parra¹, Raquel Bartolomé-Gutiérrez¹, Javier García-Campayo², Vicente Martínez-Vizcaíno¹ - Show less +2 more•Institutions (2)

University of Castilla–La Mancha¹, University of Zaragoza²

05 Aug 2011-Health and Quality of Life Outcomes

TL;DR: The Spanish version of the 10-item Connor-Davidson Resilience Scale showed good psychometric properties in young adults and thus can be used as a reliable and valid instrument for measuring resilience.

...read moreread less

Abstract: The 10-item Connor-Davidson Resilience Scale (10-item CD-RISC) is an instrument for measuring resilience that has shown good psychometric properties in its original version in English. The aim of this study was to evaluate the validity and reliability of the Spanish version of the 10-item CD-RISC in young adults and to verify whether it is structured in a single dimension as in the original English version. Cross-sectional observational study including 681 university students ranging in age from 18 to 30 years. The number of latent factors in the 10 items of the scale was analyzed by exploratory factor analysis. Confirmatory factor analysis was used to verify whether a single factor underlies the 10 items of the scale as in the original version in English. The convergent validity was analyzed by testing whether the mean of the scores of the mental component of SF-12 (MCS) and the quality of sleep as measured with the Pittsburgh Sleep Index (PSQI) were higher in subjects with better levels of resilience. The internal consistency of the 10-item CD-RISC was estimated using the Cronbach α test and test-retest reliability was estimated with the intraclass correlation coefficient. The Cronbach α coefficient was 0.85 and the test-retest intraclass correlation coefficient was 0.71. The mean MCS score and the level of quality of sleep in both men and women were significantly worse in subjects with lower resilience scores. The Spanish version of the 10-item CD-RISC showed good psychometric properties in young adults and thus can be used as a reliable and valid instrument for measuring resilience. Our study confirmed that a single factor underlies the resilience construct, as was the case of the original scale in English.

...read moreread less

240 citations

Journal Article•DOI•

Finalisation and validation of the rheumatoid arthritis impact of disease score, a patient-derived composite measure of impact of rheumatoid arthritis: a EULAR initiative

[...]

Laure Gossec¹, Simon Paternotte¹, G J Aanerud, A. Balanescu, Dimitrios T. Boumpas, Loreto Carmona, M. de Wit, B. Dijkmans², Maxime Dougados¹, Matthias Englbrecht, Feride Gogus³, Turid Heiberg⁴, C Hernandez, John R. Kirwan⁵, E. Martín Mola, M. Matucci Cerinic, Kati Otsa, G. Schett, M Scholte-Voshaar², Tuulikki Sokka, G. von Krause¹, George A. Wells, T.K. Kvien - Show less +19 more•Institutions (5)

Paris Descartes University¹, University of Amsterdam², Gazi University³, University of Oslo⁴, University of Bristol⁵

01 Jun 2011-Annals of the Rheumatic Diseases

TL;DR: The RAID score is a patient-derived composite score assessing the seven most important domains of impact of RA, and is now validated; sensitivity to change should be further examined in larger studies.

...read moreread less

Abstract: Objective A patient-derived composite measure of the impact of rheumatoid arthritis (RA), the rheumatoid arthritis impact of disease (RAID) score, takes into account pain, functional capacity, fatigue, physical and emotional wellbeing, quality of sleep and coping. The objectives were to finalise the RAID and examine its psychometric properties. Methods An international multicentre cross-sectional and longitudinal study of consecutive RA patients from 12 European countries was conducted to examine the psychometric properties of the different combinations of instruments that might be included within the RAID combinations scale (numeric rating scales (NRS) or various questionnaires). Construct validity was assessed cross-sectionally by Spearman correlation, reliability by intraclass correlation coefficient (ICC) in 50 stable patients, and sensitivity to change by standardised response means (SRM) in 88 patients whose treatment was intensified. Results 570 patients (79% women, mean±SD age 56±13 years, disease duration 12.5±10.3 years, disease activity score (DAS28) 4.1±1.6) participated in the validation study. NRS questions performed as well as longer combinations of questionnaires: the final RAID score is composed of seven NRS questions. The final RAID correlated strongly with patient global (R=0.76) and significantly also with other outcomes (DAS28 R=0.69, short form 36 physical −0.59 and mental −0.55, p Conclusion The RAID score is a patient-derived composite score assessing the seven most important domains of impact of RA. This score is now validated; sensitivity to change should be further examined in larger studies.

...read moreread less

234 citations

Journal Article•DOI•

Reliability of the Hamilton Rating Scale for Depression: A meta-analysis over a period of 49 years

[...]

Goran Trajkovic¹, Vladan Starcevic², Vladan Starcevic³, Milan Latas¹, Miomir Leštarević⁴, Tanja Ille¹, Zoran Bukumiric⁴, Jelena Marinkovic¹ - Show less +4 more•Institutions (4)

University of Belgrade¹, University of Sydney², Nepean Hospital³, Universiteti i Prishtinës⁴

30 Aug 2011-Psychiatry Research-neuroimaging

TL;DR: Figures indicate good overall levels of internal consistency, inter-rater and test-retest reliability, but some HRSD items do not appear to possess a satisfactory reliability.

...read moreread less

Abstract: The aim of this study was to provide a comprehensive meta-analytic review of the reliability of the Hamilton Rating Scale for Depression (HRSD) for the period 1960-2008, taking into consideration all three types of reliability: internal consistency, inter-rater, and test-retest reliability. This is the first such meta-analytic study of a clinician-administered psychiatric scale. A thorough literature search was conducted using MEDLINE and PsycINFO. The total number of collected articles was 5548, of which 409 reported one or more reliability coefficients. The effect size was obtained by the z-transformation of reliability coefficients. The meta-analysis was performed separately for internal consistency, inter-rater and test-retest reliability. A pooled mean for alpha coefficient in random effects model was 0.789 (95%CI 0.766-0.810). The meta-regression analysis revealed that higher alpha coefficients were associated with higher variability of the HRSD total scores. With regard to inter-rater reliability, pooled means in random effects model were 0.937 (95%CI 0.914-0.954) for the intraclass correlation coefficient, 0.81 (95%CI 0.72-0.88) for the kappa coefficient, 0.94 (95%CI 0.90-0.97) for the Pearson correlation coefficient, and 0.91 (95%CI 0.78-0.96) for the Spearman rank correlation coefficient. A meta-regression analysis showed positive association between inter-rater reliability and publication year. Test-retest reliability of HRSD ranged between 0.65 and 0.98 and generally decreased with extending the interval between two measurements (Spearman r between the duration of interval and test-retest reliability figures=-0.74). Results suggest that HRSD provides a reliable assessment of depression. Figures indicate good overall levels of internal consistency, inter-rater and test-retest reliability, but some HRSD items (e.g., "loss of insight") do not appear to possess a satisfactory reliability.

...read moreread less

215 citations

Journal Article•DOI•

Reliability of Outcome Measures for People With Lower-Limb Amputations: Distinguishing True Change From Statistical Error

[...]

Linda Resnik¹, Matthew Borgia²•Institutions (2)

Providence VA Medical Center¹, Brown University²

01 Apr 2011-Physical Therapy

TL;DR: This is the first study to present test-retest reliability data on the self-reported OPUS scales, the PSFS in people with lower-limb amputations, and a new, easier-to-use scoring mechanism for the PEQ.

...read moreread less

Abstract: Background Use of outcome measures to examine outcomes of amputation is complicated by a number of factors, including ease of administration and lack of scientific evidence to guide selection and interpretation. Objective The purposes of this study were: (1) to estimate test-retest reliability of a modified version of the Prosthetic Evaluation Questionnaire (PEQ), scales of a version of the 36-Item Short-Form Health Survey questionnaire adapted for the veteran population (SF-36V), the Orthotics and Prosthetics Users' Survey (OPUS), the Patient-Specific Functional Scale (PSFS), the Two-Minute Walk Test, the Six-Minute Walk Test, the Timed “Up & Go” Test, and the Amputee Mobility Predictor; (2) to calculate minimal detectable change (MDC) of each measure; and (3) to conduct item analysis of the modified PEQ. Design This was a multi-site study with repeated measurements. Methods Forty-four patients with unilateral lower-limb amputation participated. Participants were tested twice within 1 week. We calculated test-retest reliability of each measure using intraclass correlation coefficient (ICC [2,1]), estimated standard error of the measurement and MDC, and assessed scale score distribution. Results The study demonstrated strong test-retest reliability scores of performance measures (ICC=.83–.97) suggesting that these measures are good choices for evaluation of people with lower-limb amputation. Reliability of PEQ subscales (ICC=.41–.93) was comparable to that reported in the literature (ICC=.56–.90). Limitations This study examined only statistically measurable differences and did not evaluate whether changes in scores were clinically important. Conclusions Minimal detectable change scores can be used to determine whether change in test scores exceeds measurement error associated with day-to-day variation. This is the first study to present test-retest reliability data on the self-reported OPUS scales, the PSFS in people with lower-limb amputations, and a new, easier-to-use scoring mechanism for the PEQ.

...read moreread less

208 citations

Journal Article•DOI•

Manual muscle strength testing of critically ill patients: feasibility and interobserver agreement

[...]

Catherine L. Hough¹, Binh K Lieu¹, Ellen Caldwell¹•Institutions (1)

University of Washington¹

28 Jan 2011-Critical Care

TL;DR: Manual muscle testing during critical illness was not possible for most patients because of coma, delirium and/or injury, and interobserver agreement regarding ICUAW was good, particularly when evaluated after ICU discharge, and MMT is insufficient for early detection of ICU-acquired neuromuscular dysfunction in most patients and may be unreliable duringcritical illness.

...read moreread less

Abstract: Introduction: It has been proposed that intensive care unit (ICU)-acquired weakness (ICUAW) should be assessed using the sum of manual muscle strength test scores in 12 muscle groups (the sum score). This approach has been tested in patients with Guillain-Barre syndrome, yet little is known about the feasibility or test characteristics in other critically ill patients. We studied the feasibility and interobserver agreement of this sum score in a mixed cohort of critically ill and injured patients. Methods: We enrolled patients requiring more than 3 days of mechanical ventilation. Two observers performed systematic strength assessments of each patient. The primary outcome measure was interobserver agreement of weakness as a binary outcome (ICUAW is sum score less than 48; “no ICUAW” is a sum score greater than or equal to 48) using the Cohen’s kappa statistic. Results: We identified 135 patients who met the inclusion criteria. Most were precluded from study participation by altered mental status or polytrauma. Thirty-four participants were enrolled, and 30 of these individuals completed assessments conducted by both observers. Six met the criteria for ICUAW recorded by at least one observer. The observers agreed on the diagnosis of ICUAW for 93% of participants (Cohen’s kappa = 0.76; 95% confidence interval (CI), 0.44 to 1.0). Observer agreement was fair in the ICU (Cohen’s kappa = 0.38), and agreement was perfect after ICU discharge (Cohen’s kappa = 1.0). Absolute values of sum scores were similar between observers (intraclass correlation coefficient 0.83; 95% CI, 0.67 to 0.91), but they differed between observers by six points or more for 23% of the participants. Conclusions: Manual muscle testing (MMT) during critical illness was not possible for most patients because of coma, delirium and/or injury. Among patients who were able to participate in testing, we found that interobserver agreement regarding ICUAW was good, particularly when evaluated after ICU discharge. MMT is insufficient for early detection of ICU-acquired neuromuscular dysfunction in most patients and may be unreliable during critical illness.

...read moreread less

Journal Article•DOI•

A comparison of accelerometry and center of pressure measures during computerized dynamic posturography: a measure of balance.

[...]

Susan L. Whitney¹, Jennica L. Roche¹, Gregory F. Marchetti², Chia-Cheng Lin¹, Daniel P. Steed¹, Gabriel R. Furman³, Mark C. Musolino, Mark S. Redfern¹ - Show less +4 more•Institutions (3)

University of Pittsburgh¹, RMIT University², Florida State University College of Arts and Sciences³

01 Apr 2011-Gait & Posture

TL;DR: The degree of association between COP and ACC was equivalent when using the first trial or the 3-trial average, suggesting that one trial may be sufficient in estimating balance function and minimizing clinical evaluation time.

...read moreread less

Journal Article•DOI•

Data with hierarchical structure: impact of intraclass correlation and sample size on type-I error.

[...]

Serban C. Musca¹, Rodolphe Kamiejski², Armelle Nugier², Alain Méot², Abdelatif Er-Rafiy², Markus Brauer², Markus Brauer³ - Show less +3 more•Institutions (3)

University of Rennes¹, Blaise Pascal University², Centre national de la recherche scientifique³

20 Apr 2011-Frontiers in Psychology

TL;DR: This work presents simulations showing how the Type-I error rate is affected under different conditions of intraclass correlation and sample size, and makes suggestions on how one should collect and analyze data bearing a hierarchical structure.

...read moreread less

Abstract: Least squares analyses (e.g., ANOVAs, linear regressions) of hierarchical data leads to Type-I error rates that depart severely from the nominal Type-I error rate assumed. Thus, when least squares methods are used to analyze hierarchical data coming from designs in which some groups are assigned to the treatment condition, and others to the control condition (i.e., the widely used "groups nested under treatment" experimental design), the Type-I error rate is seriously inflated, leading too often to the incorrect rejection of the null hypothesis (i.e., the incorrect conclusion of an effect of the treatment). To highlight the severity of the problem, we present simulations showing how the Type-I error rate is affected under different conditions of intraclass correlation and sample size. For all simulations the Type-I error rate after application of the popular Kish (1965) correction is also considered, and the limitations of this correction technique discussed. We conclude with suggestions on how one should collect and analyze data bearing a hierarchical structure.

...read moreread less

Intraclass Correlation Coefficient.

[...]

William D. Johnson, Gary G. Koch

01 Jan 2011

Journal Article•DOI•

Test-retest reliability of the five-repetition sit-to-stand test: a systematic review of the literature involving adults.

[...]

Richard W. Bohannon¹•Institutions (1)

University of Connecticut¹

01 Nov 2011-Journal of Strength and Conditioning Research

TL;DR: The purpose of this review was to summarize the findings of research using the intraclass correlation coefficient (ICC) to describe the test-retest reliability of the FRSTST.

...read moreread less

Abstract: The 5-repetition sit-to-stand test (FRSTST) is a widely used measure of functional strength, particularly among older adults. The purpose of this review was to summarize the findings of research using the intraclass correlation coefficient (ICC) to describe the test-retest reliability of the FRSTST. A search of 3 electronic databases and hand searches were used to identify relevant articles. Information on the subjects, test sessions and the ICCs reported was abstracted from the articles. The searches identified 10 relevant articles. The ICCs reported in the articles ranged from 0.64 to 0.96. The adjusted mean ICC calculated from the reported ICCs was 0.81. The test-retest reliability of the FRSTST can be interpreted as good to high in most populations and settings.

...read moreread less

Journal Article•DOI•

Reproducibility of in vivo corneal confocal microscopy as a novel screening test for early diabetic sensorimotor polyneuropathy.

[...]

P. Hertz, Vera Bril¹, A. Orszag, A. Ahmed, Eduardo Ng¹, P. Nwe¹, Mylan Ngo¹, Bruce A. Perkins - Show less +4 more•Institutions (1)

University of Toronto¹

01 Oct 2011-Diabetic Medicine

TL;DR: In this article, the authors identify the most reliable in vivo corneal confocal microscopy (CCM) parameter for detection of abnormality of small nerve fibre morphology for early diabetic sensorimotor polyneuropathy.

...read moreread less

Abstract: Diabet. Med. 28, 1253–1260 (2011) Abstract Aim With the goal of identifying a valid biomarker of early diabetic sensorimotor polyneuropathy, we aimed to identify the most reliable in vivo corneal confocal microscopy (CCM) parameter for detection of abnormality of small nerve fibre morphology. Methods Cross-sectional examination of 46 subjects (26 with Type 1 diabetes and 20 healthy volunteers) examined by corneal confocal microscopy for intra- and interobserver reproducibility by the intraclass correlation coefficient method. Corneal nerve fibre density, nerve branch density, nerve fibre length and tortuosity were measured on the same day that subjects underwent clinical and electrophysiological examination. Results The 26 subjects with Type 1 diabetes had mean age and diabetes duration 42.8 ± 16.9 and 22.7 ± 16.4 years, respectively. Twelve of those subjects (46%) did not meet criteria for diabetic sensorimotor polyneuropathy, while five (19%) had mild, three (12%) had moderate and six (23%) had severe diabetic sensorimotor polyneuropathy. None of the healthy volunteers (mean age 41.4 ± 17.3 years) had polyneuropathy. Re-examination of selected corneal confocal microscopy images or sets of 40 images yielded very good to excellent intraclass correlation coefficients for all parameters. However, only one parameter (corneal nerve fibre length) emerged with consistently very good reproducibility using a clinically relevant ‘study-level’ protocol of subject re-examination (intra-observer intraclass correlation coefficient 0.72; interobserver intraclass correlation coefficient 0.73). Despite no differences in intraclass correlation coefficient between subgroups, corneal nerve fibre length was significantly lower (14.76 vs. 16.15 mm/mm2, P = 0.04) in those with diabetes. Conclusions Development of corneal confocal microscopy may need to focus on the measurement of corneal nerve fibre length, as it appears to have superior reliability in comparison with other parameters, and as evidence exists for its potential as a clinical biomarker of early diabetic sensorimotor polyneuropathy.

...read moreread less

Journal Article•DOI•

Reliability of the landing error scoring system-real time, a clinical assessment tool of jump-landing biomechanics.

[...]

Darin A. Padua¹, Michelle C. Boling², Lindsay J. DiStefano³, James A. Onate⁴, Anthony I. Beutler⁵, Stephen W. Marshall - Show less +2 more•Institutions (5)

University of North Carolina at Chapel Hill¹, University of North Florida², University of Connecticut³, Ohio State University⁴, Uniformed Services University of the Health Sciences⁵

01 May 2011-Journal of Sport Rehabilitation

TL;DR: The LESS-RT is a quick, easy, and reliable clinical assessment tool that may be used by clinicians to identify individuals who may be at risk for lower extremity injuries.

...read moreread less

Abstract: Context: There is a need for reliable clinical assessment tools that can be used to identify individuals who may be at risk for injury. The Landing Error Scoring System (LESS) is a reliable and valid clinical assessment tool that was developed to identify individuals at risk for lower extremity injuries. One limitation of this tool is that it cannot be assessed in real time and requires the use of video cameras. Objective: To determine the interrater reliability of a real-time version of the LESS, the LESS-RT. Design: Reliability study. Setting: Controlled research laboratory. Participants: 43 healthy volunteers (24 women, 19 men) between the ages of 18 and 23. Intervention: The LESS-RT evaluates 10 jump-landing characteristics that may predispose an individual to lower extremity injuries. Two sets of raters used the LESS-RT to evaluate participants as they performed 4 trials of a jump-landing task. Main Outcome Measures: Intraclass correlation coefficient (ICC 2,1) values for the final composite score of the LESS-RT were calculated to assess interrater reliability of the LESS-RT. Results: Interrater reliability (ICC2,1) for the LESS-RT ranged from .72 to .81 with standard error of measurements ranging from .69 to .79. Conclusions: The LESS-RT is a quick, easy, and reliable clinical assessment tool that may be used by clinicians to identify individuals who may be at risk for lower extremity injuries.

...read moreread less

Journal Article•DOI•

Measures of interrater agreement.

[...]

Jayawant N. Mandrekar¹•Institutions (1)

Mayo Clinic¹

01 Jan 2011-Journal of Thoracic Oncology

TL;DR: The weighted kappa when the outcome is ordinal and the intraclass correlation to assess agreement in an event the data are measured on a continuous scale are introduced.

...read moreread less

Journal Article•DOI•

Validation of Patient Health Questionnaire for depression screening among primary care patients in Taiwan.

[...]

Shen-Ing Liu¹, Zai-Ting Yeh², Hui-Chun Huang³, Hui-Chun Huang¹, Fang-Ju Sun³, Fang-Ju Sun¹, Jin-Jin Tjung¹, Lee-Ching Hwang¹, Lee-Ching Hwang³, Yang-Hsien Shih¹, Andrew Wei-Chiang Yeh⁴ - Show less +7 more•Institutions (4)

Mackay Memorial Hospital¹, Fu Jen Catholic University², Mackay Medical College³, Chang Gung University⁴

01 Jan 2011-Comprehensive Psychiatry

TL;DR: ThePHQ-9 and its 2 subscales, PHQ-2 and PHZ-1, seem reliable and valid for detecting MDD among Chinese primary care patients.

...read moreread less

Journal Article•DOI•

Reliability and validity of age band 1 of the Movement Assessment Battery for Children--second edition.

[...]

Theodoros Ellinoudis¹, Christina Evaggelinou¹, Thomas Kourtessis², Zoe Konstantinidou¹, Fotini Venetsanou², Antonis Kambas² - Show less +2 more•Institutions (2)

Aristotle University of Thessaloniki¹, Democritus University of Thrace²

01 May 2011-Research in Developmental Disabilities

TL;DR: The results suggest that the MABC-2 can be a reliable and valid tool for the assessment of movement difficulties among 3-5-year-old children.

...read moreread less

Journal Article•DOI•

A Clinical Tool to Measure Trunk Control in Children with Cerebral Palsy: The Trunk Control Measurement Scale.

[...]

Lieve Heyrman¹, Guy Molenaers, Kaat Desloovere¹, Geert Verheyden, Josse De Cat, Elegast Monbaliu¹, Hilde Feys¹ - Show less +3 more•Institutions (1)

Katholieke Universiteit Leuven¹

01 Nov 2011-Research in Developmental Disabilities

TL;DR: The results support the reliability and validity of the TCMS in children with spastic CP and the scale gives insight into the strengths and weaknesses of the child's trunk performance and therefore can have valuable clinical use.

...read moreread less

Journal Article•DOI•

Isometric muscle strength in youth assessed by hand-held dynamometry: a feasibility, reliability, and validity study.

[...]

Luc J. Hébert, Désirée B. Maltais, Céline Lepage, Joanne Saulnier, Mélanie Crête, Marc Perron - Show less +2 more

01 Jan 2011-Pediatric Physical Therapy

TL;DR: In this article, a hand-held dynamometer (HHD) was used to determine the feasibility of maximal isometric torque (MIT) measurement over a wide age range, intra-and interrater reliability, standard error of measurement, and concurrent validity.

...read moreread less

Abstract: Purpose To determine, with respect to measurement of maximal isometric torque (MIT) using a specific hand-held dynamometer (HHD) protocol, (1) protocol feasibility over a wide age range, (2) intra- and interrater reliability, (3) standard error of measurement, and (4) concurrent validity. Methods The MIT of selected upper and lower limb muscle groups was assessed (n = 74; age = 4-17.5 years) using a standardized, HHD protocol. Testing was repeated in 20 adolescents (n = 10 for each muscle group), who were also assessed with a Cybex dynamometer. Results The protocol was feasible for all participants. Mean intra- and interrater reliability [intraclass correlation coefficient (ICC)] varied from 0.75 to 0.98, except for ankle dorsiflexor interrater reliability (mean ICC = 0.67). The standard error of measurement varied from 0.5 to 4.9 Nm and was highest for hip extensors. Mean concurrent validity (ICC) varied from 0.78 to 0.93, except for ankle plantar flexors (mean ICC = 0.48). Conclusions Our HHD protocol was feasible over a wide age range and most MIT values were valid and reliable.

...read moreread less

Journal Article•DOI•

A new, validated instrument to evaluate competency in microsurgery: the University of Western Ontario Microsurgical Skills Acquisition/Assessment instrument [outcomes article].

[...]

Claire L.F. Temple¹, Douglas C. Ross•Institutions (1)

University of Western Ontario¹

01 Jan 2011-Plastic and Reconstructive Surgery

TL;DR: A model for microsurgery learning as well as a validated instrument to evaluate microsurgical competency and measures of construct validity and criterion validity demonstrated that higher scores on the UWOMSA were associated with faster knot tying and higher postgraduate year level.

...read moreread less

Abstract: BACKGROUND The authors present a model for microsurgery learning as well as a validated instrument to evaluate microsurgical competency. METHODS Novice microsurgeons participated in three 3-hour sessions wherein they completed a number of increasingly complex, standardized microsurgical tasks. Performance was recorded and graded using a newly developed University of Western Ontario Microsurgery Skills Acquisition/Assessment (UWOMSA) instrument. The knot-tying and anastomosis modules contained three categories with five-point Likert scales. Each learner's performance was assessed by two blinded surgeons. Reznick's validated global rating scale for operative performance was utilized to establish criterion validity. Within-scale scores were compared via intraclass correlation and between-scale scores with Pearson correlation coefficient. Linear regression was used to evaluate the effect of various predictors on UWOMSA scores. RESULTS Thirty-seven videos (9.6 hours) were reviewed, including 20 knot-tying sessions and 17 anastomoses. Interrater reliability of UWOMSA was high, with an intraclass correlation coefficient of 0.75 (0.57, 0.87). The intraclass correlation of the global rating scale was 0.79 (0.62, 0.89). Intrarater reliability of the UWOMSA was also high, with an intraclass correlation of 0.69 (0.48, 0.83). The intraclass correlation of the global rating scale was 0.69 (0.47, 0.84). Measures of criterion validity demonstrated strong agreement between UWOMSA and the global rating scale (Pearson correlation coefficient, 0.96; p < 0.001). Measures of construct validity demonstrated that higher scores on the UWOMSA were associated with faster knot tying (p < 0.0001) and higher postgraduate year level (p = 0.05). CONCLUSIONS The UWOMSA instrument performed well in terms of reliability and validity. Further study is planned to assess the instrument's ability to predict microsurgical skills translation to the clinical setting.

...read moreread less

Journal Article•DOI•

United States (US) multi-center study to assess the validity and reliability of the Spinal Cord Independence Measure (SCIM III)

[...]

Kim D. Anderson¹, M. E. Acuff², B. G. Arp², Deborah Backus³, S. Chun⁴, K. Fisher⁴, J. E. Fjerstad, Daniel E. Graves⁵, Karen Greenwald⁶, S. L. Groah⁷, Susan J. Harkema⁸, John A. Horton⁶, M. N. Huang⁹, M. Jennings⁴, K. S. Kelley¹⁰, S. M. Kessler¹¹, Steve Kirshblum¹², S. Koltenuk¹², M. Linke¹³, Inger Ljungberg⁷, Janos Nagy¹⁴, L. Nicolini¹⁵, Mary Joan Roach¹⁴, S. Salles¹⁶, William M. Scelza¹⁷, Mary Schmidt Read¹⁵, Ronald K. Reeves¹¹, M. D. Scott⁹, Keith E. Tansey³, J. L. Theis, C. Z. Tolfo⁸, M. Whitney¹⁷, Carla Williams¹³, C. M. Winter¹⁰, Jeanne M. Zanca¹⁸ - Show less +31 more•Institutions (18)

University of California, Irvine¹, University of Missouri², Emory University³, Veterans Health Administration⁴, Baylor College of Medicine⁵, University of Pittsburgh⁶, Memorial Hospital of South Bend⁷, University of Louisville⁸, Rancho Los Amigos National Rehabilitation Center⁹, Northwestern University¹⁰, Mayo Clinic¹¹, Kessler Institute for Rehabilitation¹², St. Joseph's Hospital and Medical Center¹³, Case Western Reserve University¹⁴, Magee Rehabilitation Hospital¹⁵, University of Kentucky¹⁶, Carolinas Healthcare System¹⁷, Icahn School of Medicine at Mount Sinai¹⁸

01 Aug 2011-Spinal Cord

TL;DR: Overall, the SCIM III is a reliable and valid measure of functional change in SCI, however, improved scoring instructions and a few modifications to the scoring categories may reduce variability between raters and enhance clinical utility.

...read moreread less

Abstract: Multi-center, prospective, cohort study. To assess the validity and reliability of the Spinal Cord Independence Measure (SCIM III) in measuring functional ability in persons with spinal cord injury (SCI). Inpatient rehabilitation hospitals in the United States (US). Functional ability was measured with the SCIM III during the first week of admittance into inpatient acute rehabilitation and within one week of discharge from the same rehabilitation program. Motor and sensory neurologic impairment was measured with the American Spinal Injury Association Impairment Scale. The Functional Independence Measure (FIM), the default functional measure currently used in most US hospitals, was used as a comparison standard for the SCIM III. Statistical analyses were used to test the validity and reliability of the SCIM III. Total agreement between raters was above 70% on most SCIM III tasks and all κ-coefficients were statistically significant (P<0.001). The coefficients of Pearson correlation between the paired raters were above 0.81 and intraclass correlation coefficients were above 0.81. Cronbach’s-α was above 0.7, with the exception of the respiration task. The coefficient of Pearson correlation between the FIM and SCIM III was 0.8 (P<0.001). For the respiration and sphincter management subscale, the SCIM III was more responsive to change, than the FIM (P<0.0001). Overall, the SCIM III is a reliable and valid measure of functional change in SCI. However, improved scoring instructions and a few modifications to the scoring categories may reduce variability between raters and enhance clinical utility.

...read moreread less

Journal Article•DOI•

An objective device for measuring surface roughness of skin and scars

[...]

Monica C. T. Bloemen, Maaike S. van Gerven, Martijn B. A. van der Wal, Pauline D. H. M. Verhaegen, Esther Middelkoop¹ - Show less +1 more•Institutions (1)

VU University Medical Center¹

01 Apr 2011-Journal of The American Academy of Dermatology

TL;DR: The PRIMOS is a valid and reliable tool for objective noninvasive evaluation of surface roughness of both skin and burn scars and is suitable for use in clinical setting.

...read moreread less

Abstract: Background Scar formation remains a major clinical problem; therefore, various therapies have been developed to improve scar quality. To evaluate the effectiveness of these therapies, objective measurement tools are necessary. An appropriate, objective measuring instrument for assessment of surface roughness is not yet available in a clinical setting. The Phaseshift Rapid In Vivo Measurement of the Skin (PRIMOS) (GFMesstechnik GmbH, Teltow, Germany) could be such an instrument. This device noninvasively produces a 3-dimensional image of the skin microtopography and measures surface roughness. Objective The aim of this study was to investigate the reliability and validity of the PRIMOS for objective and quantitative measurement of surface roughness of skin and scars. Methods Three observers assessed skin and burn scars in 60 patients using the PRIMOS and a subjective scale, the Patient and Observer Scar Assessment Scale. Reliability was tested using the intraclass correlation of intraobserver and interobserver measurements. An intraclass correlation coefficient of 0.7 or greater was required for reliable results. To test validity, scores of the PRIMOS were compared with scores of the subjective scale (Pearson correlation). A Pearson correlation coefficient greater than 0.6 was considered a strong positive correlation. Results All 3 surface roughness parameters of the PRIMOS showed good intraobserver and interobserver reliability for skin and scars (intraclass correlation coefficient arithmetic mean of surface roughness > 0.85, mean of 5 highest peaks and 5 deepest valleys from entire measuring field > 0.88, peak count > 0.86). The parameter arithmetic mean of surface roughness showed a strong correlation with the subjective score (Pearson arithmetic mean of surface roughness 0.70; mean of 5 highest peaks and 5 deepest valleys from entire measuring field 0.53; peak count 0.54). Limitations The reliability and validity of the PRIMOS were only tested on skin and burn scars, not in other dermatologic diseases. Conclusions The PRIMOS is a valid and reliable tool for objective noninvasive evaluation of surface roughness of both skin and burn scars.

...read moreread less

Journal Article•DOI•

Reliability and validity of a visual analogue scale used by owners to measure chronic pain attributable to osteoarthritis in their dogs.

[...]

Anna Hielm-Björkman¹, Amy S. Kapatkin¹, Hannu J. Rita²•Institutions (2)

University of Helsinki¹, University of California, Davis²

01 May 2011-American Journal of Veterinary Research

TL;DR: Although valid and reliable, the pain VAS was a poor tool for untrained owners because of poor face validity (ie, owners could not recognize their dogs' behavior as signs of pain). Only after owners had seen pain diminish and then return (after starting and discontinuing NSAID use) did the VAS have face validity.

...read moreread less

Abstract: Objective—To assess validity and reliability for a visual analogue scale (VAS) used by owners to measure chronic pain in their osteoarthritic dogs. Sample—68, 61, and 34 owners who completed a questionnaire. Procedures—Owners answered questionnaires at 5 time points. Criterion validity of the VAS was evaluated for all dogs in the intended-to-treat population by correlating scores for the VAS with scores for the validated Helsinki Chronic Pain Index (HCPI) and a relative quality-of-life scale. Intraclass correlation was used to assess repeatability of the pain VAS at 2 baseline evaluations. To determine sensitivity to change and face validity of the VAS, 2 blinded, randomized control groups (17 dogs receiving carprofen and 17 receiving a placebo) were analyzed over time. Results—Significant correlations existed between the VAS score and the quality-of-life scale and HCPI scores. Intraclass coefficient (r = 0.72; 95% confidence interval, 0.57 to 0.82) for the VAS indicated good repeatability. In the carprof...

...read moreread less

Journal Article•DOI•

Screening for depression with the Patient Health Questionnaire-2 (PHQ-2) among the general population in Hong Kong

[...]

Xiaonan Yu¹, Sunita M. Stewart², Paul T.K. Wong¹, Tai Hing Lam¹•Institutions (2)

University of Hong Kong¹, University of Texas Southwestern Medical Center²

01 Nov 2011-Journal of Affective Disorders

TL;DR: Evidence is provided for the PHQ-2 as a reliable and valid screening tool for depressive symptoms among a randomly recruited community sample in Hong Kong.

...read moreread less

Journal Article•DOI•

Is the Movement Assessment Battery for Children-2nd edition a reliable instrument to measure motor performance in 3 year old children?

[...]

Bouwien C. M. Smits-Engelsman, Anuschka S. Niemeijer, Hilde Van Waelvelde¹•Institutions (1)

Ghent University¹

01 Jul 2011-Research in Developmental Disabilities

TL;DR: The revised test can be applied to assess motor performance in typically developing 3-year old children and future studies are needed to confirm if the same can be said for children with motor delays.

...read moreread less

Journal Article•DOI•

Mobility Assessment: Sensitivity and Specificity of Measurement Sets in Older Adults

[...]

Victoria P. Panzer¹, Dorothy Wakefield¹, Charles B. Hall², Leslie Wolfson¹•Institutions (2)

University of Connecticut Health Center¹, Albert Einstein College of Medicine²

01 Jun 2011-Archives of Physical Medicine and Rehabilitation

TL;DR: Sets of quantitative measurement variables obtained with this mobility battery provided sensitive prediction of future injury falls and screening for multiple subsequent falls by using tasks that should be appropriate to diverse participants.

...read moreread less

Journal Article•DOI•

Accuracy, Reproducibility and Repeatability of Ultrasonography in the Assessment of Abdominal Adiposity

[...]

Alberto Bazzocchi¹, Giacomo Filonzi¹, Federico Ponti¹, Claudia Sassi¹, Eugenio Salizzoni¹, Giuseppe Battista¹, Romeo Canini¹ - Show less +3 more•Institutions (1)

University of Bologna¹

01 Sep 2011-Academic Radiology

TL;DR: Ultrasound is accurate, reproducible, and fast in the analysis of abdominal adiposity and offers a regional, easy, and close-at-hand evaluation of subcutaneous and visceral fat compartments.

...read moreread less

Collapse