scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 2011"


MonographDOI
01 Oct 2011
TL;DR: In this paper, the authors present a survey of observational methods for point-by-point agreement in coding schemes and enumeration of individual codes, as well as summary statistics for individual codes.
Abstract: 1. Introduction to observational methods 2. Coding schemes and observational measurement 3. Recording observational data 4. Representing observational data 5. Observer agreement and Cohen's kappa 6. Kappas for point-by-point agreement 7. The intraclass correlation coefficient (ICC) for summary measures 8. Summary statistics for individual codes 9. Cell and summary statistics for contingency tables 10. Preparing for sequential and other analyses 11. Time-window and log-linear sequential analysis 12. Recurrence analysis and permutation tests.

523 citations


Journal ArticleDOI
TL;DR: SINS demonstrated near-perfect inter- and intraobserver reliability in determining three clinically relevant categories of stability in patients with spinal tumor-related spinal instability.
Abstract: Purpose Standardized indications for treatment of tumor-related spinal instability are hampered by the lack of a valid and reliable classification system. The objective of this study was to determine the interobserver reliability, intraobserver reliability, and predictive validity of the Spinal Instability Neoplastic Score (SINS). Methods Clinical and radiographic data from 30 patients with spinal tumors were classified as stable, potentially unstable, and unstable by members of the Spine Oncology Study Group. The median category for each patient case (consensus opinion) was used as the gold standard for predictive validity testing. On two occasions at least 6 weeks apart, each rater also scored each patient using SINS. Each total score was converted into a three-category data field, with 0 to 6 as stable, 7 to 12 as potentially unstable, and 13 to 18 as unstable. Results The statistics for interobserver reliability were 0.790, 0.841, 0.244, 0.456, 0.462, and 0.492 for the fields of location, pain, bone quality, alignment, vertebral body collapse, and posterolateral involvement, respectively. The statistics for intraobserver reliability were 0.806, 0.859, 0.528, 0.614, 0.590, and 0.662 for the same respective fields. Intraclass correlation coefficients for inter- and intraobserver reliability of total SINS score were 0.846 (95% CI, 0.773 to 0.911) and 0.886 (95% CI, 0.868 to 0.902), respectively. The statistic for predictive validity was 0.712 (95% CI, 0.676 to 0.766). Conclusion SINS demonstrated near-perfect inter- and intraobserver reliability in determining three clinically relevant categories of stability. The sensitivity and specificity of SINS for potentially unstable or unstable lesions were 95.7% and 79.5%, respectively. J Clin Oncol 29:3072-3077. © 2011 by American Society of Clinical Oncology

412 citations


Journal ArticleDOI
TL;DR: This study provides evidence that novice raters can perform digital algometry with adequate reliability for research and clinical use in people with and without neck pain.
Abstract: Study Design Clinical measurement. Objectives To evaluate the intrarater, interrater, and test-retest reliability of an accessible digital algometer, and to determine the minimum detectable change in normal healthy individuals and a clinical population with neck pain. Background Pressure pain threshold testing may be a valuable assessment and prognostic indicator for people with neck pain. To date, most of this research has been completed using algometers that are too resource intensive for routine clinical use. Methods Novice raters (physiotherapy students or clinical physiotherapists) were trained to perform algometry testing over 2 clinically relevant sites: the angle of the upper trapezius and the belly of the tibialis anterior. A convenience sample of normal healthy individuals and a clinical sample of people with neck pain were tested by 2 different raters (all participants) and on 2 different days (healthy participants only). Intraclass correlation coefficient (ICC), standard error of measurement, ...

372 citations


Journal ArticleDOI
01 Feb 2011-Stroke
TL;DR: Standardized measurement methods and training of therapist assessors for a multi-site, rehabilitation, randomized, clinical trial resulted in high inter-rater reliability for the Fugl-Meyer motor and sensory assessments.
Abstract: Background and Purpose—Outcome measurement fidelity within and between sites of multi-site, randomized, clinical trials is an essential element to meaningful trial outcomes. As important are the methods developed for randomized, clinical trials that can have practical utility for clinical practice. A standardized measurement method and rater training program were developed for the total Fugl-Meyer motor and sensory assessments; inter-rater reliability was used to test program effectiveness. Methods—Fifteen individuals with hemiparetic stroke, 17 trained physical therapists across 5 regional clinical sites, and an expert rater participated in an inter-rater reliability study of the Fugl-Meyer motor (total, upper extremity, and lower extremity subscores) and sensory (total, light touch, and proprioception subscores) assessments. Results—Intra-rater reliability for the expert rater was high for the motor and sensory scores (range, 0.95–1.0). Inter-rater agreement (intraclass correlation coefficient, 2, 1) be...

327 citations


Journal ArticleDOI
TL;DR: The results showed that the TUG and the DGI have generally acceptable random measurement error and test-retest reliability, which should help clinicians and researchers determine whether a change in an individual patient with PD is a true change.
Abstract: Background The minimal detectable change (MDC) is the smallest amount of difference in individual scores that represents true change (beyond random measurement error). The MDCs of the Timed “Up & Go” Test (TUG) and the Dynamic Gait Index (DGI) in people with Parkinson disease (PD) are largely unknown, limiting the interpretability of the change scores of both measures. Objective The purpose of this study was to estimate the MDCs of the TUG and the DGI in people with PD. Design This investigation was a prospective cohort study. Methods Seventy-two participants were recruited from special clinics for movement disorders at a university hospital. Their mean age was 67.5 years, and 61% were men. All participants completed the TUG and the DGI assessments twice, about 14 days apart. The MDC was calculated from the standard error of measurement. The percentage MDC (MDC%) was calculated as the MDC divided by the mean of all scores for the sample. Furthermore, the intraclass correlation coefficient was used to examine the reproducibility between testing sessions (test-retest reliability). Results The respective MDC and MDC% of the TUG were 3.5 seconds and 29.8, and those of the DGI were 2.9 points and 13.3. The test-retest reliability values for the TUG and the DGI were high; the intraclass correlation coefficients were .80 and .84, respectively. Limitations The study sample was a convenience sample, and the participants had mild to moderately severe PD. Conclusions The results showed that the TUG and the DGI have generally acceptable random measurement error and test-retest reliability. These findings should help clinicians and researchers determine whether a change in an individual patient with PD is a true change.

295 citations


Journal ArticleDOI
TL;DR: Online self-administered data collection, by reducing the logistic burden and cost, could advantageously replace classical methods based on dietitian's interviews when assessing dietary intake in large epidemiological studies.
Abstract: Online self-administered data collection, by reducing the logistic burden and cost, could advantageously replace classical methods based on dietitian's interviews when assessing dietary intake in large epidemiological studies. Studies comparing such new instruments with traditional methods are necessary. Our objective was to compare one NutriNet-Sante web-based self-administered 24 h dietary record with one 24 h recall carried out by a dietitian. Subjects completed the web-based record, which was followed the next day by a dietitian-conducted 24 h recall by telephone (corresponding to the same day and using the same computerised interface for data entry). The subjects were 147 volunteers aged 48-75 years (women 59·2 %). The study was conducted in February 2009 in France. Agreement was assessed by intraclass correlation coefficients (ICC) for foods and energy-adjusted Pearson's correlations for nutrients. Agreement between the two methods was high, although it may have been overestimated because the two assessments were consecutive to one another. Among consumers only, the median of ICC for foods was 0·8 in men and 0·7 in women (range 0·5-0·9). The median of energy-adjusted Pearson's correlations for nutrients was 0·8 in both sexes (range 0·6-0·9). The mean Pearson correlation was higher in subjects ≤ 60 years (P = 0·02) and in those who declared being 'experienced/expert' with computers (P = 0·0003), but no difference was observed according to educational level (P = 0·12). The mean completion time was similar between the two methods (median for both methods: 25 min). The web-based method was preferred by 66·1 % of users. Our web-based dietary assessment, permitting considerable logistic simplification and cost savings, may be highly advantageous for large population-based surveys.

243 citations


Journal ArticleDOI
TL;DR: The Spanish version of the 10-item Connor-Davidson Resilience Scale showed good psychometric properties in young adults and thus can be used as a reliable and valid instrument for measuring resilience.
Abstract: The 10-item Connor-Davidson Resilience Scale (10-item CD-RISC) is an instrument for measuring resilience that has shown good psychometric properties in its original version in English. The aim of this study was to evaluate the validity and reliability of the Spanish version of the 10-item CD-RISC in young adults and to verify whether it is structured in a single dimension as in the original English version. Cross-sectional observational study including 681 university students ranging in age from 18 to 30 years. The number of latent factors in the 10 items of the scale was analyzed by exploratory factor analysis. Confirmatory factor analysis was used to verify whether a single factor underlies the 10 items of the scale as in the original version in English. The convergent validity was analyzed by testing whether the mean of the scores of the mental component of SF-12 (MCS) and the quality of sleep as measured with the Pittsburgh Sleep Index (PSQI) were higher in subjects with better levels of resilience. The internal consistency of the 10-item CD-RISC was estimated using the Cronbach α test and test-retest reliability was estimated with the intraclass correlation coefficient. The Cronbach α coefficient was 0.85 and the test-retest intraclass correlation coefficient was 0.71. The mean MCS score and the level of quality of sleep in both men and women were significantly worse in subjects with lower resilience scores. The Spanish version of the 10-item CD-RISC showed good psychometric properties in young adults and thus can be used as a reliable and valid instrument for measuring resilience. Our study confirmed that a single factor underlies the resilience construct, as was the case of the original scale in English.

240 citations


Journal ArticleDOI
TL;DR: The RAID score is a patient-derived composite score assessing the seven most important domains of impact of RA, and is now validated; sensitivity to change should be further examined in larger studies.
Abstract: Objective A patient-derived composite measure of the impact of rheumatoid arthritis (RA), the rheumatoid arthritis impact of disease (RAID) score, takes into account pain, functional capacity, fatigue, physical and emotional wellbeing, quality of sleep and coping. The objectives were to finalise the RAID and examine its psychometric properties. Methods An international multicentre cross-sectional and longitudinal study of consecutive RA patients from 12 European countries was conducted to examine the psychometric properties of the different combinations of instruments that might be included within the RAID combinations scale (numeric rating scales (NRS) or various questionnaires). Construct validity was assessed cross-sectionally by Spearman correlation, reliability by intraclass correlation coefficient (ICC) in 50 stable patients, and sensitivity to change by standardised response means (SRM) in 88 patients whose treatment was intensified. Results 570 patients (79% women, mean±SD age 56±13 years, disease duration 12.5±10.3 years, disease activity score (DAS28) 4.1±1.6) participated in the validation study. NRS questions performed as well as longer combinations of questionnaires: the final RAID score is composed of seven NRS questions. The final RAID correlated strongly with patient global (R=0.76) and significantly also with other outcomes (DAS28 R=0.69, short form 36 physical −0.59 and mental −0.55, p Conclusion The RAID score is a patient-derived composite score assessing the seven most important domains of impact of RA. This score is now validated; sensitivity to change should be further examined in larger studies.

234 citations


Journal ArticleDOI
TL;DR: Figures indicate good overall levels of internal consistency, inter-rater and test-retest reliability, but some HRSD items do not appear to possess a satisfactory reliability.
Abstract: The aim of this study was to provide a comprehensive meta-analytic review of the reliability of the Hamilton Rating Scale for Depression (HRSD) for the period 1960-2008, taking into consideration all three types of reliability: internal consistency, inter-rater, and test-retest reliability. This is the first such meta-analytic study of a clinician-administered psychiatric scale. A thorough literature search was conducted using MEDLINE and PsycINFO. The total number of collected articles was 5548, of which 409 reported one or more reliability coefficients. The effect size was obtained by the z-transformation of reliability coefficients. The meta-analysis was performed separately for internal consistency, inter-rater and test-retest reliability. A pooled mean for alpha coefficient in random effects model was 0.789 (95%CI 0.766-0.810). The meta-regression analysis revealed that higher alpha coefficients were associated with higher variability of the HRSD total scores. With regard to inter-rater reliability, pooled means in random effects model were 0.937 (95%CI 0.914-0.954) for the intraclass correlation coefficient, 0.81 (95%CI 0.72-0.88) for the kappa coefficient, 0.94 (95%CI 0.90-0.97) for the Pearson correlation coefficient, and 0.91 (95%CI 0.78-0.96) for the Spearman rank correlation coefficient. A meta-regression analysis showed positive association between inter-rater reliability and publication year. Test-retest reliability of HRSD ranged between 0.65 and 0.98 and generally decreased with extending the interval between two measurements (Spearman r between the duration of interval and test-retest reliability figures=-0.74). Results suggest that HRSD provides a reliable assessment of depression. Figures indicate good overall levels of internal consistency, inter-rater and test-retest reliability, but some HRSD items (e.g., "loss of insight") do not appear to possess a satisfactory reliability.

215 citations


Journal ArticleDOI
TL;DR: This is the first study to present test-retest reliability data on the self-reported OPUS scales, the PSFS in people with lower-limb amputations, and a new, easier-to-use scoring mechanism for the PEQ.
Abstract: Background Use of outcome measures to examine outcomes of amputation is complicated by a number of factors, including ease of administration and lack of scientific evidence to guide selection and interpretation. Objective The purposes of this study were: (1) to estimate test-retest reliability of a modified version of the Prosthetic Evaluation Questionnaire (PEQ), scales of a version of the 36-Item Short-Form Health Survey questionnaire adapted for the veteran population (SF-36V), the Orthotics and Prosthetics Users' Survey (OPUS), the Patient-Specific Functional Scale (PSFS), the Two-Minute Walk Test, the Six-Minute Walk Test, the Timed “Up & Go” Test, and the Amputee Mobility Predictor; (2) to calculate minimal detectable change (MDC) of each measure; and (3) to conduct item analysis of the modified PEQ. Design This was a multi-site study with repeated measurements. Methods Forty-four patients with unilateral lower-limb amputation participated. Participants were tested twice within 1 week. We calculated test-retest reliability of each measure using intraclass correlation coefficient (ICC [2,1]), estimated standard error of the measurement and MDC, and assessed scale score distribution. Results The study demonstrated strong test-retest reliability scores of performance measures (ICC=.83–.97) suggesting that these measures are good choices for evaluation of people with lower-limb amputation. Reliability of PEQ subscales (ICC=.41–.93) was comparable to that reported in the literature (ICC=.56–.90). Limitations This study examined only statistically measurable differences and did not evaluate whether changes in scores were clinically important. Conclusions Minimal detectable change scores can be used to determine whether change in test scores exceeds measurement error associated with day-to-day variation. This is the first study to present test-retest reliability data on the self-reported OPUS scales, the PSFS in people with lower-limb amputations, and a new, easier-to-use scoring mechanism for the PEQ.

208 citations


Journal ArticleDOI
TL;DR: Manual muscle testing during critical illness was not possible for most patients because of coma, delirium and/or injury, and interobserver agreement regarding ICUAW was good, particularly when evaluated after ICU discharge, and MMT is insufficient for early detection of ICU-acquired neuromuscular dysfunction in most patients and may be unreliable duringcritical illness.
Abstract: Introduction: It has been proposed that intensive care unit (ICU)-acquired weakness (ICUAW) should be assessed using the sum of manual muscle strength test scores in 12 muscle groups (the sum score). This approach has been tested in patients with Guillain-Barre syndrome, yet little is known about the feasibility or test characteristics in other critically ill patients. We studied the feasibility and interobserver agreement of this sum score in a mixed cohort of critically ill and injured patients. Methods: We enrolled patients requiring more than 3 days of mechanical ventilation. Two observers performed systematic strength assessments of each patient. The primary outcome measure was interobserver agreement of weakness as a binary outcome (ICUAW is sum score less than 48; “no ICUAW” is a sum score greater than or equal to 48) using the Cohen’s kappa statistic. Results: We identified 135 patients who met the inclusion criteria. Most were precluded from study participation by altered mental status or polytrauma. Thirty-four participants were enrolled, and 30 of these individuals completed assessments conducted by both observers. Six met the criteria for ICUAW recorded by at least one observer. The observers agreed on the diagnosis of ICUAW for 93% of participants (Cohen’s kappa = 0.76; 95% confidence interval (CI), 0.44 to 1.0). Observer agreement was fair in the ICU (Cohen’s kappa = 0.38), and agreement was perfect after ICU discharge (Cohen’s kappa = 1.0). Absolute values of sum scores were similar between observers (intraclass correlation coefficient 0.83; 95% CI, 0.67 to 0.91), but they differed between observers by six points or more for 23% of the participants. Conclusions: Manual muscle testing (MMT) during critical illness was not possible for most patients because of coma, delirium and/or injury. Among patients who were able to participate in testing, we found that interobserver agreement regarding ICUAW was good, particularly when evaluated after ICU discharge. MMT is insufficient for early detection of ICU-acquired neuromuscular dysfunction in most patients and may be unreliable during critical illness.

Journal ArticleDOI
TL;DR: The degree of association between COP and ACC was equivalent when using the first trial or the 3-trial average, suggesting that one trial may be sufficient in estimating balance function and minimizing clinical evaluation time.

Journal ArticleDOI
TL;DR: This work presents simulations showing how the Type-I error rate is affected under different conditions of intraclass correlation and sample size, and makes suggestions on how one should collect and analyze data bearing a hierarchical structure.
Abstract: Least squares analyses (e.g., ANOVAs, linear regressions) of hierarchical data leads to Type-I error rates that depart severely from the nominal Type-I error rate assumed. Thus, when least squares methods are used to analyze hierarchical data coming from designs in which some groups are assigned to the treatment condition, and others to the control condition (i.e., the widely used "groups nested under treatment" experimental design), the Type-I error rate is seriously inflated, leading too often to the incorrect rejection of the null hypothesis (i.e., the incorrect conclusion of an effect of the treatment). To highlight the severity of the problem, we present simulations showing how the Type-I error rate is affected under different conditions of intraclass correlation and sample size. For all simulations the Type-I error rate after application of the popular Kish (1965) correction is also considered, and the limitations of this correction technique discussed. We conclude with suggestions on how one should collect and analyze data bearing a hierarchical structure.


Journal ArticleDOI
TL;DR: The purpose of this review was to summarize the findings of research using the intraclass correlation coefficient (ICC) to describe the test-retest reliability of the FRSTST.
Abstract: The 5-repetition sit-to-stand test (FRSTST) is a widely used measure of functional strength, particularly among older adults. The purpose of this review was to summarize the findings of research using the intraclass correlation coefficient (ICC) to describe the test-retest reliability of the FRSTST. A search of 3 electronic databases and hand searches were used to identify relevant articles. Information on the subjects, test sessions and the ICCs reported was abstracted from the articles. The searches identified 10 relevant articles. The ICCs reported in the articles ranged from 0.64 to 0.96. The adjusted mean ICC calculated from the reported ICCs was 0.81. The test-retest reliability of the FRSTST can be interpreted as good to high in most populations and settings.

Journal ArticleDOI
TL;DR: In this article, the authors identify the most reliable in vivo corneal confocal microscopy (CCM) parameter for detection of abnormality of small nerve fibre morphology for early diabetic sensorimotor polyneuropathy.
Abstract: Diabet. Med. 28, 1253–1260 (2011) Abstract Aim With the goal of identifying a valid biomarker of early diabetic sensorimotor polyneuropathy, we aimed to identify the most reliable in vivo corneal confocal microscopy (CCM) parameter for detection of abnormality of small nerve fibre morphology. Methods Cross-sectional examination of 46 subjects (26 with Type 1 diabetes and 20 healthy volunteers) examined by corneal confocal microscopy for intra- and interobserver reproducibility by the intraclass correlation coefficient method. Corneal nerve fibre density, nerve branch density, nerve fibre length and tortuosity were measured on the same day that subjects underwent clinical and electrophysiological examination. Results The 26 subjects with Type 1 diabetes had mean age and diabetes duration 42.8 ± 16.9 and 22.7 ± 16.4 years, respectively. Twelve of those subjects (46%) did not meet criteria for diabetic sensorimotor polyneuropathy, while five (19%) had mild, three (12%) had moderate and six (23%) had severe diabetic sensorimotor polyneuropathy. None of the healthy volunteers (mean age 41.4 ± 17.3 years) had polyneuropathy. Re-examination of selected corneal confocal microscopy images or sets of 40 images yielded very good to excellent intraclass correlation coefficients for all parameters. However, only one parameter (corneal nerve fibre length) emerged with consistently very good reproducibility using a clinically relevant ‘study-level’ protocol of subject re-examination (intra-observer intraclass correlation coefficient 0.72; interobserver intraclass correlation coefficient 0.73). Despite no differences in intraclass correlation coefficient between subgroups, corneal nerve fibre length was significantly lower (14.76 vs. 16.15 mm/mm2, P = 0.04) in those with diabetes. Conclusions Development of corneal confocal microscopy may need to focus on the measurement of corneal nerve fibre length, as it appears to have superior reliability in comparison with other parameters, and as evidence exists for its potential as a clinical biomarker of early diabetic sensorimotor polyneuropathy.

Journal ArticleDOI
TL;DR: The LESS-RT is a quick, easy, and reliable clinical assessment tool that may be used by clinicians to identify individuals who may be at risk for lower extremity injuries.
Abstract: Context: There is a need for reliable clinical assessment tools that can be used to identify individuals who may be at risk for injury. The Landing Error Scoring System (LESS) is a reliable and valid clinical assessment tool that was developed to identify individuals at risk for lower extremity injuries. One limitation of this tool is that it cannot be assessed in real time and requires the use of video cameras. Objective: To determine the interrater reliability of a real-time version of the LESS, the LESS-RT. Design: Reliability study. Setting: Controlled research laboratory. Participants: 43 healthy volunteers (24 women, 19 men) between the ages of 18 and 23. Intervention: The LESS-RT evaluates 10 jump-landing characteristics that may predispose an individual to lower extremity injuries. Two sets of raters used the LESS-RT to evaluate participants as they performed 4 trials of a jump-landing task. Main Outcome Measures: Intraclass correlation coefficient (ICC 2,1) values for the final composite score of the LESS-RT were calculated to assess interrater reliability of the LESS-RT. Results: Interrater reliability (ICC2,1) for the LESS-RT ranged from .72 to .81 with standard error of measurements ranging from .69 to .79. Conclusions: The LESS-RT is a quick, easy, and reliable clinical assessment tool that may be used by clinicians to identify individuals who may be at risk for lower extremity injuries.

Journal ArticleDOI
TL;DR: The weighted kappa when the outcome is ordinal and the intraclass correlation to assess agreement in an event the data are measured on a continuous scale are introduced.

Journal ArticleDOI
TL;DR: ThePHQ-9 and its 2 subscales, PHQ-2 and PHZ-1, seem reliable and valid for detecting MDD among Chinese primary care patients.

Journal ArticleDOI
TL;DR: The results suggest that the MABC-2 can be a reliable and valid tool for the assessment of movement difficulties among 3-5-year-old children.

Journal ArticleDOI
TL;DR: The results support the reliability and validity of the TCMS in children with spastic CP and the scale gives insight into the strengths and weaknesses of the child's trunk performance and therefore can have valuable clinical use.

Journal ArticleDOI
TL;DR: In this article, a hand-held dynamometer (HHD) was used to determine the feasibility of maximal isometric torque (MIT) measurement over a wide age range, intra-and interrater reliability, standard error of measurement, and concurrent validity.
Abstract: Purpose To determine, with respect to measurement of maximal isometric torque (MIT) using a specific hand-held dynamometer (HHD) protocol, (1) protocol feasibility over a wide age range, (2) intra- and interrater reliability, (3) standard error of measurement, and (4) concurrent validity. Methods The MIT of selected upper and lower limb muscle groups was assessed (n = 74; age = 4-17.5 years) using a standardized, HHD protocol. Testing was repeated in 20 adolescents (n = 10 for each muscle group), who were also assessed with a Cybex dynamometer. Results The protocol was feasible for all participants. Mean intra- and interrater reliability [intraclass correlation coefficient (ICC)] varied from 0.75 to 0.98, except for ankle dorsiflexor interrater reliability (mean ICC = 0.67). The standard error of measurement varied from 0.5 to 4.9 Nm and was highest for hip extensors. Mean concurrent validity (ICC) varied from 0.78 to 0.93, except for ankle plantar flexors (mean ICC = 0.48). Conclusions Our HHD protocol was feasible over a wide age range and most MIT values were valid and reliable.

Journal ArticleDOI
TL;DR: A model for microsurgery learning as well as a validated instrument to evaluate microsurgical competency and measures of construct validity and criterion validity demonstrated that higher scores on the UWOMSA were associated with faster knot tying and higher postgraduate year level.
Abstract: BACKGROUND The authors present a model for microsurgery learning as well as a validated instrument to evaluate microsurgical competency. METHODS Novice microsurgeons participated in three 3-hour sessions wherein they completed a number of increasingly complex, standardized microsurgical tasks. Performance was recorded and graded using a newly developed University of Western Ontario Microsurgery Skills Acquisition/Assessment (UWOMSA) instrument. The knot-tying and anastomosis modules contained three categories with five-point Likert scales. Each learner's performance was assessed by two blinded surgeons. Reznick's validated global rating scale for operative performance was utilized to establish criterion validity. Within-scale scores were compared via intraclass correlation and between-scale scores with Pearson correlation coefficient. Linear regression was used to evaluate the effect of various predictors on UWOMSA scores. RESULTS Thirty-seven videos (9.6 hours) were reviewed, including 20 knot-tying sessions and 17 anastomoses. Interrater reliability of UWOMSA was high, with an intraclass correlation coefficient of 0.75 (0.57, 0.87). The intraclass correlation of the global rating scale was 0.79 (0.62, 0.89). Intrarater reliability of the UWOMSA was also high, with an intraclass correlation of 0.69 (0.48, 0.83). The intraclass correlation of the global rating scale was 0.69 (0.47, 0.84). Measures of criterion validity demonstrated strong agreement between UWOMSA and the global rating scale (Pearson correlation coefficient, 0.96; p < 0.001). Measures of construct validity demonstrated that higher scores on the UWOMSA were associated with faster knot tying (p < 0.0001) and higher postgraduate year level (p = 0.05). CONCLUSIONS The UWOMSA instrument performed well in terms of reliability and validity. Further study is planned to assess the instrument's ability to predict microsurgical skills translation to the clinical setting.

Journal ArticleDOI
TL;DR: Overall, the SCIM III is a reliable and valid measure of functional change in SCI, however, improved scoring instructions and a few modifications to the scoring categories may reduce variability between raters and enhance clinical utility.
Abstract: Multi-center, prospective, cohort study. To assess the validity and reliability of the Spinal Cord Independence Measure (SCIM III) in measuring functional ability in persons with spinal cord injury (SCI). Inpatient rehabilitation hospitals in the United States (US). Functional ability was measured with the SCIM III during the first week of admittance into inpatient acute rehabilitation and within one week of discharge from the same rehabilitation program. Motor and sensory neurologic impairment was measured with the American Spinal Injury Association Impairment Scale. The Functional Independence Measure (FIM), the default functional measure currently used in most US hospitals, was used as a comparison standard for the SCIM III. Statistical analyses were used to test the validity and reliability of the SCIM III. Total agreement between raters was above 70% on most SCIM III tasks and all κ-coefficients were statistically significant (P<0.001). The coefficients of Pearson correlation between the paired raters were above 0.81 and intraclass correlation coefficients were above 0.81. Cronbach’s-α was above 0.7, with the exception of the respiration task. The coefficient of Pearson correlation between the FIM and SCIM III was 0.8 (P<0.001). For the respiration and sphincter management subscale, the SCIM III was more responsive to change, than the FIM (P<0.0001). Overall, the SCIM III is a reliable and valid measure of functional change in SCI. However, improved scoring instructions and a few modifications to the scoring categories may reduce variability between raters and enhance clinical utility.

Journal ArticleDOI
TL;DR: The PRIMOS is a valid and reliable tool for objective noninvasive evaluation of surface roughness of both skin and burn scars and is suitable for use in clinical setting.
Abstract: Background Scar formation remains a major clinical problem; therefore, various therapies have been developed to improve scar quality. To evaluate the effectiveness of these therapies, objective measurement tools are necessary. An appropriate, objective measuring instrument for assessment of surface roughness is not yet available in a clinical setting. The Phaseshift Rapid In Vivo Measurement of the Skin (PRIMOS) (GFMesstechnik GmbH, Teltow, Germany) could be such an instrument. This device noninvasively produces a 3-dimensional image of the skin microtopography and measures surface roughness. Objective The aim of this study was to investigate the reliability and validity of the PRIMOS for objective and quantitative measurement of surface roughness of skin and scars. Methods Three observers assessed skin and burn scars in 60 patients using the PRIMOS and a subjective scale, the Patient and Observer Scar Assessment Scale. Reliability was tested using the intraclass correlation of intraobserver and interobserver measurements. An intraclass correlation coefficient of 0.7 or greater was required for reliable results. To test validity, scores of the PRIMOS were compared with scores of the subjective scale (Pearson correlation). A Pearson correlation coefficient greater than 0.6 was considered a strong positive correlation. Results All 3 surface roughness parameters of the PRIMOS showed good intraobserver and interobserver reliability for skin and scars (intraclass correlation coefficient arithmetic mean of surface roughness > 0.85, mean of 5 highest peaks and 5 deepest valleys from entire measuring field > 0.88, peak count > 0.86). The parameter arithmetic mean of surface roughness showed a strong correlation with the subjective score (Pearson arithmetic mean of surface roughness 0.70; mean of 5 highest peaks and 5 deepest valleys from entire measuring field 0.53; peak count 0.54). Limitations The reliability and validity of the PRIMOS were only tested on skin and burn scars, not in other dermatologic diseases. Conclusions The PRIMOS is a valid and reliable tool for objective noninvasive evaluation of surface roughness of both skin and burn scars.

Journal ArticleDOI
TL;DR: Although valid and reliable, the pain VAS was a poor tool for untrained owners because of poor face validity (ie, owners could not recognize their dogs' behavior as signs of pain). Only after owners had seen pain diminish and then return (after starting and discontinuing NSAID use) did the VAS have face validity.
Abstract: Objective—To assess validity and reliability for a visual analogue scale (VAS) used by owners to measure chronic pain in their osteoarthritic dogs. Sample—68, 61, and 34 owners who completed a questionnaire. Procedures—Owners answered questionnaires at 5 time points. Criterion validity of the VAS was evaluated for all dogs in the intended-to-treat population by correlating scores for the VAS with scores for the validated Helsinki Chronic Pain Index (HCPI) and a relative quality-of-life scale. Intraclass correlation was used to assess repeatability of the pain VAS at 2 baseline evaluations. To determine sensitivity to change and face validity of the VAS, 2 blinded, randomized control groups (17 dogs receiving carprofen and 17 receiving a placebo) were analyzed over time. Results—Significant correlations existed between the VAS score and the quality-of-life scale and HCPI scores. Intraclass coefficient (r = 0.72; 95% confidence interval, 0.57 to 0.82) for the VAS indicated good repeatability. In the carprof...

Journal ArticleDOI
TL;DR: Evidence is provided for the PHQ-2 as a reliable and valid screening tool for depressive symptoms among a randomly recruited community sample in Hong Kong.

Journal ArticleDOI
TL;DR: The revised test can be applied to assess motor performance in typically developing 3-year old children and future studies are needed to confirm if the same can be said for children with motor delays.

Journal ArticleDOI
TL;DR: Sets of quantitative measurement variables obtained with this mobility battery provided sensitive prediction of future injury falls and screening for multiple subsequent falls by using tasks that should be appropriate to diverse participants.

Journal ArticleDOI
TL;DR: Ultrasound is accurate, reproducible, and fast in the analysis of abdominal adiposity and offers a regional, easy, and close-at-hand evaluation of subcutaneous and visceral fat compartments.