scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 2019"


Journal ArticleDOI
TL;DR: To assess test–retest reliability for PRO measures, the two-way mixed-effect analysis of variance model with interaction for the absolute agreement between single scores is recommended.
Abstract: Purpose The US Food and Drug Administration (FDA) 2009 guidance for industry on patient-reported outcome (PRO) measures describes how the Agency evaluates the psychometric properties of measures intended to support medical product labeling claims. An important psychometric property is test–retest reliability. The guidance lists intraclass correlation coefficients (ICCs) and the assessment time period as key considerations for test–retest reliability evaluations. However, the guidance does not provide recommendations regarding ICC computation, nor is there consensus within the measurement literature regarding the most appropriate ICC formula for test–retest reliability assessment. This absence of consensus emerged as an issue within Critical Path Institute’s PRO Consortium. The purpose of this project was to generate thoughtful and informed recommendations regarding the most appropriate ICC formula for assessing a PRO measure’s test–retest reliability.

96 citations


Journal ArticleDOI
TL;DR: The structural validity, reliability, and interpretability of the Barthel index are considered sufficient for measuring and interpreting changes in physical function of geriatric rehabilitation patients.

94 citations



Journal ArticleDOI
TL;DR: The 25-item Parent-Rated Anxiety Scale for ASD is a reliable and valid scale for measuring anxiety in youth with ASD and indicates excellent reliability across a wide range of scores with low standard errors.
Abstract: Objective Anxiety is common in youth with autism spectrum disorder (ASD). There is no accepted outcome measure for anxiety in this population. Method Following a series of focus groups with parents of youth with ASD, we generated 72 items (scored 0−3). Parents of 990 youth with ASD (aged 5−17 years; 80.8% male) completed an online survey. Factor analysis and item response theory analyses reduced the content to a single factor with 25 items. Youth with at least mild anxiety (n = 116; aged 5−17 years; 79.3% male) participated in a comprehensive clinical assessment to evaluate the validity and reliability of the 25-item Parent-Rated Anxiety Scale for ASD (PRAS-ASD). Results In the online sample, the mean PRAS-ASD score was 29.04 ± 14.9 (range, 0−75). The coefficient α was 0.93. The item response theory results indicated excellent reliability across a wide range of scores with low standard errors. In the clinical sample (n = 116), the PRAS-ASD mean was 31.0 ± 15.6 (range, 1−65). Pearson correlations with parent ratings of ASD symptom severity, repetitive behavior, and disruptive behavior ranged 0.33 to 0.66, supporting divergent validity of the PRAS-ASD. Pearson correlation with a parent-rated measure of anxiety used in the general pediatric population of 0.83 supported convergent validity. A total of 40 participants (32 boys, 8 girls; mean age, 11.9 ± 3.4 years) returned at time 2 (mean, 12.2 days) and time 3 (mean, 24.2 days). Intraclass correlation showed test−retest reliabilities of 0.88 and 0.86 at time 2 and time 3, respectively. Conclusion The 25-item PRAS-ASD is a reliable and valid scale for measuring anxiety in youth with ASD.

63 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the Koos classification system for vestibular schwannoma is a reliable method for tumor classification and lends further support to the results of current literature using Koos grading system.
Abstract: Background The Koos classification of vestibular schwannomas is designed to stratify tumors based on extrameatal extension and compression of the brainstem While this classification system is widely reported in the literature, to date no study has assessed its reliability Objective To assess the intra- and inter-rater reliability of the Koos classification system Methods After institutional review board approval was obtained, a cross-sectional group of the Magnetic Resonance imagings of 40 patients with vestibular schwannomas varying in size comprised the study sample Four raters were selected to assign a Koos grade to 50 total scans Inter- and intrarater reliability were calculated and reported using Fleiss' kappa, Kendall's W, and Intraclass correlation coefficient (ICC) Results Inter-rater reliability was found to be substantial when measured using Fleiss' kappa (71), extremely strong using Kendall's W (92), and excellent as calculated by ICC (88)Intrarater reliability was perfect for 3 out of 4 raters as assessed using weighted kappa, Kendall's W and ICC, with the intrarater agreement for the fourth rater measured as extremely high Conclusion We have demonstrated that the Koos classification system for vestibular schwannoma is a reliable method for tumor classification This study lends further support to the results of current literature using Koos grading system Further studies are required to evaluate its validity and utility in counseling patients with regard to outcomes

63 citations


Journal ArticleDOI
04 Mar 2019
TL;DR: Results show that when asymmetry is computed between test sessions, the group mean is generally devoid of systematic bias; however, the direction of asymmetry shows greater variability and is often inter-changeable.
Abstract: The aims of the present study were to determine test-retest reliability for unilateral strength and power tests used to quantify asymmetry and determine the consistency of both the magnitude and direction of asymmetry between test sessions. Twenty-eight recreational trained sport athletes performed unilateral isometric squat, countermovement jump (CMJ) and drop jump (DJ) tests over two test sessions. Inter-limb asymmetry was calculated from both the best trial and as an average of three trials for each test. Test reliability was computed using the intraclass correlation coefficient (ICC), coefficient of variation (CV) and standard error of measurement (SEM). In addition, paired samples t-tests were used to determine systematic bias between test sessions and Kappa coefficients to report how consistently asymmetry favoured the same side. Within and between-session reliability ranged from moderate to excellent (ICC range = 0.70⁻0.96) and CV values ranged from 3.7⁻13.7% across tests. Significant differences in asymmetry between test sessions were seen for impulse during the isometric squat (p = 0.04; effect size = ⁻0.60) but only when calculating from the best trial. When computing the direction of asymmetry across test sessions, levels of agreement were fair to substantial for the isometric squat (Kappa = 0.29⁻0.64), substantial for the CMJ (Kappa = 0.64⁻0.66) and fair to moderate for the DJ (Kappa = 0.36⁻0.56). These results show that when asymmetry is computed between test sessions, the group mean is generally devoid of systematic bias; however, the direction of asymmetry shows greater variability and is often inter-changeable. Thus, practitioners should consider both the direction and magnitude of asymmetry when monitoring inter-limb differences in healthy athlete populations.

62 citations


Journal ArticleDOI
TL;DR: To assess the reliability and the responsiveness of both the Jebsen‐Taylor Test of Hand Function (JTTHF) and the Box and Block Test (BBT) in children with cerebral palsy (CP).
Abstract: Aim To assess the reliability and to evaluate the responsiveness of both the Jebsen-Taylor Test of Hand Function (JTTHF) and the Box and Block Test (BBT) in children with cerebral palsy (CP). Method In this retrospective study, the reliability analyses were conducted with paired t-tests considering a short (mean 14d) and a long (mean 120d) time in between two assessment periods. In addition, an intraclass correlation coefficient (ICC) was used to assess the level of congruency. The responsiveness to therapy was conducted with a paired t-test in the whole sample regarding the age, the manual ability level as classified with the Manual Ability Classification System (MACS), and the topography. Results Our main results confirmed the tests' reliability in a short time period for the JTTHF in both hands and for the BBT on the less affected hand. These results were consistent with the ICC. The responsiveness was confirmed, except on the less affected hand for the JTTHF, with similar results for age, MACS, and topography approach. Interpretation This study supports the use of the JTTHF and the BBT to examine changes after short-term interventions for children with CP. These results should be interpreted with association to normative values or with a control group when used over long assessment periods. What this paper adds The Box and Block Test (BBT) is reliable in a brief period of assessment in children with cerebral palsy (CP). The Jebsen-Taylor Test of Hand Function (JTTHF) is reliable in a brief period of assessment in children with CP. The JTTHF and BBT are responsive to changes in a brief period of intensive therapy in children with CP. The reliability and responsiveness of the JTTHF and BBT are weak over long assessment periods.

50 citations


Journal ArticleDOI
TL;DR: The developed magnetic resonance novel index for fistula imaging in CD, called the MAGNIFI-CD, assesses MRI data and determines perianal fistulizing CD activity with improved operating characteristics compared to previous indices, indicates its stability and reasonable external validity.

50 citations


Journal ArticleDOI
TL;DR: This version of the Connor–Davidson Resilience Scale (CD-RISC) showed the best combination of reliability, validity, and practicality and is advised as measure of resilience in individuals with SCI in a rehabilitation setting.
Abstract: Cross-sectional psychometric study. To compare psychometric properties of the Connor–Davidson Resilience Scale (CD-RISC) with 25, 10, and 2 items, and to assess the agreement between these versions in individuals with spinal cord injury (SCI). Standard psychological screening at a Dutch rehabilitation centre during the first 2 weeks of inpatient rehabilitation. Anonymous data from the psychological screening were analysed. CD-RISC outcomes were checked for floor and ceiling effects. Internal consistency was assessed by calculating Cronbach’s α. Convergent validity was assessed by Spearman’s correlation between resilience and anxiety, depression, passive coping, and life satisfaction. Agreement between CD-RISC versions was examined by calculating intraclass correlation coefficients (ICCs), corresponding 95% confidence intervals (CIs), and Bland–Altman plots. Total CD-RISC scores were only skewed on the CD-RISC 2 (−1.12). There were no floor and ceiling effects. Internal consistency of the 25-, 10-, and 2-item scales was good to moderate (0.90, 0.86, and 0.66, respectively). Good convergent validity was shown only for the CD-RISC 10. Agreement was highest between the CD-RISC 25 and CD-RISC 10 with an ICC of 0.90 with 95% CI from 0.85 to 0.94. Out of the three CD-RISC versions, the CD-RISC 10 showed the best combination of reliability, validity, and practicality. Therefore, this version is advised as measure of resilience in individuals with SCI in a rehabilitation setting. Measurement of resilience could be part of a psychological screening to identify individuals at risk to develop psychological problems after SCI.

44 citations


Journal ArticleDOI
TL;DR: This study showed that the Italian version of the Canadian Occupational Performance Measure (COPM) is a reliable tool for assessing SCI clients’ perceived performance of daily activities and their satisfaction with their performance.
Abstract: Cross-sectional study. To examine the construct validity and the ability to detect change, of the Italian version of the Canadian Occupational Performance Measure (COPM) in a spinal cord injury (SCI) population. Rehabilitation service of the Paraplegic Center of Ostia, Italy. Thirty-nine spinal cord injury participants were recruited. The clinimetric properties of the measure were assessed following international guidelines. Cronbach’s alpha and the intraclass correlation coefficient were assessed for internal consistency and test-retest reliability, respectively. Construct validity was evaluated, by calculating correlation between COPM and the Spinal Cord Independence Measure (SCIM) through Pearson’s correlation coefficient and Spearman’s Rho. The ability to detect change was evaluated on the overall sample. The COPM was shown to be reliable in a spinal cord injury sample with positive and statistically significant results for Cronbach’s alpha (0.89) and ICC (0.99 for the performance subtest and 0.98 for the satisfaction subtest). Correlation coefficients did not show a correlation between the COPM total score and the SCIM. The COPM scores improved significantly during in-patient rehabilitation, moreover the mean change between the start of treatment and the end of the therapy as evaluated with the Wilcoxon signed-rank test was −4.25 points for the performance score and −2.96 points for the satisfaction score. This study showed that the COPM is a reliable tool for assessing SCI clients’ perceived performance of daily activities and their satisfaction with their performance.

43 citations


Journal ArticleDOI
TL;DR: The macular metrics obtained using SS-OCTA showed excellent repeatability in healthy subjects, and showed high intereye correlation in FAZ area and perimeter, moderate correlation in fractal dimension and VDI, while vessel density had poor correlation in normal healthy subjects.
Abstract: Purpose To investigate the repeatability, interocular correlation and agreement of quantitative swept-source optical coherence tomography angiography (SS-OCTA) metrics in healthy subjects. Methods Thirty-three healthy normal subjects were enrolled. The macula was scanned four times by an SS-OCTA system using the 3 mm×3 mm mode. The superficial capillary map images were analysed using a MATLAB program. A series of parameters were measured: foveal avascular zone (FAZ) area, FAZ perimeter, FAZ circularity, parafoveal vessel density, fractal dimension and vessel diameter index (VDI). The repeatability of four scans was determined by intraclass correlation coefficient (ICC). Then the averaged results were analysed for intereye difference, correlation and agreement using paired t-test, Pearson’s correlation coefficient (r), ICC and Bland-Altman plot. Results The repeatability assessment of the macular metrics exported high ICC values (ranged from 0.853 to 0.996). There is no statistically significant difference in the OCTA metrics between the two eyes. FAZ area (ICC=0.961, r=0.929) and FAZ perimeter (ICC=0.884, r=0.802) showed excellent binocular correlation. Fractal dimension (ICC=0.732, r=0.578) and VDI (ICC=0.707, r=0.547) showed moderate binocular correlation, while parafoveal vessel density had poor binocular correlation. Bland-Altman plots showed the range of agreement was from −0.0763 to 0.0954 mm 2 for FAZ area and from −0.0491 to 0.1136 for parafoveal vessel density. Conclusions The macular metrics obtained using SS-OCTA showed excellent repeatability in healthy subjects. We showed high intereye correlation in FAZ area and perimeter, moderate correlation in fractal dimension and VDI, while vessel density had poor correlation in normal healthy subjects.

Journal ArticleDOI
TL;DR: A method that can enable simultaneous examination of lung anatomy and ventilation is of clinical interest and CT/spirometry only provides global measures of lung ventilation.
Abstract: BACKGROUND Computed tomography (CT) and spirometry are the current standard methods for assessing lung anatomy and pulmonary ventilation, respectively. However, CT provides limited ventilation information and spirometry only provides global measures of lung ventilation. Thus, a method that can enable simultaneous examination of lung anatomy and ventilation is of clinical interest. PURPOSE To develop and test a 4D respiratory-resolved sparse lung MRI (XD-UTE: eXtra-Dimensional Ultrashort TE imaging) approach for simultaneous evaluation of lung anatomy and pulmonary ventilation. STUDY TYPE Prospective. POPULATION In all, 23 subjects (11 volunteers and 12 patients, mean age = 63.6 ± 8.4). FIELD STRENGTH/SEQUENCE 3T MR; a prototype 3D golden-angle radial UTE sequence, a Cartesian breath-hold volumetric-interpolated examination (BH-VIBE) sequence. ASSESSMENT All subjects were scanned using the 3D golden-angle radial UTE sequence during normal breathing. Ten subjects underwent an additional scan during alternating normal and deep breathing. Respiratory-motion-resolved sparse reconstruction was performed for all the acquired data to generate dynamic normal-breathing or deep-breathing image series. For comparison, BH-VIBE was performed in 12 subjects. Lung images were visually scored by three experienced chest radiologists and were analyzed by two observers who segmented the left and right lung to derive ventilation parameters in comparison with spirometry. STATISTICAL TESTS Nonparametric paired two-tailed Wilcoxon signed-rank test; intraclass correlation coefficient, Pearson correlation coefficient. RESULTS XD-UTE achieved significantly improved image quality compared both with Cartesian BH-VIBE and radial reconstruction without motion compensation (P < 0.05). The global ventilation parameters (a sum of the left and right lung measures) were in good correlation with spirometry in the same subjects (correlation coefficient = 0.724). There were excellent correlations between the results obtained by two observers (intraclass correlation coefficient ranged from 0.8855-0.9995). DATA CONCLUSION Simultaneous evaluation of lung anatomy and ventilation using XD-UTE is demonstrated, which have shown good potential for improved diagnosis and management of patients with heterogeneous lung diseases. LEVEL OF EVIDENCE 2 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2019;49:411-422.

Journal ArticleDOI
TL;DR: 2D-SWE shows good technical performance for assessing liver stiffness, with high technical success and reliability, and future studies should establish the quality criteria and optimal number of measurements.
Abstract: Objective To assess the technical performance of two-dimensional shear wave elastography (2D-SWE) for measuring liver stiffness. Materials and methods The Ovid-MEDLINE and EMBASE databases were searched for studies reporting the technical performance of 2D-SWE, including concerns with technical failures, unreliable measurements, interobserver reliability, and/or intraobserver reliability, published until June 30, 2018. The pooled proportion of technical failure and unreliable measurements was calculated using meta-analytic pooling via the random-effects model and inverse variance method for calculating weights. Subgroup analyses were performed to explore potential causes of heterogeneity. The pooled intraclass correlation coefficients (ICCs) for interobserver and intraobserver reliability were calculated using the Hedges-Olkin method with Fisher's Z transformation of the correlation coefficient. Results The search yielded 34 articles. From 20 2D-SWE studies including 6196 patients, the pooled proportion of technical failure was 2.3% (95% confidence interval [CI], 1.3-3.9%). The pooled proportion of unreliable measurements from 20 studies including 6961 patients was 7.5% (95% CI, 4.7-11.7%). In the subgroup analyses, studies conducting more than three measurements showed fewer unreliable measurements than did those with three measurements or less, but no intergroup difference was found in technical failure. The pooled ICCs for interobserver reliability (from 10 studies including 517 patients) and intraobserver reliability (from 7 studies including 679 patients) were 0.87 (95% CI, 0.82-0.90) and 0.93 (95% CI, 0.89-0.95), respectively, suggesting good to excellent reliability. Conclusion 2D-SWE shows good technical performance for assessing liver stiffness, with high technical success and reliability. Future studies should establish the quality criteria and optimal number of measurements.

Journal ArticleDOI
TL;DR: The VLT-SV-IT was shown to be a reliable and valid assessment tool for measuring hand function in the Italian population with C-SCI and suggests that it could be used as a starting point for hand therapy and to assist in clinical decision-making regarding treatment policy.
Abstract: Psychometric study. To validate the Italian version of the Van Lieshout Test Short Version (VLT-SV) with a spinal cord injury population. Three Italian spinal units. The Italian version of the VLT-SV (VLT-SV-IT) was administered to a sample of people with cervical spinal cord injuries (C-SCI) and the test–retest was performed. Reliability was assessed using Cronbach’s alpha for internal consistency and the intraclass correlation coefficient for repeatability assessment (test–retest). Pearson’s correlation coefficient was calculated for concurrent validity with the Italian version of the Jebsen–Taylor Hand Function Test (JTHFT) and for construct validity with the Italian version of the Spinal Cord Injury Independence Measure (SCIM III). The VLT-SV-IT was administered to 61 individuals and all psychometric properties were significant: Cronbach’s alpha was 0.95 (left hand and right hand) and the intraclass correlation coefficient for test–retest reliability was 0.90 for the right hand, the left hand, and the total score. Pearson’s correlation coefficient of the VLT-SV-IT with the JTHFT was significant, while the correlation with SCIM III was not. The obtained values are considered acceptable and consistent with international guidelines. The VLT-SV-IT was shown to be a reliable and valid assessment tool for measuring hand function in the Italian population with C-SCI. This result suggests that it could be used as a starting point for hand therapy and to assist in clinical decision-making regarding treatment policy.

Journal ArticleDOI
16 May 2019-Stroke
TL;DR: The presented deep learning-based method is fully automatic and shows a high correlation of diffusion lesion volume measurements with manual segmentation and commercial software, which has the potential to be used in patient selection for endovascular reperfusion therapy in the late time window of acute stroke.
Abstract: Background and Purpose- Automatic segmentation of cerebral infarction on diffusion-weighted imaging (DWI) is typically performed based on a fixed apparent diffusion coefficient (ADC) threshold. Fixed ADC threshold methods may not be accurate because ADC values vary over time after stroke onset. Deep learning has the potential to improve the accuracy, provided that a large set of correctly annotated lesion data is used for training. The purpose of this study was to evaluate deep learning-based methods and compare them with commercial software in terms of lesion volume measurements. Methods- U-net, an encoder-decoder convolutional neural network, was adopted to train segmentation models. Two U-net models were developed: a U-net (DWI+ADC) model, trained on DWI and ADC data, and a U-net (DWI) model, trained on DWI data only. A total of 296 subjects were used for training and 134 for external validation. An expert neurologist manually delineated the stroke lesions on DWI images, which were used as the ground-truth reference. Lesion volume measurements from the U-net methods were compared against the expert's manual segmentation and Rapid Processing of Perfusion and Diffusion (RAPID; iSchemaView Inc) analysis. Results- In external validation, U-net (DWI+ADC) showed the highest intraclass correlation coefficient with manual segmentation (intraclass correlation coefficient, 1.0; 95% CI, 0.99-1.00) and sufficiently high correlation with the RAPID results (intraclass correlation coefficient, 0.99; 95% CI, 0.98-0.99). U-net (DWI+ADC) and manual segmentation resulted in the smallest 95% Bland-Altman limits of agreement (-5.31 to 4.93 mL) with a mean difference of -0.19 mL. Conclusions- The presented deep learning-based method is fully automatic and shows a high correlation of diffusion lesion volume measurements with manual segmentation and commercial software. The method has the potential to be used in patient selection for endovascular reperfusion therapy in the late time window of acute stroke.

Journal ArticleDOI
TL;DR: When using the AP for a week, data from a combination of any 5 days provided reliable estimates of all activities and transitions per day, but more precise estimates were achieved if at least 1 weekend day was included.

Journal ArticleDOI
TL;DR: A multi–sensor‐based kiosk (automatically measured Short Physical Performance Battery) that can perform automated measurement of the SPPB is validated.
Abstract: OBJECTIVES We aimed to validate a multi-sensor-based kiosk (automatically measured Short Physical Performance Battery [eSPPB] kiosk) that can perform automated measurement of the SPPB. DESIGN Prospective, cross-sectional study. SETTING Rehabilitation clinic of a tertiary-care hospital. PARTICIPANTS Ambulatory outpatients, aged 65 years or older (N = 40). MEASUREMENTS The eSPPB kiosk was developed to measure the three components of the SPPB: standing balance, gait speed, and chair stand test with embedded sensors and algorithms. Correlations between the total and component-specific scores of the eSPPB and manually measured SPPB (mSPPB), assessed by a physical therapist, were assessed. Further, correlations between SPPB parameters and geriatric functional measures were also evaluated. RESULTS This study included 40 participants with a mean age of 74.4 ± 6.5 years, a mean total eSPPB score of 10.1 ± 2.1, and a mean total mSPPB score of 10.2 ± 2.1. The intraclass correlation coefficient between the eSPPB and mSPPB total score was 0.97 (P < .001), and the κ agreement was 0.79 (P < .001). The intraclass coefficients between the components of eSPPB and mSPPB were 0.77 (P < .001), 0.88 (P < .001), and 0.99 (P < .001) for standing balance, gait speed, and chair stand test, respectively. CONCLUSION The newly developed kiosk might be a viable and efficient method for performing the SPPB in older adults. J Am Geriatr Soc 67:2605-2609, 2019.

Journal ArticleDOI
Yidi Ma1, Tao Xu1, Ye Zhang1, Meng Mao1, Jia Kang1, Lan Zhu1 
TL;DR: The Chinese version of the PFDI-20 developed in this study is a reliable and valid instrument that provides good responsiveness to clinical changes and can be used as a referral instrument for pelvic floor dysfunction patients in China.
Abstract: The objective of this study was to translate the short version of the Pelvic Floor Distress Inventory (PFDI-20) into Chinese and to evaluate its psychometric properties in Chinese women with symptomatic pelvic floor dysfunction according to the Consensus-Based Standards for the Selection of Health Status Measurement Instruments (COSMIN) checklist. Between October 2017 and May 2018, a cross-sectional analysis of the clinical data of 126 patients who met the inclusion criteria was performed. The patients completed the questionnaires at the baseline (T1), 1–2 weeks later (T2), and 3 months after surgery (T3). Reliability testing included internal consistency, test–retest reliability, and measurement error. The methodical tests for validity were content validity, criterion validity, construct validity, and hypothesis testing. Responsiveness was also taken into consideration. One hundred twenty-six patients completed all questionnaires. Internal consistency, measured by Cronbach’s alpha value, was good, and the test–retest reliability was high, with an intraclass correlation coefficient (ICC) of 0.99. Construct validity was verified by factor analysis. All assumptions were confirmed, and there were no ceiling or floor effects in this study. Spearman’s correlation coefficient between the PFDI-20 and the Pelvic Floor Impact Questionnaire (PFIQ-7) was 0.867, showing a significant correlation. Furthermore, the minimal important change (MIC) of 50.0 was less than the smallest detectable change (SDC) of 18.36, indicating the sufficient responsiveness. The Chinese version of the PFDI-20 developed in this study is a reliable and valid instrument that provides good responsiveness to clinical changes.

Journal ArticleDOI
TL;DR: In this paper, a class of stretched beta priors is proposed on the intraclass correlations, which is equivalent to shifted F priors for the between groups variances, through a parameter expansion it is shown that this prior is conditionally conjugate under the marginal model yielding efficient posterior computation.
Abstract: The intraclass correlation plays a central role in modeling hierarchically structured data, such as educational data, panel data, or group-randomized trial data. It represents relevant information concerning the between-group and within-group variation. Methods for Bayesian hypothesis tests concerning the intraclass correlation are proposed to improve decision making in hierarchical data analysis and to assess the grouping effect across different group categories. Estimation and testing methods for the intraclass correlation coefficient are proposed under a marginal modeling framework where the random effects are integrated out. A class of stretched beta priors is proposed on the intraclass correlations, which is equivalent to shifted F priors for the between groups variances. Through a parameter expansion it is shown that this prior is conditionally conjugate under the marginal model yielding efficient posterior computation. A special improper case results in accurate coverage rates of the credible intervals even for minimal sample size and when the true intraclass correlation equals zero. Bayes factor tests are proposed for testing multiple precise and order hypotheses on intraclass correlations. These tests can be used when prior information about the intraclass correlations is available or absent. For the noninformative case, a generalized fractional Bayes approach is developed. The method enables testing the presence and strength of grouped data structures without introducing random effects. The methodology is applied to a large-scale survey study on international mathematics achievement at fourth grade to test the heterogeneity in the clustering of students in schools across countries and assessment cycles.

Journal ArticleDOI
TL;DR: The use of the Italian version of the Jebsen-Taylor Hand Function Test is supported as a measure of functional dexterity in people with upper limb disorders and Italian health professionals can now use the JTHFT with more confidence.
Abstract: Importance: Having a test to evaluate hand function is fundamental to occupational therapy practice. Objective: To assess the psychometric properties of the Italian version of the Jebsen–Taylor Hand Function Test (JTHFT). Design: Cross-sectional study. Setting: Three health care institutions in Rome, Italy. Participants: 136 people with injuries, burns, or neurological diseases of the hand. Intervention: No intervention was provided. Outcomes and Measures: We administered the JTHFT, an assessment of fine motor skills during performance of activities of daily living, and compared results with dynamometer readings. Results: The mean ± standard deviation total time required to perform all subtests was 89.47 ± 67.98 s for the dominant hand (DH) and 167.11 ± 257.58 s for the nondominant hand (NDH). Reliability procedures were applied to data from 51 participants; mean intrarater intraclass correlation coefficient (ICC) was .814 for the DH and .981 for the NDH, and mean interrater ICC was .818 for the DH and .821 for the NDH. Pearson’s correlation coefficients were significant. Conclusion and Relevance: Results support the use of the Italian version of the JTHFT as a measure of functional dexterity in people with upper limb disorders. What This Article Adds: The JTHFT is a valid and reliable assessment tool for nonspecific hand diseases. Italian health professionals can now use the JTHFT with more confidence.

Journal ArticleDOI
TL;DR: This study’s results demonstrate that the ePROM versions of the IIEF-5 and II EF-15 can be reliably implemented, as outcomes are reliable and in accordance with findings of the paper version.
Abstract: Background: Patient-reported outcome measures (PROMs) are increasingly used to measure patient’s perspective of functional well-being, disease burden, treatment effectiveness, and clinical decision making. Electronic versions are increasingly feasible because of smartphone and tablet usage. However, validation of these electronic PROMs (ePROMs) is warranted for justified implementation. The International Index of Erectile Function (IIEF) 5 and 15 are widely used PROMs in urology to measure erectile dysfunction. Measurement reliability and validity testing of the IIEF ePROMs are essential before clinical application. Objective: The aim of this study was to assess reliability and validity of an ePROM version of both IIEF-5 and 15. Methods: This study included 179 patients from our urology outpatient clinic. It also had a randomized crossover design—participants completed either a paper and electronic IIEF-5 or 15 or twice completed an electronic version—with a 5-day delay. Internal consistency was assessed using Cronbach alpha and Spearman-Brown coefficient, test-retest reliability using the intraclass correlation coefficient (ICC), and convergent validity using the Pearson and Spearman correlation coefficient. Results: A total of 122 participants completed the study. Internal consistency was excellent for the electronic IIEF-5 (ICC 0.902) and good to excellent for the domains of the IIEF-15 (ICC 0.962-0.834). Test-retest reliability was excellent for the IIEF-5 (ICC 0.924) and good to excellent for the domains of the IIEF-15 (ICC 0.950-0.778). Convergent validity was excellent for the IIEF-5 and IIEF-15, with a correlation of r=0.923 and r=0.951, respectively. Conclusions: We successfully introduced patient-acceptable ePROM versions of the IIEF-5 and IIEF-15. This study’s results demonstrate that the ePROM versions of the IIEF-5 and IIEF-15 can be reliably implemented, as outcomes are reliable and in accordance with findings of the paper version. Trial Registration: ClinicalTrials.gov NCT03222388; https://clinicaltrials.gov/ct2/show/NCT03222388

Journal ArticleDOI
TL;DR: The PEDI-I showed good psychometric properties and it is possible to confirm its validity and reliability in ASD population, however, for better understanding of how PEDi-I works in clinical practice, further researches are recommended.
Abstract: Objectives To measure psychometric properties of the Italian version of the Pediatric Evaluation of Disability Inventory (PEDI-I) in a population with Autism Spectrum Disorder (ASD). Methods The PEDI-I was administered to different children with ASD. The internal consistency was examined by using Cronbach's Alpha, while the intraclass correlation coefficient (ICC) was used to investigate both inter-observer and intra-observer reproducibility. Its concurrent validity was evaluated with the Italian version of the Barthel Index. Results The PEDI-I was administered to 60 children with a diagnosis of ASD. Cronbach's Alpha showed statistically significant values (.885-.965). Inter-observer and intra-observer investigations confirm the reproducibility of the scale with a range of high and very high parameters. The Pearson Correlation Coefficient with the Barthel Index showed significant data for all PEDI-I subscales with a p Conclusions The PEDI-I showed good psychometric properties and it is possible to confirm its validity and reliability in ASD population. However, for better understanding of how PEDI-I works in clinical practice, further researches are recommended.

Journal ArticleDOI
TL;DR: The Italian version of the QUEST 2.0 is a valid and reliable assessment tool when used with a sample of assistive mobility device users and health professionals can use it with more confidence.
Abstract: OBJECTS The aim of the study is the translation, culturally adaptation and validation of the Italian version of Quebec User Evaluation of Satisfaction with Assistive Technology 2.0 (QUEST-IT 2.0) in an Italian population with users of mobility assistive device, through a cross sectional study. MATERIALS AND METHODS To evaluate internal consistency and test-retest reliability, Cronbach's? and Intraclass Correlation Coefficient (ICC) were respectively calculated. The WheelCon-M is administered together, and Pearson's correlation coefficient was calculated for validity. RESULTS The scale was submitted to 130. The mean QUEST-IT 2.0 score in this study was 37.39 ± 9.35. Cronbach's? was 0.740 (p 0.994 in each domain. The correlation with WheelCon-M short form a was 0.30 (p < 0.05). CONCLUSIONS the QUEST-IT 2.0 showed out good consistent results for reliability and validity. This scale will be very useful for doctors, researchers and occupational therapist in the valuation and management of mobility assistive device.IMPLICATIONS FOR REHABILITATIONThe Italian version of the QUEST 2.0 (QUEST-IT 2.0) in now available and health professionals can use it with more confidence.The QUEST-IT 2.0 is a valid and reliable assessment tool when used with a sample of assistive mobility device users.

Journal ArticleDOI
TL;DR: Fatigue was confirmed to be an important symptom to patients with PsA, and FACIT-fatigue was found to be a reliable and valid measure in this population.
Abstract: To evaluate the measurement properties (e.g., content validity, reliability, and ability to detect change) of the Functional Assessment of Chronic Illness Therapy (FACIT)-Fatigue scale in patients with active psoriatic arthritis (PsA). One-on-one semi-structured qualitative interviews with adult patients with active PsA evaluated the content validity of FACIT-Fatigue. Quantitative measurement properties were evaluated using data from phase III tofacitinib randomized controlled trials (RCTs) in PsA: OPAL Broaden (NCT01877668) and OPAL Beyond (NCT01882439). Of 12 patients included in the qualitative study, 2 (17%) had mild, 8 (67%) had moderate, and 2 (17%) had severe PsA disease activity; 7 (58%) attributed fatigue to PsA, and 7 (58%) rated fatigue as important or extremely important. Most patients considered the FACIT-Fatigue items relevant to their PsA experience, and understood item content and response options as intended. In the psychometric analysis of RCT data, a second-order confirmatory factor model fit the data well (Bentler’s Comparative Fit Index ≥0.92). FACIT-Fatigue demonstrated good internal consistency (Cronbach’s coefficient α ≥ 0.90), test-retest reliability (Intraclass Correlation Coefficient ≥ 0.80) and a strong correlation with SF-36 Vitality (r > 0.80). A robust relationship between disease activity (based on Patient’s Global Assessment of Psoriasis and Arthritis) and FACIT-Fatigue was observed (effect sizes > 1.4), with clinically important difference for the FACIT-Fatigue total score estimated as 3.1 points, and the responder definition estimated as a 4-point improvement for FACIT-Fatigue total score. Fatigue was confirmed to be an important symptom to patients with PsA, and FACIT-Fatigue was found to be a reliable and valid measure in this population.

Journal ArticleDOI
TL;DR: The findings indicate that the SLS test including the FSD and LSD tests can be suitable for clinical use regardless of number of observed segments and particularly with a ≤3-point rating scale.
Abstract: Single leg squat (SLS) is a common tool used in clinical examination to set and evaluate rehabilitation goals, but also to assess lower extremity function in active people. Objectives To conduct a review and meta-analysis on the inter-rater and intrarater reliability of the SLS, including the lateral step-down (LSD) and forward step-down (FSD) tests. Design Review with meta-analysis. Data sources CINAHL, Cochrane Library, Embase, Medline (OVID) and Web of Science was searched up until December 2018. Eligibility criteria Studies were eligible for inclusion if they were methodological studies which assessed the inter-rater and/or intrarater reliability of the SLS, FSD and LSD through observation of movement quality. Results Thirty-one studies were included. The reliability varied largely between studies (inter-rater: kappa/intraclass correlation coefficients (ICC) = 0.00–0.95; intrarater: kappa/ICC = 0.13–1.00), but most of the studies reached ‘moderate’ measures of agreement. The pooled results of ICC/kappa showed a ‘moderate’ agreement for inter-rater reliability, 0.58 (95% CI 0.50 to 0.65), and a ‘substantial’ agreement for intrarater reliability, 0.68 (95% CI 0.60 to 0.74). Subgroup analyses showed a higher pooled agreement for inter-rater reliability of ≤3-point rating scales while no difference was found for different numbers of segmental assessments. Conclusion Our findings indicate that the SLS test including the FSD and LSD tests can be suitable for clinical use regardless of number of observed segments and particularly with a ≤3-point rating scale. Since most of the included studies were affected with some form of methodological bias, our findings must be interpreted with caution. PROSPERO registration number CRD42018077822.

Journal ArticleDOI
TL;DR: All balance tests presented similar reliability, reproducibility, and validity, which suggests that any of these tests can be used in clinical practice and the Brief-BESTest is the quickest and easiest test to perform.
Abstract: Background and purpose In any given year, 28% to 35% of older adults experience falls. In nursing home environments, the annual rate of falls increases to 30% to 50%. Our objective was to verify and compare the reliability, validity, and ability to identify falls of the Berg Balance Scale (BBS), Balance Evaluation Systems Test (BESTest), Mini-BESTest, and Brief-BESTest for older adults who live in nursing homes. Methods This was a cross-sectional study. Older adults (n = 49; aged 62-90 years; mean = 77.8; standard deviation = 7.2) were recruited from a nonprofit nursing home. All participants were assessed by 2 physiotherapists using the BBS, BESTest, Mini-BESTest, and Brief-BESTest. The interrater and test-retest (7-14 days) reliability were assessed using intraclass correlation coefficients (ICCs [2, 1]). Minimal detectable changes at the 95% confidence level were established. To analyze each test's ability to identify fall status, we used receiver operating characteristic (ROC) curves, whose statistical significance we verified using the area under the ROC curve (AUC) and respective 95% confidence intervals (CIs). The diagnostic likelihood ratios (positive and negative) and 95% CI were used to verify posttest probability. We used Fagan's nomogram to show the posttest probability of each balance test. Validity was assessed using kappa coefficients and the prevalence-adjusted bias-adjusted kappa (PABAK). Results Interrater and test-retest reliability for the total scores were good to excellent across all 4 tests (ICC interrater value = 0.992-0.994 and ICC test-retest value = 0.886-0.945). All tests were also able to identify fall status (AUC = 0.712-0.762) and were in good agreement with each other (kappa coefficient for individuals with fall risk = 0.679-0.957 and individuals with no fall risk = 0.135-0.143; PABAK = 83.7%-98%). Conclusion All balance tests presented similar reliability, reproducibility, and validity. This suggests that any of these tests can be used in clinical practice. However, the Brief-BESTest is the quickest and easiest test to perform.

Journal ArticleDOI
TL;DR: The observed values suggest that the YBT-LQ is a reliable test and suitable to detect changes of dynamic balance performance in healthy adolescents from grade six to eleven (i.e., aged 11-19 years).

Journal ArticleDOI
TL;DR: Initial findings support the CY-BOCS-II as a reliable and valid measure of obsessive-compulsive symptoms in youth.
Abstract: Objective To develop and examine the psychometric properties of the Children’s Yale-Brown Obsessive-Compulsive Scale Second Edition (CY-BOCS-II) in children and adolescents with obsessive-compulsive disorder (OCD). Method Youth with OCD (N = 102; age range 7–17 years), who were seeking treatment from 1 of 2 specialty OCD treatment centers, participated in the study. The CY-BOCS-II was administered at an initial assessment, and measures of OCD symptom severity, anxiety and depressive symptoms, behavioral and emotional problems, and global functioning were administered. Inter-rater and test-retest reliabilities were assessed on a subsample of participants (n = 50 and n = 31, respectively) approximately 1 week after intial assessment. Results The CY-BOCS-II demonstrated moderate-to-strong internal consistency (α = 0.75–0.88) and excellent inter-rater (intraclass correlation coefficient = 0.86–0.92) and test-retest (intraclass correlation coefficient = 0.95–0.98) reliabilities across all scales. Construct validity was supported by strong correlations with clinician-rated measures of OCD symptom severity and moderate correlations with measures of anxiety symptoms. Exploratory factor analysis showed a 2-factor structure, which was generally inconsistent with its adult counterpart, the Yale-Brown Obsessive-Compulsive Scale Second Edition. Conclusion Initial findings support the CY-BOCS-II as a reliable and valid measure of obsessive-compulsive symptoms in youth.

Journal ArticleDOI
TL;DR: Interobserver variability of nerve ultrasound in peripheral neuropathy is generally limited, especially in arm nerves, and nerve ultrasound is a reproducible tool for diagnostics in routine clinical practice and multicenter research.
Abstract: Objective To determine interobserver variability of nerve ultrasound in peripheral neuropathy in a prospective, systematic, multicenter study. Methods We enrolled 20 patients with an acquired chronic demyelinating or axonal polyneuropathy and 10 healthy controls in 3 different centers. All participants underwent an extensive nerve ultrasound protocol, including cross-sectional area measurements of median, ulnar, fibular, tibial, and sural nerves, and brachial plexus. Real-time image acquisition was performed blind by a local and a visiting investigator (reference). Five patients were investigated using different types of sonographic devices. Intraclass correlation coefficients were calculated, and a random-effects model was fitted to identify factors with significant effect on interobserver variability. Results Systematic differences between measurements made by different investigators were small (mean difference 0.11 mm2 [95% confidence interval 0.00–0.23 mm2]). Intraclass correlation coefficients were generally higher in arm nerves (0.48–0.96) than leg nerves (0.46–0.61). The hospital site and sonographic device did not contribute significantly to interobserver variability in the random-effects model. Conclusions Interobserver variability of nerve ultrasound in peripheral neuropathy is generally limited, especially in arm nerves. Different devices and a multicenter setting have no effect on interobserver variability. Therefore, nerve ultrasound is a reproducible tool for diagnostics in routine clinical practice and (multicenter) research.

Journal ArticleDOI
03 Oct 2019-PLOS ONE
TL;DR: The accuracy and interobserver-agreement of respiratory rate measurements by healthcare professionals are suboptimal, which leads to both over- and underestimation of scores of four clinical prediction/diagnostic rules.
Abstract: Objective In clinical prediction/diagnostic rules aimed at early detection of critically ill patients, the respiratory rate plays an important role. We investigated the accuracy and interobserver-agreement of respiratory rate measurements by healthcare professionals, and the potential effect of incorrect measurements on the scores of 4 common clinical prediction/diagnostic rules: Systemic Inflammatory Response Syndrome (SIRS) criteria, quick Sepsis-related Organ Failure Assessment (qSOFA), National Early Warning Score (NEWS), and Modified Early Warning Score (MEWS). Methods Using an online questionnaire, we showed 5 videos with a healthy volunteer, breathing at a fixed (true) rate (13-28 breaths/minute). Respondents measured the respiratory rate, and categorized it as low, normal, or high. We analysed how accurate the measurements were using descriptive statistics, and calculated interobserver-agreement using the intraclass correlation coefficient (ICC), and agreement between measurements and categorical judgments using Cohen's Kappa. Finally, we analysed how often incorrect measurements led to under/overestimation in the selected clinical rules. Results In total, 448 healthcare professionals participated. Median measurements were slightly higher (1-3/min) than the true respiratory rate, and 78.2% of measurements were within 4/min of the true rate. ICC was moderate (0.64, 95% CI 0.39-0.94). When comparing the measured respiratory rates with the categorical judgments, 14.5% were inconsistent. Incorrect measurements influenced the 4 rules in 8.8% (SIRS) to 37.1% (NEWS). Both underestimation (4.5-7.1%) and overestimation (3.9-32.2%) occurred. Conclusions The accuracy and interobserver-agreement of respiratory rate measurements by healthcare professionals are suboptimal. This leads to both over- and underestimation of scores of four clinical prediction/diagnostic rules. The clinically most important effect could be a delay in diagnosis and treatment of (critically) ill patients.