scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 2021"


Journal ArticleDOI
TL;DR: In this article, a meta-analysis was performed to explore the reproducibility of FFQs and factors related to FFQ reproducible, and the authors concluded that FFQ with correlation coefficients greater than 0.5 for most nutrients may be considered a reliable tool to measure dietary intake.
Abstract: Reproducibility of FFQs measures the consistency of the same subject at different time points. We performed a meta-analysis to explore the reproducibility of FFQs and factors related to reproducibility of FFQs. A systematic literature review was performed before July 2020 using PubMed and Web of Science databases. Pooled intraclass and Spearman correlation coefficients (95% confidence interval) were calculated to assess the reproducibility of FFQs. Subgroup analyses based on characteristics of study populations, FFQs, or study design were performed to investigate factors related to the reproducibility of FFQs. A total of 123 studies comprising 20,542 participants were eligible for the meta-analysis. The pooled crude intraclass correlation coefficients ranged from 0.499 to 0.803 and 0.499 to 0.723 for macronutrients and micronutrients, respectively. Energy-adjusted intraclass correlation coefficients ranged from 0.420 to 0.803 and 0.507 to 0.712 for macronutrients and micronutrients, respectively. The pooled crude and energy-adjusted Spearman correlation coefficients ranged from 0.548 to 0.851 and 0.441 to 0.793, respectively, for macronutrients; and from 0.573 to 0.828 and 0.510 to 0.744, respectively, for micronutrients. FFQs with more food items, 12 months as dietary recall interval (compared to less than 12 months), and a shorter time period between repeated FFQs resulted in superior FFQ reproducibility. In conclusion, FFQs with correlation coefficients greater than 0.5 for most nutrients may be considered a reliable tool to measure dietary intake. To develop FFQs with higher reproducibility, the number of food items and dietary recall interval should be taken into consideration.

36 citations


Journal ArticleDOI
TL;DR: In this paper, the authors explored the relationship between the behavioral features and depression using correlation and bivariate linear mixed models (LMMs) and leveraged 5 supervised machine learning (ML) algorithms with hyperparameter optimization, nested cross-validation, and imbalanced data handling to predict depression.
Abstract: Background: Depression is a prevalent mental health challenge. Current depression assessment methods using self-reported and clinician-administered questionnaires have limitations. Instrumenting smartphones to passively and continuously collect moment-by-moment data sets to quantify human behaviors has the potential to augment current depression assessment methods for early diagnosis, scalable, and longitudinal monitoring of depression. Objective: The objective of this study was to investigate the feasibility of predicting depression with human behaviors quantified from smartphone data sets, and to identify behaviors that can influence depression. Methods: Smartphone data sets and self-reported 8-item Patient Health Questionnaire (PHQ-8) depression assessments were collected from 629 participants in an exploratory longitudinal study over an average of 22.1 days (SD 17.90; range 8-86). We quantified 22 regularity, entropy, and SD behavioral markers from the smartphone data. We explored the relationship between the behavioral features and depression using correlation and bivariate linear mixed models (LMMs). We leveraged 5 supervised machine learning (ML) algorithms with hyperparameter optimization, nested cross-validation, and imbalanced data handling to predict depression. Finally, with the permutation importance method, we identified influential behavioral markers in predicting depression. Results: Of the 629 participants from at least 56 countries, 69 (10.97%) were females, 546 (86.8%) were males, and 14 (2.2%) were nonbinary. Participants’ age distribution is as follows: 73/629 (11.6%) were aged between 18 and 24, 204/629 (32.4%) were aged between 25 and 34, 156/629 (24.8%) were aged between 35 and 44, 166/629 (26.4%) were aged between 45 and 64, and 30/629 (4.8%) were aged 65 years and over. Of the 1374 PHQ-8 assessments, 1143 (83.19%) responses were nondepressed scores (PHQ-8 score <10), while 231 (16.81%) were depressed scores (PHQ-8 score ≥10), as identified based on PHQ-8 cut-off. A significant positive Pearson correlation was found between screen status–normalized entropy and depression (r=0.14, P<.001). LMM demonstrates an intraclass correlation of 0.7584 and a significant positive association between screen status–normalized entropy and depression (β=.48, P=.03). The best ML algorithms achieved the following metrics: precision, 85.55%-92.51%; recall, 92.19%-95.56%; F1, 88.73%-94.00%; area under the curve receiver operating characteristic, 94.69%-99.06%; Cohen κ, 86.61%-92.90%; and accuracy, 96.44%-98.14%. Including age group and gender as predictors improved the ML performances. Screen and internet connectivity features were the most influential in predicting depression. Conclusions: Our findings demonstrate that behavioral markers indicative of depression can be unobtrusively identified from smartphone sensors’ data. Traditional assessment of depression can be augmented with behavioral markers from smartphones for depression diagnosis and monitoring.

35 citations


Journal ArticleDOI
TL;DR: Vendor derived LA reservoir strain, conduit strain, and contractile strain demonstrates modest to excellent intervendor and intermodality correlation depending on strain component examined and there are systematic differences in measurements depending on modality and vendor.
Abstract: AIMS: Left atrial (LA) strain is a prognostic biomarker with utility across a spectrum of acute and chronic cardiovascular pathologies. There are limited data on intervendor differences and no data on intermodality differences for LA strain. We sought to compare the intervendor and intermodality differences between transthoracic echocardiography (TTE) and cardiac magnetic resonance (CMR) derived LA strain. We hypothesized that various components of atrial strain would show good intervendor and intermodality correlation but that there would be systematic differences between vendors and modalities. METHODS AND RESULTS: We evaluated 54 subjects (43 patients with a clinical indication for CMR and 11 healthy volunteers) in a study comparing TTE- and CMR-derived LA reservoir strain (ƐR), conduit strain (ƐCD), and contractile strain (ƐCT). The LA strain components were evaluated using four dedicated types of post-processing software. We evaluated the correlation and systematic bias between modalities and within each modality. Intervendor and intermodality correlation was: ƐR [intraclass correlation coefficient (ICC 0.64-0.90)], ƐCD (ICC 0.62-0.89), and ƐCT (ICC 0.58-0.77). There was evidence of systematic bias between vendors and modalities with mean differences ranging from (3.1-12.2%) for ƐR, ƐCD (1.6-8.6%), and ƐCT (0.3-3.6%). Reproducibility analysis revealed intraobserver coefficient of variance (COV) of 6.5-14.6% and interobserver COV of 9.9-18.7%. CONCLUSION: Vendor derived ƐR, ƐCD, and ƐCT demonstrates modest to excellent intervendor and intermodality correlation depending on strain component examined. There are systematic differences in measurements depending on modality and vendor. These differences may be addressed by future studies, which, examine calibration of LA geometry/higher frame rate imaging, semi-quantitative approaches, and improvements in reproducibility.

34 citations


Journal ArticleDOI
TL;DR: In this paper, the authors provide guidance on how the results of statistical tests are typically interpreted, this does not mean that the results can universally be interpreted as suggested here, and they further emphasize that cutoff values may depend on the specific clinical or scientific context.
Abstract: Researchers reporting results of statistical analyses, as well as readers of manuscripts reporting original research, often seek guidance on how numeric results can be practically and meaningfully interpreted. With this article, we aim to provide benchmarks for cutoff or cut-point values and to suggest plain-language interpretations for a number of commonly used statistical measures of association, agreement, diagnostic accuracy, effect size, heterogeneity, and reliability in medical research. Specifically, we discuss correlation coefficients, Cronbach's alpha, I2, intraclass correlation (ICC), Cohen's and Fleiss' kappa statistics, the area under the receiver operating characteristic curve (AUROC, concordance statistic), standardized mean differences (Cohen's d, Hedge's g, Glass' delta), and z scores. We base these cutoff values on what has been previously proposed by experts in the field in peer-reviewed literature and textbooks, as well as online statistical resources. We integrate, adapt, and/or expand previous suggestions in attempts to (a) achieve a compromise between divergent recommendations, and (b) propose cutoffs that we perceive sensible for the field of anesthesia and related specialties. While our suggestions provide guidance on how the results of statistical tests are typically interpreted, this does not mean that the results can universally be interpreted as suggested here. We discuss the well-known inherent limitations of using cutoff values to categorize continuous measures. We further emphasize that cutoff values may depend on the specific clinical or scientific context. Rule-of-the thumb approaches to the interpretation of statistical measures should therefore be used judiciously.

32 citations


Journal ArticleDOI
TL;DR: A new index that measures the interobserver variability, which is defined in terms of the distances among the 'true values' assigned by different observers on the same subject, and a coefficient of excess observer variability is developed, which compares the total observer variability to the expectedtotal observer variability when there are no differences among the observers.
Abstract: Existing indices of observer agreement for continuous data, such as the intraclass correlation coefficient or the concordance correlation coefficient, measure the total observer-related variability, which includes the variabilities between and within observers. This work introduces a new index that measures the interobserver variability, which is defined in terms of the distances among the 'true values' assigned by different observers on the same subject. The new coefficient of interobserver variability (CIV) is defined as the ratio of the interobserver and the total observer variability. We show how to estimate the CIV and how to use bootstrap and ANOVAbased methods for inference. We also develop a coefficient of excess observer variability, which compares the total observer variability to the expected total observer variability when there are no differences among the observers. This coefficient is a simple function of the CIV. In addition, we show how the value of the CIV, estimated from an agreement study, can be used in the design of measurements studies. We illustrate the new concepts and methods by two examples, where (1) two radiologists used calcium scores to evaluate the severity of coronary artery arteriosclerosis, and (2) two methods were used to measure knee joint angle.

32 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed a comprehensive body map that can be universally applied across pain conditions, by performing an environmental scan and assessing existing body maps, and compared the pain location questionnaire responses of 530 participants with chronic pain with their pain endorsements on the CHOIR body map (CBM) graphic.

31 citations


Journal ArticleDOI
TL;DR: In this article, the authors assess the reliability and reproducibility of three chest radiograph reporting systems (RALE, Brixia, and percentage opacification) in proven SARS-CoV-2 and examine the ability of these scores to predict adverse outcomes both alone and in conjunction with two clinical scoring systems: NEWS2 and ISARIC-4C mortality.
Abstract: Background Radiographic severity may predict patient deterioration and outcomes from COVID-19 pneumonia. Purpose To assess the reliability and reproducibility of three chest radiograph reporting systems (RALE, Brixia, and percentage opacification) in proven SARS-CoV-2 and examine the ability of these scores to predict adverse outcomes both alone and in conjunction with two clinical scoring systems: NEWS2 and ISARIC-4C mortality. Materials and Methods This retrospective cohort study used routinely collected clinical data of PCR-positive SARS-CoV-2 patients admitted to a single UK center from February 2020 until July 2020. Initial chest radiographs were scored for RALE, Brixia, and percentage opacification by one of three radiologists. Intra- and inter-rater agreement was assessed with Intraclass correlation coefficients. The rate of ICU admission or death until 60 days after scored chest radiograph was estimated. NEWS2 and ISARIC-4C mortality, on hospital admission were calculated. Daily risk of admission to ICU or death was modelled with Cox proportional hazards models, incorporating the chest radiograph scores adjusted for NEWS2 or ISARIC-4C mortality. Results Admission chest radiographs of 50 patients (mean age, 74 years +/-16 [sd], 28 men) were scored by all 3 radiologists, with good inter-rater reliability for all scores (ICCs (95% CIs) of for RALE 0.87 (0.80, 0.92), BRIXIA 0.86 (0.76, 0.92), and percentage opacification 0.72 (0.48, 0.85)). Of 751 patients with chest radiograph, those with >75% opacification had a median time to ICU admission or death of just 1-2 days. Among 628 patients with data (median age 76 years (IQR 61 - 84), and 344 were men), 50-75% opacification increased risk of ICU admission or death by twofold (1.6 - 2.8), and over 75% opacification by 4 fold (3.4 - 4.7), compared to a 0-25% opacification when adjusted for NEWS2 score. Conclusion BRIXIA, RALE, and percent opacification scores all reliably predicted adverse outcomes in SARS-CoV-2. See also the editorial by Little.

30 citations


Journal ArticleDOI
TL;DR: This study compares seven reliability coefficients for ordinal rating scales and provides a thorough understanding of these coefficients such that the applied researcher can make a sensible choice for ordinals rating scales.
Abstract: Kappa coefficients are commonly used for quantifying reliability on a categorical scale, whereas correlation coefficients are commonly applied to assess reliability on an interval scale. Both types of coefficients can be used to assess the reliability of ordinal rating scales. In this study, we compare seven reliability coefficients for ordinal rating scales: the kappa coefficients included are Cohen’s kappa, linearly weighted kappa, and quadratically weighted kappa; the correlation coefficients included are intraclass correlation ICC(3,1), Pearson’s correlation, Spearman’s rho, and Kendall’s tau-b. The primary goal is to provide a thorough understanding of these coefficients such that the applied researcher can make a sensible choice for ordinal rating scales. A second aim is to find out whether the choice of the coefficient matters. We studied to what extent we reach the same conclusions about inter-rater reliability with different coefficients, and to what extent the coefficients measure agreement in a similar way, using analytic methods, and simulated and empirical data. Using analytical methods, it is shown that differences between quadratic kappa and the Pearson and intraclass correlations increase if agreement becomes larger. Differences between the three coefficients are generally small if differences between rater means and variances are small. Furthermore, using simulated and empirical data, it is shown that differences between all reliability coefficients tend to increase if agreement between the raters increases. Moreover, for the data in this study, the same conclusion about inter-rater reliability was reached in virtually all cases with the four correlation coefficients. In addition, using quadratically weighted kappa, we reached a similar conclusion as with any correlation coefficient a great number of times. Hence, for the data in this study, it does not really matter which of these five coefficients is used. Moreover, the four correlation coefficients and quadratically weighted kappa tend to measure agreement in a similar way: their values are very highly correlated for the data in this study.

23 citations



Journal ArticleDOI
TL;DR: Clinicians and researchers may use this test with confidence to assess the dexterity of hand injury patients and JTHFT is a reliable and valid instrument.

21 citations


Journal ArticleDOI
TL;DR: The JH-HLM has excellent inter-rater reliability as part of routine physical therapy practice, across different types of adult ICUs in medical, surgical, and neurological ICUs.
Abstract: Background The Johns Hopkins Highest Level of Mobility (JH-HLM) scale is used to document the observed mobility of hospitalized patients, including those patients in the intensive care unit (ICU) setting. Objective To evaluate the inter-rater reliability of the JH-HLM, completed by physical therapists, across medical, surgical, and neurological adult ICUs at a single large academic hospital. Methods The JH-HLM is an ordinal scale for documenting a patient’s highest observed level of activity, ranging from lying in bed (score = 1) to ambulating > 250 feet (score = 8). Eighty-one rehabilitation sessions were conducted by eight physical therapists, with 1 of 2 reference physical therapist rater simultaneously observing the session and independently scoring the JH-HLM. The intraclass correlation coefficient was used to determine the inter-rater reliability. Results A total of 77 (95%) of 81 assessments had perfect agreement. The overall intraclass correlation coefficient for inter-rater reliability was 0.98 (95% confidence interval: 0.96, 0.99), with similar scores in the medical, surgical, and neurological ICUs. A Bland–Altman plot revealed a mean difference in JH-HLM scoring of 0 (limits of agreement: −0.54 to 0.61). Conclusion The JH-HLM has excellent inter-rater reliability as part of routine physical therapy practice, across different types of adult ICUs.

Journal ArticleDOI
TL;DR: 5L appears to have better measurement properties than 3L for measuring the health-related quality of life of cancer patients, and 5L should be preferable to3L for use in cancer outcomes research.
Abstract: This study aimed to compare the measurement properties of the EQ-5D-3L (3L) and EQ-5D-5L (5L) in cancer patients. A consecutive sample of inpatients with lung, breast, colorectal, liver, gastric, or thyroid cancer were interviewed using the 3L, 5L, and Functional Assessment of Cancer Therapy–General (FACT-G) questionnaires, and a subgroup was invited to complete the 3L and 5L again. Kappa and intraclass correlation coefficient were used to assess test–retest reliability, and Spearman’s correlation between the EQ-5D and FACT-G was evaluated to assess convergent validity. Comparison of subgroups defined using Eastern Cooperative Oncology Group status and cancer stage were performed to assess known-group validity and discriminatory power using the F-statistic and area under the receiver-operating characteristics curve. All analyses were also performed for each subgroup of cancer patients. A total of 416 cancer patients completed the baseline questionnaire and 90 patients also completed the follow-up survey after 2 days. Ceiling effects were smaller in 5L (10.1%) than in 3L (17.8%). The test–retest reliability and convergent validity of the 5L were slightly better than those of the 3L. Both the 3L and 5L showed known-group validity; however, the 5L index showed better discriminatory power. Similar trends were found in the six types of cancers. In general, 5L appears to have better measurement properties than 3L for measuring the health-related quality of life of cancer patients. While both the 3L and 5L are suitable, 5L should be preferable to 3L for use in cancer outcomes research.

Journal ArticleDOI
TL;DR: D diagnosis was highly consistent across time and moderately between examiners, and accuracy was almost perfect for stage and moderate for grade and extent.
Abstract: Aim The objective of this study was to evaluate consistency and accuracy of the periodontitis staging and grading classification system. Methods Thirty participants (10 periodontal experts, 10 general dentists and 10 undergraduate students) and a gold-standard examiner were asked to classify 25 fully documented periodontitis cases twice. Fleiss kappa was used to estimate consistency across examiners. Intraclass correlation coefficient (ICC) was used to calculate consistency across time. Quadratic weighted kappa and percentage of complete agreement versus gold standard were computed to assess accuracy. Results Fleiss kappa for stage, extent and grade were 0.48, 0.37 and 0.45 respectively. The highest ICC was provided by students for stage (0.91), whereas the lowest ICC by general dentists for extent (0.79). Pairwise comparisons against gold standard showed mean value of kappa >0.81 for stage and >0.41 for grade and extent. Agreement with the gold standard for all three components of the case definition was achieved in 47.2% of cases. The study identified specific factors associated with lower consistency and accuracy. Conclusions Diagnosis was highly consistent across time and moderately between examiners. Accuracy was almost perfect for stage and moderate for grade and extent. Additional efforts are required to improve training of general dentists.

Journal ArticleDOI
TL;DR: It can be concluded that the 30-15 IFT has excellent test-retest reliability for both maximal velocity and peak heart rate and may be used as a reliable measure of fitness in research and sports practice.

Journal ArticleDOI
TL;DR: In this article, a reliable shear-wave ultrasound elastography (SWUE) protocol for evaluating tongue muscle elasticity and its feasibility and utility in differentiating patients with OSA was established.
Abstract: Few studies have explored the feasibility of shear-wave ultrasound elastography (SWUE) for evaluating the upper airways of patients with obstructive sleep apnea (OSA) This study aimed to establish a reliable SWUE protocol for evaluating tongue muscle elasticity and its feasibility and utility in differentiating patients with OSA Inter-rater and intra-rater reliability of SWUE measurements were tested using the intraclass correlation coefficients Submental ultrasound was used to measure tongue thickness and stiffness Association between the ultrasound measurements and presence of OSA was analyzed using multivariate logistic regression One-way analysis of variance was used to examine if the values of the ultrasound parameters varied among patients with different severities of OSA Overall, 37 healthy subjects and 32 patients with OSA were recruited The intraclass correlation coefficients of intra- and inter-rater reliability for SWUE for tongue stiffness ranged from 084 to 090 After adjusting for age, sex, neck circumference, and body mass index, the risk for OSA was positively associated with tongue thickness [odds ratio 116 (95% confidence interval 101-132)] and negatively associated with coronal imaging of tongue muscle stiffness [odds ratio 072 (95% confidence interval 054-095)] There were no significant differences in tongue stiffness among OSA patients with varying disease severity SWUE provided a reliable evaluation of tongue muscle stiffness, which appeared to be softer in patients with OSA Future longitudinal studies are necessary to investigate the relationship between tongue softening and OSA, as well as response to treatment


Journal ArticleDOI
TL;DR: The Satisfaction with Treatment Result Questionnaire has good-to-excellent construct validity and very high test-retest reliability in patients with hand and wrist conditions as discussed by the authors, which can be used to reliably and validly measure satisfaction with treatment result.
Abstract: BACKGROUND A patient's satisfaction with a treatment result is an important outcome domain as clinicians increasingly focus on patient-centered, value-based healthcare. However, to our knowledge, there are no validated satisfaction metrics focusing on treatment results for hand and wrist conditions. QUESTIONS/PURPOSES Among patients who were treated for hand and wrist conditions, we asked: (1) What is the test-retest reliability of the Satisfaction with Treatment Result Questionnaire? (2) What is the construct validity of that outcomes tool? METHODS This was a prospective study using two samples: a test-retest reliability sample and a construct validity sample. For the test-retest sample, data collection took place between February 2020 and May 2020, and we included 174 patients at the end of their treatment with complete baseline data that included both the primary test and the retest. Test-retest reliability was evaluated with a mean time difference of 7.2 ± 1.6 days. For the construct validity sample, data collection took place between January 2012 and May 2020. We included 3742 patients who completed the Satisfaction with Treatment Result Questionnaire, VAS, and the Net Promotor Score (NPS) at 3 months. Construct validity was evaluated using hypothesis testing in which we correlated the patients' level of satisfaction to the willingness to undergo the treatment again, VAS scores, and the NPS. We performed additional hypothesis testing on 2306 patients who also completed the Michigan Hand Outcomes Questionnaire (MHQ). Satisfaction with the treatment result was measured as the patients' level of satisfaction on a 5-point Likert scale and their willingness to undergo the treatment again under similar circumstances. RESULTS We found high reliability for level of satisfaction measured on Likert scale (intraclass correlation coefficient 0.86 [95% CI 0.81 to 0.89]) and almost-perfect agreement for both level of satisfaction measured on the Likert scale (weighted kappa 0.86 [95% CI 0.80 to 0.91]) and willingness to undergo the treatment again (kappa 0.81 [95% CI 0.70 to 0.92]) of the Satisfaction with Treatment Result Questionnaire. Construct validity was good to excellent as seven of the eight hypotheses were confirmed. In the confirmed hypotheses, there was a moderate-to-strong correlation with VAS pain, VAS function, NPS, MHQ pain, and MHQ general hand function (Spearman rho ranged from 0.43 to 0.67; all p < 0.001) and a strong to very strong correlation with VAS satisfaction and MHQ satisfaction (Spearman rho 0.73 and 0.71; both p < 0.001). The rejected hypothesis indicated only a moderate correlation between the level of satisfaction on a 5-point Likert scale and the willingness to undergo the treatment again under similar circumstances (Spearman rho 0.44; p < 0.001). CONCLUSION The Satisfaction with Treatment Result Questionnaire has good-to-excellent construct validity and very high test-retest reliability in patients with hand and wrist conditions. CLINICAL RELEVANCE This questionnaire can be used to reliably and validly measure satisfaction with treatment result in striving for patient-centered care and value-based healthcare. Future research should investigate predictors of variation in satisfaction with treatment results.

Journal ArticleDOI
TL;DR: Reliability estimates with RMSD suggested that change within +/- 1 SD on these measures may reflect typical test- retest variability, and future teleneuropsychology test-retest reliability research is needed with larger, more diverse samples and in clinical populations.
Abstract: Prior teleneuropsychological research has assessed the reliability between in-person and remote administration of cognitive assessments. Few, if any, studies have examined the test-retest reliability of cognitive assessments conducted in sequential clinic-to-home or home-to-home teleneuropsychological evaluations - a critical issue given the state of clinical practice during the COVID-19 pandemic. This study examined this key psychometric question for several cognitive tests administered over repeated videoconferencing visits 4-6 months apart in a sample of healthy English-speaking adults.A total of 44 participants (ages 18-75) completed baseline and follow-up cognitive testing 4-6 months apart. Testing was conducted in a home-to-home setting over HIPAA-compliant videoconferencing meetings on participants' audio-visual enabled laptop or desktop computers. The following measures were repeated at both virtual visits: the Controlled Oral Word Association Test (FAS), Category Fluency (Animals), and Digit Span Forward and Backward from the Wechsler Adult Intelligence Scale, Fourth Edition. Intraclass correlation coefficients (ICC), Pearson correlations, root mean square difference (RMSD), and concordance correlation coefficients (CCC) were calculated as test-retest reliability metrics, and practice effects were assessed using paired-samples t-tests.Some tests exhibited small practice effects, and test-retest reliability was marginal or worse for all measures except FAS, which had adequate reliability (based on ICC and r). Reliability estimates with RMSD suggested that change within +/- 1 SD on these measures may reflect typical test-retest variability.The included cognitive measures exhibited questionable reliability over repeated home-to-home videoconferencing evaluations. Future teleneuropsychology test-retest reliability research is needed with larger, more diverse samples and in clinical populations.

Journal ArticleDOI
21 Jul 2021
TL;DR: The virtual physical performance measures appear to have high reliability and the findings are generalizable across health conditions among veterans, indicating they are reliable for evaluating physical performance in older veterans in virtual settings.
Abstract: Objective To determine the reliability of 3 physical performance tests performed via a telehealth visit (30-s arm curls test, 30-s chair stand test, 2-min step test) among community-dwelling older veterans. Design Cross sectional study. Setting Virtual. Participants Veterans (N=55; mean age 75y) who enrolled in Gerofit, a virtual group exercise program. Interventions Not applicable. Main Outcome Measures Participants were tested by 2 different assessors at 1 time point. The intraclass correlation coefficient (ICC) with 95% confidence intervals and Bland-Altman plots were used as measures of reliability. To assess generalizability, ICCs were further evaluated by health conditions (type 2 diabetes, arthritis, obesity, depression). Results Assessments were conducted among 55 participants. The ICC was above 0.98 for all 3 tests across health conditions and Bland-Altman plots indicated that there were no significant systematic errors in the measurement. Conclusions The virtual physical performance measures appear to have high reliability and the findings are generalizable across health conditions among veterans. Thus, they are reliable for evaluating physical performance in older veterans in virtual settings.

Journal ArticleDOI
TL;DR: In this article, a simple and efficient estimating equations approach to analyze cluster-period means is proposed to address the computational burden associated with estimating equations defined based on individual-level observations, and enables fast point and interval estimation of the intervention effect and correlations.
Abstract: Stepped wedge cluster randomized trials (SW-CRTs) with binary outcomes are increasingly used in prevention and implementation studies. Marginal models represent a flexible tool for analyzing SW-CRTs with population-averaged interpretations, but the joint estimation of the mean and intraclass correlation coefficients (ICCs) can be computationally intensive due to large cluster-period sizes. Motivated by the need for marginal inference in SW-CRTs, we propose a simple and efficient estimating equations approach to analyze cluster-period means. We show that the quasi-score for the marginal mean defined from individual-level observations can be reformulated as the quasi-score for the same marginal mean defined from the cluster-period means. An additional mapping of the individual-level ICCs into correlations for the cluster-period means further provides a rigorous justification for the cluster-period approach. The proposed approach addresses a long-recognized computational burden associated with estimating equations defined based on individual-level observations, and enables fast point and interval estimation of the intervention effect and correlations. We further propose matrix-adjusted estimating equations to improve the finite-sample inference for ICCs. By providing a valid approach to estimate ICCs within the class of generalized linear models for correlated binary outcomes, this article operationalizes key recommendations from the CONSORT extension to SW-CRTs, including the reporting of ICCs.

Journal ArticleDOI
TL;DR: In this paper, a final version of a patient-rated questionnaire that captures the presence and severity of non-motor fluctuations in levodopa-treated Parkinson patients (NoMoFA) was validated.
Abstract: Background In patients with Parkinson's disease (PD), sleep, mood, cognitive, autonomic, and other non-motor symptoms may fluctuate in a manner similar to motor symptoms. Objectives To validate a final version of a patient-rated questionnaire that captures the presence and severity of non-motor fluctuations in levodopa-treated PD patients (NoMoFA). Methods We recruited PD subjects from five movement disorders centers across the US and Canada. We assessed the internal consistency, floor and ceiling effects, test-retest reliability, and concurrent validity of NoMoFA. Classical test theory and item response theory methods informed item reduction and Delphi process yielded a final questionnaire. Results Two hundred subjects and their care-partners participated in the study (age: 66.4 ± 9.6 years; disease duration: 9 ± 5.5 years; median Hoehn and Yahr [HY mean Unified Parkinson's Disease Rating Scale (UPDRS) III ON score: 27.4 ± 14.9). Acceptability of the scale was adequate. There were floor effects in 8/28 items. Cronbach's alpha was 0.894. While eight items had "item-to-total" correlations below the cutoff of 0.4, removing these items did not improve Cronbach's alpha. Test-retest reliability was acceptable (intraclass correlation coefficient [ICC] 0.73; 95% confidence interval, 0.64-0.80). Concurrent validity was adequate with all Spearman's rho values comparing NoMoFA score to other measures of parkinsonian severity showing significance and in the expected direction. A final Delphi panel eliminated one item to avoid redundancy. Conclusions The final 27-item self-administered NoMoFA is a valid and reliable questionnaire, capturing both static and fluctuating non-motor symptoms in PD. © 2021 International Parkinson and Movement Disorder Society.

Journal ArticleDOI
TL;DR: The 6MWT has high test-retest reliability, when used to assess individuals with stroke, and other types of reliability and SEM and MDC need further investigations in populations with a stroke.
Abstract: The six-minute walking test (6MWT) is a simple and widely used measure of functional capacity. The aim of this systematic review is to summarize findings on reliability of 6MWT in subjects who have had a stroke. Two independent investigators conducted an extensive search in multidisciplinary electronic databases from inception to August 2019, and selected complete original studies on the reliability of the 6MWT used to assess individuals with stroke. Two reviewers independently extracted data and evaluated methodological quality. Outcome for meta-analysis was reliability, measured by intraclass correlation coefficient (ICC). In addition, standard error of measurement (SEM) and minimal detectable change (MDC) were recorded. Of the 241 potentially relevant articles screened, 6 met inclusion criteria and 5 of them were included in meta-analysis. Combined correlation coefficient of .98 (confidence interval .98–.99) was found for test-retest reliability. Only one study investigated inter-rater and intra-rater reliability. SEM and MDC values were rarely reported. The 6MWT has high test-retest reliability, when used to assess individuals with stroke. Other types of reliability and SEM and MDC need further investigations in populations with a stroke.

Journal ArticleDOI
TL;DR: In this paper, the authors assess the reliability of CEFBOT, an artificial intelligence-based cephalometry software, for landmark annotation and linear and angular measurement according to the CPT.
Abstract: Objective:To assess the reliability of CEFBOT, an artificial intelligence (AI)-based cephalometry software, for cephalometric landmark annotation and linear and angular measurement according to Arn...

Journal ArticleDOI
Feng-Zhe Wang1, He Sun1, Jun Zhou, Ling-Ling Sun1, Shinong Pan1 
TL;DR: MRI exhibited good interobserver reliability and excellent agreement with CT for quantification of abdominal SMA, which revealed that there was a strong correlation between the two technologies.

Journal ArticleDOI
TL;DR: The standardized isotonic protocol with computerized dynamometry was reliable in assessing quadriceps power in COPD and correlates best with functional capacity, indicating higher relevance than static measures when investigating determinants of function.
Abstract: Background: Muscle power declines with age and is a stronger determinant of physical function than strength. Muscle power using computerized dynamometry has not been investigated in COPD.Objectives: To determine: 1) test-retest reliability of quadriceps power using a standardized protocol with computerized dynamometry; and 2) associations between quadriceps strength and power, and functional capacity.Design/Setting: Prospective observational study in four Canadian research labs.Participants: People with mild to very severe COPD.Methods: Tests were conducted on two days. Quadriceps muscle maximal strength was evaluated during a static maneuver using maximal voluntary isometric contractions (MVIC). Rate of torque development (RTD) during MVIC was used to assess explosive force. Muscle power was measured using a dynamic, isotonic protocol from which peak and average power and peak velocity were derived. Functional capacity was assessed with the Short Physical Performance Battery (SPPB). Reliability was assessed using intraclass correlation coefficients (ICC), standard error of measurements (SEM), and Bland Altman plots. Spearman and Pearson correlation coefficients were used for associations.Results: 65 patients (age 69 ± 8 years; FEV1 48 ± 21% of predicted) were included. ICC was 0.77 for RTD and 0.87-0.98 for isotonic power measures (95%CI 0.63-0.99, p 30% for RTD. SPPB had moderate correlation with average power, but not with MVIC or RTD.Conclusion: The standardized isotonic protocol with computerized dynamometry was reliable in assessing quadriceps power in COPD. Our data highlights that average power correlates best with functional capacity, indicating higher relevance than static measures when investigating determinants of function.

Journal ArticleDOI
TL;DR: The semi-automaticWebCeph can overcome some limitations of the automatic WebCeph; however, it should be used for cephalometric analysis with a great deal of caution.

Journal ArticleDOI
TL;DR: The isokinetic, isometric, and functional assessments used in this return to sports testing battery demonstrates acceptable validity and reliability and offers clinicians information that can be utilized in clinical decision-making as it relates the testing battery's psychometric properties.

Journal ArticleDOI
TL;DR: The Eating and Drinking Ability Classification System is a reliable and valid tool for classifying eating and drinking ability in adults with CP and is a valuable adjunct to comprehensive functional classification in Adults with CP.
Abstract: The Eating and Drinking Ability Classification System (EDACS) was developed to evaluate dysphagia in children with cerebral palsy (CP). This study aimed to investigate the interrater reliability and validity of the EDACS in adults with CP. This cross-sectional study included 117 community-dwelling adults (mean age, 37.9 ± 12.5 years) with a confirmed CP diagnosis. A swallowing occupational therapist (SwOT) conducted detailed interviews with participants and/or caregivers to classify the EDACS. Another SwOT and participants/caregivers evaluated the EDACS. Correlations were evaluated between the EDACS and Functional Oral Intake Scale (FOIS), Swallowing Quality of Life (SWAL-QOL), Gross Motor Function Classification System (GMFCS), and Manual Ability Classification System (MACS). Interrater reliabilities between SwOTs (κ = 0.866, intraclass correlation coefficient (ICC) = 0.867), and between SwOT and participant/caregiver (κ = 0.884, ICC = 0.717) were reported. The EDACS correlated with the FOIS, SWAL-QOL, and MACS, although no significant correlation was found with the GMFCS. The EDACS of spastic-type showed better correlation than that of dyskinetic-type with the FOIS, MACS, and GMFCS. There was a significant correlation between the EDACS and the GMFCS in those aged ≤ 30 years, whereas there was no correlation in those aged ≥ 30 years. The EDACS is a reliable and valid tool for classifying eating and drinking ability in adults with CP. The correlation between the EDACS with gait or hand function was more prominent in individuals with spastic CP and in younger individuals. The EDACS is a valuable adjunct to comprehensive functional classification in adults with CP.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper found that CTQ-SF is reliable to assess childhood trauma exposures in schizophrenia over relatively long intervals, regardless of patients' current symptoms and states of cognition.
Abstract: Background Many studies have reported an association between childhood trauma exposure and schizophrenia. Among these studies, the Short-form Childhood Trauma Questionnaire (CTQ-SF) is one of the most widely used measures of childhood trauma. However, little is known regarding the long-term reliability of the CTQ-SF, especially in patients with psychopathology. Methods The CTQ-SF was administered to 50 patients diagnosed with schizophrenia from a hospital in Changsha, Hunan, China. These patients were asked to re-complete the CTQ-SF when they were re-hospitalized or received outpatient treatments in the same hospital within 4 years of follow-up. Intraclass correlation coefficient (ICC) was used to assess test-retest reliability of the CTQ-SF over the intervals. Associations of the CTQ-SF with the Positive and Negative Syndrome Scale (PANSS) and Wechsler Adult Intelligence Scale (WAIS) were tested using Spearman correlation coefficients. Results Among the participants, 35 (70.0%) patients re-completed the CTQ-SF after an interval averaging 11.26 months. Excellent test-retest reliabilities (with ICC > 0.75) were found for the total CTQ-SF score (ICC = 0.772) as well as scores of the emotional abuse (ICC = 0.808), physical abuse (ICC = 0.756), sexual abuse (ICC = 0.877) and physical neglect (ICC = 0.751) subscales. Meanwhile, a moderate test-retest reliability was found for the emotional neglect subscale (ICC = 0.538). At both baseline and follow-up, no significant correlations (p > 0.05) were found between CTQ-SF scores and any other clinical assessments. Conclusion Our results suggest that CTQ-SF is reliable to assess childhood trauma exposures in schizophrenia over relatively long intervals, regardless of patients' current symptoms and states of cognition.

Journal ArticleDOI
TL;DR: In this paper, the diagnostic accuracy of grading steatosis with reference to magnetic resonance imaging-based proton density fat fraction (MRI-PDFF), a non-invasive method with high accuracy, in a large cohort was analyzed.