scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 2015"


Journal ArticleDOI
TL;DR: The evidence reviewed indicated high interdevice reliability for steps, distance, energy expenditure, and sleep for certain Fitbit models, and consistency between the devices was high.
Abstract: Consumer-wearable activity trackers are electronic devices used for monitoring fitness- and other health-related metrics. The purpose of this systematic review was to summarize the evidence for validity and reliability of popular consumer-wearable activity trackers (Fitbit and Jawbone) and their ability to estimate steps, distance, physical activity, energy expenditure, and sleep. Searches included only full-length English language studies published in PubMed, Embase, SPORTDiscus, and Google Scholar through July 31, 2015. Two people reviewed and abstracted each included study. In total, 22 studies were included in the review (20 on adults, 2 on youth). For laboratory-based studies using step counting or accelerometer steps, the correlation with tracker-assessed steps was high for both Fitbit and Jawbone (Pearson or intraclass correlation coefficients (CC) > =0.80). Only one study assessed distance for the Fitbit, finding an over-estimate at slower speeds and under-estimate at faster speeds. Two field-based studies compared accelerometry-assessed physical activity to the trackers, with one study finding higher correlation (Spearman CC 0.86, Fitbit) while another study found a wide range in correlation (intraclass CC 0.36–0.70, Fitbit and Jawbone). Using several different comparison measures (indirect and direct calorimetry, accelerometry, self-report), energy expenditure was more often under-estimated by either tracker. Total sleep time and sleep efficiency were over-estimated and wake after sleep onset was under-estimated comparing metrics from polysomnography to either tracker using a normal mode setting. No studies of intradevice reliability were found. Interdevice reliability was reported on seven studies using the Fitbit, but none for the Jawbone. Walking- and running-based Fitbit trials indicated consistently high interdevice reliability for steps (Pearson and intraclass CC 0.76–1.00), distance (intraclass CC 0.90–0.99), and energy expenditure (Pearson and intraclass CC 0.71–0.97). When wearing two Fitbits while sleeping, consistency between the devices was high. This systematic review indicated higher validity of steps, few studies on distance and physical activity, and lower validity for energy expenditure and sleep. The evidence reviewed indicated high interdevice reliability for steps, distance, energy expenditure, and sleep for certain Fitbit models. As new activity trackers and features are introduced to the market, documentation of the measurement properties can guide their use in research settings.

947 citations


DatasetDOI
TL;DR: In this article, a brief review of reliability theory and interrater reliability is provided, followed by a set of practical guidelines for the calculation of ICC in SPSS in order to get it right.
Abstract: Intraclass correlation (ICC) is one of the most commonly misused indicators of interrater reliability, but a simple step-by-step process will get it right In this article, I provide a brief review of reliability theory and interrater reliability, followed by a set of practical guidelines for the calculation of ICC in SPSS 1

195 citations


Journal ArticleDOI
TL;DR: The CVS-Q has acceptable psychometric properties, making it a valid and reliable tool to control the visual health of computer workers, and can potentially be used in clinical trials and outcome research.

168 citations


Journal ArticleDOI
TL;DR: While the Kinect V2 body tracking may not accurately obtain lower body kinematic data, it shows great potential as a tool for measuring spatiotemporal aspects of gait.

153 citations


Journal ArticleDOI
TL;DR: The study presents norms of the 2MWT established by the NIH Toolbox, which can be used to determine the presence of limitations in walking endurance across the adult lifespan.

152 citations


Journal ArticleDOI
07 Aug 2015-PLOS ONE
TL;DR: The CES-D appears to be a valid, reliable, sensitive and responsive instrument for screening and monitoring depressive symptoms in adult Chinese primary care patients.
Abstract: Background The Center for Epidemiologic Studies Depression Scale (CES-D) is a commonly used instrument to measure depressive symptomatology. Despite this, the evidence for its psychometric properties remains poorly established in Chinese populations. The aim of this study was to validate the use of the CES-D in Chinese primary care patients by examining factor structure, construct validity, reliability, sensitivity and responsiveness. Methods and Results The psychometric properties were assessed amongst a sample of 3686 Chinese adult primary care patients in Hong Kong. Three competing factor structure models were examined using confirmatory factor analysis. The original CES-D four-structure model had adequate fit, however the data was better fit into a bi-factor model. For the internal construct validity, corrected item-total correlations were 0.4 for most items. The convergent validity was assessed by examining the correlations between the CES-D, the Patient Health Questionnaire 9 (PHQ-9) and the Short Form-12 Health Survey (version 2) Mental Component Summary (SF-12 v2 MCS). The CES-D had a strong correlation with the PHQ-9 (coefficient: 0.78) and SF-12 v2 MCS (coefficient: -0.75). Internal consistency was assessed by McDonald’s omega hierarchical (ωH). The ωH value for the general depression factor was 0.855. The ωH values for “somatic”, “depressed affect”, “positive affect” and “interpersonal problems” were 0.434, 0.038, 0.738 and 0.730, respectively. For the two-week test-retest reliability, the intraclass correlation coefficient was 0.91. The CES-D was sensitive in detecting differences between known groups, with the AUC >0.7. Internal responsiveness of the CES-D to detect positive and negative changes was satisfactory (with p value 0.2). The CES-D was externally responsive, with the AUC>0.7. Conclusions The CES-D appears to be a valid, reliable, sensitive and responsive instrument for screening and monitoring depressive symptoms in adult Chinese primary care patients. In its original four-factor and bi-factor structure, the CES-D is supported for cross-cultural comparisons of depression in multi-center studies.

133 citations


Journal ArticleDOI
TL;DR: To determine the effect of visual inspection on sample size required for studies of MRI‐derived cortical thickness, the number of subjects required to show group differences was calculated and significant differences observed across imaging sites, between visually approved/disapproved subjects, and across regions with different sizes suggest that these measures should be used with caution.
Abstract: In the last decade, many studies have used automated processes to analyze magnetic resonance imaging (MRI) data such as cortical thickness, which is one indicator of neuronal health. Due to the convenience of image processing software (e.g., FreeSurfer), standard practice is to rely on automated results without performing visual inspection of intermediate processing. In this work, structural MRIs of 40 healthy controls who were scanned twice were used to determine the test-retest reliability of FreeSurfer-derived cortical measures in four groups of subjects-those 25 that passed visual inspection (approved), those 15 that failed visual inspection (disapproved), a combined group, and a subset of 10 subjects (Travel) whose test and retest scans occurred at different sites. Test-retest correlation (TRC), intraclass correlation coefficient (ICC), and percent difference (PD) were used to measure the reliability in the Destrieux and Desikan-Killiany (DK) atlases. In the approved subjects, reliability of cortical thickness/surface area/volume (DK atlas only) were: TRC (0.82/0.88/0.88), ICC (0.81/0.87/0.88), PD (0.86/1.19/1.39), which represent a significant improvement over these measures when disapproved subjects are included. Travel subjects' results show that cortical thickness reliability is more sensitive to site differences than the cortical surface area and volume. To determine the effect of visual inspection on sample size required for studies of MRI-derived cortical thickness, the number of subjects required to show group differences was calculated. Significant differences observed across imaging sites, between visually approved/disapproved subjects, and across regions with different sizes suggest that these measures should be used with caution.

131 citations


Journal ArticleDOI
TL;DR: The QUACS scale is highly reliable and exhibits strong construct validity and can confidently be applied in assessing the methodological quality of observational cadaveric dissection studies.
Abstract: Although systematic reviews are conducted in the field of anatomical research, no instruments exist for the assessment of study quality. Thus, our objective was to develop a valid tool that reliably assesses the methodological quality of observational cadaveric studies. The QUACS scale (QUality Appraisal for Cadaveric Studies) was developed using an expert consensus process. It consists of a 13-item checklist addressing the design, conduct and report of cadaveric dissection studies. To evaluate inter-rater reliability, a blinded investigator obtained an initial pool of 120 observational cadaveric studies. Sixty-eight of them were selected randomly according to sample size calculations. Three independent researchers rated each publication by means of the QUACS scale. The reliability of the total score was estimated using the intraclass correlation coefficient (ICC). To assess agreement among individual items, margin-free kappa values were calculated. For construct validity, two experts (an anatomist and an experienced physician) categorized the quality of 15 randomly selected studies as ‘excellent’ (4 points), ‘moderate to good’ (3 points), poor to moderate’ (2 points) or ‘poor’ (1 point). Kendall's tau rank correlation was used to compare the expert ratings with the scores on the QUACS scale. An evaluation of feasibility was carried out during the reliability analysis. All three raters recorded the duration of quality appraisal for each article. Means were used to describe average time exposure. The ICC for the total score was 0.87 (95% confidence interval: 0.82–0.92; P < 0.0001). For individual items, margin-free kappa values ranged between 0.56 and 0.96 with an agreement of 69–97% among the three raters. Kendall's tau B coefficient of the association between expert ratings and the results obtained with the QUACS scale was 0.69 (P < 0.01). Required rating time per article was 5.4 ± 1.6 min. The QUACS scale is highly reliable and exhibits strong construct validity. Thus, it can confidently be applied in assessing the methodological quality of observational dissection studies.

109 citations


Journal ArticleDOI
08 Dec 2015-Gut
TL;DR: The Inflammatory Bowel Disease Disability Index (IBD-DI) has been validated for use in clinical trials and epidemiological studies and showed high internal consistency, interobserver reliability and construct validity, and a moderate intraob server reliability.
Abstract: IBDs are chronic destructive disorders that negatively affect the functional status of patients. Recently, the Inflammatory Bowel Disease Disability Index (IBD-DI) was developed according to standard WHO processes. The aims of the current study were to validate the IBD-DI in an independent patient cohort, to develop an index-specific scoring system and to describe the disability status of a well-defined population-based cohort of French patients with IBD. From February 2012 to March 2014, the IBD-DI questionnaire was administered to a random sample of adult patients with an established diagnosis of IBD issued from a French population-based registry. The IBD-DI consists of 28 items that evaluate the four domains of body functions, activity participation, body structures and environmental factors. Validation included item reduction and data structure, construct validity, internal consistency, interobserver and intraobserver reliability evaluations. 150 patients with Crohn's disease (CD) and 50 patients with UC completed the IBD-DI validation phase. The intraclass correlation coefficient for interobserver reliability was 0.91 and 0.54 for intraobserver reliability. Cronbach's α of internal consistency was 0.86. IBD-DI scores varied from 0 to 100 with a mean of 35.3 (Q1=19.6; Q3=51.8). IBD-DI scores were highly correlated with Inflammatory Bowel Disease Questionnaire (-0.82; p<0.001) and SF-36 (-0.61; p<0.05) scores. Female gender (p<0.001), clinical disease activity (p<0.0001) and disease duration (p=0.02) were associated with higher IBD-DI scores. The IBD-DI has been validated for use in clinical trials and epidemiological studies. The IBD-DI showed high internal consistency, interobserver reliability and construct validity, and a moderate intraobserver reliability. It comprises 14 questions and ranges from 0 to 100. The mean IBD-DI score was 35.3 and was associated with gender, clinical disease activity and disease duration. Further research is needed to confirm the structural validity and to assess the responsiveness of IBD-DI. 2011-A00877-34.

109 citations


Journal ArticleDOI
TL;DR: The ICCs for AM-PAC “6-Clicks” total scores were very high and levels of agreement varied across pairs of raters, from large to nearly perfect for physical therapists and from moderate to nearlyperfect for occupational therapists.
Abstract: Background The interrater reliability of 2 new inpatient functional short-form measures, Activity Measure for Post-Acute Care (AM-PAC) “6-Clicks” basic mobility and daily activity scores, has yet to be established. Objective The purpose of this study was to examine the interrater reliability of AM-PAC “6-Clicks” measures. Design A prospective observational study was conducted. Methods Four pairs of physical therapists rated basic mobility and 4 pairs of occupational therapists rated daily activity of patients in 1 of 4 hospital services. One therapist in a pair was the primary therapist directing the assessment while the other therapist observed. Each therapist was unaware of the other's AM-PAC “6-Clicks” scores. Reliability was assessed with intraclass correlation coefficients (ICCs), Bland-Altman plots, and weighted kappa. Results The ICCs for the overall reliability of basic mobility and daily activity were .849 (95% confidence interval [CI]=.784, .895) and .783 (95% CI=.696, .847), respectively. The ICCs for the reliability of each pair of raters ranged from .581 (95% CI=.260, .789) to .960 (95% CI=.897, .983) for basic mobility and .316 (95% CI=−.061, .611) to .907 (95% CI=.801, .958) for daily activity. The weighted kappa values for item agreement ranged from .492 (95% CI=.382, .601) to .712 (95% CI=.607, .816) for basic mobility and .251 (95% CI=.057, .445) to .751 (95% CI=.653, .848) for daily activity. Mean differences between raters' scores were near zero. Limitations Raters were from one health system. Each pair of raters assessed different patients in different services. Conclusions The ICCs for AM-PAC “6-Clicks” total scores were very high. Levels of agreement varied across pairs of raters, from large to nearly perfect for physical therapists and from moderate to nearly perfect for occupational therapists. Levels of agreement for individual item scores ranged from small to very large.

107 citations


Journal ArticleDOI
TL;DR: While the Delphi process enabled to develop definitions and classification of intraoperative complications by severity, further research including a multicentre international full-scale validation needs to be conducted with the ultimate goal to contribute to standardized reporting in surgical practice and research.
Abstract: Standardized reporting of intraoperative adverse events is important to enhance transparency. To the best of our knowledge, there is no validated definition and classification of intraoperative complications. We conducted a two-round Delphi study to develop a definition and classification of intraoperative complications. Experts were contacted by email and sent a link to the online questionnaire. In a pilot study, two independent raters applied the definition and classification in a sample of 60 surgical interventions of low, intermediate, and high complexity and evaluated practicability. Interrater agreement of the classification was determined (raw categorical agreement, weighted kappa, and intraclass correlation). In the Delphi study, 40 of 52 experts (77 % return rate) from 14 countries took part in each round. The Delphi study resulted in a comprehensive definition of intraoperative complications as any deviation from the ideal intraoperative course occurring between skin incision and skin closure. The classification foresees four grades depending on the need for treatment (no need, grade I; need for treatment, grade II) and the severity of the complication (life-threatening/permanent disability, grade III; death, grade IV). The pilot study showed good practicability (6 on a 7-point scale) and a high raw agreement of 87 %, a weighted kappa of 0.83 [95 % confidence interval (CI) 0.73–0.94] and an intraclass correlation coefficient of 0.83 (95 % CI 0.73–0.90). While the Delphi process enabled to develop definitions and classification of intraoperative complications by severity, further research including a multicentre international full-scale validation needs to be conducted with the ultimate goal to contribute to standardized reporting in surgical practice and research.

Journal ArticleDOI
TL;DR: The 5L is more promising compared to the 3L in terms of a lower ceiling, more discriminatory power, and higher preference by the respondents, and should be recommended as a preferred health-related quality of life measure in Thailand.
Abstract: The EQ-5D is a health-related quality of life instrument which provides a simple descriptive health profile and a single index value for health status. The latest version, the EQ-5D-5L, has been translated into more than one hundred languages worldwide - including Thai. This study aims to assess the measurement properties of the Thai version of the EQ-5D-5L (the 5L) compared to the EQ-5D-3L (the 3L). A total of 117 diabetes patients treated with insulin completed a questionnaire including the 3L and the 5L. The 3L and 5L were compared in terms of distribution, ceiling, convergent validity, discriminative power, test-retest reliability, feasibility, and patient preference. Convergent validity was tested by assessing the relationship between each dimension of the EQ-5D and SF-36v2 using Spearman’s rank-order correlation. Discriminative power was determined by the Shannon index (H ′) and Shannon’s Evenness index (J ′). The test-retest reliability was assessed by examining the intraclass correlation coefficient (ICC) and Cohen’s weighted kappa coefficient. No inconsistent response was found. The 5L trended towards a slightly lower ceiling compared with the 3L (33% versus 29%). Regarding redistribution, 69% to 100% of the patients answering level 2 with the 3L version redistributed their responses to level 2 with the 5L version while about 9% to 22% redistributed their responses to level 3 with the 5L version. The Shannon index (H ′) improved with the 5L while the Shannon's Evenness index (J ′) reduced slightly. Convergent validity and test-retest reliability was confirmed for both 3L and 5L. Evidence supported the convergent validity and test-retest reliability of both the 3L and 5L in diabetes patients. However, the 5L is more promising compared to the 3L in terms of a lower ceiling, more discriminatory power, and higher preference by the respondents. Thus, the 5L should be recommended as a preferred health-related quality of life measure in Thailand.

Journal ArticleDOI
TL;DR: The Malay version of the IPAQ-M demonstrated good reliability and validity for the evaluation of physical activity among this Malay population.
Abstract: The International Physical Activity Questionnaire (IPAQ) was developed to assess the physical activity patterns in populations. The authors aim to examine the reliability and validity of the Malay version of IPAQ (IPAQ-M). The IPAQ-M was self-administered twice at a 1-week interval to assess its test-retest reliability. Criterion validity was assessed between the IPAQ-M and a 7-day physical activity log (PA-Log). A total of 81 Malay adults participated in the study. Intraclass correlation coefficients (ICC), kappa (κ), correlation coefficients (ρ), and Bland-Altman plot were used for data analyses. The ICC scores revealed moderate to good correlations (ICC = 0.54-0.92; P < .001) on items categorized by intensities and domains and a κ of 0.73 for total activity. Validity results from the PA-Log were statistically significant (P < .001) across intensities and domains (ρ = 0.67-0.98). The IPAQ-M demonstrated good reliability and validity for the evaluation of physical activity among this Malay population.

Journal ArticleDOI
TL;DR: The kappa statistic is commonly used for quantifying inter-rater agreement on a nominal scale as mentioned in this paper, and it is a function of the proportion of observed and expected agreement.
Abstract: The kappa statistic is commonly used for quantifying inter-rater agreement on a nominal scale. In this review article we discuss five interpretations of this popular coefficient. Kappa is a function of the proportion of observed and expected agreement, and it may be interpreted as the proportion of agreement corrected for chance. Furthermore, kappa may be interpreted as the average category reliability as well as an intraclass correlation.

Journal ArticleDOI
TL;DR: While common TMS measures cannot be reliably used as a biomarker to detect individual change, they can reliably detect change exceeding measurement noise in moderate-sized groups and should be used based on their reliability in particular contexts.
Abstract: The reliability of transcranial magnetic stimulation (TMS) measures in healthy older adults and stroke patients has been insufficiently characterized. We determined whether common TMS measures could reliably evaluate change in individuals and in groups using the smallest detectable change (SDC), or could tell subjects apart using the intraclass correlation coefficient (ICC). We used a single-rater test-retest design in older healthy, subacute stroke, and chronic stroke subjects. At twice daily sessions on two consecutive days, we recorded resting motor threshold, test stimulus intensity, recruitment curves, short-interval intracortical inhibition, and facilitation, and long-interval intracortical inhibition. Using variances estimated from a random effects model, we calculated the SDC and ICC for each TMS measure. For all TMS measures in all groups, SDCs for single subjects were large; only with modest group sizes did the SDCs become low. Thus, while these TMS measures cannot be reliably used as a biomarker to detect individual change, they can reliably detect change exceeding measurement noise in moderate-sized groups. For several of the TMS measures, ICCs were universally high, suggesting that they can reliably discriminate between subjects. TMS measures should be used based on their reliability in particular contexts. More work establishing their validity, responsiveness, and clinical relevance is still needed.

Journal ArticleDOI
TL;DR: High reliability and relatively moderate validity were found for the Persian translation of the Modifiable Activity Questionnaire in a Tehranian adolescent population.
Abstract: Background: The purpose of this study was to evaluate the validity and reliability on the Persian translation of the Modifiable Activity Questionnaire (MAQ) in a sample of Tehranian adolescents. Methods: Of a total of 52 subjects, a sub‑sample of 40 participations (55.0% boys) was used to assess the reliability and the validity of the physical activity questionnaire. The reliability of the two MAQs was calculated by intraclass correlation coefficients, and validation was evaluated using Pearson correlation coefficients to compare data between mean of the two MAQs and mean of four physical activity records. Results: Intraclass correlation coefficient was calculated to assess the reliability between two MAQs and the results of leisure time physical activity over the past year were 0.97. Pearson correlation coefficients between mean of two MAQs and mean of four physical activity records were 0.49 (P < 0.001), for leisure time physical activities. Conclusions: High reliability and relatively moderate validity were found for the Persian translation of the MAQ in a Tehranian adolescent population. Further studies with large sample size are suggested to assess the validity more precisely.

Journal ArticleDOI
TL;DR: The smartphone app is both reliable and valid, provides a low-cost method of measuring range of motion, and can be easily incorporated into clinical practice.
Abstract: In clinical and research settings, objective range of motion measurement is an essential component of lower limb assessment and treatment evaluation. One reliable tool is the digital inclinometer; however, availability and cost preclude its widespread use. Smartphone apps are now widely available, allowing smartphones to be used as an inclinometer. Reliability and validity studies of new technologies are scarce. Intrarater and interrater reliability of the iHandy Level app installed on a smartphone and an inclinometer were assessed in 20 participants for ankle dorsiflexion using a weight-bearing lunge test. Criterion validity was assessed between a Fastrak and the app, and construct validity was assessed between the inclinometer and the app. Intraclass correlation coefficients2,1 demonstrated excellent intrarater and interrater reliability (intraclass correlation coefficient, 0.97 and 0.76, respectively). Tests of validity demonstrated excellent correlation between all three methods (r > 0.99). The smartphone app is both reliable and valid, provides a low-cost method of measuring range of motion, and can be easily incorporated into clinical practice.

Journal ArticleDOI
TL;DR: Keops® has no bias compared to the traditionally paper measurement, and moreover, the repeatability and the reproducibility of measurements with this method is much better than with similar standard radiologic measures done manually in both frontal and sagittal plane and that the use of this software can be recommended for clinical application.
Abstract: The purpose of this study was to evaluate the inter- and intra-observer variability of the computerized radiologic measurements using Keops® and to determine the bias between the software and the standard paper measurement. Four individuals measured all frontal and sagittal variables on the 30 X-rays randomly selected on two occasions (test and retest conditions). The Bland–Altman plot was used to determine the degree of agreement between the measurement on paper X-ray and the measurement using Keops® for all reviewers and for the two measures; the intraclass correlation coefficient (ICC) was calculated for each pair of analyses to assess interobserver reproducibility among the four reviewers for the same patient using either paper X-ray or Keops® measurement and finally, concordance correlation coefficient (rc) was calculated to assess intraobserver repeatability among the same reviewer for one patient between the two measure using the same method (paper or Keops®). The mean difference calculated between the two methods was minimal at −0, 4° ± 3.41° [−7.1; 6.4] for frontal measurement and 0.1° ± 3.52° [−6.7; 6.8] for sagittal measurement. Keops® has a better interobserver reproducibility than paper measurement for determination of the sagittal pelvic parameter (ICC = 0.9960 vs. 0.9931; p = 0.0001). It has a better intraobserver repeatability than paper for determination of Cobbs angle (rc = 0.9872 vs. 0.9808; p < 0.0001) and for pelvic parameter (rc = 0.9981 vs. 0.9953; p < 0.0001). We conclude that Keops® has no bias compared to the traditionally paper measurement, and moreover, the repeatability and the reproducibility of measurements with this method is much better than with similar standard radiologic measures done manually in both frontal and sagittal plane and that the use of this software can be recommended for clinical application. Diagnostic, level III.

Journal ArticleDOI
TL;DR: The NRS versions of ESAS and its revised version (ESAS-r), with the additional symptoms of constipation and sleep, are valid and reliable for measuring symptoms in this population of outpatients with advanced cancer.

Journal ArticleDOI
TL;DR: The insole system introduced to continuously measure kinetic and temporospatial gait parameters independently through an insole over up to 4 weeks is feasible for clinical trials that require step by step as well as grouped analysis of gait over a long period of time.
Abstract: A new tool (OpenGo, Moticon GmbH) was introduced to continuously measure kinetic and temporospatial gait parameters independently through an insole over up to 4 weeks. The goal of this study was to investigate the validity and reliability of this new insole system in a group of healthy individuals. Gait data were collected from 12 healthy individuals on a treadmill at two different speeds. In total, six trials of three minutes each were performed by every participant. Validation was performed with the FDM-S System (Zebris). Complete sensor data were used for a within test reliability analysis of over 10000 steps. Intraclass correlation was calculated for different gait parameters and analysis of variance performed. Intraclass correlation for the validation was >0.796 for temporospatial and kinetic gait parameters. No statistical difference was seen between the insole and force plate measurements (difference between means: 36.3 ± 27.19 N; p = 0.19 and 0.027 ± 0.028 s; p = 0.36). Intraclass correlation for the reliability was >0.994 for all parameters measured. The system is feasible for clinical trials that require step by step as well as grouped analysis of gait over a long period of time. Comparable validity and reliability to a stationary analysis tool has been shown.

Journal ArticleDOI
TL;DR: The physical activity and sedentary behaviour items of the HBSC questionnaire seem to be at the borderline of reliability to be used in adolescents.
Abstract: Better assessment of the reliability of the physical activity and sedentary behaviour items across countries in all WHO regions is highly needed. The aim of the study was to examine the test–retest reliability of selected physical activity and sedentary behaviour items of the HBSC questionnaire in Czech, Slovak and Polish adolescents. We obtained data from 693 Czech, Slovak and Polish (50.9 % boys) primary school pupils, grades five (mean age = 11.08; SD = 0.45) and nine (mean age = 15.12; SD = 0.45), who participated in a test–retest study in 2013. We used the single measures of Intraclass Correlation Coefficients (ICC) and Cohen’s Kappa statistic to estimate the test–retest reliability of all selected items within the sample and stratified by gender, age group and country. Both physical activity items (VPA and MVPA) and most of the sedentary behaviour items showed moderate agreement (ICC 0.41–0.60) and a similarly moderate correlation (Cohen’s Kappa 0.3–0.5) after dichotomization. The physical activity and sedentary behaviour items of the HBSC questionnaire seem to be at the borderline of reliability to be used in adolescents.

Journal ArticleDOI
TL;DR: The adoption of ICECAP-O and ASCOT as outcome measures in economic evaluations of care interventions for older adults that have a broader aim than health-related QOL because they are at least as reliable as the EQ-5D-3L and are associated with aspects of QOL broader than health.

01 Jan 2015
TL;DR: Scalpdex as mentioned in this paper is the first quality-of-life instrument specifically for patients with scalp dermatitis that is reliable, valid, and responsive, using principal axes factor analyses with orthogonal rotation correlated to the hypothesized scales.
Abstract: Results: Fifty-two dermatology patients completed the study. We demonstrated construct validity by confirming that the factors derived by principal axes factor analyses with orthogonal rotation correlated to our hypothesized scales (r = 0.76-0.84) and that differences in symptom, functioning, and emotion scores differed among the varying levels of self-reported scalp severity more than would be expected by chance (P.05 by analysis of variance). The instrument demonstrated reliability with internal consistency (Cronbach, 0.62-0.80) and reproducibility (intraclass correlation coefficient, 0.90-0.97). The quality-of-life scores changed in the expected direction in our test for responsiveness (P.05, by paired t test for functioning and emotion for those who improved). We ascertained the discriminant capability of Scalpdex compared with a dermatological generic quality-of-life tool, Skindex, by demonstrating superior responsiveness (P.005 by paired t test in functioning and emotion) and improved overall sensitivity in individual items. Conclusions: Scalpdex is, to our knowledge, the first quality-of-life instrument specifically for patients with scalp dermatitis that is reliable, valid, and responsive. Clinicians can use the instrument to determine which aspect of the disease most bothers the patient and to evaluate quality of life as one variable of responsiveness to the therapeutic intervention. Arch Dermatol. 2002;138:803-807

Journal ArticleDOI
TL;DR: The Hungarian version of the BICAMS test is a valid and reliable method for the evaluation of MS patients' cognitive function and it seems that because of the short retest period, the members of the HC group remembered the CVLT-II words thus performed better than the patients did.
Abstract: Background Multiple Sclerosis (MS) causes not only somatic, but also cognitive impairment regardless of the patients׳ age or the course of the disease. The Brief International Cognitive Assessment for MS (BICAMS) test, published in 2011, is a short cognitive questionnaire: a fast, reliable, sensitive and specific tool for the evaluation of the patients׳ cognitive state. Objectives Our primary objective was to assess the validity of the Hungarian version of the BICAMS test. Our secondary objective was to evaluate the impact of the cognitive impairment on the patient׳s quality of life and fatigue׳s impact on the patients׳ cognitive state. Methods 65 RR-MS patients and 65 age, sex and education matched healthy control (HC) subjects completed the test and were retested after 3 weeks. The patients also completed the MS Quality of Life 54 (MSQoL54) and the Fatigue Impact Scale (FIS) assessments. Group differences were calculated by paired sample T-tests. The test-retest reliability was measured by intraclass correlation coefficients. To analyze the difference between the test-retest performances of the two groups we used two-way repeated measures ANOVA where the BICAMS battery was the single composite outcome and one-way repeated measures ANOVA. To assess the impact of the cognitive decline on the patients׳ quality of life and fatigue׳s impact on the cognitive state, we examined the correlations between results in the BICAMS and the MSQoL54 and FIS. Results We found significant difference ( p ≤0.001, p =0.017 in the first CVLT-II assessment) between MS patients and members of the HC group in all four evaluated parameters of BICAMS test in both sessions. The correlation coefficients were very strong between the tests and retests ( r >0.8; p r =0.678, p p =0.020) better in the retest sessions as compared to their original performance than the patients did and this difference is solely due to the difference between the CVLT-II performances. We have found significant negative correlation between the patients׳ cognitive function and the fatigue score ( r p r >0.3; p Conclusions The Hungarian version of the BICAMS test is a valid and reliable method for the evaluation of MS patients׳ cognitive function. It seems that because of the short retest period, the members of the HC group remembered the CVLT-II words thus performed better than the patients did. Also apparently fatigue can have a negative impact on the patients׳ cognitive state, and cognitive impairment could worsen the patients׳ quality of life.

Journal ArticleDOI
TL;DR: Algometry is reliable and responsive to assess measures of pressure pain threshold for evaluating pain patients with knee osteoarthritis and the minimum-detectable-change and standard error of measurement of testing to facilitate clinical interpretation of temporal changes.
Abstract: [Purpose] This study aimed to establish the intrarater reliability and responsiveness of a clinically available algometer in patients with knee osteoarthritis as well as to determine the minimum-detectable-change and standard error of measurement of testing to facilitate clinical interpretation of temporal changes. [Subjects] Seventy-three patients with knee osteoarthritis were included. [Methods] Pressure pain threshold measured by algometry was evaluated 3 times at 2-min intervals over 2 clinically relevant sites-mediolateral to the medial femoral tubercle (distal) and lateral to the medial malleolus (local)-on the same day. Intrarater reliability was estimated by intraclass correlation coefficients. The minimum-detectable-change and standard error of measurement were calculated. As a measure of responsiveness, the effect size was calculated for the results at baseline and after treatment. [Results] The intrarater reliability was almost perfect (intraclass correlation coefficient = 0.93-0.97). The standard error of measurement and minimum-detectable-change were 0.70-0.66 and 1.62-1.53, respectively. The pressure pain threshold over the distal site was inadequately responsive in knee osteoarthritis, but the local site was responsive. The effect size was 0.70. [Conclusion] Algometry is reliable and responsive to assess measures of pressure pain threshold for evaluating pain patients with knee osteoarthritis.

Journal ArticleDOI
TL;DR: The EQ-5D-5L was practical, reliable, valid, and responsive in Thai patients with chronic diseases and had higher correlations with those of WHOQoL-BREF and SF-12v2.
Abstract: Due to the problem of high ceiling effects of the EQ-5D-3L, the EQ-5D-5L was developed. However, little was known about the full psychometric properties of the EQ-5D-5L. Thus, this study aimed to evaluate its practicality, reliability, validity, and responsiveness in Thai patients with chronic diseases. One thousand one hundred and fifty-six adults taking a medicine at least 3 months were identified from three university hospitals in Bangkok, Thailand, between July 2014 and March 2015. Practicality was evaluated by administration times and ceiling effects. Test–retest reliability was assessed using weighted kappa and intraclass correlation coefficients (ICCs). Validity was tested with correlations between the EQ-5D-5L and WHOQoL-BREF and SF-12v2, and known-groups validity. Responsiveness was measured with standardized effect sizes (SES). The mean administration time was approximately 2 min, and the ceiling effect of the EQ-5D-5L index was 13.6 %. The weighted kappa values and ICC of the EQ-5D-5L were 0.48–0.61 and 0.82, respectively. Similar dimensions of the EQ-5D-5L had higher correlations with those of WHOQoL-BREF and SF-12v2. As expected, elderly, female, low-educated, unemployed, higher number of comorbidities and medicines, patients’ perception of poor disease control, and having an adverse drug reaction tended to have poorer EQ-5D-5L scores. The SES of EQ-5D-5L index and EQ-VAS were considered small (0.33–0.42) for the improved group. For the worsened group, the SES of the EQ-5D-5L index were considered small (−0.29) but that of the EQ-VAS considered large (−0.82). The EQ-5D-5L was practical, reliable, valid, and responsive in Thai patients with chronic diseases.

Journal ArticleDOI
TL;DR: Using the Kinect to independently assess the multiple components of the TUG may provide reliable and clinically useful information that could enable efficient and information-rich large-scale assessments of physical deficits following stroke.
Abstract: Background. The Microsoft Kinect presents a simple, inexpensive, and portable method of examining the independent components of the Timed Up and Go (TUG) without any intrusion on the patient. Objective. This study examined the reliability of these measures, and whether they improved prediction of performance on common clinical tests. Methods. Thirty individuals with stroke completed 4 clinical assessments, including the TUG, 10-m walk test (10MWT), Step Test, and Functional Reach test on 2 testing occasions. The TUG was assessed using the Kinect to determine 7 different functional components. Test-retest reliability was assessed using intraclass correlation coefficient (ICC), redundancy using Spearman's correlation, and score prediction on the clinical tests using multiple regression. Results. All Kinect-TUG variables possessed excellent reliability (ICC(2,k) > 0.90) except trunk flexion angle (ICC = 0.73). Trunk flexion angle and first step length were nonredundant with total TUG time. When predicting 10MWT and Step Test scores, adding step length into regression models comprising age and total TUG time improved model performance by 7% (P <.01) and 6% (P =.03), respectively. Specifically, an interquartile range increase in first step length (0.19 m) was associated with a 0.15 m/s faster gait speed and 1.8 more repetitions on the Step Test. These effect sizes were comparable to our minimal detectable change scores of 0.17 m/s for gait speed and 1.71 repetitions for the Step Test. Conclusions. Using the Kinect to independently assess the multiple components of the TUG may provide reliable and clinically useful information. This could enable efficient and information-rich large-scale assessments of physical deficits following stroke.

Journal ArticleDOI
06 Mar 2015-PLOS ONE
TL;DR: Although AM of WC, HC, and WHR are higher when compared to MM based on WHO guidelines, the data indicate good validity, excellent reliability, and similar correlations to parameters of the MetS.
Abstract: Objective: Body surface scanners (BS), which visualize a 3D image of the human body, facilitate the computation of numerous body measures, including height, waist circumference (WC) and hip circumference (HC). However, limited information is available regarding validity and reliability of these automated measurements (AM) and their correlation with parameters of the Metabolic Syndrome (MetS) compared to traditional manual measurements (MM). Methods: As part of a cross-sectional feasibility study, AM of WC, HC and height were assessed twice in 60 participants using a 3D BS (VitussmartXXL). Additionally, MM were taken by trained personnel according to WHO guidelines. Participants underwent an interview, bioelectrical impedance analysis, and blood pressure measurement. Blood samples were taken to determine HbA1c, HDL-cholesterol, triglycerides, and uric acid. Validity was assessed based on the agreement between AM and MM, using Bland-Altman-plots, correlation analysis, and paired t-tests. Reliability was assessed using intraclass correlation coefficients (ICC) based on two repeated AM. Further, we calculated age-adjusted Pearson correlation for AM and MM with fat mass, systolic blood pressure, HbA1c, HDL-cholesterol, triglycerides, and uric acid. Results: Body measures were higher in AM compared to MM but both measurements were strongly correlated (WC, men, difference = 1.5cm, r = 0.97; women, d = 4.7cm, r = 0.96; HC, men, d = 2.3cm, r = 0.97; women, d = 3.0cm; r = 0.98). Reliability was high for all AM (nearly all ICC>0.98). Correlations of WC, HC, and the waist-to-hip ratio (WHR) with parameters of MetS were similar between AM and MM; for example the correlation of WC assessed by AM with HDL-cholesterol was r = 0.35 in men, and r = -0.48 in women, respectively whereas correlation of WC measured manually with HDL cholesterol was r = -0.41 in men, and r = -0.49 in women, respectively. Conclusions: Although AM of WC, HC, and WHR are higher when compared to MM based on WHO guidelines, our data indicate good validity, excellent reliability, and similar correlations to parameters of the MetS.

Journal ArticleDOI
TL;DR: The excellent ICCs observed in this study support the utility of using multiple RAs in large cohort studies using standardised protocols, with the caveat that an absence of any confounding of study estimates by rater is checked, due to systematic rater bias identified inThis study.

Journal ArticleDOI
TL;DR: In this article, the reliability of a Matlab toolbox for the fully automated, pre-and post-processing of resting state EEG (automated analysis, AA) was compared with analysis involving visually controlled pre- and postprocessing (VA), and the reliability over time was assessed with intraclass correlation coefficients (ICC).