scispace - formally typeset
Search or ask a question

Showing papers on "Intraclass correlation published in 2013"


Journal ArticleDOI
09 Sep 2013-PLOS ONE
TL;DR: A case is made for clinicians to consider measurement error (ME) indices Coefficient of Repeatability (CR) or the Smallest Real Difference (SRD) over relative reliability coefficients like the Pearson’s (r) and the Intraclass Correlation Coefficient (ICC) while selecting tools to measure change and inferring change as true.
Abstract: The use of standardised tools is an essential component of evidence-based practice. Reliance on standardised tools places demands on clinicians to understand their properties, strengths, and weaknesses, in order to interpret results and make clinical decisions. This paper makes a case for clinicians to consider measurement error (ME) indices Coefficient of Repeatability (CR) or the Smallest Real Difference (SRD) over relative reliability coefficients like the Pearson’s (r) and the Intraclass Correlation Coefficient (ICC), while selecting tools to measure change and inferring change as true. The authors present statistical methods that are part of the current approach to evaluate test–retest reliability of assessment tools and outcome measurements. Selected examples from a previous test–retest study are used to elucidate the added advantages of knowledge of the ME of an assessment tool in clinical decision making. The CR is computed in the same units as the assessment tool and sets the boundary of the minimal detectable true change that can be measured by the tool.

376 citations


Journal ArticleDOI
TL;DR: The QoR-40 is a suitable measure of postoperative quality of recovery in a range of clinical and research situations and was supported by high patient recruitment into evaluation studies, and an excellent completion and return rate.
Abstract: Background Several rating scales have been developed to measure quality of recovery after surgery and anaesthesia, but the most extensively used is the QoR-40, a 40-item questionnaire that provides a global score and subscores across five dimensions: patient support, comfort, emotions, physical independence, and pain. It has been evaluated in a variety of settings, but its overall psychometric properties (validity, reliability, ease of use, and interpretation) and clinical utility are uncertain. Methods We undertook a quantitative systematic review of studies evaluating psychometric properties of the QoR-40. Data were combined in meta-analyses using random effects models. This resulted in a total sample of 3459 patients from 17 studies originating in nine countries. Results We confirmed content, construct, and convergent [pooled r=0.58, 95% confidence interval (CI): 0.51–0.65] validity. Reliability was confirmed by excellent intraclass correlation (pooled α=0.91, 95% CI: 0.88–0.93), test–retest reliability (pooled r=0.90, 95% CI: 0.86–0.92), and inter-rater reliability (intraclass correlation=0.86). The clinical utility of the QoR-40 instrument was supported by high patient recruitment into evaluation studies (97%), and an excellent completion and return rate (97%). The mean time to complete the QoR-40 was 5.1 (95% CI: 4.4–5.7) min. Conclusions The QoR-40 is a widely used and extensively validated measure of quality of recovery. The QoR-40 is a suitable measure of postoperative quality of recovery in a range of clinical and research situations.

288 citations


Journal ArticleDOI
01 Nov 2013-Thorax
TL;DR: The 5STS is reliable, valid and responsive in patients with COPD with an estimated MCID of 1.7 s and is a practical functional outcome measure suitable for use in most healthcare settings.
Abstract: Background Moving from sitting to standing is a common activity of daily living. The five-repetition sit-to-stand test (5STS) is a test of lower limb function that measures the fastest time taken to stand five times from a chair with arms folded. The 5STS has been validated in healthy community-dwelling adults, but data in chronic obstructive pulmonary disease (COPD) populations are lacking. Aims To determine the reliability, validity and responsiveness of the 5STS in patients with COPD. Methods Test-retest and interobserver reliability of the 5STS was measured in 50 patients with COPD. To address construct validity we collected data on the 5STS, exercise capacity (incremental shuttle walk (ISW)), lower limb strength (quadriceps maximum voluntary contraction (QMVC)), health status (St George9s Respiratory Questionnaire (SGRQ)) and composite mortality indices (Age Dyspnoea Obstruction index (ADO), BODE index (iBODE)). Responsiveness was determined by measuring 5STS before and after outpatient pulmonary rehabilitation (PR) in 239 patients. Minimum clinically important difference (MCID) was estimated using anchor-based methods. Results Test-retest and interobserver intraclass correlation coefficients were 0.97 and 0.99, respectively. 5STS time correlated significantly with ISW, QMVC, SGRQ, ADO and iBODE (r=−0.59, −0.38, 0.35, 0.42 and 0.46, respectively; all p Conclusions The 5STS is reliable, valid and responsive in patients with COPD with an estimated MCID of 1.7 s. It is a practical functional outcome measure suitable for use in most healthcare settings.

261 citations


Journal ArticleDOI
TL;DR: The YBT showed good interrater test-retest reliability with an acceptable level of measurement error among multiple raters screening active duty service members, suggesting impaired balance symmetry and potentially increased risk for injury.
Abstract: The Y-balance test (YBT) is one of the few field expedient tests that have shown predictive validity for injury risk in an athletic population. However, analysis of the YBT in a heterogeneous population of active adults (e.g., military, specific occupations) involving multiple raters with limited experience in a mass screening setting is lacking. The primary purpose of this study was to determine interrater test–retest reliability of the YBT in a military setting using multiple raters. Sixty-four service members (53 males, 11 females) actively conducting military training volunteered to participate. Interrater test–retest reliability of the maximal reach had intraclass correlation coefficients (2,1) of 0.80 to 0.85 with a standard error of measurement ranging from 3.1 to 4.2 cm for the 3 reach directions (anterior, posteromedial, and posterolateral). Interrater test–retest reliability of the average reach of 3 trails had an intraclass correlation coefficients (2,3) range of 0.85 to 0.93 with an as...

246 citations


Journal ArticleDOI
TL;DR: Web-based self-reported weight and height data from the NutriNet-Santé study can be considered as valid enough to be used when studying associations of nutritional factors with anthropometrics and health outcomes.
Abstract: Background: With the growing scientific appeal of e-epidemiology, concerns arise regarding validity and reliability of Web-based self-reported data. Objective: The objectives of the present study were to assess the validity of Web-based self-reported weight, height, and resulting body mass index (BMI) compared with standardized clinical measurements and to evaluate the concordance between Web-based self-reported anthropometrics and face-to-face declarations. Methods: A total of 2513 participants of the NutriNet-Sante study in France completed a Web-based anthropometric questionnaire 3 days before a clinical examination (validation sample) of whom 815 participants also responded to a face-to-face anthropometric interview (concordance sample). Several indicators were computed to compare data: paired t test of the difference, intraclass correlation coefficient (ICC), and Bland–Altman limits of agreement for weight, height, and BMI as continuous variables; and kappa statistics and percent agreement for validity, sensitivity, and specificity of BMI categories (normal, overweight, obese). Results: Compared with clinical data, validity was high with ICC ranging from 0.94 for height to 0.99 for weight. BMI classification was correct in 93% of cases; kappa was 0.89. Of 2513 participants, 23.5% were classified overweight (BMI≥25) with Web-based self-report vs 25.7% with measured data, leading to a sensitivity of 88% and a specificity of 99%. For obesity, 9.1% vs 10.7% were classified obese (BMI≥30), respectively, leading to sensitivity and specificity of 83% and 100%. However, the Web-based self-report exhibited slight underreporting of weight and overreporting of height leading to significant underreporting of BMI ( P <.05) for both men and women: –0.32 kg/m 2 (SD 0.66) and –0.34 kg/m 2 (SD 1.67), respectively. Mean BMI underreporting was –0.16, –0.36, and –0.63 kg/m 2 in the normal, overweight, and obese categories, respectively. Almost perfect agreement (ie, concordance) was observed between Web-based and face-to-face report (ICC ranged from 0.96 to 1.00, classification agreement was 98.5%, and kappa 0.97). Conclusions: Web-based self-reported weight and height data from the NutriNet-Sante study can be considered as valid enough to be used when studying associations of nutritional factors with anthropometrics and health outcomes. Although self-reported anthropometrics are inherently prone to biases, the magnitude of such biases can be considered comparable to face-to-face interview. Web-based self-reported data appear to be an accurate and useful tool to assess anthropometric data. [J Med Internet Res 2013;15(8):e152]

196 citations


Journal ArticleDOI
TL;DR: Training with the current method improved accuracy, and reduced variance, of FMA scoring; the 20% FMA variance reduction with training would decrease sample size requirements from 137 to 88 in a theoretical trial aiming to detect a 7-point FMA difference.
Abstract: Background. Standardizing scoring reduces variability and increases accuracy. A detailed scoring and training method for the Fugl-Meyer motor assessment (FMA) is described and assessed, and implications for clinical trials considered. Methods. A standardized FMA scoring approach and training materials were assembled, including a manual, scoring sheets, and instructional video plus patient videos. Performance of this approach was evaluated for the upper extremity portion. Results. Inter- and intrarater reliability in 31 patients were excellent (intraclass correlation coefficient = 0.98-0.99), validity was excellent (r = 0.74-0.93, P < .0001), and minimal detectable change was low (3.2 points). Training required 1.5 hours and significantly reduced error and variance among 50 students, with arm FMA scores deviating from the answer key by 3.8 ± 6.2 points pretraining versus 0.9 ± 4.9 points posttraining. The current approach was implemented without incident into training for a phase II trial. Among 66 patient...

194 citations


Journal ArticleDOI
TL;DR: The Mini-BESTest is a reliable and valid tool for evaluating balance in people with chronic stroke and its accuracy in categorizing people with stroke based on fall history is found to be accurate.
Abstract: Background The Mini-Balance Evaluation Systems Test (Mini-BESTest) is a new balance assessment, but its psychometric properties have not been specifically tested in individuals with stroke. Objectives The purpose of this study was to examine the reliability and validity of the Mini-BESTest and its accuracy in categorizing people with stroke based on fall history. Design An observational measurement study with a test-retest design was conducted. Methods One hundred six people with chronic stroke were recruited. Intrarater reliability was evaluated by repeating the Mini-BESTest within 10 days by the same rater. The Mini-BESTest was administered by 2 independent raters to establish interrater reliability. Validity was assessed by correlating Mini-BESTest scores with scores of other balance measures (Berg Balance Scale, one-leg-standing, Functional Reach Test, and Timed “Up & Go” Test) in the stroke group and by comparing Mini-BESTest scores between the stroke group and 48 control participants, and between fallers (≥1 falls in the previous 12 months, n=25) and nonfallers (n=81) in the stroke group. Results The Mini-BESTest had excellent internal consistency (Cronbach alpha=.89–.94), intrarater reliability (intraclass correlation coefficient [3,1]=.97), and interrater reliability (intraclass correlation coefficient [2,1]=.96). The minimal detectable change at 95% confidence interval was 3.0 points. The Mini-BESTest was strongly correlated with other balance measures. Significant differences in Mini-BESTest total scores were found between the stroke and control groups and between fallers and nonfallers in the stroke group. In terms of floor and ceiling effects, the Mini-BESTest was significantly less skewed than other balance measures, except for one-leg-standing on the nonparetic side. The Berg Balance Scale showed significantly better ability to identify fallers (positive likelihood ratio=2.6) than the Mini-BESTest (positive likelihood ratio=1.8). Limitations The results are generalizable only to people with mild to moderate chronic stroke. Conclusions The Mini-BESTest is a reliable and valid tool for evaluating balance in people with chronic stroke.

178 citations


Journal ArticleDOI
TL;DR: The present study performs the first independent testing of the official MDS-UPDRS Spanish version and adds information on scale reliability, construct validity, and precision and shows satisfactory clinimetric characteristics.
Abstract: The Movement Disorder Society-UPDRS (MDS-UPDRS) was published in 2008, showing satisfactory clinimetric results and has been proposed as the official benchmark scale for Parkinson’s disease. The present study, based on the official MDS-UPDRS Spanish version, performed the first independent testing of the scale and adds information on its clinimetric properties. The cross-culturally adapted MDS-UPDRS Spanish version showed a comparative fit index ≥0.90 for each part (I–IV) relative to the English-language version and was accepted as the Official MDS-UPDRS Spanish version. Data from this scale, applied with other assessments to Spanish-speaking Parkinson’s disease patients in five countries, were analyzed for an independent and complementary clinimetric evaluation. In total, 435 patients were included. Missing data were negligible and moderate floor effect (30 %) was found for Part IV. Cronbach’s α index ranged between 0.79 and 0.93 and only five items did not reach the 0.30 threshold value of item-total correlation. Test–retest reliability was adequate with only two sub-scores of the item 3.17, Rest tremor amplitude, reaching κ values lower than 0.60. The intraclass correlation coefficient was higher than 0.85 for the total score of each part. Correlation of the MDS-UPDRS parts with other measures for related constructs was high (≥0.60) and the standard error of measurement lower than one-third baseline standard deviation for all subscales. Results confirm those of the original study and add information on scale reliability, construct validity, and precision. The MDS-UPDRS Spanish version shows satisfactory clinimetric characteristics.

163 citations


Journal ArticleDOI
TL;DR: Multisite field trials and training comparable to what would be available to any clinician after publication of DSM-5 provided “real-world” testing of DSM-5 proposed diagnoses.
Abstract: ObjectiveThis article discusses the design, sampling strategy, implementation, and data analytic processes of the DSM-5 Field Trials.MethodThe DSM-5 Field Trials were conducted by using a test-retest reliability design with a stratified sampling approach across six adult and four pediatric sites in the United States and one adult site in Canada. A stratified random sampling approach was used to enhance precision in the estimation of the reliability coefficients. A web-based research electronic data capture system was used for simultaneous data collection from patients and clinicians across sites and for centralized data management. Weighted descriptive analyses, intraclass kappa and intraclass correlation coefficients for stratified samples, and receiver operating curves were computed. The DSM-5 Field Trials capitalized on advances since DSM-III and DSM-IV in statistical measures of reliability (i.e., intraclass kappa for stratified samples) and other recently developed measures to determine confidence in...

159 citations


Journal ArticleDOI
11 Oct 2013-PLOS ONE
TL;DR: Using the standard operating procedure (SOP), quantitative, reliable and reproducible morphometric results can be obtained on duodenal biopsy specimens with different grades of gluten-induced injury.
Abstract: Background Assessment of the gluten-induced small-intestinal mucosal injury remains the cornerstone of celiac disease diagnosis. Usually the injury is evaluated using grouped classifications (e.g. Marsh groups), but this is often too imprecise and ignores minor but significant changes in the mucosa. Consequently, there is a need for validated continuous variables in everyday practice and in academic and pharmacological research. Methods We studied the performance of our standard operating procedure (SOP) on 93 selected biopsy specimens from adult celiac disease patients and non-celiac disease controls. The specimens, which comprised different grades of gluten-induced mucosal injury, were evaluated by morphometric measurements. Specimens with tangential cutting resulting from poorly oriented biopsies were included. Two accredited evaluators performed the measurements in blinded fashion. The intraobserver and interobserver variations for villus height and crypt depth ratio (VH:CrD) and densities of intraepithelial lymphocytes (IELs) were analyzed by the Bland-Altman method and intraclass correlation. Results Unevaluable biopsies according to our SOP were correctly identified. The intraobserver analysis of VH:CrD showed a mean difference of 0.087 with limits of agreement from −0.398 to 0.224; the standard deviation (SD) was 0.159. The mean difference in interobserver analysis was 0.070, limits of agreement −0.516 to 0.375, and SD 0.227. The intraclass correlation coefficient in intraobserver variation was 0.983 and that in interobserver variation 0.978. CD3+ IEL density countings in the paraffin-embedded and frozen biopsies showed SDs of 17.1% and 16.5%; the intraclass correlation coefficients were 0.961 and 0.956, respectively. Conclusions Using our SOP, quantitative, reliable and reproducible morphometric results can be obtained on duodenal biopsy specimens with different grades of gluten-induced injury. Clinically significant changes were defined according to the error margins (2SD) of the analyses in VH:CrD as 0.4 and in CD3+-stained IELs as 30%.

156 citations


Journal ArticleDOI
TL;DR: This study sought to confirm the test–retest reliability and validity of the National Center for Geriatrics and Gerontology functional assessment tool (NCGG‐FAT), a newly developed assessment of multidimensional neurocognitive function using a tablet personal computer (PC).
Abstract: Aim This study sought to confirm the test–retest reliability and validity of the National Center for Geriatrics and Gerontology functional assessment tool (NCGG-FAT), a newly developed assessment of multidimensional neurocognitive function using a tablet personal computer (PC). Methods This study included 20 community-dwelling older adults (9 females, aged 65–81 years). Participants were administered the NCGG-FAT twice, separated by approximately 30 days to determine test–retest reliability. To test the validity of the measure, participants underwent established neurocognitive measurements, including memory, attention, executive function, processing speed and visuospatial function within a week from the first administration of the NCGG-FAT. Results Test–retest reliability was in an acceptable range for each component of the NCGG-FAT, with intraclass correlation coefficients ranging from 0.764 to 0.942. Each task in the NCGG-FAT showed a moderate to high correlation with scores on widely-used conventional neurocognitive tests (r = 0.496 to 0.842). Conclusion We found that the NCGG-FAT using a tablet PC was reliable in a sample of community-dwelling older adults. The NCGG-FAT might be useful for cognitive screening in population-based samples and outcomes, enabling assessment of the effects of intervention on multidimensional cognitive function among older adults. Geriatr Gerontol Int 2013; 13: 860–866.

Journal ArticleDOI
TL;DR: This is the first study providing evidence that the Eyes test is reliable and stable over a 1-year period, in a non-clinical sample of adults, and using the Bland-Altman method.
Abstract: The ‘Reading the Mind in the Eyes’ (Eyes) test is an advanced test of theory of mind. It is widely used to assess individual differences in social cognition and emotion recognition across different groups and cultures. The present study examined distributions of responses and scores on a Spanish version of the test in a non-clinical Spanish adult population, and assessed test-retest reliability over a 1-year interval. A total of 358 undergraduates of both sexes, age 18 to 65 years, completed the Spanish version of the test twice over an interval of 1 year. The Bland-Altman method was used to calculate test-retest reliability. Distributions of responses and scores were optimal. Test-retest reliability for total score on the Eyes test was .63 (P <.01), based on the intraclass correlation coefficient. Test-retest reliability using the Bland-Altman method was fairly good. This is the first study providing evidence that the Eyes test is reliable and stable over a 1-year period, in a non-clinical sample of adults.

Journal ArticleDOI
TL;DR: In this article, the authors developed the Defense and Veterans Pain Rating Scale (DVPRS) to improve interpretability of incremental pain intensity levels, and to improve communication and documentation across all transitions of care.
Abstract: Background. The Army Surgeon General released the Pain Management Task Force final report in May 2010. Among military providers, concerns were raised that the standard numeric rating scale (NRS) for pain was inconsistently administered and of questionable clinical value. In response, the Defense and Veterans Pain Rating Scale (DVPRS) was developed. Methods. The instrument design integrates pain rating scale features to improve interpretability of incremental pain intensity levels, and to improve communication and documentation across all transitions of care. A convenience sample of 350 inpatient and outpatient active duty or retired military service members participated in the study at Walter Reed Army Medical Center. Participants completed the five-item DVPRS—one pain intensity NRS with and without word descriptors presented in random order and four supplemental items measuring general activity, sleep, mood, and level of stress and the Brief Pain Inventory seven interference items. Using systematic sampling, a random sample was selected for a word descriptor validation procedure matching word phases to corresponding pain intensity on the NRS. Results. Parallel forms reliability and concurrent validity testing demonstrated a robust correlation. When the DVPRS was presented with the word descriptors first, the correlation between the two ratings was slightly higher, r = 0.929 (N = 171; P < 0.001), than ordering first without the descriptors, r = 0.882 (N = 177; P < 0.001). Intraclass correlation coefficient was 0.943 showing excellent alignment of word descriptors by respondents (N = 42), matching them correctly with pain level. Conclusions. The DVPRS tool demonstrated acceptable psychometric properties in a military population.

Journal ArticleDOI
TL;DR: This study suggests that the Wii® balance board is a valid tool for the quantification of postural stability among individuals with Parkinson’s.
Abstract: Background:Impaired postural stability places individuals with Parkinson’s at an increased risk for falls. Given the high incidence of fall-related injuries within this population, ongoing assessment of postural stability is important.Objective:To evaluate the validity of the Nintendo Wii® balance board as a measurement tool for the assessment of postural stability in individuals with Parkinson’s.Subjects:Twenty individuals with Parkinson’s participated.Intervention:Subjects completed testing on two balance tasks with eyes open and closed on a Wii® balance board and biomechanical force platform.Main Measures:Bland–Altman plots and a two-way, random-effects, single measure intraclass correlation coefficient model were used to assess concurrent validity of centre-of-pressure data.Results:Concurrent validity was demonstrated to be excellent across balance tasks (intraclass correlation coefficients = 0.96, 0.98, 0.92, 0.94).Conclusions:This study suggests that the Wii® balance board is a valid tool for the qu...

Journal ArticleDOI
TL;DR: The COD IAGT seems to be a reliable and valid test, whose performance is significantly related to speed rather than to acceleration and leg power.
Abstract: The purposes of this study were first to assess the reliability and criterion-related validity of the Illinois change of direction (COD) Illinois Agility Test (IAGT) and second to determine whether a relationship with power and speed exists. A total of 105 male team sport athletes participated in this investigation. Repeat measurements in 89 subjects out of the 105 were performed to assess the test-retest reliability and the 95% confidence interval (CI) of the difference in the score between paired observations (minimal detectable change [MDC]95) of the COD IAGT. The intraclass correlation coefficient and the SEM values for the COD IAGT test were 0.96 (95% CI, 0.85-0.98) and 0.19 seconds, respectively. The smallest worthwhile change (0.20 seconds) for the IAGT was greater than its SEM (0.19 seconds). The MDC95 value for the IAGT was 0.52 seconds. Criterion-related validity of the COD IAGT was assessed in the 105 subjects. They performed the COD IAGT and the T-test. Both tests were significantly correlated (r = 0.31 [95% CI, 0.24-0.39]; p < 0.05). The correlation between COD IAGT, acceleration, straight speed, and leg power was analyzed in all the 105 subjects. Pearson moment correlation revealed no association between acceleration and the COD IAGT. However, significant correlations were observed between the COD IAGT and leg power (r = -0.39 [95% CI, -0.26 to -0.44]; p < 0.05), and speed (r = 0.42 [95% CI, 0.37-0.51]; p < 0.05). When controlling for speed with partial correlation, the significant relationship between the COD IAGT and leg power disappeared. In conclusion, the COD IAGT seems to be a reliable and valid test, whose performance is significantly related to speed rather than to acceleration and leg power.

Journal ArticleDOI
TL;DR: The iOC/OrthoCAD system can be used to measure tooth widths and calculate Bolton ratios with clinically acceptable accuracy and excellent reliability and reproducibility and appears to be a sound orthodontic aid.

Journal ArticleDOI
TL;DR: The EQ- 5D-5L has a smaller ceiling effect than the EQ-5D-3L and is a valid and reliable instrument to measure health-related quality of life in the general population.
Abstract: The EQ-5D-5L was developed to compensate for a high ceiling effect and lack of descriptive richness of the EQ-5D-3L. We evaluated psychometric properties of EQ-5D-5L in the general population. Six hundred of adults were sampled from the general population in South Korea using a multistage stratified quota sampling method. Participants completed the EQ-5D-5L, EQ-5D-3L, and SF-36v2. One hundred participants were resurveyed for reliability evaluation. The ceiling effect, known-groups construct validity, convergent and discriminant validity, and reliability of EQ-5D-5L were evaluated. A smaller proportion of participants answered ‘no problem’ to all dimensions of EQ-5D-5L (61.2 %) than EQ-5D-3L (65.7 %, p < 0.01), indicating a reduced ceiling effect. Female, elderly, low-educated, and low-income participants reported health problems more frequently, indicating known-groups construct validity. The mobility dimension of EQ-5D-5L was better correlated with the physical component score (|r| = 0.48) than the mental component score (|r| = 0.25) of the SF-36v2, and the anxiety/depression dimension was better correlated with mental component score (|r| = 0.45) than physical component score (|r| = 0.34), indicating convergent and discriminant validity. The intraclass correlation coefficient of EQ-5D-5L index was 0.75. The EQ-5D-5L has a smaller ceiling effect than the EQ-5D-3L and is a valid and reliable instrument to measure health-related quality of life in the general population.

Journal ArticleDOI
TL;DR: The physical performance tests evaluated are useful for detecting differences in performance between older people with mild to moderate dementia and, therefore, are suitable for cross-sectional or controlled intervention studies, but appear less suitable to monitor clinically relevant intra-individual performance changes.
Abstract: Background Physical performance tests are important for assessing the effect of physical activity interventions in older people with dementia, but their psychometric properties have not been systematically established within this specific population. Objective The purpose of this study was to determine the relative and absolute test-retest reliability of the 6-m walk test, the Figure-of-Eight Walk Test (F8W), the Timed “Up & Go” Test (TUG), the Frailty and Injuries: Cooperative Studies of Intervention Techniques–4 (FICSIT–4) Balance Test, the Chair Rise Test (CRT), and the Jamar dynamometer. These tests are used to assess gait speed, dynamic balance, functional mobility, static balance, lower-limb strength, and grip strength, respectively. Design This investigation was a prospective, nonexperimental study. Methods Older people with dementia (n=58, age range=70–92 years) performed each test at baseline and again after 1 week. Intraclass correlation coefficients (ICC), standard error of measurement (SEM), minimal detectable change (MDC), and log-transferred limits of agreement of Bland-Altman plots were calculated. Results The relative reliability of the F8W, TUG, and Jamar dynamometer was excellent (ICC=.90–.95) and good for the 6-m walk test, FICSIT–4, and CRT (ICC=.79–.86). The SEMs and MDCs were large for all tests. The absolute reliability of the TUG and CRT was significantly influenced by the level of cognitive functioning (as assessed with the Mini-Mental State Examination [MMSE]). Limitations The specific etiology of dementia was not obtained. Conclusions The physical performance tests evaluated are useful for detecting differences in performance between older people with mild to moderate dementia and, therefore, are suitable for cross-sectional or controlled intervention studies. They appear less suitable to monitor clinically relevant intra-individual performance changes. Future studies should focus on the development of more sensitive tests and the identification of criteria for clinically relevant changes in this rapidly growing population.

Journal ArticleDOI
TL;DR: The reproducibility of individual MRI features overall is fair to good, with good reproducedcibility for the most commonly used features, when combined into the MR index of activity and CDMI score, overall reproducible is good.
Abstract: OBJECTIVE. The purpose of this article is to assess the interobserver variability for scoring MRI features of Crohn disease activity and to correlate two MRI scoring systems to the Crohn disease endoscopic index of severity (CDEIS). MATERIALS AND METHODS. Thirty-three consecutive patients with Crohn disease undergoing 3-T MRI examinations (T1-weighted with IV contrast medium administration and T2-weighted sequences) and ileocolonoscopy within 1 month were independently evaluated by four readers. Seventeen MRI features were recorded in 143 bowel segments and were used to calculate the MR index of activity and the Crohn disease MRI index (CDMI) score. Multirater analysis was performed for all features and scoring systems using intraclass correlation coefficient (icc) and kappa statistic. Scoring systems were compared with ileocolonoscopy with CDEIS using Spearman rank correlation. RESULTS. Thirty patients (median age, 32 years; 21 women and nine men) were included. MRI features showed fair-to-good interobse...

Journal ArticleDOI
TL;DR: New norm scores for the Box and Block Test for gross manual dexterity in children ages 3-10 yr are provided, finding an age effect for the scores; older children obtained higher scores than younger children.
Abstract: This study provides new norm scores for the Box and Block Test for gross manual dexterity in children ages 3-10 yr. Two hundred fifteen Dutch children performed the Box and Block Test separately with each hand. We found an age effect for the scores; older children obtained higher scores than younger children. Concurrent validity was assessed by means of comparison with the manual dexterity subtests of the Movement Assessment Battery for Children-2; correlations were significant. Intraclass correlation coefficients for test-retest and interrater reliability measures were .85 and .99, respectively. The Box and Block Test is an easy, feasible, valid, and reliable measurement for gross manual dexterity in young children. The obtained norms can be used in clinical settings to compare the gross manual dexterity of atypically developing children with that of age-related peers and to evaluate efficacy of interventions. A larger international reference population is needed to increase generalizability.

Journal ArticleDOI
TL;DR: Results suggest that repeated exposure to the ImPACT test may result in significant improvements in the physical mechanics of how college students interact with the test, but repeated exposure across 1 month does not result in practice effects in memory performance or reaction time.

Journal ArticleDOI
TL;DR: The AM-ULA is a new measure of activity performance for adults with upper limb amputation that considers task completion, speed, movement quality, skillfulness of prosthetic use, and independence in its rating system and has good interrater reliability, test-retest reliability, and demonstrated known group validity.

Journal ArticleDOI
TL;DR: The short-term reliability of anthropometry and physical performance measures in highly-trained young soccer players is unlikely to be affected by age or maturation, but some of these measures are unstable throughout adolescence, which questions their usefulness in a talent identification perspective.
Abstract: The purpose of this study was to assess both short-term reliability and long-term stability of anthropometric and physical performance measures in highly-trained young soccer players in relation to age and maturation. Data were collected on 80 players from an academy (U13–U18, pre- (n = 14), circum- (n = 32) and post- (n = 34) estimated peak height velocity, PHV). For the reliability analysis, anthropometric and performance tests were repeated twice within a month. For the stability analysis, these tests were repeated 12 times over a 4-year period in 10 players. Absolute reliability was assessed with the typical error of measurement, expressed as a coefficient of variation (CV). Relative reliability and long-term stability were assessed using the intraclass correlation coefficient (ICC). There was no clear age or maturation effect on either the CVs or ICCs: e.g., Post-PHV vs. Pre-PHV: effect size = –0.37 (90% confidence limits (CL):-1.6;0.9), with chances of greater/similar/lower values of 20/20/...

Journal ArticleDOI
TL;DR: The I2C2 generalizes the classic intraclass correlation (ICC) coefficient to the case when the data of interest are images, thereby providing a measure that is both intuitive and convenient in high-dimensional imaging studies.
Abstract: This article proposes the image intraclass correlation (I2C2) coefficient as a global measure of reliability for imaging studies. The I2C2 generalizes the classic intraclass correlation (ICC) coefficient to the case when the data of interest are images, thereby providing a measure that is both intuitive and convenient. Drawing a connection with classical measurement error models for replication experiments, the I2C2 can be computed quickly, even in high-dimensional imaging studies. A nonparametric bootstrap procedure is introduced to quantify the variability of the I2C2 estimator. Furthermore, a Monte Carlo permutation is utilized to test reproducibility versus a zero I2C2, representing complete lack of reproducibility. Methodologies are applied to three replication studies arising from different brain imaging modalities and settings: regional analysis of volumes in normalized space imaging for characterizing brain morphology, seed-voxel brain activation maps based on resting-state functional magnetic resonance imaging (fMRI), and fractional anisotropy in an area surrounding the corpus callosum via diffusion tensor imaging. Notably, resting-state fMRI brain activation maps are found to have low reliability, ranging from .2 to .4. Software and data are available to provide easy access to the proposed methods.

Journal ArticleDOI
TL;DR: The findings from this study suggest that the Iranian version of SQOL-F questionnaire has good psychometric properties and it will be useful to assess the female sexual quality of life in reproductive health care settings.
Abstract: Female sexual dysfunction is a common condition that extremely affects reproductive health and quality of life. To assess this health condition, a valid and reliable questionnaire is required. The aim of this study was to translate and validate the Sexual Quality of Life-Female (SQOL-F) questionnaire in Iran. Forward-backward procedure was applied to translate the questionnaire from English into Persian. After linguistic validation and pilot examination, a cross-sectional study was carried out and psychometric properties of the Iranian version of questionnaire were tested. One hundred reproductive aged, married, healthy and sexually active women completed the questionnaire. Reliability was assessed by internal consistency (Cronbach’s alpha), and test-retest (intraclass correlation coefficient) analyses. In addition, content, and face validity were assessed and the factor structure of the questionnaire was extracted by performing exploratory factor analysis. The mean age of participants was 33 (SD = 8.07) years, and the mean quality of sexual life score was 86.4 (SD = 1.78) ranging from 36 to 108. Most women were housewife (n = 92). Reliability evaluation revealed high internal consistency and good test-retest reliability. The Cronbach’s alpha coefficient was 0.73 and intraclass correlation coefficient (ICC) was 0.88. The mean scores for the content validity index (CVI) and the content validity ratio (CVR) were 0.91 and 0.84, respectively. The results of exploratory factor analysis (EFA) indicated a four-factor solution for the questionnaire that jointly accounted for 60.8% of variance observed. The findings from this study suggest that the Iranian version of SQOL-F questionnaire has good psychometric properties and it will be useful to assess the female sexual quality of life in reproductive health care settings.

Journal ArticleDOI
TL;DR: The FAOS for AAFD has been validated with acceptable construct and content validity, reliability, and responsiveness, and the additional findings in this study support its use as an alternative to less reliable outcome surveys.
Abstract: Introduction:The American Orthopaedic Foot and Ankle Society (AOFAS) Ankle-Hindfoot Score has been under recent scrutiny. The Foot and Ankle Outcome Score (FAOS) is an alternative subjective survey, assessing outcomes in 5 subscales. It is validated for lateral ankle instability and hallux valgus patients. The aim of our study was to validate the FAOS for assessing outcomes in flexible adult acquired flatfoot deformity (AAFD).Methods:Patients from the authors’ institution diagnosed with flexible AAFD from 2006 to 2011 were eligible for the study. In all, 126 patients who completed the FAOS and the Short-Form 12 (SF-12) on the same visit were included in the construct validity component. Correlation was deemed moderate if the Spearman’s correlation coefficient was .4 to .7. Content validity was assessed in 63 patients by a questionnaire that asked patients to rate the relevance of each FAOS question, with a score of 2 or greater considered acceptable. Reliability was measured using intraclass correlation c...

Journal ArticleDOI
TL;DR: This study demonstrates high clinical feasibility and reproducibility of RV pGLS and RV peak global longitudinal strain rate measurements by 2D speckle-tracking echocardiography in premature infants and offers methods for image acquisition and data analysis for systolic strain imaging that can provide a reliable assessment of global RV function.
Abstract: Background Right ventricular (RV) systolic function is an important prognostic determinant of cardiopulmonary pathologies in premature infants. Measurements of dominant RV longitudinal deformation are likely to provide a sensitive measure of RV function. An approach for image acquisition and postacquisition processing is needed for reliable and reproducible measurements of myocardial deformation by two-dimensional (2D) speckle-tracking echocardiography. The aims of this study were to determine the feasibility and reproducibility of 2D speckle-tracking echocardiographic measurement of RV peak global longitudinal strain (pGLS) and peak global longitudinal strain rate in premature infants and to establish methods for acquiring and analyzing strain. Methods The study was designed in two phases: (1) a training phase to develop methods of image acquisition and postprocessing in a cohort of 30 premature infants (born at 28 ± 1 weeks) and (2) a study phase to prospectively test in a separate cohort of 50 premature infants (born at 27 ± 1 weeks) if the methods improved the feasibility and reproducibility of RV pGLS and peak global longitudinal strain rate measurements to a clinically significant level, assessed using Bland-Altman analysis (bias, limits of agreement, coefficient of variation, and intraclass correlation coefficient). Results Strain imaging was feasible from 84% of the acquisitions using the methods developed for optimal speckle brightness and frame rate for RV-focused image acquisition. There was high intraobserver (bias, 3%; 95% limits of agreement, −1.6 to +1.6; coefficient of variation, 2.7%; intraclass correlation coefficient, 0.97; P = .02) and interobserver (bias, 7%; 95% limits of agreement, −4.8 to +4.73; coefficient of variation, 3.9%; intraclass correlation coefficient, 0.93; P r = 0.97 [ P r = 0.93 [ P Conclusions This study demonstrates high clinical feasibility and reproducibility of RV pGLS and RV peak global longitudinal strain rate measurements by 2D speckle-tracking echocardiography in premature infants and offers methods for image acquisition and data analysis for systolic strain imaging that can provide a reliable assessment of global RV function.

Journal ArticleDOI
TL;DR: The TUG test is reliable in patients with advanced COPD, CHF, or CRF after 2 trials and values of standard error of measurement and MDC may be used in daily clinical practice with these populations to define what is expected and what represents true change in repeated measures.

Journal ArticleDOI
30 Sep 2013-PLOS ONE
TL;DR: The validity and reliability of the Japanese version of the painDETECT questionnaire (PDQ-J) is demonstrated and researchers and clinicians are encouraged to use this tool for the assessment of patients who suffer suspected neuropathic pain.
Abstract: Objectives The aim of this study was to evaluate the validity and reliability of the Japanese version of the painDETECT questionnaire (PDQ-J). Materials and Methods The translation of the original PDQ into Japanese was achieved according to the published guidelines. Subsequently, a multicenter observational study was performed to evaluate the validity and reliability of PDQ-J, including 113 Japanese patients suffering from pain. Results Factor analysis revealed that the main component of PDQ-J comprises two determinative factors, which account for 62% of the variance observed. Moreover, PDQ-J revealed statistically significant correlation with the intensity of pain (Numerical Rating Scale), Physical Component Score, and Mental Component Score of the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36). The Cronbach alpha for the total score was 0.78 and for the main component was 0.80. In the analysis of test–retest method, the intraclass correlation coefficient between the two scores was 0.94. Conclusions We demonstrated the validity and reliability of PDQ-J. We encourage researchers and clinicians to use this tool for the assessment of patients who suffer suspected neuropathic pain.

Journal ArticleDOI
TL;DR: The Wii Fit balance board is a balance assessment tool with good reliability and high validity for elderly people, and it is recommended as an alternative tool for assessing balance ability.
Abstract: [Purpose] Balance is an integral part of human ability. The smart balance master system (SBM) is a balance test instrument with good reliability and validity, but it is expensive. Therefore, we modified a Wii Fit balance board, which is a convenient balance assessment tool, and analyzed its reliability and validity. [Subjects and Methods] We recruited 20 healthy young adults and 20 elderly people, and administered 3 balance tests. The correlation coefficient and intraclass correlation of both instruments were analyzed. [Results] There were no statistically significant differences in the 3 tests between the Wii Fit balance board and the SBM. The Wii Fit balance board had a good intraclass correlation (0.86–0.99) for the elderly people and positive correlations (r = 0.58–0.86) with the SBM. [Conclusions] The Wii Fit balance board is a balance assessment tool with good reliability and high validity for elderly people, and we recommend it as an alternative tool for assessing balance ability.