scispace - formally typeset
Search or ask a question

Showing papers on "Intra-rater reliability published in 2020"


Journal ArticleDOI
TL;DR: To determine the intra‐ and interobserver reliability of ultrasound (US)‐detected age‐related joint vascularization and ossification grading in healthy children, an apples-to- apples comparison study is conducted.
Abstract: Objective To determine the intra- and inter-observer reliability of ultrasound (US)-detected age-related joint vascularization and ossification grading in healthy children. Methods Following standardized image acquisition and machine setting protocols, 10 international US experts examined four joints (wrist, second metacarpophalangeal joint, knee, and ankle) in 12 healthy children (divided into four age groups: 2–4, 5–8, 9–12, and 13–16 years). Grey-scale was used to detect the ossification grade, and power Doppler (PD) US was used to detect physiological vascularization. Ossification was graded from grade 0 (no ossification) to grade 3 (complete ossification). A positive PD signal was defined as any PD signal inside the joint. Kappa statistics were applied for intra- and inter-observer reliability. Results According to the specific joint and age, up to four solitary PD signals (mean, 1.5) were detected within each joint area with predominant localization of the physiological vascularization in specific anatomic positions: fat pad, epiphysis, physis, and short bone cartilage. The kappa values for ossification grading were 0.87 (range, 0.85–0.91) and 0.58 for intra- and inter-observer reliability, respectively. The bias-adjusted kappa values for intra- and inter-observer reliability were 0.71 (range, 0.44–1.00) and 0.69, respectively. Conclusion Detection of normal findings (i.e., grading of physiological ossification during skeletal maturation and identification of physiological vessels) can be highly reliable by using clear definitions and a standardized acquisition protocol. These data will permit development of a reliable and standardized US approach for evaluating paediatric joint pathologies. This article is protected by copyright. All rights reserved.

43 citations


Journal ArticleDOI
TL;DR: The m30STS is a reliable, feasible tool for use in a general geriatric population with a lower level of function and demonstrated concurrent validity with the Berg Balance Scale and modified Barthel Index but not with knee extensor strength to body weight ratio.
Abstract: Background and purpose Sit-to-stand tests measure a clinically relevant function and are widely used in older adult populations. The modified 30-second sit-to-stand test (m30STS) overcomes the floor effect of other sit-to-stand tests observed in physically challenged older adults. The purpose of this study was to examine interrater and test-retest intrarater reliability for the m30STS for older adults. In addition, convergent validity of the m30STS, as well as responsiveness to change, was examined in older adults undergoing rehabilitation. Methods In phase I, 7 older adult participants were filmed performing the m30STS. The m30STS was standardized to allow hand support during the rise to and descent from standing but required participants to let go of the armrests with each stand. Ten physical therapists and physical therapist assistants independently scored the filmed m30STS twice, with 21 days separating the scoring sessions. In phase II, 33 older adults with comorbidities admitted to physical therapy services at a skilled nursing facility were administered the m30STS, Berg Balance Scale, handheld dynamometry of knee extensors, and the modified Barthel Index at initial examination and discharge. Results In phase I, the m30STS was found to be reliable. Interrater reliability using absolute agreement was calculated as intraclass correlation coefficient (ICC)2,1 = 0.737 (P ≤ .001). Test-retest intrarater reliability using absolute agreement was calculated as ICC2,k = 0.987 (P ≤ .001). In phase II, concurrent validity was established for the m30STS for the initial (Spearman ρ = 0.737, P = .01) and discharge (Spearman ρ = 0.727, P = .01) Berg Balance Scale as well as total scores of the modified Barthel Index (initial total score Spearman ρ = 0.711, P = .01; discharge total score Spearman ρ = 0.824, P = .01). The initial m30STS predicted 31.5% of the variability in the discharge Berg Balance Scale. The m30STS did not demonstrate significant correlation with body weight-adjusted strength measures of knee extensors measured by handheld dynamometry. The minimal detectable change (MDC90) was calculated to be 0.70, meaning that an increase of 1 additional repetition in the m30STS is a change beyond error. Conclusion The m30STS is a reliable, feasible tool for use in a general geriatric population with a lower level of function. The m30STS demonstrated concurrent validity with the Berg Balance Scale and modified Barthel Index but not with knee extensor strength to body weight ratio. One repetition of the m30STS was established as the MDC90 as change beyond error.

29 citations


Journal ArticleDOI
TL;DR: The OCRA checklist method, included as a reference method in the ISO and CEN standards regarding upper limb repetitive risk assessment, was in this study investigated regarding its reliability.

27 citations


Journal ArticleDOI
TL;DR: Skin tears are acute wounds that are frequently misdiagnosed and under‐reported and a standardized and globally adopted skin tear classification system with supporting evidence for diagnostic validity and reliability is required.
Abstract: Background Skin tears are acute wounds that are frequently misdiagnosed and under-reported. A standardized and globally adopted skin tear classification system with supporting evidence for diagnostic validity and reliability is required to allow assessment and reporting in a consistent way. Objectives To measure the validity and reliability of the International Skin Tear Advisory Panel (ISTAP) Classification System internationally. Methods A multicountry study was set up to validate the content of the ISTAP Classification System through expert consultation in a two-round Delphi procedure involving 17 experts from 11 countries. An online survey including 24 skin tear photographs was conducted in a convenience sample of 1601 healthcare professionals from 44 countries to measure diagnostic accuracy, agreement, inter-rater reliability and intrarater reliability of the instrument. Results A definition for the concept of a 'skin flap' in the area of skin tears was developed and added to the initial ISTAP Classification System consisting of three skin tear types. The overall agreement with the reference standard was 0·79 [95% confidence interval (CI) 0·79-0·80] and sensitivity ranged from 0·74 (95% CI 0·73-0·75) to 0·88 (95% CI 0·87-0·88). The inter-rater reliability was 0·57 (95% CI 0·57-0·57). The Cohen's Kappa measuring intrarater reliability was 0·74 (95% CI 0·73-0·75). Conclusions The ISTAP Classification System is supported by evidence for validity and reliability. The ISTAP Classification System should be used for systematic assessment and reporting of skin tears in clinical practice and research globally. What's already known about this topic? Skin tears are common acute wounds that are misdiagnosed and under-reported too often. A skin tear classification system is needed to standardize documentation and description for clinical practice, audit and research. What does this study add? The International Skin Tear Advisory Panel Classification System was psychometrically tested in 1601 healthcare professionals from 44 countries. Diagnostic accuracy was high when differentiating between type 1, 2 and 3 skin tears using a set of validated photographs.

26 citations


Journal ArticleDOI
TL;DR: The Spanish FMA-LE can be recommended for evaluation of motor impairment in stroke and wider use would allow worldwide comparisons of stroke recovery.
Abstract: Background The Fugl-Meyer Assessment of Lower Extremity (FMA-LE) is a widely used and recommended scale for evaluation of post-stroke motor impairment. However, the reliability of the scale has only been established by using parametric statistical methods, which ignores the ordinal properties of the scale. Objective To determined intra- and inter-rater reliability of the FMA-LE at item and summed score level early after stroke. Methods Sixty patients (mean age 65.9 years, median FMA-LE 29 points) admitted to the hospital due to stroke were included. The FMA-LE was simultaneously, but independently, scored by three experienced and trained physical therapists randomly assigned into pairs, on two consecutive days, between 4 to 9 days post stroke. A rank-based statistical method for paired ordinal data was used to assess the level of agreement and systematic and random disagreements. Results The item-level reliability was high (percentage of agreement [PA] ≥75%). Two items (ankle dorsiflexion during flexor synergy and normal reflex activity) showed some systematic disagreement in intrarater analysis. A satisfactory intrarater reliability (PA ≥70%) was reached for all summed scores when a 1- or 2-point difference was accepted between ratings. Conclusion The FMA-LE is a reliable tool for assessment of motor impairment both within and between raters early after stroke. The scale can be recommended not only for use in Spanish speaking countries, but also internationally. A unified international use of FMA-LE would allow comparison of stroke recovery outcomes worldwide and thereby potentially improve the quality of stroke rehabilitation.

24 citations


Journal ArticleDOI
TL;DR: SANE, as a feasible and markerless system, has large potential for assessing spatiotemporal gait parameters and showed from an acceptable to an excellent test-retest, inter-rater and intra- rater reliability.
Abstract: Studies have demonstrated the validity of Kinect-based systems to measure spatiotemporal parameters of gait. However, few studies have addressed test-retest, inter-rater and intra-rater reliability for spatiotemporal gait parameters. This study aims to assess test-retest, inter-rater and intra-rater reliability of SANE (eaSy gAit aNalysis system) as a measuring instrument for spatiotemporal gait parameters. SANE comprises a depth sensor and a software that automatically estimates spatiotemporal gait parameters using distances between ankles without the need to manually indicate where each gait cycle begins and ends. Gait analysis was conducted by 2 evaluators for 12 healthy subjects during 4 sessions. The reliability was evaluated using Intraclass Correlation Coefficients (ICC). In addition, the Standard Error of the Measurement (SEM), and Smallest Detectable Change (SDC) was calculated. SANE showed from an acceptable to an excellent test-retest, inter-rater and intra-rater reliability; test-retest reliability ranged from 0.62 to 0.81, inter-rater reliability ranged from 0.70 to 0.95 and intra-rater ranged from 0.74 to 0.92. The subject behavior had a greater effect on the reliability of SANE than the evaluator performance. The reliability values of SANE were comparable with other similar studies. SANE, as a feasible and markerless system, has large potential for assessing spatiotemporal gait parameters.

21 citations


Journal ArticleDOI
TL;DR: Hidradenitis suppurativa is a chronic, inflammatory skin disease with a large impact on patients’ health‐related quality of life and reliable and consistent outcome measures to assess body surface area (BSA) have not been established.
Abstract: Background Hidradenitis suppurativa (HS) is a chronic, inflammatory skin disease with a large impact on patients' health-related quality of life. However, reliable and consistent outcome measures to assess body surface area (BSA) of HS have not been established. Objectives To develop and assess the reliability and validity of a novel outcome instrument for assessment of HS BSA in a clinical trial setting. Methods Qualitative interviews and focus groups were conducted from July to August 2015 and October 2017 to January 2018. Evaluation of the measurement was assessed during a single-day grading session with patients in April 2018. Participants, who included clinicians or patients, were recruited from academic medical centres in the U.S. mid-Atlantic region. Results Concept elicitation included input from 10 providers, of which 60% (n = 6) were female, 80% (n = 8) dermatology specialists and 20% (n = 2) gynaecology specialists. Cognitive debriefing was conducted with 11 providers, of which 82% (n = 9) were dermatologists and 18% (n = 2) gynaecologists. The evaluation stage included 10 clinicians and 23 patients. The intraclass correlation coefficient (ICC) for inter-rater reliability was 0·60 [95% confidence interval (CI) 0·44-0·74]. The ICC for intrarater reliability was 0·98 (95% CI 0·94-1·00). Transformation of the BSA score resulted in an increase in inter-rater reliability to 0·75 (95% CI 0·62-0·85) or 0·76 (95% CI 0·62-0·85). Scores all demonstrated concurrent validity, with statistically significant correlations with extant scoring methods. Conclusions This novel scale is a reliable and valid HS outcome instrument and may capture a wide range of patients by assessing BSA. Future research is necessary to demonstrate its responsiveness. What's already known about this topic? The major HS disease activity scales rely on lesions counts and have moderate-to-good reliability. Body surface area (BSA) is one of the physical signs included in the Core Outcome Set for HS, but is not a part of existing HS disease activity scales. What does this study add? A novel disease severity scale, the Severity and Area Score for Hidradenitis (SASH), was developed and the psychometric properties assessed. There was high inter-rater reliability of 0·75 and 0·76 when BSA was scored on an ordinal scale, and an excellent intrarater reliability of 0·98. The SASH score also demonstrated convergent validity with extant instruments. What are the clinical implications of this work? The ability of clinicians to accurately assess disease status will be improved. Implementation of the SASH score will help guide and assess the effectiveness of appropriate treatment choice.

21 citations


Journal ArticleDOI
TL;DR: The diagnosis of vasospasm using CTA alone was not sufficiently repeatable among observers to support its general use to guide decisions in the clinical management of patients with SAH, and interrater reliability was found to be moderate at best.
Abstract: BACKGROUND AND PURPOSE: Computed tomography angiography offers a non-invasive alternative to DSA for the assessment of cerebral vasospasm following subarachnoid hemorrhage but there is limited evidence regarding its reliability. Our aim was to perform a systematic review (Part I) and to assess (Part II) the inter- and intraobserver reliability of CTA in the diagnosis of cerebral vasospasm. MATERIALS AND METHODS: In Part I, articles reporting the reliability of CTA up to May 2018 were systematically searched and evaluated. In Part II, 11 raters independently graded 17 arterial segments in each of 50 patients with SAH for the presence of vasospasm using a 4-category scale. Raters were additionally asked to judge the presence of any moderate/severe vasospasm (≥ 50% narrowing) and whether findings would justify augmentation of medical treatment or conventional angiography ± balloon angioplasty. Four raters took part in the intraobserver reliability study. RESULTS: In Part I, the systematic review revealed few studies with heterogeneous vasospasm definitions. In Part II, we found interrater reliability to be moderate at best (κ ≤ 0.6), even when results were stratified according to specialty and experience. Intrarater reliability was substantial (κ > 0.6) in 3/4 readers. In the per arterial segment analysis, substantial agreement was reached only for the middle cerebral arteries, and only when senior raters’ judgments were dichotomized (presence or absence of ≥50% narrowing). Agreement on the medical or angiographic management of vasospasm based on CTA alone was less than substantial (κ ≤ 0.6). CONCLUSIONS: The diagnosis of vasospasm using CTA alone was not sufficiently repeatable among observers to support its general use to guide decisions in the clinical management of patients with SAH.

20 citations


Journal ArticleDOI
TL;DR: The results of this study indicate the polynomial method, with subject-specific anatomy correction, can measure spinal alignment in a valid and reliable way using motion capture in both healthy and deformed spines.

17 citations


Journal ArticleDOI
TL;DR: Four- and 10-site 10 g monofilament testing have similarly acceptable levels of reliability and the neurothesiometer is the most reliable method of assessing vibration perception function.
Abstract: Testing of protective sensation and vibration perception are two of the most commonly used non-invasive methods of screening for diabetes-related peripheral neuropathy (DPN). However, there is limited research investigating the reliability of these tests in people with diabetes. The aim of this study was to determine the inter- and intra-rater reliability of methods used to test vibration perception and protective sensation in a community-based population of adults with type 2 diabetes. Three podiatrists with varying clinical experience tested four- and 10-site, 10 g monofilament and vibration perception threshold (VPT). In a separate cohort, the reliability of a graduated tuning fork as well as two methods of conventional tuning fork (on/off method and dampening method) was undertaken by a new graduate podiatrist and podiatrist with one-year’s clinical experience. The intra- (Cohen’s К) and inter-rater (Cohen’s or Fleiss’ К) reliability of each test was determined. Fifty participants (66% male, 100% type 2, 32% with DPN) underwent monofilament and neurothesiometer testing with 44 returning for the retest. Twenty-four participants (63% male, 100% type 2, 4% with DPN) underwent tuning fork testing and returned for retest. All tests demonstrated acceptable inter-rater reliability ranging from moderate (10-site monofilament, К: 0.54, CI: 0.38–0.70, p = 0.02) to substantial (graduated tuning fork, К: 0.68, CI: 0.41–0.95, p < 0.01). The 10-site monofilament (К: 0.44–0.77) outperformed the 4-site test (К: 0.34–0.67) and the dampened tuning fork method (К: 0.41–0.49) showed lower intra-rater reliability compared to both conventional (К: 0.52–0.57) and graduated methods (К: 0.50–0.57). We support the current recommendations of using more than one test to screen and monitor progression of DPN. Four- and 10-site 10 g monofilament testing have similarly acceptable levels of reliability and the neurothesiometer is the most reliable method of assessing vibration perception function. Use of a graduated tuning fork was slightly more reliable than other methods of tuning fork application however all had substantial reliability. Years of clinical experience only marginally affected test reliability overall and due to subjective nature of the tests we suggest that testing should be performed regularly and repetitively.

14 citations


Journal ArticleDOI
TL;DR: The development of a codebook as a supplement to the assessment tool Observation Scheme-12 enables an objective rating of audiotaped clinical communication with acceptable reliability.
Abstract: The aim of the study was to confirm the validity and reliability of the Observation Scheme-12, a measurement tool for rating clinical communication skills. The study is a sub-study of an intervention study using audio recordings to assess the outcome of communication skills training. This paper describes the methods used to validate the assessment tool Observation Scheme-12 by operationalizing the crude 5-point scale into specific elements described in a codebook. Reliability was tested by calculating the intraclass correlation coefficients for interrater and intrarater reliability. The validation of the Observation Scheme-12 produced a rating tool with 12 items. Each item has 0 to 5 described micro-skills. For each item, the codebook described the criteria for delivering a rating from 0 to 4 depending on how successful the different micro-skills (or number of used jargon words) was accomplished. Testing reliability for the overall score intraclass correlation coefficients was 0.74 for interrater reliability and 0.86 for intrarater reliability. An intraclass correlation coefficient greater than 0.5 was observed for 10 of 12 items. The development of a codebook as a supplement to the assessment tool Observation Scheme-12 enables an objective rating of audiotaped clinical communication with acceptable reliability. The Observation Scheme-12 can be used to assess communication skills based on the Calgary-Cambridge Guide.

Journal ArticleDOI
TL;DR: Gyko has good inter- and intrarater reliability and excellent concurrent validity compared to the optical motion system for lumbar range of motion.
Abstract: Background The aim of this study was to test the inter- and intrarater reliability and the concurrent validity of the Gyko Microgate for the assessment of lumbar range of motion. Methods A cross-sectional study was carried out with two groups of healthy participants. The first group, consisting of 91 subjects, was tested to determine the inter- and intrarater reliability. Concurrent validity was assessed with comparisons with an optical motion system (Vicon) in a second group of 20 subjects. Lumbar range of motion in flexion, extension, left and right lateral flexion were performed. Intraclass correlation coefficient (ICC) was calculated for both analyses. Measurement error was calculated with standard error of the measurement (SEM), smallest detectable change (SDC) and Limits of Agreement (LoA). ICCs were considered good when ICC ≥0.80 and excellent with ICC ≥0.90. Results Interrater reliability was good to excellent with ICCs ranging from 0.82 to 0.94. Intrarater reliability was good to excellent with ICCs ranging from 0.84 to 0.95. Concurrent validity was excellent with ICCs varying from 0.90 to 0.95. LoA were highest in interrater reliability and smallest in concurrent validity. SEM ranged from 2.2 to 4.0° in lateral flexion left and flexion respectively. SDC varied from 6.1 to 11.1°. Conclusion Gyko has good inter- and intrarater reliability and excellent concurrent validity compared to the optical motion system for lumbar range of motion. Gyko may be considered as objective measure to measure range of motion for clinical purposes, however trials with patients are currently lacking.

Journal ArticleDOI
TL;DR: The mobile application may be considered a valid and reliable tool to assess thoracolumbar ROM for both asymptomatic and chronic low back pain subjects and intrarater reliability of individuals with and without back pain is verified.
Abstract: Background Smartphone devices have been used to measure range of motion (ROM) in different joints. Objective To verify the concurrent validity of thoracolumbar ROM using a mobile application and a digital inclinometer, as well as the intrarater reliability of individuals with and without back pain. Methods One investigator was responsible for measuring the ROM during the evaluations performed on 20 asymptomatic subjects and 20 symptomatic subjects in two consecutive days. Results Regarding to the concurrent validity, the Intraclass Correlation Coefficients (ICC) were classified as very good for all analyzed movements. For intrarater reliability, the mobile application had ICC varying between good and very good for the symptomatic subjects and very good for asymptomatic subjects. Conclusions The mobile application may be considered a valid and reliable tool to assess thoracolumbar ROM for both asymptomatic and chronic low back pain subjects.

Journal ArticleDOI
TL;DR: In this paper, the relative and absolute reliability of the pressure pain threshold (PPT) in the shoulder muscles of participants with and without unilateral subacromial impingement syndrome was determined.

Journal ArticleDOI
TL;DR: Low interrater and intrarater agreement in triage decisions among PCPs limits the possibility to use human decisions as a reference for ML to automate triage in primary care.
Abstract: Background: Smartphones have made it possible for patients to digitally report symptoms before physical primary care visits. Using machine learning (ML), these data offer an opportunity to support decisions about the appropriate level of care (triage). Objective: The purpose of this study was to explore the interrater reliability between human physicians and an automated ML-based triage method. Methods: After testing several models, a naive Bayes triage model was created using data from digital medical histories, capable of classifying digital medical history reports as either in need of urgent physical examination or not in need of urgent physical examination. The model was tested on 300 digital medical history reports and classification was compared with the majority vote of an expert panel of 5 primary care physicians (PCPs). Reliability between raters was measured using both Cohen κ (adjusted for chance agreement) and percentage agreement (not adjusted for chance agreement). Results: Interrater reliability as measured by Cohen κ was 0.17 when comparing the majority vote of the reference group with the model. Agreement was 74% (138/186) for cases judged not in need of urgent physical examination and 42% (38/90) for cases judged to be in need of urgent physical examination. No specific features linked to the model’s triage decision could be identified. Between physicians within the panel, Cohen κ was 0.2. Intrarater reliability when 1 physician retriaged 50 reports resulted in Cohen κ of 0.55. Conclusions: Low interrater and intrarater agreement in triage decisions among PCPs limits the possibility to use human decisions as a reference for ML to automate triage in primary care.

Journal ArticleDOI
TL;DR: Measurements performed with the CROM goniometer show interrater and intrarater agreement in assessments of cervical range of motion and can be recommended for use in daily clinical practice.
Abstract: Purpose. The current study was designed to assess interrater and intrarater validity of cervical range of motion measurements performed with a CROM goniometer. Material and Methods. The study involved 95 healthy university students (31 males and 64 females) aged 20-24 years. Two examiners performed measurements of cervical range of motion using a CROM goniometer. The same subjects were examined again after two weeks, in the same conditions. The results acquired by one rater during the first and the second examination were compared for reproducibility, while the results obtained by the two examiners were compared to assess validity and reliability of the tool. Cronbach’s alpha was applied to determine intrarater reliability, and the values of correlations were used to assess the interrater agreement. Results. Analysis of the results showed both intrarater and interrater agreement in all the measures of cervical range of motion. The highest intrarater and interrater concordance was observed in the measure of extension. Intrarater agreement for Examiner 1 was reflected by Cronbach’s , and for Examiner 2 by Cronbach’s . As for the interrater agreement in the measure of extension, the value of correlation in both the first and the second measurement amounted to . Conclusions. Measurements performed with the CROM goniometer show interrater and intrarater agreement in assessments of cervical range of motion. The CROM goniometer can be recommended for use in daily clinical practice.

Journal ArticleDOI
TL;DR: Navicular drop is relatively more reliable than other traditional techniques and the FPI-6 has excellent intrarater reliability, but only moderate interrater reliability which can provide clinicians and researchers with a reliable way to implement foot posture assessment.

Journal ArticleDOI
01 Jul 2020-Pm&r
TL;DR: The Gait Assessment and Intervention Tool is an observational gait scale that assesses kinematic parameters using video recordings.
Abstract: Background Gait impairment is one of the main causes of disability in people with multiple sclerosis. The Gait Assessment and Intervention Tool is an observational gait scale that assesses kinematic parameters using video recordings. Objective To study intra- and interrater reliability and the minimal detectable change of the Gait Assessment and Intervention Tool in individuals with multiple sclerosis. Design Observational study. Setting Multiple Sclerosis Foundation. Participants Thirty-five participants with multiple sclerosis were assessed (12 men, 23 women; 47.7 ± 11 y; Expanded Disability Status Scale = 4.32 ± 1.4). Interventions Not applicable. Main outcome measurements Intra- and interrater reliability of the Gait Assessment and Intervention Tool was assessed for each limb using the Intraclass Correlation Coefficient. In addition, the minimal detectable change was calculated. Results The Intraclass Correlation Coefficient for the intrarater reliability was found to be excellent for the total score both for the right side (.91; 95% confidence interval 95% CI .85-.95) and the left side (.93; 95% CI .88-.96). The intraclass correlation coefficient for the interrater reliability was .91 (95% CI .85-.95) for the right side, and .93 (95% CI .88-.96) for the left side. The minimal detectable change for the intrarater reliability was 1.19 points for the right side and .77 for the left side. Conclusions The Gait Assessment and Intervention Tool exhibits excellent intra- and interrater reliability and a small minimal detectable change for people with multiple sclerosis.

Journal ArticleDOI
TL;DR: Analyzing the interrater and intrarater reliability of direct anthropometric measurements with caliper on defined craniofacial references in infants with positional plagiocephaly shows excellent intra- and interRater reliability for maximal cranial length, maximal cranIAL width, and right and left cranial diagonals, and good intra-and-interrater reliability in maximal Cranial circumference measurement.
Abstract: (1) Background: anthropometric measurements with calipers are used to objectify cranial asymmetry in positional plagiocephaly but there is controversy regarding the reliability of different methodologies. Purpose: to analyze the interrater and intrarater reliability of direct anthropometric measurements with caliper on defined craniofacial references in infants with positional plagiocephaly. (2) Methods: 62 subjects ( 0.9. Intrarater and interrater reliability for the left cranial diagonal was excellent: ICC > 0.9 and difference in agreement in the Bland-Altman plot 0.0 mm, respectively. Intrarater and interrater reliability for the maximal cranial circumference was good: differences in agreement in Bland-Altman plots: intra: -0.03 cm; inter: -0.12 cm. (4) Conclusions: anthropometric measurements in a sample of infants with moderate positional plagiocephaly have shown excellent intra- and interrater reliability for maximal cranial length, maximal cranial width, and right and left cranial diagonals, and good intra- and interrater reliability in maximal cranial circumference measurement.

Journal ArticleDOI
TL;DR: None of the 6 evaluated scoring systems showed a significant advantage over the other when comparing ICCs, and all the instruments seem to be very reliable methods.
Abstract: A wide variety of assessment tools have been proposed for hidradenitis suppurativa (HS) until now, but none of them meets the criteria for an ideal score. Because there is no gold standard scoring system, the choice of the measure instrument depends on the purpose of use and even on the physician's experience in the subject of HS. The aim of this study was to assess the intrarater and interrater reliability of 6 scoring systems commonly used for grading severity of HS: the Hurley Staging System, the Refined Hurley Staging, the Hidradenitis Suppurativa Severity Score System (IHS4), the Hidradenitis Suppurativa Severity Index (HSSI), the Sartorius Hidradenitis Suppurativa Score and the Hidradenitis Suppurativa Physician's Global Assessment Scale (HS-PGA). On the scoring day, 9 HS patients underwent a physical examination and disease severity assessment by a group of 16 dermatology residents using all evaluated instruments. Then, intrarater reliability was calculated using intraclass correlation coefficient (ICC), and interrater variability was evaluated using the coefficient of variation (CV). In all 6 scorings the ICCs were >0.75, indicating high intrarater reliability of all presented scales. The study has also demonstrated moderate agreement between raters in most of the evaluated measure instruments. The most reproducible methods, according to CVs, seem to be the Hurley staging, IHS4, and HSSI. None of the 6 evaluated scoring systems showed a significant advantage over the other when comparing ICCs, and all the instruments seem to be very reliable methods. The interrater reliability was usually good, but the most repeatable results between researchers were obtained for the easiest scales, including Hurley scoring, IHS4 and HSSI.

Journal ArticleDOI
TL;DR: Patients can acquire the necessary skills to conduct a correct joint assessment after initial and thorough training and patient-performed assessments of joints and DAS28-CRP in an eHealth home-monitoring solution were reliable and comparable with HCP.
Abstract: Objective. In an eHealth setting, to investigate intra- and interrater reliability and agreement of joint assessments and Disease Activity Score using C-reactive protein (DAS28-CRP) in patients with rheumatoid arthritis (RA) and test the effect of repeated joint assessment training. Methods. Patients with DAS28-CRP ≤ 5.1 were included in a prospective cohort study (clinicaltrials.gov: NCT02317939). Intrarater reliability and agreement of patient-performed joint counts were assessed through completion of 5 joint assessments over a 2-month period. All patients received training on joint assessment at baseline; only half of the patients received repeated training. A subset of patients was included in an appraisal of interrater reliability and agreement comparing joint assessments completed by patients, healthcare professionals (HCP), and ultrasonography. Cohen’s κ coefficients and intraclass correlation coefficients (ICC) were used for quantifying of reliability of joint assessments and DAS28-CRP. Agreement was assessed using Bland-Altman plots. Results. Intrarater reliability was excellent with ICC of 0.87 (95% CI 0.83–0.90) and minimal detectable change of 1.13. ICC for interrater reliability ranged between 0.69 and 0.90 (good to excellent). Patients tended to rate DAS28-CRP slightly higher than HCP. In patients receiving repeated training, a mean difference in DAS28-CRP of −0.08 was observed (limits of agreements of −1.06 and 0.90). After 2 months, reliability between patients and HCP was similar between groups receiving single or repeated training. Conclusion. Patient-performed assessments of joints and DAS28-CRP in an eHealth home-monitoring solution were reliable and comparable with HCP. Patients can acquire the necessary skills to conduct a correct joint assessment after initial and thorough training. [clinicaltrials.gov (NCT02317939)]

Journal ArticleDOI
TL;DR: It is demonstrated that FRT and global cervical range of motion with a CROM device show high reliability in individuals with migraine and in comparable headache-free women.

Journal ArticleDOI
02 Nov 2020
TL;DR: When paired with video review, the TAI-Q demonstrates moderate to acceptable levels of reliability and validity for the total score and could help to potentially screen for deficiencies in transfer quality and opportunities for intervention.
Abstract: Objectives To evaluate the psychometric properties of the Transfer Assessment Instrument Questionnaire (TAI-Q), a self-assessment measure to evaluate transfer quality compared with clinician-reported measures. Design Participants self-assessed transfers from their wheelchair to a mat table using the TAI-Q. For session 1, participants self-assessed their transfer both before and after reviewing a video of themselves completing the transfer (session 1). Self-assessment was completed for another transfer after a 10-minute delay (session 2, intrarater reliability) and after a 1- to 2-day delay (session 3, test-retest reliability). Self-assessment was compared with a criterion standard of an experienced clinician scoring the same transfers with the Transfer Assessment Instrument (TAI) version 4.0 (concurrent validity). Setting 2017 National Veterans Wheelchair Games. Participants Convenience sample of full-time wheelchair users (N=44). Interventions Not applicable. Main Outcome Measures TAI-Q and TAI. Results After video review of their transfer, acceptable levels of reliability were demonstrated for total TAI-Q score for intrarater (intraclass correlation [ICC], 0.627) and test-retest reliability (ICC, 0.705). Moderate to acceptable concurrent validity was demonstrated with the TAI (ICC, 0.554-0.740). Participants tended to underestimate the quality of their transfer (reported more deficient items) compared with the TAI. However, this deficit decreased and reliability improved from pre-video review to post-video review and from session 1 to session 2. The minimum detectable change indicated that a change of 1.63 to 2.21 in the TAI-Q total score is needed to detect a significant difference in transfer skills. Conclusions When paired with video review, the TAI-Q demonstrates moderate to acceptable levels of reliability and validity for the total score. Self-assessment was completed quickly (<5min) and could help to potentially screen for deficiencies in transfer quality and opportunities for intervention.

Journal ArticleDOI
TL;DR: The modified TICI score is a practical metric for assessing reperfusion after mechanical thrombectomy, though not without limitations and formal training of interventionalists may improve reporting reliability.
Abstract: BACKGROUND AND PURPOSE: The modified TICI score is the benchmark for quantifying reperfusion after mechanical thrombectomy. There has been limited investigation into the reliability of this score. We aim to identify intra-rater and inter-rater reliability of the mTICI score among endovascular neurosurgeons. MATERIALS AND METHODS: Four independent endovascular neurosurgeons (raters) reviewed angiograms of 67 patients at 2 time points. κ statistics assessed inter- and intrarater reliability and compared raters’-versus-proceduralists’ scores. Reliability was also assessed for occlusion location and by dichotomizing modified TICI scores (0–2a versus 2b–3). RESULTS: Interrater reliability was moderate-to-substantial, weighted κ = 0.417–0.703, overall κ = 0.374 (P CONCLUSIONS: The modified TICI score is a practical metric for assessing reperfusion after mechanical thrombectomy, though not without limitations. Agreement improved when scores were dichotomized around the clinically relevant threshold of successful revascularization. Interrater reliability improved with time, suggesting that formal training of interventionalists may improve reporting reliability. Agreement of the modified TICI scale is best with M1 and ICA occlusion and becomes less reliable with more distal or posterior circulation occlusions. These findings should be considered when developing research trials.

Journal ArticleDOI
TL;DR: The MRI-based classification of RNR showed moderate-to-almost perfect inter-rater and almost perfect intra- rater reliability, which improved among junior raters and improved among senior raters.
Abstract: Patients with central lumbar spinal stenosis (LSS) have a longer symptom history, more severe stenosis, and worse postoperative outcomes, when redundant nerve roots (RNRs) are evident in the preoperative MRI. The objective was to test the inter- and intra-rater reliability of an MRI-based classification for RNR. This is a retrospective reliability study. A neuroradiologist, an orthopedic surgeon, a neurosurgeon, and three orthopedic surgeons in-training classified RNR on 126 preoperative MRIs of patients with LSS admitted for microsurgical decompression. On sagittal and axial T2-weighted images, the following four categories were classified: allocation (A) of the key stenotic level, shape (S), extension (E), and direction (D) of the RNR. A second read with cases ordered differently was performed 4 weeks later. Fleiss and Cohen’s kappa procedures were used to determine reliability. The allocation, shape, extension, and direction (ASED) classification showed moderate to almost perfect inter-rater reliability, with kappa values (95% CI) of 0.86 (0.83, 0.90), 0.62 (0.57, 0.66), 0.56 (0.51, 0.60), and 0.66 (0.63, 0.70) for allocation, shape, extension, and direction, respectively. Intra-rater reliability was almost perfect, with kappa values of 0.90 (0.88, 0.92), 0.86 (0.84, 0.88), and 0.84 (0.81, 0.87) for shape, extension, and direction, respectively. Intra-rater kappa values were similar for junior and senior raters. Kappa values for inter-rater reliability were similar between the first and second reads (p = 0.06) among junior raters and improved among senior raters (p = 0.008). The MRI-based classification of RNR showed moderate-to-almost perfect inter-rater and almost perfect intra-rater reliability.

Journal ArticleDOI
TL;DR: PPT algometry is a useful measurement tool with acceptable reliability and thus suitable for monitoring and quantifying pain in persons with conservatively managed wrist fractures.
Abstract: OBJECTIVE Wrist fracture is a common injury in Norway. Pressure algometry is widely used to quantify patients' pain threshold in various anatomical locations. The aim of this study was to explore the reliability of pain pressure threshold (PPT) algometry in persons with conservatively managed distal radius fractures. METHODS In this cross-sectional study, three raters (A, B, and C) tested the PPT of participants (18-97 years of age) with a unilateral distal radius fracture after removal of the cast. The raters conducted two measurements of both wrists. Intrarater reliability was examined in 75, 50, and 25 participants by Raters A, B, and C, respectively. Interrater reliability was tested in 50 and 25 participants by Rater Pairs A-B and A-C, respectively. Relative reliability was calculated with intraclass correlation coefficient (ICC1.1 ) and absolute reliability using within-subject standard deviation (Sw ). RESULTS There was a significant difference in the PPT between the participants' injured and noninjured wrists (p < .0001). The mean PPT was 29% lower in the injured than in the noninjured wrists, 175 kPa (SD ± 62) versus 248 kPa (SD ± 83). Intrarater reliability (A) of PPT algometry was better in injured wrists than in noninjured wrists (ICC1.1 = 0.825 vs. 0.765 and Sw = 27 vs. 43 kPa). Similarly, interrater reliability of PPT algometry was better in injured wrists than in noninjured wrists. In injured wrists, the interrater reliability of PPT algometry between Raters A and B was 0.617 (ICC1.1 ) and Sw was 51 kPa, and between Raters A and C, the interrater reliability was 0.706 (ICC1.1 ) and Sw was 48 kPa. CONCLUSION PPT algometry is a useful measurement tool with acceptable reliability and thus suitable for monitoring and quantifying pain in persons with conservatively managed wrist fractures. To be more certain that a change has occurred, the same rater should perform the measurements.

Journal ArticleDOI
TL;DR: Good reliability and validity of this new vaginal dynamometer to quantify pelvic floor muscle (PFM) strength in incontinent and continent women are obtained.
Abstract: AIMS Assess the intrarater and interrater reliabilities and diagnostic accuracy of a new vaginal dynamometer to measure pelvic floor muscle (PFM) strength in incontinent and continent women. METHODS A test-retest reliability study including 152 female patients. EXCLUSION CRITERIA history of urge urinary incontinence, prolapse of pelvic organ, pregnancy, previous urogynecological surgery, severe vaginal atrophy, or neurological conditions. The examination comprised digital assessment using the modified Oxford scale (MOS) and dynamometry measurements with a new prototype hand-held dynamometer. The MOS score ranges from 0 to 5: 0, no contraction; 1, flicker; 2, weak; 3, moderate; 4, good; 5, strong. Examinations were performed by a physiatrist, a physiotherapist and a midwife. The rest period between each rater measurement was 5 minutes. Assessment of intrarater and interrater reliability was calculated with the intraclass correlation coefficient. RESULTS One hundred twenty-two incontinent women and 30 continent women were included. Scores between 0 and 2 in MOS were recorded in 72% of incontinent women versus 20% in continent patients (P < 0.001). Intrarater reliability of the dynamometer was 0.942 (95% confidence interval [CI], 0.920-0.958) and the interrater reliability was 0.937 (95% CI, 0.913-0.954). The analysis of variance analysis showed significant differences in PFM strength across digital assessment categories. The post-hoc analysis showed statistical differences between adjacent categories of MOS 1-2 and 2-3. The diagnostic accuracy showed an area under the curve of 0.82 (95% C,: 0.75-0.89), 0.87 (95% CI, 0.81-0.92), and 0.83 (95% CI, 0.77-0.90) for the physiatrist, midwife, and physiotherapist, respectively. CONCLUSIONS The results obtained show a good reliability and validity of this new vaginal dynamometer to quantify PFM strength.

Journal ArticleDOI
TL;DR: Overall, HHD is recommended as a reliable and valid tool for single individuals and for flexor muscles on a group level and for balance assessments, the dynamic balance tests are recommended as the most valid and reliable balance tests.
Abstract: Objective To investigate intrarater reliability and concurrent and construct validity of muscle strength, balance, and functional mobility measures in individuals with noncongenital myotonic dystrophy type 1 (DM1). Methods Seventy-eight adults with noncongenital DM1 participated in visit 1, and 73 of the them participated in visit 2 separated by 1 to 2 weeks. The assessments consisted of muscle strength tests with handheld dynamometry (HHD) and stationary dynamometry in the lower limb. The balance tests consisted of the step test, Timed Up and Go test, feet-together stance, tandem stance, 1-leg stance, and modified Clinical Test of Sensory Integration and Balance on a balance platform. The functional mobility tests consisted of the 10-m walk test (10mWT) and 10-times Sit-to-Stand test. Results The HHD and stationary dynamometry had sufficient intrarater reliability for most muscle groups on a group (SEM% ≤15%) and individual (minimal detectable difference [MDD95%] ≤30%) level, but the HHD was most reliable. Stationary dynamometry measured a higher torque than HHD for all extensor muscles, but for single individuals, none of the devices were favored. Overall, intrarater reliability and validity were sufficient only for the dynamic balance tests, not the static balance tests. Both functional mobility tests were sufficiently reliable and valid, but the 10mWT was most reliable. Conclusion Overall, HHD is recommended as a reliable and valid tool for single individuals and for flexor muscles on a group level. For balance assessments, the dynamic balance tests are recommended as the most valid and reliable balance tests. Both functional mobility tests are recommended for valid and reliable outcomes, but the 10mWT was superior for reliability.

Journal ArticleDOI
TL;DR: The Brisbane Evidence-Based Language Test demonstrated almost perfect inter-rater reliability, intra- rater reliability and internal consistency, and high correlation coefficients and narrow confidence intervals indicated that test ratings vary minimally when administered by clinicians of different experience levels, or different levels of familiarity with the new measure.
Abstract: Purpose: To examine the inter-rater reliability, intra-rater reliability, internal consistency and practice effects associated with a new test, the Brisbane Evidence-Based Language TestMethods: Re

Journal ArticleDOI
TL;DR: The Thai version of the screening tool for patients with lumbar instability achieved excellent content validity and interrater and intrarater reliability and is recommended for use with Thai patients with low back pain.