scispace - formally typeset
Search or ask a question

Showing papers on "Intra-rater reliability published in 2019"


Journal ArticleDOI
TL;DR: It is demonstrated that the Koos classification system for vestibular schwannoma is a reliable method for tumor classification and lends further support to the results of current literature using Koos grading system.
Abstract: Background The Koos classification of vestibular schwannomas is designed to stratify tumors based on extrameatal extension and compression of the brainstem While this classification system is widely reported in the literature, to date no study has assessed its reliability Objective To assess the intra- and inter-rater reliability of the Koos classification system Methods After institutional review board approval was obtained, a cross-sectional group of the Magnetic Resonance imagings of 40 patients with vestibular schwannomas varying in size comprised the study sample Four raters were selected to assign a Koos grade to 50 total scans Inter- and intrarater reliability were calculated and reported using Fleiss' kappa, Kendall's W, and Intraclass correlation coefficient (ICC) Results Inter-rater reliability was found to be substantial when measured using Fleiss' kappa (71), extremely strong using Kendall's W (92), and excellent as calculated by ICC (88)Intrarater reliability was perfect for 3 out of 4 raters as assessed using weighted kappa, Kendall's W and ICC, with the intrarater agreement for the fourth rater measured as extremely high Conclusion We have demonstrated that the Koos classification system for vestibular schwannoma is a reliable method for tumor classification This study lends further support to the results of current literature using Koos grading system Further studies are required to evaluate its validity and utility in counseling patients with regard to outcomes

63 citations


01 Jan 2019
TL;DR: Findings indicated that coding behavior changes both between and within individuals over time, emphasizing the importance of conducting regular and systematic IRR and intrarater reliability tests, especially when multiple coders are involved, to ensure consistency and clarity at the screening and coding stages.
Abstract: A methodologically sound systematic review is characterized by transparency, replicability and a clear inclusion criteria. However, little attention has been paid to reporting the details of inter-rater reliability (IRR) when multiple coders are used to make decisions at various points in the screening and data extraction stages of a study. Prior research has mentioned the paucity of information on IRR, including number of coders involved, at what stages and how IRR tests were conducted, and how disagreements were resolved. This paper examines and reflects on the human factors that affect decision-making in systematic reviews via reporting on three IRR tests, conducted at three different points in the screening process, for two distinct reviews. Results of the two studies are discussed in the context of inter rater and intra rater reliability in terms of the accuracy, precision and reliability of coding behaviour of multiple coders. Findings indicated that coding behaviour changes both between and within individuals over time, emphasising the importance of conducting regular and systematic inter and intra-rater reliability tests, especially when multiple coders are involved, to ensure consistency and clarity at the screening and coding stages. Implications for good practice while screening/coding for systematic reviews are discussed.

41 citations


Journal ArticleDOI
TL;DR: In this paper, the effect of training on interrater reliability with the rat Grimace Scale (RGS) was evaluated with two training sets (42 and 150 images) were prepared from acute pain models, and four trainee raters progressed through two rounds of training, scoring 42 images (set 1) followed by 150 images(set 2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater.
Abstract: Rodent grimace scales facilitate assessment of ongoing pain. Reported rater training using these scales varies considerably and may contribute to the observed variability in interrater reliability. This study evaluated the effect of training on interrater reliability with the Rat Grimace Scale (RGS). Two training sets (42 and 150 images) were prepared from acute pain models. Four trainee raters progressed through 2 rounds of training, scoring 42 images (set 1) followed by 150 images (set 2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then rescored (set 2b). Four years later, trainees rescored the 150 images (set 2c). A second group of raters (no-training group) scored the same image sets without review with the experienced rater. Inter- and intrarater reliability were evaluated by using the intraclass correlation coefficient (ICC), and ICC values were compared by using the Feldt test. In the trainee group, interrater reliability increased from moderate to very good between sets 1 and 2b and increased between sets 2a and 2b. Action units with the highest and lowest ICC at set 2b were orbital tightening and whiskers, respectively. In comparison to an experienced rater, the ICC for all trainees improved, ranging from 0.88 to 0.91 at set 2b. Four years later, very good interrater reliability was retained, and intrarater reliability was good or very good). The interrater reliability of the no-training group was moderate and did not improve from set 1 to set 2b. Training improved interrater reliability, with an associated reduction in 95%CI. In addition, training improved interrater reliability with an experienced rater, and performance was retained.

29 citations


Journal ArticleDOI
TL;DR: The findings indicate that the SLS test including the FSD and LSD tests can be suitable for clinical use regardless of number of observed segments and particularly with a ≤3-point rating scale.
Abstract: Single leg squat (SLS) is a common tool used in clinical examination to set and evaluate rehabilitation goals, but also to assess lower extremity function in active people. Objectives To conduct a review and meta-analysis on the inter-rater and intrarater reliability of the SLS, including the lateral step-down (LSD) and forward step-down (FSD) tests. Design Review with meta-analysis. Data sources CINAHL, Cochrane Library, Embase, Medline (OVID) and Web of Science was searched up until December 2018. Eligibility criteria Studies were eligible for inclusion if they were methodological studies which assessed the inter-rater and/or intrarater reliability of the SLS, FSD and LSD through observation of movement quality. Results Thirty-one studies were included. The reliability varied largely between studies (inter-rater: kappa/intraclass correlation coefficients (ICC) = 0.00–0.95; intrarater: kappa/ICC = 0.13–1.00), but most of the studies reached ‘moderate’ measures of agreement. The pooled results of ICC/kappa showed a ‘moderate’ agreement for inter-rater reliability, 0.58 (95% CI 0.50 to 0.65), and a ‘substantial’ agreement for intrarater reliability, 0.68 (95% CI 0.60 to 0.74). Subgroup analyses showed a higher pooled agreement for inter-rater reliability of ≤3-point rating scales while no difference was found for different numbers of segmental assessments. Conclusion Our findings indicate that the SLS test including the FSD and LSD tests can be suitable for clinical use regardless of number of observed segments and particularly with a ≤3-point rating scale. Since most of the included studies were affected with some form of methodological bias, our findings must be interpreted with caution. PROSPERO registration number CRD42018077822.

27 citations


Journal ArticleDOI
TL;DR: The purpose of this study was to evaluate test–retest reliability at the cerebral aqueduct and C2‐C3 area using PC‐MRI to assess the feasibility of investigating CSF and CBF flow dynamics.
Abstract: Purpose Pathological states occur when cerebrospinal fluid (CSF) and cerebral blood flow (CBF) dynamics become dysregulated in the brain. Phase-contrast MRI (PC-MRI) is a noninvasive imaging technique that enables quantitative measurements of CSF and CBF flow. While studies have validated PC-MRI as an imaging technique for flow, few studies have evaluated its reliability for CSF and CBF flow parameters commonly associated with neurological disease. The purpose of this study was to evaluate test-retest reliability at the cerebral aqueduct (CA) and C2-C3 area using PC-MRI to assess the feasibility of investigating CSF and CBF flow dynamics. Methods This study was performed on 27 cognitively normal young adults (ages 20-35 years). Flow data was acquired on a 3T Siemens Prisma using a 2D cine-PC pulse sequence. Three consecutive flow measurements were acquired at the CA and C2-C3 area. Intraclass correlation coefficient (ICC) and coefficient of variance (CV) were used to evaluate intrarater, inter-rater, and test-retest reliability. Results Among the 26 flow parameters analyzed, 22 had excellent reliability (ICC > 0.80), including measurements of CSF stroke volume, flush peak, and fill peak, and 4 parameters had good reliability (ICC 0.60-0.79). 16 flow parameters had a mean CV ≤ 10%, 7 had a CV ≤ 15%, and 3 had a CV ≤ 30%. All CSF and CBF flow measurements had excellent inter-rater and intrarater reliability (ICC > 0.80). Conclusion This study shows that CSF and CBF flow can be reliably measured at the CA and C2-C3 area using PC-MRI, making it a promising tool for studying flow dynamics in the central nervous system.

26 citations


Journal ArticleDOI
TL;DR: The 2MWT and TUG are highly reliable and responsive in the assessment of respectively the walking capacity and general mobility of patients with MS with mild disability and could be reliably used in so called mildly disabled patients withMS to assess mobility limitation.
Abstract: Background Mobility limitations are frequent in patients with multiple sclerosis (MS), and could already be present in patients with so-called mild neurological disability (Expanded Disability Status Scale≤4). Assessing mobility in these patients is therefore of paramount importance. Timed Up-and-Go Test (TUG) and 2-Minute Walk Test (2MWT) are two clinically feasible tests which reliability and responsiveness are unknown among these patients. Whether fatigue, which is the number one symptom among these patients, is linked to these limitations remains unknown. Aim The aim of this study was to explore the intrarater reliability and minimal detectable change (MDC95), as an index of responsiveness, of TUG and 2MWT. To explore their link with perceived fatigue among patients with MS. Design Cross-sectional observational study, including two measures. Setting Two university hospital outpatient centers. Population Patients (N.=63, 49 seen twice) with MS with mild disability (Expanded Disability Status Scale≤4). Methods 2MWT and TUG were performed twice in one occasion, and repeated 2 weeks later. Modified fatigue impact scale (MFIS) was used to assess fatigue. Intraclass coefficient correlations were calculated for immediate and 2-week reliability. MDC95 were computed. Correlations between mobility indices and fatigue were explored using Spearman's ρ. Results Mobility was impaired in comparison to normative values (2MWT: -4.9% from normative distance; TUG: +32% from normative time). The immediate reliability was excellent for both the 2MWT (ICC=0.98) and TUG (ICC=0.98). Reliability at 2 weeks was excellent for 2MWT (ICC=0.95) and very good for TUG (ICC=0.90). MDC95 were respectively 20m (2MWT) and 1.3s (TUG). Both measures were significantly weakly correlated to total MFIS (ρ=-0.37 and 0.39, respectively; P Conclusions The 2MWT and TUG are highly reliable and responsive in the assessment of respectively the walking capacity and general mobility of patients with MS with mild disability. Mobility impairments are linked to perceived fatigue among these patients. Clinical rehabilitation impact TUG and 2MWT are easy to administer and could be reliably used in so called mildly disabled patients with MS to assess mobility limitation.

24 citations


Journal ArticleDOI
TL;DR: Self-measured arm circumference is reliable and valid among women with and without BCRL, and self-managed surveillance for BCRL can support self-efficacy without increasing anxiety.
Abstract: Background Prospective surveillance by physical therapists enables early detection and treatment of breast cancer-related lymphedema (BCRL). Strategies to increase access to prospective surveillance could reduce the burden of BCRL on patients and the health system. One potential solution is self-managed surveillance that does not require in-person assessment by a specialized physical therapist. Objective The objective was to develop and test the reliability and validity of a written and video-supported protocol for women with breast cancer to self-measure arm circumference. Design This was a cross-sectional reliability and validity study. Results The intrarater reliability between CIRself_home and CIRself_lab and the interrater reliability between CIRself_lab and CIRther was high to excellent for both arms in both groups (intraclass correlation coefficient ≥0.86). VOLself_lab correlated strongly with VOLper (r ≥ 0.95), demonstrating excellent validity. Participants reported strong intention, self-efficacy, and positive attitude toward the performance of self-managed surveillance for BCRL, which was not perceived to increase worry about having or getting BCRL. Methods Participants with (n = 20) and without (n = 21) BCRL completed self-measurement of arm circumference on both arms at home (CIRself_home) and at the lab (CIRself_lab) (intrarater reliability). The CIRself_lab was subsequently compared to measures performed by a specialized physical therapist (CIRther) (interrater reliability). To test validity, arm volume calculated from the self-measurements (VOLself_lab) was compared to perometry measurements (VOLper). Participants completed a questionnaire to assess attitudes for performing self-managed surveillance for BCRL. Limitations These findings need to be replicated in a clinical setting to confirm the reliability and acceptability of self-managed surveillance for BCRL among women newly diagnosed with breast cancer. Conclusions Self-measured arm circumference is reliable and valid among women with and without BCRL. Self-managed surveillance for BCRL can support self-efficacy without increasing anxiety.

20 citations


Journal ArticleDOI
TL;DR: The results of the present study showed very good ICC values for both intrarater and interrater reliability measuring knee joint ROM with EasyAngle, which may indicate a problem monitoring small differences between measurements.
Abstract: Objectives Measurements of joint range of motion (ROM) are part of a physical therapist's daily work. Activities of daily living and exercises can be complicated to perform when ROM is limited, and depending on the demands in daily living, the knee joint requires different ROM. In sports, a few degrees in ROM may make the difference between getting injured or not. The goals for physical therapists are to help the patients to regain full ROM, mobility, strength, and function after sustaining an injury. To measure joints with the manual universal goniometer is considered time-consuming and difficult with respect to repeated measurements. Recently, a new digital instrument for measuring ROM was developed-EasyAngle. A first objective of the study was to investigate the reliability of EA for measuring knee joint angles, considering intrarater and interrater reliability. A second objective was to investigate if there were any differences in the intrarater reliability between a novice and an experienced assessor. Method Passive knee angles were measured in fixed positions for 40 knee joints (20 subjects). Two registered physical therapists, one novice and one experienced, conducted the measurements. Both registered physical therapists were blinded to the measurements throughout the study. Results The results showed very good intrarater (intraclass correlation coefficient [ICC] 0,997-0,998, standard error of mean 1.15-1.48, smallest detectable difference [SDD] 3.19-4.09, limits of agreement -3.36-3.04, -4.66-4.09) and interrater reliability (ICC 0.994, standard error of mean 2.11, SDD 5.85, limits of agreement -4.75-6.95) for measurements of knee joint ROM. No statistical difference between a novice and an experienced assessor was detected (p = 0.86). Conclusion The results of the present study showed very good ICC values for both intrarater and interrater reliability measuring knee joint ROM with EasyAngle. Relatively high SDD values were seen for both assessors and may indicate a problem monitoring small differences between measurements. Further studies are recommended to increase the generalizability of the results.

20 citations


Journal ArticleDOI
TL;DR: In participants with a low medial longitudinal arch, the Navicular Drop Test showed significant correlations with footprint parameters; correlations were good for the arch angle and Chippaux‐Smirnak Index, and excellent for the Staheli Index.
Abstract: Background The medial longitudinal arch of the foot is a variable structure, and a decrease in its height could affect several functions and increase the risk of injuries in the lower limbs. There are many different techniques for evaluating it. Objective The objective of this study was to evaluate the correlations of the Navicular Drop Test, several footprint parameters, and the Foot Posture Index-6 in people with a low medial longitudinal arch. Intrarater reliability and interrater reliability were also estimated. Design This was a repeated-measures, observational descriptive study. Methods Seventy-one participants (53.5% women; mean age = 24.13 years; SD = 3.41) were included. All of the parameters were collected from the dominant foot. The correlation coefficients were calculated. The reliability was also calculated using the intraclass correlation coefficient, 95% CI, and kappa coefficient. Results Statistically significant correlations were obtained between the Navicular Drop Test and the footprint parameters, with r absolute values ranging from 0.722 to 0.788. The Navicular Drop Test and the Foot Posture Index-6 showed an excellent correlation (Spearman correlation coefficient = 0.8), and good correlations (Spearman correlation coefficient = |0.663-0.703|) were obtained between the footprint parameters and the Foot Posture Index-6. Excellent intrarater reliability and interrater reliability were obtained for all of the parameters. Limitations Radiographic parameters, the gold standard for evaluating the medial longitudinal arch height, were not used. In addition, the results of this research cannot be generalized to people with normal and high medial longitudinal arches. Conclusions In participants with a low medial longitudinal arch, the Navicular Drop Test showed significant correlations with footprint parameters; correlations were good for the arch angle and Chippaux-Smirnak Index, and excellent for the Staheli Index. The Foot Posture Index-6 showed an excellent correlation with the Navicular Drop Test and a good correlation with the footprint parameters evaluated. All of the parameters showed high reliability.

18 citations


Journal ArticleDOI
TL;DR: To determine the intrarater, interrater, and retest reliability of facial nerve grading of patients with facial palsy (FP) using standardized videos recorded synchronously during a self‐explanatory patient video tutorial.
Abstract: Objective To determine the intrarater, interrater, and retest reliability of facial nerve grading of patients with facial palsy (FP) using standardized videos recorded synchronously during a self-explanatory patient video tutorial. Study design Prospective, observational study. Methods The automated videos from 10 patients with varying degrees of FP (5 acute, 5 chronic FP) and videos without tutorial from eight patients (all chronic FP) were rated by five novices and five experts according to the House-Brackmann grading system (HB), the Sunnybrook Grading System (SB), and the Facial Nerve Grading System 2.0 (FNGS 2.0). Results Intrarater reliability for the three grading systems was very high using the automated videos (intraclass correlation coefficient [ICC]; SB: ICC = 0.967; FNGS 2.0: ICC = 0.931; HB: ICC = 0.931). Interrater reliability was also high (SB: ICC = 0.921; FNGS 2.0: ICC = 0.837; HB: ICC = 0.736), but for HB Fleiss kappa (0.214) and Kendell W (0.231) was low. The interrater reliability was not different between novices and experts. Retest reliability was very high (SB: novices ICC = 0.979; experts ICC = 0.964; FNGS 2.0: novices ICC = 0.979; experts ICC = 0.969). The reliability of grading of chronic FP with SB was higher using automated videos with tutorial (ICC = 0.845) than without tutorial (ICC = 0.538). Conclusion The reliability of the grading using the automated videos is excellent, especially for the SB grading. We recommend using this automated video tool regularly in clinical routine and for clinical studies. Level of evidence 4 xsLaryngoscope, 129:2274-2279, 2019.

18 citations


Journal ArticleDOI
TL;DR: The reliability of the IFAC is established as a tool for classifying cells in the frontal recess among an international group of rhinologists; and communication and teaching of frontal endoscopic sinus surgery (ESS) is improved.
Abstract: Background Inconsistencies in the nomenclature of structures of the frontal sinus have impeded the development of a validated "reference standard" classification system that surgeons can reliably agree upon The International Frontal Sinus Anatomy Classification (IFAC) system was developed as a consensus document, based on expert opinion, attempting to address this issue The purposes of this study are to: establish the reliability of the IFAC as a tool for classifying cells in the frontal recess among an international group of rhinologists; and improve communication and teaching of frontal endoscopic sinus surgery (ESS) Methods Forty-two computed tomography (CT) scans, each with a marked frontal cell, were reviewed by 15 international fellowship-trained rhinologists Each marked cell was classified into 1 of 7 categories described in the IFAC, on 2 occasions separated by 2 weeks Inter- and intrarater reliability were evaluated using Light's kappa (κ), the interclass correlation coefficient (ICC), and simple proportion of agreement Results Interrater reliability showed pairwise κ values ranging from 07248 to 10, with a mean of 09162 (SD, 00537) The ICC was 098 Intrarater reliability showed κ values ranging from 08613 to 10, with a mean of 09407 (SD, 00376) The within-rater ICC was 098 Conclusion Among a diverse sample of rhinologists (raters), there was substantial to almost perfect agreement between raters, and among individual raters at different timepoints The IFAC is a reliable tool for classification of cells in the frontal sinus Further outcome studies are still needed to determine the validity of the IFAC

Journal ArticleDOI
TL;DR: The m-CKCUEST allowed the production of two reliable outcome measures, which assessed the upper limb stability and the muscular endurance, which may be used in a follow-up to assess performance or rehabilitation level.

Journal ArticleDOI
TL;DR: The WHF echocardiographic criteria enable reproducible categorisation of e chocardiograms as definite RHD versus no or borderline RHD and hence it would be a suitable tool for screening and monitoring disease progression.
Abstract: Objective Different definitions have been used for screening for rheumatic heart disease (RHD). This led to the development of the 2012 evidence-based World Heart Federation (WHF) echocardiographic criteria. The objective of this study is to determine the intra-rater and inter-rater reliability and agreement in differentiating no RHD from mild RHD using the WHF echocardiographic criteria. Methods A standard set of 200 echocardiograms was collated from prior population-based surveys and uploaded for blinded web-based reporting. Fifteen international cardiologists reported on and categorised each echocardiogram as no RHD, borderline or definite RHD. Intra-rater and inter-rater reliability was calculated using Cohen’s and Fleiss’ free-marginal multirater kappa (κ) statistics, respectively. Agreement assessment was expressed as percentages. Subanalyses assessed reproducibility and agreement parameters in detecting individual components of WHF criteria. Results Sample size from a statistical standpoint was 3000, based on repeated reporting of the 200 studies. The inter-rater and intra-rater reliability of diagnosing definite RHD was substantial with a kappa of 0.65 and 0.69, respectively. The diagnosis of pathological mitral and aortic regurgitation was reliable and almost perfect, kappa of 0.79 and 0.86, respectively. Agreement for morphological changes of RHD was variable ranging from 0.54 to 0.93 κ. Conclusions The WHF echocardiographic criteria enable reproducible categorisation of echocardiograms as definite RHD versus no or borderline RHD and hence it would be a suitable tool for screening and monitoring disease progression. The study highlights the strengths and limitations of the WHF echo criteria and provides a platform for future revisions.

Journal ArticleDOI
TL;DR: The Turkish TCMS appears to be a suitable evaluation tool to assess the qualitative performance of trunk control and sitting balance for children with CP and it gives opportunity to use clinically and research purposes.

Journal ArticleDOI
TL;DR: Results indicate that validity for strength testing of the serratus anterior muscle is optimal with subjects in a seated position and the shoulder flexed at 90° in the scapular plane.
Abstract: Strength testing of the serratus anterior muscle with hand held dynamometry (HDD) in supine subjects has low reproducibility, and is influenced by compensatory activity of other muscles like the pectoralis major and upper trapezius. Previously, two manual maximum voluntary isometric contraction tests of the serratus anterior muscle were reported that recruited optimal surface electromyography (sEMG) activity in a sitting position. We adapted three manual muscle tests to make them suitable for HHD and investigated their validity and reliability. Twenty-one healthy adults were examined by two assessors in one supine and two seated positions. Each test was repeated twice. Construct validity was determined by evaluating force production (assessed with HHD) in relation to sEMG of the serratus anterior, upper trapezius and pectoralis major muscles, comparing the three test positions. Intra- and interrater reliability were determined by calculating intra-class correlation coefficients (ICC) smallest detectable change (SDC) and standard error of measurement (SEM). Serratus anterior muscle sEMG activity was most isolated in a seated position with the humerus in 90° anteflexion in the scapular plane. This resulted in the lowest measured force levels in this position with a mean force of 296 N (SEM 15.8 N). Intrarater reliability yielded an ICC of 0.658 (95% CI 0.325; 0.846) and an interrater reliability of 0.277 (95% CI -0.089;0.605). SDC was 127 Newton, SEM 45.8 Newton. The results indicate that validity for strength testing of the serratus anterior muscle is optimal with subjects in a seated position and the shoulder flexed at 90° in the scapular plane. Intrarater reliability is moderate and interrater reliability of this procedure is poor. However the high SDC values make it difficult to use the measurement in repeated measurements.

Journal ArticleDOI
TL;DR: Objective measurement of target lesions in vitiligo is important for clinical practice and trials, yet no preferred tool has been defined and studies are not yet based on ultraviolet photography.
Abstract: Background Objective measurement of target lesions in vitiligo is important for clinical practice and trials, yet no preferred tool has been defined Reported digital tools have shortcomings related to feasibility aspects and often lack information on validity, reliability and responsiveness Moreover, studies are not yet based on ultraviolet (UV) photography Objectives To assess the reliability, validity and feasibility of two functions in ImageJ for measurement of target lesions, based on three different types of images including UV pictures Methods Planimetric measurements were performed on photographs with and without UV, and lesion contours on transparent sheets of 52 vitiligo lesions from 10 patients with vitiligo The ImageJ functions 'wand' and 'threshold' were used by three and four assessors, respectively Inter- and intrarater reliability, hypothesis testing for construct validity, and feasibility were evaluated Results The inter- and intrarater reliability for the 'wand' and 'threshold' functions were excellent [intraclass correlation coefficient (ICC) > 0 center dot 9] for measurement on pictures (with or without UV) The highest agreement (ICC > 0 center dot 95) and lowest variance were obtained for measurements on transparent sheets All four hypotheses for construct validity were confirmed for all measurements Overall, all measurement methods scored satisfactorily for user-friendliness However, measurements on transparent sheets were preferred and the completion time was significantly faster Conclusions This study confirmed the reliability, validity and feasibility of two functions in ImageJ to measure target lesions in vitiligo Based on the feasibility and included three-dimensional aspects, transparent sheets measured with the ImageJ 'wand' function can be proposed for future trials as a reference method to investigate the criterion validity of other digital instruments

Journal ArticleDOI
TL;DR: Manual palpation by physiotherapists was shown to have excellent interrater reliability when using a categorical scoring system, and the PA showed a lack of consistency in intrarater reliability conflicting with previous research findings, whereas the FFS showed greater reliability in comparison.

Journal Article
TL;DR: This study demonstrates the psychometric properties of the IcaBI in an Italian stroke population, and shows that the scale can be considered a valid and reliable assessment tool for measuring functional disability in Italian acute ischemic stroke survivors.
Abstract: The objective of this study was to assess and validate the psychometric properties of the Italian culturally adapted Barthel Index (IcaBI) in a cohort of people with ischemic stroke. The validation process was conducted in an Italian cohort of 99 stroke inpatients to whom the IcaBI was administered in order to test its structural validity, and inter-and intrarater reliability. The internal consistency (Cronbach's alpha) was 0.901. Factor analysis revealed a two-factor structure. The interclass correlation coefficient 3,1 (ICC) for intra-rater reliability was estimated at 0.987 (95% CI: 0.975-0.993), while the ICC for inter-rater reliability was 0.909 (95% CI: 0.852-0.948). This study demonstrates the psychometric properties of the IcaBI in an Italian stroke population, and therefore shows that the scale can be considered a valid and reliable assessment tool for measuring functional disability in Italian acute ischemic stroke survivors.

Journal ArticleDOI
TL;DR: The Cutaneous Lupus Erythematosus Disease Area and Severity Index (CLASI) is a reliable outcome measure for cutaneous lupus erythematOSus (CLE) in adults used in clinical trials, but it has not been validated in children, limiting clinical trials for paediatric CLE.
Abstract: Background The Cutaneous Lupus Erythematosus Disease Area and Severity Index (CLASI) is a reliable outcome measure for cutaneous lupus erythematosus (CLE) in adults used in clinical trials. However, it has not been validated in children, limiting clinical trials for paediatric CLE. Objectives This study aimed to validate the CLASI in paediatrics. Methods Eleven paediatric patients with CLE, six dermatologists and six rheumatologists participated. The physicians were trained to use the CLASI and Physician's Global Assessment (PGA), and individually rated all patients using both tools. Each physician reassessed two randomly selected patients. Within each physician group, the intraclass correlation coefficient (ICC) was calculated to assess the reliability of each measure. Results CLASI activity scores demonstrated excellent inter- and intrarater reliability (ICC > 0·90), while the PGA activity scores had good inter-rater reliability (ICC 0·73-0·77) among both specialties. PGA activity scores showed excellent (ICC 0·89) and good intrarater reliability (ICC 0·76) for dermatologists and rheumatologists, respectively. Limitations of this study include the small sample size of patients and potential recall bias during the physician rerating session. Conclusions CLASI activity measurement showed excellent inter- and intrarater reliability in paediatric CLE and superiority over the PGA. These results demonstrate that the CLASI is a reliable and valid outcome instrument for paediatric CLE.

Journal ArticleDOI
TL;DR: In this paper, the reliability of the Actuarial Risk Assessment Instrument for Youth Protection (ARIJ) is evaluated by examining the inter- and intrarater reliability of four vignettes with an interval of at least three months.
Abstract: In the Netherlands, the Actuarial Risk Assessment Instrument for Youth Protection (ARIJ) is a widely used safety and risk assessment instrument in child welfare, although little is known about its reliability. Therefore, this study aimed to determine the reliability of the ARIJ by examining the inter- and intrarater reliability. For determining interrater reliability, professionals of two Dutch agencies (child and family support, n= 39 & child protection, n = 24) and master students (n= 65) each rated a random selection of 4 out of 24 vignettes. The vignettes were based on actual cases that were handled by the two agencies. For determining intrarater reliability, the professionals rated four vignettes twice with an interval of at least 3 months. Three reliability measures were calculated for each of the three samples: percent agreement, Krippendorff’s alpha, and Gwet’s Agreement Coefficient. Overall, the items and outcome of the safety assessment instrument showed a moderate or higher than moderate interrater reliability, and a substantial to almost perfect intrarater reliability. In general, the risk assessment items showed a moderate interrater and a substantial-to-high intrarater reliability. The risk assessment outcome had a near perfect interrater reliability and a substantial to almost perfect intrarater reliability. The outcome of both the safety and risk assessment of the ARIJ proved to be reliable and justifies the use of the ARIJ in the Dutch child welfare by professionals with different levels of experience.

Journal ArticleDOI
TL;DR: Tests demonstrated good reliability and measurement precision, although ICCs and SEMs were different between limbs, and tests were correlated, but only one-third of the variance was shared between tests.

Journal ArticleDOI
TL;DR: This is the first study to investigate the inter‐ and intra‐rater reliability for UTC of the patellar tendon on a large scale.
Abstract: Purpose Ultrasound tissue characterization (UTC) is used in research and clinical practice to quantify tendon structure of the patellar tendon. This is the first study to investigate the inter- and intra-rater reliability for UTC of the patellar tendon on a large scale. Method Fifty participants (25 patellar tendinopathy, 25 asymptomatic) were recruited. The affected patellar tendons in symptomatic and right tendons in asymptomatic participants were scanned with UTC twice by one researcher and once by another. The same was done for contour marking (needed to analyze a UTC scan) of the tendon. Intraclass correlation coefficient (ICC (2,1)) for echo-types I, II, III, IV, aligned fibrillar structure (echo-types I + II), and disorganized structure (echo-types III + IV) were calculated. This was done for UTC scans as well as solely marking contours. Results Inter-rater reliability showed fair to good ICC values for echo-types I (0.65) and II (0.46) and excellent ICC values for echo-type III (0.81), echo-type IV (0.83), aligned fibrillar structure (0.82), and disorganized structure (0.82). Intra-rater reliability showed excellent ICC values for echo-types I (0.76), III (0.88), IV (0.85), aligned fibrillar structure (0.88), and disorganized fibrillar structure (0.88) and a fair to good value for echo-type II (0.61). Contour marking showed excellent ICC values for all echo-types. Conclusion This study showed that UTC scans for patellar tendons have overall good intra-rater and inter-rater reliability. To optimize reliability of UTC scans of the patellar tendon, using the same rater and using aligned fibrillar structure (echo-types I + II combined) and disorganized structure (echo-types III + IV combined) as outcome measures can be considered.

Journal ArticleDOI
TL;DR: This newly developed clinical, physical assessment of upper extremity lymphedema provides standardization and a single score that accounts for multiple constructs that shows excellent inter- and intrarater reliability and shows moderately strong concurrent validity with objective and subjective measures.

Journal ArticleDOI
TL;DR: The classifications used in the Swedish fracture register for vertebral fractures have an acceptable inter- and intra-rater reliability with a moderate strength of agreement.
Abstract: Inter- and intra-rater reliability of vertebral fracture classifications in the Swedish fracture register

Journal ArticleDOI
TL;DR: The Pain Behaviour Scale is a valid and reliable tool for measuring the presence and severity of pain behaviour, and the physical performance tests are reliable tests.
Abstract: Objectives. To examine the interrater and intrarater reliability and construct validity of the Pain Behaviour Scale during standard physical performance tests in people with chronic low back pain and to confirm the test-retest reliability of the physical performance tests in this population. The Pain Behaviour Scale (PaBS) is an observational scale that was recently designed to uniquely measure both the presence and severity of observed pain behaviours. Methods. Twenty-two participants with chronic low back pain were observed during performance of five physical performance tests by two raters. Pain behaviours were assessed using the Pain Behaviour Scale. The Visual Analogue Scale and Modified Oswestry Disability Index were used to measure pain and disability, respectively. Descriptive statistics were used to report demographic features of participants. Reliability was analyzed using ICCs. Rater agreement was analyzed using the weighted Cohen’s kappa. Correlations between PaBS, self-reported measures, and physical performance tests were calculated using Pearson’s product-moment correlations. Results. The PaBS demonstrated excellent interrater (ICC2,1 = 1.0, 95% CI: 0.9 to 1.0) and intrarater (ICC3,1 = 0.9, 95% CI: 0.8 to 1.0) reliability. Component physical performance tests (i.e., time and distance) demonstrated good test-retest (0.6–1.0) reliability. Perfect agreement in the reporting of pain behaviours was found (95–100%). Correlations between pain behaviour severity and pain intensity (r = 0.6) and disability (r = 0.6) were moderate. Moderate correlations were found between pain behaviours and physical performance tests in sit to stand (r = 0.5), trunk flexion (r = 0.4), timed up and go (r = 0.4), and 50-foot walk (r = 0.4). Conclusion. The Pain Behaviour Scale is a valid and reliable tool for measuring the presence and severity of pain behaviour, and the physical performance tests are reliable tests.

Journal ArticleDOI
TL;DR: The second EP version of Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V), with permission granted by American Speech-Language-Hearing Association, is concluded to be a reliable and valid instrument for auditory-perceptual evaluation of the EP population.

Journal ArticleDOI
TL;DR: The excellent concurrent validity and intra-rater reliability, and the small SEM and MDC of the QMT make this test a method of choice, in either a clinical or research setting, to precisely evaluate muscle strength impairments of the KE in men with DM1.
Abstract: Background : Myotonic dystrophy type 1 (DM1) is the most prevalent degenerative neuromuscular disease in adults. Knee extensor (KE) maximal strength loss is a strong indicator of physical limitations in DM1. A reliable, precise and accessible maximal strength evaluation method needs to be validated for this slowly progressive disease. Objective : This paper aims to assess the intra-rater reliability, the standard error of measurement (SEM), the minimal detectable change (MDC), and the concurrent validity of quantified muscle testing (QMT) using a handheld dynamometer with a gold standard: the Biodex isokinetic device. Methods:Nineteen men with the adult form of DM1 participated in this study by attending 2 visits spaced by one week. The evaluation of KE muscle strength with QMT was completed on the first visit and the same QMT evaluation in addition to the maximal muscle strength evaluation using an isokinetic device were performed on the second visit. Results : The intra-rater reliability was excellent with an intraclass correlation coefficient (ICC) of 0.98 (0.96–0.99 : 95% confidence interval). SEM and MDC values were 1.05 Nm and 2.92 Nm, respectively. Concurrent validity of QMT of KE muscle group with the Biodex was also excellent with a Spearman’s correlation of ρ= 0.98. Conclusions : The excellent concurrent validity and intra-rater reliability, and the small SEM and MDC of the QMT make this test a method of choice, in either a clinical or research setting, to precisely evaluate muscle strength impairments of the KE in men with DM1.

Journal ArticleDOI
TL;DR: The Turkish version of CAPE-V is a reliable and valid instrument for auditory-perceptual evaluation of the Turkish speaking population.

Journal ArticleDOI
TL;DR: The interarytenoid assessment protocol appears reliable in describing interaryTenoid anatomy and may provide important steps toward improved standardization in the anatomic description of the interARYtenoid region in pediatric dysphagia.
Abstract: OBJECTIVE While the Benjamin-Inglis classification system is widely used to categorize laryngeal clefts, it does not clearly differentiate a type 1 cleft from normal anatomy, and there is no widely accepted or validated protocol for systematically evaluating interarytenoid mucosal height. We sought to propose the interarytenoid assessment protocol as a method to standardize the description of the interarytenoid anatomy and to test its reliability. STUDY DESIGN Retrospective review of endoscopic videos. SETTING Pediatric academic center. SUBJECTS AND METHODS The interarytenoid assessment protocol comprises 4 steps for evaluation of the interarytenoid region relative to known anatomic landmarks in the supraglottis, glottis, and subglottis. Thirty consecutively selected videos of the protocol were reviewed by 4 otolaryngologists. The raters were blinded to identifying information, and the video order was randomized for each review. We assessed protocol completion times and calculated Cohen's linear-weighted κ coefficient between blinded expert raters and with the operating surgeon to evaluate interrater/intrarater reliability. RESULTS Median age was 4.9 years (59 months; range, 1 month to 20 years). Median completion time was 144 seconds. Interrater and intrarater reliability showed substantial agreement (interrater κ = 0.71 [95% confidence interval (CI), 0.55-0.87]; intrarater mean κ = 0.70 [95% CI, 0.59-0.92/rater 1, 0.47-0.85/rater 2]; P < .001). Comparing raters to the operating surgeon demonstrated substantial agreement (mean κ = 0.62; 95% CI, 0.31-0.79/rater 1, 0.48-0.89/rater 2; P < .001). CONCLUSION The interarytenoid assessment protocol appears reliable in describing interarytenoid anatomy. Rapid completion times and substantial interrater/intrarater reliability were demonstrated. Incorporation of this protocol may provide important steps toward improved standardization in the anatomic description of the interarytenoid region in pediatric dysphagia.

Journal ArticleDOI
TL;DR: It is suggested that the T-GAP may be an ideal approach to assessing the outcomes of pollicization and provide a means of ongoing assessment of children's grip and pinch function.
Abstract: Purpose The Thumb Grasp and Pinch (T-GAP) assessment quantifies functional hand use in children with congenital thumb hypoplasia by categorizing grasp and thumb use patterns during assessment activities that encourage a variety of grasp and pinch styles. This study aims to demonstrate interrater and intrarater reliability results of the T-GAP. Methods A retrospective review was performed of children who had undergone index finger pollicization for congenital thumb hypoplasia and subsequent evaluation with videotaping of the T-GAP assessment. Following a training period, 4 occupational therapists scored 11 T-GAP videos on 2 separate occasions, separated by at least 2 weeks. Intraclass correlation coefficients (ICCs), standard error of measurements, minimum detectable change (MDC), and Pearson correlation coefficients were calculated. Results The T-GAP raw scores were 16 to 55, demonstrating a range of mild to severe hand grasp differences. The ICCs for the interrater reliability trials were 0.887 and 0.901. Intrarater ICCs were all above 0.88. The MDC for each trial was 8.1 and 6.7 points. Pearson correlation coefficients calculated for each rater and each pair of raters were above 0.8 in all cases. Conclusions Interrater and intrarater reliability testing results for the T-GAP were excellent in all cases; this strongly suggests that results from T-GAP assessments are reliable. The high ICCs suggest that raters can classify and score children’s hand function consistently. Clinical relevance This study, in conjunction with previous work, suggests that the T-GAP may be an ideal approach to assessing the outcomes of pollicization and provide a means of ongoing assessment of children’s grip and pinch function.