scispace - formally typeset
Search or ask a question

Showing papers on "Intra-rater reliability published in 2013"


Journal ArticleDOI
TL;DR: A comprehensive review of reliability assessment and improvement of power electronic systems from three levels: 1) metrics and methodologies of reliability assess of existing system; 2) reliability improvement of existing systems by means of algorithmic solutions without change of the hardware; and 3) reliability-oriented design solutions that are based on fault-tolerant operation of the overall systems.
Abstract: With wide-spread application of power electronic systems across many different industries, their reliability is being studied extensively. This paper presents a comprehensive review of reliability assessment and improvement of power electronic systems from three levels: 1) metrics and methodologies of reliability assessment of existing system; 2) reliability improvement of existing system by means of algorithmic solutions without change of the hardware; and 3) reliability-oriented design solutions that are based on fault-tolerant operation of the overall systems. The intent of this review is to provide a clear picture of the landscape of reliability research in power electronics. The limitations of the current research have been identified and the direction for future research is suggested.

681 citations


Journal ArticleDOI
TL;DR: The Food Intake LEVEL Scale (FILS) seems to have fair reliability and validity as a practical tool for assessing the severity of dysphagia, and further study on the reliability, validity, and sensitivity of the FILS compared with the FOIS is needed.

241 citations


Journal ArticleDOI
TL;DR: Almost one-third of patients with newly diagnosed PD fulfill the consensus criteria for PD-MCI; after 5 years, this proportion is approximately 50% of patients without dementia.
Abstract: Objective: We examined the development of Parkinson disease (PD)–mild cognitive impairment (MCI) in patients with newly diagnosed PD over 5 years using recently proposed consensus criteria, and we assessed the reliability of the criteria. Methods: Patients with PD (n = 123) underwent extensive neuropsychological testing at baseline and after 3 (n = 93) and 5 years (n = 59). Two neuropsychologists independently applied the PD-MCI criteria to examine the interrater and intrarater reliability. Results: At baseline, 35% of patients had PD-MCI. Three years later, 53% of the patients had PD-MCI. At 5-year follow-up, 20 patients who had PD-MCI at an earlier assessment had converted to PD dementia and 50% of the remaining patients without dementia had MCI. The interrater reliability (kappa) was 0.91. The intrarater reliabilities were 0.85 and 0.96. Conclusion: Approximately one-third of patients with newly diagnosed PD fulfill the consensus criteria for PD-MCI; after 5 years, this proportion is approximately 50% of patients without dementia. The criteria have good interrater and intrarater reliability.

179 citations


Journal ArticleDOI
TL;DR: The Mini-BESTest is a reliable and valid tool for evaluating balance in people with chronic stroke and its accuracy in categorizing people with stroke based on fall history is found to be accurate.
Abstract: Background The Mini-Balance Evaluation Systems Test (Mini-BESTest) is a new balance assessment, but its psychometric properties have not been specifically tested in individuals with stroke. Objectives The purpose of this study was to examine the reliability and validity of the Mini-BESTest and its accuracy in categorizing people with stroke based on fall history. Design An observational measurement study with a test-retest design was conducted. Methods One hundred six people with chronic stroke were recruited. Intrarater reliability was evaluated by repeating the Mini-BESTest within 10 days by the same rater. The Mini-BESTest was administered by 2 independent raters to establish interrater reliability. Validity was assessed by correlating Mini-BESTest scores with scores of other balance measures (Berg Balance Scale, one-leg-standing, Functional Reach Test, and Timed “Up & Go” Test) in the stroke group and by comparing Mini-BESTest scores between the stroke group and 48 control participants, and between fallers (≥1 falls in the previous 12 months, n=25) and nonfallers (n=81) in the stroke group. Results The Mini-BESTest had excellent internal consistency (Cronbach alpha=.89–.94), intrarater reliability (intraclass correlation coefficient [3,1]=.97), and interrater reliability (intraclass correlation coefficient [2,1]=.96). The minimal detectable change at 95% confidence interval was 3.0 points. The Mini-BESTest was strongly correlated with other balance measures. Significant differences in Mini-BESTest total scores were found between the stroke and control groups and between fallers and nonfallers in the stroke group. In terms of floor and ceiling effects, the Mini-BESTest was significantly less skewed than other balance measures, except for one-leg-standing on the nonparetic side. The Berg Balance Scale showed significantly better ability to identify fallers (positive likelihood ratio=2.6) than the Mini-BESTest (positive likelihood ratio=1.8). Limitations The results are generalizable only to people with mild to moderate chronic stroke. Conclusions The Mini-BESTest is a reliable and valid tool for evaluating balance in people with chronic stroke.

178 citations


Journal ArticleDOI
TL;DR: The Berg Balance Scale has acceptable reliability, although it might not detect modest, clinically important changes in balance in individual subjects, and was only able to comment on the absolute reliability of the Bergbalance Scale among people with moderately poor to normal balance.

157 citations


Journal ArticleDOI
TL;DR: The results showed that the F MS could be consistently scored by people with varying degrees of experience with the FMS after a 2-hour training session, with similar ICC regardless of education or previous experience with FMS.
Abstract: The purpose of this study was to investigate interrater and intrarater reliability of the Functional Movement Screen (FMS) with real-time administration with raters of different educational background and experience. The FMS was assessed with real-time administration in healthy injury-free men and women and included a certified FMS rater for comparison with other raters. A relatively new tool, the FMS, was developed to screen 7 individual movement patterns to classify subjects' injury risk. Previous reliability studies have been published with only one investigating intrarater reliability. These studies had limitations in study design and clinical applicability such as the use of only video to rate or the use of raters without comparison to a certified FMS rater. Raters (n = 4) with varying degrees of FMS experience and educational levels underwent a 2-hour FMS training session. Subjects (n = 19) were rated during 2 sessions, 1 week apart, using standard FMS protocol and equipment. Interrater reliability was good for session 1 (intraclass correlation coefficient [ICC] = 0.89) and for session 2 (ICC = 0.87). The individual FMS movements showed hurdle step as the least reliable (ICC = 0.30 for session 1 and 0.35 for session 2), whereas the most reliable was shoulder mobility (ICC = 0.98 for session 1 and 0.96 for session 2). Intrarater reliability was good for all raters (ICC = 0.81-0.91), with similar ICC regardless of education or previous experience with FMS. The results showed that the FMS could be consistently scored by people with varying degrees of experience with the FMS after a 2-hour training session. Intrarater reliability was not increased with FMS certification.

142 citations


Journal ArticleDOI
TL;DR: Osteoporosis is a significant risk factor for scapular fractures after reverse shoulder arthroplasty and the current classification has only moderate reliability, suggesting that an alternative classification method is needed.

128 citations


Journal ArticleDOI
TL;DR: This study indicates that intrarater reliability is strong and seems to strengthen when the individuals have experience using the FMS in addition to clinical experience.
Abstract: The Functional Movement Screen (FMS) is a tool that quantifies movement patterns as a way to detect performance asymmetries. Although previous study has investigated the reliability of FMS, no current research has examined intrarater reliability or how clinical experience plays a role in the reliability of this tool. In this controlled laboratory study design, repeated measures were used to investigate how experience using the FMS and clinical experience as an athletic trainer (AT) affects the intrarater reliability of FMS testing. Before the data collection, 3 individuals recruited from the university community provided signed informed consent to serve as videotaped models performing the FMS test. The participants (raters) in the study, with different levels of FMS and clinical experience, viewed each of the 3 videotaped models and rated the video models on each exercise of the FMS according to the script that was presented by one of the study investigators. A week later, the participants watched the same videos again, in a different randomized order, and rated each video model on each exercise. After the scores from the participants were collected from both sessions, the intersession scores of the FMS were examined to establish intrarater reliability of all the participants. Additionally, the intrarater reliability of different groups of clinicians and students was compared to make inferences about the influence of clinical experience as an AT along with previous experience using the FMS. The ATs with at least 6 months of experience using the FMS (ATExp group) had the strongest intrarater reliability [intraclass correlation coefficients, ICC (2,1): 0.946], followed by the AT group with moderate reliability [ICC (2,1): 0.771]. This study indicates that intrarater reliability is strong and seems to strengthen when the individuals have experience using the FMS in addition to clinical experience.

127 citations


Journal ArticleDOI
TL;DR: Both the reliability and the responsiveness of the GMFM-88 are reasonable for measuring gross motor function in children with CP.
Abstract: Background The Gross Motor Function Measure (GMFM-88) is commonly used in the evaluation of gross motor function in children with cerebral palsy (CP). The relative reliability of GMFM-88 has been assessed in children with CP. However, little information is available regarding the absolute reliability or responsiveness of GMFM-88. Objective The purpose of this study was to determine the absolute and relative reliability and the responsiveness of the GMFM-88 in evaluating gross motor function in children with CP. Design A clinical measurement design was used. Methods Ten raters scored the GMFM-88 in 84 children (mean age=3.7 years, SD=1.9, range=10 months to 9 years 9 months) from video records across all Gross Motor Function Classification System (GMFCS) levels to establish interrater reliability. Two raters participated to assess intrarater reliability. Responsiveness was determined from 3 additional assessments after the baseline assessment. The interrater and intrarater intraclass correlation coefficients (ICCs) with 95% confidence intervals, standard error of measurement (SEM), smallest real difference (SRD), effect size (ES), and standardized response mean (SRM) were calculated. Results The relative reliability of the GMFM was excellent (ICCs=.952–1.000). The SEM and SRD for total score of the GMFM were acceptable (1.60 and 3.14, respectively). Additionally, the ES and SRM of the dimension goal scores increased gradually in the 3 follow-up assessments (GMFCS levels I and II: ES=0.5, 0.6, and 0.8 and SRM=1.3, 1.8, and 2.0; GMFCS levels III–V: ES=0.4, 0.7, and 0.9 and SRM=1.5, 1.7, and 2.0). Limitations Children over 10 years of age with CP were not included in this study, so the results should not be generalized to all children with CP. Conclusions Both the reliability and the responsiveness of the GMFM-88 are reasonable for measuring gross motor function in children with CP.

123 citations


Journal ArticleDOI
TL;DR: The proposed cervical spine osteotomy nomenclature provides the surgeon with a simple, standard description of the various cervical osteotomies and the reliability analysis demonstrated that this system is consistent and directly applicable.
Abstract: Object Cervical spine osteotomies are powerful techniques to correct rigid cervical spine deformity. Many variations exist, however, and there is no current standardized system with which to describe and classify cervical osteotomies. This complicates the ability to compare outcomes across procedures and studies. The authors' objective was to establish a universal nomenclature for cervical spine osteotomies to provide a common language among spine surgeons. Methods A proposed nomenclature with 7 anatomical grades of increasing extent of bone/soft tissue resection and destabilization was designed. The highest grade of resection is termed the major osteotomy, and an approach modifier is used to denote the surgical approach(es), including anterior (A), posterior (P), anterior-posterior (AP), posterior-anterior (PA), anterior-posterior-anterior (APA), and posterior-anterior-posterior (PAP). For cases in which multiple grades of osteotomies were performed, the highest grade is termed the major osteotomy, and lower-grade osteotomies are termed minor osteotomies. The nomenclature was evaluated by 11 reviewers through 25 different radiographic clinical cases. The review was performed twice, separated by a minimum 1-week interval. Reliability was assessed using Fleiss kappa coefficients. Results The average intrarater reliability was classified as "almost perfect agreement" for the major osteotomy (0.89 [range 0.60-1.00]) and approach modifier (0.99 [0.95-1.00]); it was classified as "moderate agreement" for the minor osteotomy (0.73 [range 0.41-1.00]). The average interrater reliability for the 2 readings was the following: major osteotomy, 0.87 ("almost perfect agreement"); approach modifier, 0.99 ("almost perfect agreement"); and minor osteotomy, 0.55 ("moderate agreement"). Analysis of only major osteotomy plus approach modifier yielded a classification that was "almost perfect" with an average intrarater reliability of 0.90 (0.63-1.00) and an interrater reliability of 0.88 and 0.86 for the two reviews. Conclusions The proposed cervical spine osteotomy nomenclature provides the surgeon with a simple, standard description of the various cervical osteotomies. The reliability analysis demonstrated that this system is consistent and directly applicable. Future work will evaluate the relationship between this system and health-related quality of life metrics.

79 citations


Book ChapterDOI
06 May 2013
TL;DR: The ARL-unbiased c°chart tackles the curse of the null LCL and detects decreases in ∏ in a timely fashion, by relying on the randomization probabilities.
Abstract: (b) Discuss the advantages of the ARL-unbiased c°chart, with in-control ARL equal to 370.4, when (1.0) compared to the c°chart with 3–sigma control limits. • Advantages of the ARL-unbiased c° chart... [Following the slides (2018-04-04-Slides-CandS2 charts.pdf, p. 16), we can add that...] As opposed to the c-chart with 3–sigma control limits: – the ARL-unbiased c-chart can take a pre-specified in-control ARL, in this case 370.4; – the associated ARL curve attains a maximum when ∏ is on target; – the ARL-unbiased c-chart tackles the curse of the null LCL and detects decreases in ∏ in a timely fashion, by relying on the randomization probabilities.

Journal ArticleDOI
TL;DR: The finding suggests the current AKE test showed excellent interrater and intrarater reliability for assessing hamstring flexibility in healthy adults.
Abstract: [Purpose] The purpose of this study was to determine the reliability of the active knee extension (AKE) test among healthy adults. [Subjects] Fourteen healthy participants (10 men and 4 women) volunteered and gave informed consent. [Methods] Two raters conducted AKE tests independently with the aid of a simple and inexpensive stabilizing apparatus. Each knee was measured twice, and the AKE test was repeated one week later. [Results] The interrater reliability intraclass correlation coefficients (ICC2,1) were 0.87 for the dominant knee and 0.81 for the nondominant knee. In addition, the intrarater (test-retest) reliability ICC3,1 values range between 0.78-0.97 and 0.75-0.84 for raters 1 and 2 respectively. The percentages of agreement within 10° for AKE measurements were 93% for the dominant knee and 79% for the nondominant knee. [Conclusion] The finding suggests the current AKE test showed excellent interrater and intrarater reliability for assessing hamstring flexibility in healthy adults.

Journal ArticleDOI
TL;DR: The MTS is reliable for assessing spasticity in most lower limb muscles of adults with chronic neurologic injuries and Repeated MTS measurements of spasticITY are best based on R1 measurements rather than spasticities angle or qualitative ratings of spasticsity.

Journal ArticleDOI
TL;DR: The results of this study provide an indication that the ICF categories could be used as components of rehabilitation outcome measures.
Abstract: Purpose: The categories of the International Classification of Functioning , Disability and Health (ICF) could potentially be used as components of outcome measures. Literature demonstrating the psychometric properties of ICF categories is limited. Objective: Determine the agreement and reliability of ICF activities of daily living category scores and compare these to agreement and reliability of the Functional Independence Measure (FIM) item scores. Method: Two investigators independently reviewed the clinical notes to score the ICF activities of daily living cate gories, of 100 patients using ICF qualifiers with additional scor ing guidelines. The percentage agreement, interrater and intrarater reliability were compared with the matched FIM items scored by a separate set of two investigators using the same methodology. Kappa Statistic was calculated using Med Calc. Results: ICF interrater reliability as indicated by Kappa values ranging from 0.42 to 0.81 was moderate or better for the eleven self care and mobility categories. The language ICF categories and problem solving generally have fair agreement, with Kappa values ranging from 0.21 for receiving verbal messages to 0.44 for basic social interactions. Absolute agreement was above 72% for all categories. Reliability and agreement of the FIM items was generally lower than the corresponding ICF categories. Conclusion: The inter-rater and intra-rater reliability and agreement of the ICF activities of daily living categories were comparable or better than the corresponding FIM items. The results of this study provide an indication that the ICF categories could be used as components of rehabilitation outcome measures.

Journal ArticleDOI
TL;DR: With minimal training, some surgeons can obtain valid and reliable measurements of knee osteoarthritis status in individuals who eventually undergo total knee arthroplasty and some surgeons require additional training to become proficient in the radiographic classification systems.
Abstract: Full article available online at Healio.com/Orthopedics. Search: 20121217-14 Most orthopedic surgeons do not routinely use radiographic classification systems to grade the extent of joint space narrowing in patients considered for total knee arthroplasty. The authors compared the validity and reliability of radiographic measures of tibiofemoral osteoarthritis by 2 experienced and 2 inexperienced orthopedic surgeons on individu- als who subsequently underwent total knee arthroplasty. The Kellgren-Lawrence and the Osteoarthritis Research Society International classification systems were used by all sur - geons to score the radiographs in 116 individuals in the Osteoarthritis Initiative, a federally funded cohort study of individuals with or at risk of knee osteoarthritis. Validity was judged based on comparison with the criterion centrally adjudicated consensus measures obtained by Osteoarthritis Initiative investigators. Weighted kappa, a chance corrected agreement index, was used to describe validity and reliability. Validity and intrarater reliability were substantial to almost perfect for 1 experienced and 1 inexperienced surgeon, with weighted kappas ranging from 0.76 to 0.96 for the surgical knees. The other experienced and inex- perienced surgeons demonstrated moderate to substantial validity, with weighted kappas ranging from 0.43 to 0.70 and lower intrarater reliability. Interrater reliability was generally less than intrarater reliability. With minimal training, some surgeons can obtain valid and reliable measurements of knee osteoarthritis status in individuals who eventually undergo total knee arthroplasty. Measurement quality does not appear to be dependent on extent of surgeon experience. Some surgeons require additional training to become proficient in the radiographic classification systems, and future research should examine this issue.

Journal ArticleDOI
TL;DR: The standardized clinical tests exhibited moderate to substantial reliability in patients with axial neck pain referred for diagnostic facet joint blocks, and the incorporation of these tests into a clinical prediction model to screen patients before referral for diagnostic facets joint blocks is justified.

Journal ArticleDOI
TL;DR: The EFIP is a reliable and valid instrument to evaluate the effect of physical activity on frailty in research and in clinical practice and will be assessed or reassessed in a larger study population.
Abstract: Background Physical activity is assumed to be important in the prevention and treatment of frailty. It is unclear, however, to what extent frailty can be influenced because instruments designed to assess frailty have not been validated as evaluative outcome instruments in clinical practice. Objectives The aims of this study were: (1) to develop a frailty index (ie, the Evaluative Frailty Index for Physical Activity [EFIP]) based on the method of deficit accumulation and (2) to test the clinimetric properties of the EFIP. Design The content of the EFIP was determined using a written Delphi procedure. Intrarater reliability, interrater reliability, and construct validity were determined in an observational study (n=24). Method Intrarater reliability and interrater reliability were calculated using Cohen kappa and intraclass correlation coefficients (ICCs). Construct validity was determined by correlating the score on the EFIP with those on the Timed “Up & Go” Test (TUG), the Performance-Oriented Mobility Assessment (POMA), and the Cumulative Illness Rating Scale for Geriatrics (CIRS-G). Results Fifty items were included in the EFIP. Interrater reliability (Cohen kappa=0.72, ICC=.96) and intrarater reliability (Cohen kappa=0.77 and 0.80, ICC=.93 and .98) were good. As expected, a fair to moderate correlation with the TUG, POMA, and CIRS-G was found (.61, −.70, and .66, respectively). Limitations Reliability and validity of the EFIP have been tested in a small sample. These and other clinimetric properties, such as responsiveness, will be assessed or reassessed in a larger study population. Conclusion The EFIP is a reliable and valid instrument to evaluate the effect of physical activity on frailty in research and in clinical practice.

Journal ArticleDOI
01 May 2013
TL;DR: The proposed grading system is shown to have good interrater and intrarater reliability and provides a reliable instrument for assessing lateral wall insufficiency.
Abstract: This study was designed to validate a grading scheme for lateral nasal wall insufficiency with interrater and intrarater reliability measures. Representative endoscopic videos depicting varied degrees of lateral nasal wall insufficiency were collated into a 30-clip video (15 clips in duplicate). This was rated by five reviewers for a total of 150 observations. Interrater and intrarater reliability were determined using Fleiss kappa and intraclass correlation coefficient (ICC) statistics, respectively. Good agreement was established between reviewers (interrater reliability), with a Fleiss kappa of 0.7733 (p < 0.01). Analysis of intrarater variability with the ICC revealed a very strong agreement (ICC = 0.88; p < 0.01). The proposed grading system is shown to have good interrater and intrarater reliability. It provides a reliable instrument for assessing lateral wall insufficiency.

Journal ArticleDOI
TL;DR: Overall, video RUSI is a reliable surrogate for static R USI for multifidus muscle measurements and has the additional advantage of requiring shorter data collection time.
Abstract: Study Design Reliability study. Objectives To compare the within- and between-day intrarater reliability of rehabilitative ultrasound imaging (RUSI) using static images (static RUSI) and video clips (video RUSI) to quantify multifidus muscle thickness at rest and while contracted. Secondary objectives were to compare the measurement precision of averaging multiple measures and to estimate reliability in individuals with and without low back pain (LBP). Background Although intrarater reliability of static RUSI in measuring multifidus thickness has been established, using video RUSI may improve reliability estimates, as it allows examiners to select the optimal image from a video clip. Further, multiple measurements and LBP status may affect RUSI reliability estimates. Methods Static RUSI and video RUSI were used to quantify multifidus muscle thickness at rest and during contraction and percent thickness change in 27 volunteers (13 without LBP and 14 with LBP). Three static RUSI images and 3 video RUSI vide...

Journal ArticleDOI
TL;DR: The reliability of the Modified Tardieu Scale in the measurement of ankle plantarflexor spasticity in adult patients after stroke was insufficient for routine use in clinical settings and research.
Abstract: Primary objective: To evaluate the reliability of the Modified Tardieu Scale (MTS) in the measurement of ankle plantarflexor spasticity in patients after stroke.Research design: Inter- and intra-rater reliability study.Interventions: Not applicable.Methods and procedures: Adult patients after stroke participated. Patients were tested by two raters for inter-rater reliability. Patients were re-tested by one rater at least 1 week later for intra-rater reliability. The plantarflexors on the hemiparetic side were tested.Main outcomes and results: The ICCs of inter and intra-rater reliability across all components of MTS were moderate and moderately high (range 0.40–0.71). Inter- and intra-rater reliability for the dynamic component of spasticity (R2–R1) were moderate (ICC = 0.57 and 0.40, respectively). The difference between the two raters for R2 was statistically significant (p = 0.001).Conclusions: The reliability of the Modified Tardieu Scale in the measurement of ankle plantarflexor spasticity in adult p...

Journal ArticleDOI
01 Feb 2013-Pm&r
TL;DR: To assess the reliability of ultrasound (US) measures of the transversus abdominis (TrA) muscle in a sample of subjects with and without specific chronic low back pain and to test whether reliability is enhanced by using different abdominal muscle activation tasks, with use of a foam cube for US transducer stabilization or by averaging 3 measures on the same image.
Abstract: Objective To assess the reliability of ultrasound (US) measures of the transversus abdominis (TrA) muscle in a sample of subjects with and without specific chronic low back pain and to test whether reliability is enhanced by using different abdominal muscle activation tasks, with use of a foam cube for US transducer stabilization or by averaging 3 measures on the same image. Design Cross-sectional repeated-measures design. Setting Laboratory setting. Patients Fifteen subjects with chronic low back pain and 15 control subjects. Methods Subjects (n = 30) performed 3 tasks in the supine position: (1) contralateral straight leg raise (SLR), (2) bilateral hook-lying leg raising (HLR), and (3) abdominal drawing-in maneuver (ADIM) (control subjects only). Two 7-second videos of the right and left abdominal wall (from rest to contraction) were collected, with and without use of the foam cube. One of the 2 raters repeated the testing 7 to 14 days later to assess intrarater reliability. Main outcome measurements US imaging of abdominal muscles thickness. Results The TrA muscle was recruited preferentially in the ADIM task compared with the automatic tasks (SLR and HLR). The reliability was comparable among the 3 tasks, with intrarater reliability results being better than interrater reliability results. The use of the foam cube or averaging measures on the same image was generally not effective to increase reliability. Conclusions Although they are not as preferential in TrA recruitment as the ADIM, the SLR and HLR tasks showed comparable reliability results. The foam cube used to control transducer orientation and pressure and averaging measures on the same image had limited effect on reliability.

Journal ArticleDOI
TL;DR: Several novel uncertainty propagation methods and reliability models, which are the basis of the reliability assessment, are given and recent developments on reliability evaluation and sensitivity analysis are highlighted.
Abstract: We review recent research activities on structural reliability analysis, reliability-based design optimization (RBDO) and applications in complex engineering structural design. Several novel uncertainty propagation methods and reliability models, which are the basis of the reliability assessment, are given. In addition, recent developments on reliability evaluation and sensitivity analysis are highlighted as well as implementation strategies for RBDO.


Journal ArticleDOI
01 Aug 2013-Pm&r
TL;DR: To assess the intra‐ and inter‐rater reliability of different ultrasound measures of the lumbar multifidus muscle in subjects with and without chronic low back pain, 3 different ways to enhance reliability are tested by testing different tasks, using a template, and averaging trials within or between days.
Abstract: Objective To (1) assess the intra- and inter-rater reliability of different ultrasound (US) measures of the lumbar multifidus muscle in subjects with and without chronic low back pain and (2) test 3 different ways to enhance reliability, that is, by testing different tasks, using a template, and averaging trials within or between days. Design Cross-sectional repeated-measures design. Setting Laboratory setting. Patients Fifteen subjects with chronic low back pain and 15 control subjects. Methods Subjects (n = 30) performed contralateral arm lifting and contralateral leg lifting while in the prone position. Two 7-second videos of the lumbar multifidus (from rest to contraction) were collected with and without a template (transparency) to reposition the transducer on the skin. One of the two raters repeated the testing 7 to 14 days later to assess intrarater reliability in addition to inter-rater reliability. Reliability was assessed with the generalizability theory as a framework. Main outcome measurements US imaging measures of the lumbar multifidus thickness were obtained in patients at rest and during standardized contractions (hereafter called primary measures) at 2 vertebral levels and on both sides. These primary measures were used to calculate different, potentially useful US parameters (hereafter called derived measures). Results Intrarater reliability was better than inter-rater reliability, and primary measures were more reliable than derived measures. The tasks investigated showed comparable reliability results, and the use of the transducer position template was not effective to increase reliability. Averaging the measures of 3 images increased reliability substantially. Conclusions Optimal reliability requires the use of a single rater and the averaging of at least 3 images per visit. In these conditions, primary measures reach acceptable levels of reliability, which was more difficult to achieve for most derived measures. Arm or leg lifting tasks showed similar reliability, and thus the arm-lifting task is recommended for comparisons with previous studies. The use of a transducer position template is not recommended.

Journal ArticleDOI
15 Nov 2013-Spine
TL;DR: The SedSign was shown to have high intrarater reliability and acceptable inter-rater reliability, yet may not aid in the differential diagnosis of LSS from LBP or vascular claudication, or add any specific diagnostic information beyond the traditional history, physical examination, and imaging studies that are standard in LSS diagnosis.
Abstract: Study Design. Retrospective review of magnetic resonance images. Objective. Examine the diagnostic accuracy, discriminative ability, and reliability of the sedimentation sign in a sample of patients with clinically diagnosed lumbar spinal stenosis (LSS), low back pain (LBP), and vascular claudication, and in asymptomatic controls. Summary of Background Data. The nerve root sedimentation sign (SedSign) was recently described as a new diagnostic test for LSS; however, the degree to which this sign is sensitive and specific in diagnosis of LSS is unknown. Methods. All LSS images were obtained from subjects who had clinically diagnosed LSS confirmed on imaging by a spine specialist. The other images were obtained from people with LBP but no LSS, people with severe vascular claudication, and asymptomatic participants. Three blinded raters independently assessed the images. A positive sign was defined as the absence of nerve root sedimentation at the level above or below the level of maximum stenosis. Results. Images from 148 subjects were reviewed (67 LSS, 31 LBP, 4 vascular, and 46 asymptomatic). Intrarater reliability for the sign ranged from κ= 0.87 to 0.97 and inter-rater reliability from 0.62 to 0.69. Sensitivity ranged from 42% to 66%, and specificity ranged from 49% to 78%. Sensitivity improved to a range of 60% to 96% when images with only a smallest cross-sectional area of the dural sac less than 80 mm2 were included. The sign was able to differentiate (P = 0.004) between LSS and asymptomatic controls but not between LSS and LBP or between LSS and vascular claudication. Conclusion. The SedSign was shown to have high intrarater reliability and acceptable inter-rater reliability. The Sign appears most sensitive in defining severe LSS cases, yet may not aid in the differential diagnosis of LSS from LBP or vascular claudication, or add any specific diagnostic information beyond the traditional history, physical examination, and imaging studies that are standard in LSS diagnosis. Conclusion. Level of Evidence: 4

Journal ArticleDOI
TL;DR: Investigation of TJA interrater and intrarater reliability with raters of different educational and clinical backgrounds found that published protocol and training of raters were insufficient to allow consistent TJA scoring.
Abstract: Objective. The Tuck Jump Assessment (TJA), a clinical plyometric assessment, identifies 10 jumping and landing technique flaws. The study objective was to investigate TJA interrater and intrarater reliability with raters of different educational and clinical backgrounds. Methods. 40 participants were video recorded performing the TJA using published protocol and instructions. Five raters of varied educational and clinical backgrounds scored the TJA. Each score of the 10 technique flaws was summed for the total TJA score. Approximately one month later, 3 raters scored the videos again. Intraclass correlation coefficients determined interrater (5 and 3 raters for first and second session, resp.) and intrarater (3 raters) reliability. Results. Interrater reliability with 5 raters was poor (ICC = 0.47; 95% confidence intervals (CI) 0.33–0.62). Interrater reliability between 3 raters who completed 2 scoring sessions improved from 0.52 (95% CI 0.35–0.68) for session one to 0.69 (95% CI 0.55–0.81) for session two. Intrarater reliability was poor to moderate, ranging from 0.44 (95% CI 0.22–0.68) to 0.72 (95% CI 0.55–0.84). Conclusion. Published protocol and training of raters were insufficient to allow consistent TJA scoring. There may be a learned effect with the TJA since interrater reliability improved with repetition. TJA instructions and training should be modified and enhanced before clinical implementation.

Journal ArticleDOI
TL;DR: There is only fair agreement among observers when measuring the shortening of clavicles fractures in adolescents on digital clavicle radiographs by either method described, and a lack of standardization of measurement in previous studies onClavicle fracture treatment may not represent a significant problem.
Abstract: Background A relative indication for surgical treatment of midshaft clavicle fractures is shortening ≥2.0 cm. A standard method for determining shortening with routine clavicle radiographs has not been established. This study evaluated the interobserver and intraobserver reliability when measuring shortening of midshaft clavicle fractures in adolescents. Methods We identified all clavicle radiographs of simple midshaft clavicle fractures in adolescents from 2006 to 2010. Thirty-two radiographs were chosen following a power analysis for 7 observers. Each film was measured twice by each evaluator using 2 separate methods. Method 1 was the evaluator's method of choice to determine shortening on the digital radiographs. Method 2 was standardized. Intraclass correlation coefficient and confidence intervals (CI) were calculated to determine interrater reliability, and average differences between the 2 time points with 95% CI were calculated to determine intrarater reliability. Results Interrater reliability for method 1 was 0.771 (95% CI, 0.655-0.865) and 0.743 (95% CI, 0.604-0.851) at the 2 time points for fair agreement. Interrater reliability for method 2 was 0.741 (95% CI, 0.629-0.842) and 0.685 (95% CI, 0.554-0.805) at the 2 time points, for fair and poor agreement, respectively. Neither method was statistically superior to the other. For method 1, the SD for the measurements averaged 3.1 mm. For method 2, the average SD was 3.0 mm. Intrarater reliability for method 1 was 2.62 mm average difference between the 2 time points (95% CI, 2.24-3.00), and for method 2 it was 3.34 mm average (95% CI, 2.88-3.80). Method 2 had a significantly greater difference at the 2 time points than method 1 (P=0.027). Conclusions There is only fair agreement among observers when measuring the shortening of clavicle fractures in adolescents on digital clavicle radiographs by either method described. However, as the average difference among measurers was only 3 mm, this is unlikely to influence clinical decision making. A lack of standardization of measurement in previous studies on clavicle fracture treatment may not represent a significant problem. Level of evidence Level III diagnostic study.

Journal ArticleDOI
TL;DR: Given the increasing number of animal studies using videofluoroscopy to study dysphagia, this scale provides a valid and reliable measure of airway protection during swallowing in infant pigs that will give these animal models increased translational significance.
Abstract: A penetration–aspiration scale exists for assessing airway protection in adult videofluoroscopy and fiberoptic endoscopic swallowing studies; however, no such scale exists for animal models. The aim of this study was threefold: (1) develop a penetration–aspiration scale (PAS) for infant mammals, (2) test the scale’s intra- and interrater reliabilities, and (3) validate the use of the scale for distinguishing between abnormal and normal animals. After discussion and reviewing many videos, the result was a 7-point infant mammal PAS. Reliability was tested by having five judges score 90 swallows recorded with videofluoroscopy across two time points. In these videos, the frame rate was either 30 or 60 frames per second and the animals were either normal, had a unilateral superior laryngeal nerve (SLN) lesion, or had hard palate local anesthesia. The scale was validated by having one judge score videos of both normal and SLN lesioned pigs and testing the difference using a t test. Raters had a high intrarater reliability [average κ = 0.82, intraclass correlation coefficient (ICC) = 0.92] and high interrater reliability (average κ = 0.68, ICC = 0.66). There was a significant difference in reliability for videos captured at 30 and 60 frames per second for scores of 3 and 7 (P < 0.001). The scale was also validated for distinguishing between normal and abnormal pigs (P < 0.001). Given the increasing number of animal studies using videofluoroscopy to study dysphagia, this scale provides a valid and reliable measure of airway protection during swallowing in infant pigs that will give these animal models increased translational significance.

BookDOI
01 Jan 2013
TL;DR: The book presents an early software reliability prediction model that will help to grow the reliability of the software systems by monitoring it in each development phase, i.e. from requirement phase to testing phase.
Abstract: The development of software system with acceptable level of reliability and quality within available time frame and budget becomes a challenging objective. This objective could be achieved to some extent through early prediction of number of faults present in the software, which reduces the cost of development as it provides an opportunity to make early corrections during development process. The book presents an early software reliability prediction model that will help to grow the reliability of the software systems by monitoring it in each development phase, i.e. from requirement phase to testing phase. Different approaches are discussed in this book to tackle this challenging issue. An important approach presented in this book is a model to classify the modules into two categories (a) fault-prone and (b) not fault-prone. The methods presented in this book for assessing expected number of faults present in the software, assessing expected number of faults present at the end of each phase and classification of software modules in fault-prone or no fault-prone category are easy to understand, develop and use for any practitioner. The practitioners are expected to gain more information about their development process and product reliability, which can help to optimize the resources used.

Journal ArticleDOI
TL;DR: The TIMPSI can reliably be used to assess motor function in infants with type I SMA and is related to the ability to reach, an important functional skill in children with type II spinal muscular atrophy.
Abstract: Purpose This study examined the reliability and validity of the Test of Infant Motor Performance Screening Items (TIMPSI) in infants with type I spinal muscular atrophy (SMA). Methods After training, 12 evaluators scored 4 videos of infants with type I SMA to assess interrater reliability. Intrarater and test-retest reliability was further assessed for 9 evaluators during a SMA type I clinical trial, with 9 evaluators testing a total of 38 infants twice. Relatedness of the TIMPSI score to ability to reach and ventilatory support was also examined. Results Excellent interrater video score reliability was noted (intraclass correlation coefficient, 0.97-0.98). Intrarater reliability was excellent (intraclass correlation coefficient, 0.91-0.98) and test-retest reliability ranged from r = 0.82 to r = 0.95. The TIMPSI score was related to the ability to reach (P ≤ .05). Conclusion The TIMPSI can reliably be used to assess motor function in infants with type I SMA. In addition, the TIMPSI scores are related to the ability to reach, an important functional skill in children with type I SMA.