scispace - formally typeset
Search or ask a question

Showing papers on "Intra-rater reliability published in 2017"


Journal ArticleDOI
TL;DR: The FMS has excellent interrater and intrarater reliability and validity and has predictive value for musculoskeletal injuries, demonstrating the injury predictive value of the test.
Abstract: Background:The Functional Movement Screen (FMS) is utilized by professional and collegiate sports teams and the military for the prevention of musculoskeletal injuries.Hypothesis:The FMS demonstrates good interrater and intrarater reliability and validity and has predictive value for musculoskeletal injuries.Study Design:Systematic review and meta-analysis.Methods:A systematic review and meta-analysis were conducted using a computerized search of the electronic databases MEDLINE and ScienceDirect in adherence with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Extracted relevant data from each included study were recorded on a standardized form. The Cochran Q statistic was utilized to evaluate study heterogeneity. Pooled quantitative synthesis was performed to measure the intraclass correlation coefficient (ICC) for interrater and intrarater reliability, along with 95% CIs, and odds ratios with 95% CIs for the injury predictive value for a score of ≤14.Results:...

195 citations


Journal ArticleDOI
TL;DR: This paper aims to increase insight into reliability studies by pointing to the assumptions of reliability coefficient, similarities between various coefficients, and the subsequent new applications of reliability coefficients.

120 citations


Journal ArticleDOI
TL;DR: In this paper, the authors outline important factors to consider in test-retest reliability analyses, common errors, and some initial methods for conducting and reporting reliability analyses to avoid such errors.
Abstract: Psychological research and clinical practice rely heavily on psychometric testing for measuring psychological constructs that represent symptoms of psychopathology, individual difference characteristics, or cognitive profiles Test-retest reliability assessment is crucial in the development of psychometric tools, helping to ensure that measurement variation is due to replicable differences between people regardless of time, target behavior, or user profile While psychological studies testing the reliability of measurement tools are pervasive in the literature, many still discuss and assess this form of reliability inappropriately with regard to the specified aims of the study or the intended use of the tool The current paper outlines important factors to consider in test-retest reliability analyses, common errors, and some initial methods for conducting and reporting reliability analyses to avoid such errors The paper aims to highlight a persistently problematic area in psychological assessme

64 citations


Journal ArticleDOI
TL;DR: The results for visual shade matching exhibited a high to moderate level of inconsistency for both intra-rater and inter- rater comparisons, while the VITA Easyshade Advance intraoral spectrophotometer exhibited significantly better reliability compared with visual shade selection.
Abstract: Objective The aim of this investigation was to evaluate intrarater and inter-rater reliability of visual and instrumental shade matching. Materials and Methods Forty individuals with normal color perception participated in this study. The right maxillary central incisor of a teaching model was prepared and restored with 10 feldspathic all-ceramic crowns of different shades. A shade matching session consisted of the observer (rater) visually selecting the best match by using VITA classical A1-D4 (VC) and VITA Toothguide 3D Master (3D) shade guides and the VITA Easyshade Advance intraoral spectrophotometer (ES) to obtain both VC and 3D matches. Three shade matching sessions were held with 4 to 6 weeks between sessions. Intrarater reliability was assessed based on the percentage of agreement for the three sessions for the same observer, whereas the inter-rater reliability was calculated as mean percentage of agreement between different observers. The Fleiss' Kappa statistical analysis was used to evaluate visual inter-rater reliability. Results The mean intrarater reliability for the visual shade selection was 64(11) for VC and 48(10) for 3D. The corresponding ES values were 96(4) for both VC and 3D. The percentages of observers who matched the same shade with VC and 3D were 55(10) and 43(12), respectively, while corresponding ES values were 88(8) for VC and 92(4) for 3D. Conclusions The results for visual shade matching exhibited a high to moderate level of inconsistency for both intrarater and inter-rater comparisons. The VITA Easyshade Advance intraoral spectrophotometer exhibited significantly better reliability compared with visual shade selection. CLINICAL SIGNIFICANCE This study evaluates the ability of observers to consistently match the same shade visually and with a dental spectrophotometer in different sessions. The intrarater and inter-rater reliability (agreement of repeated shade matching) of visual and instrumental tooth color matching strongly suggest the use of color matching instruments as a supplementary tool in everyday dental practice to enhance the esthetic outcome.

62 citations


Journal ArticleDOI
TL;DR: The CHARTS instrument can be used to reliably and comprehensively map the anatomical location of spontaneous ICH, and may be helpful for studying important questions regarding causes, risk factors, prognosis, and for stratification in clinical trials.

59 citations


Journal ArticleDOI
TL;DR: This study establishes the use of a HHD with a portable stabilization device as demonstrating good reliability within testers for measuring lower extremity muscle performance in an active healthy population.
Abstract: Hand held dynamometry (HHD) is a more objective way to quantify muscle force production (MP) compared to traditional manual muscle testing. HHD reliability can be negatively impacted by both the strength of the tester and the subject particularly in the lower extremities due to larger muscle groups. The primary aim of this investigation was to assess intrarater reliability of HHD with use of a portable stabilization device for lower extremity MP in an athletic population. Isometric lower extremity strength was measured for bilateral lower extremities including hip abductors, external rotators, adductors, knee extensors, and ankle plantar flexors was measured in a sample of healthy recreational runners (8 male, 7 females, = 30 limbs) training for a marathon. These measurements were assessed using an intrasession intrarater reliability design. Intraclass correlation coefficients (ICC) were calculated using 3,1 model based on the single rater design. The standard error of measurement (SEM) for each muscle group was also calculated. ICC were excellent ranging from ICC (3,1) = 0.93–0.98 with standard error of measurements ranging from 0.58 to 17.2 N. This study establishes the use of a HHD with a portable stabilization device as demonstrating good reliability within testers for measuring lower extremity muscle performance in an active healthy population.

58 citations


Journal ArticleDOI
TL;DR: TGMD-3 raw scores of children with ASD were significantly lower than typically developing peers, however, significantly improved using the TGMD- 3 visual support protocol, demonstrating that the TG MD-3 visual support Protocol is a valid and reliable assessment of gross motor performance for children with autism spectrum disorder.
Abstract: The validity and reliability of the Test of Gross Motor Development-3 (TGMD-3) were measured, taking into consideration the preference for visual learning of children with autism spectrum disorder (ASD). The TGMD-3 was administered to 14 children with ASD (4-10 years) and 21 age-matched typically developing children under two conditions: TGMD-3 traditional protocol, and TGMD-3 visual support protocol. Excellent levels of internal consistency, test-retest, interrater and intrarater reliability were achieved for the TGMD-3 visual support protocol. TGMD-3 raw scores of children with ASD were significantly lower than typically developing peers, however, significantly improved using the TGMD-3 visual support protocol. This demonstrates that the TGMD-3 visual support protocol is a valid and reliable assessment of gross motor performance for children with ASD.

55 citations


Journal ArticleDOI
TL;DR: A reliability study to assess clinicians’ interrater and intrarater reliability, as well as the reliability of using high-quality macrophotographs of postoperative scars, finds the SCAR scale represents a reliable standard rating scale for postoperative scar cosmesis.
Abstract: Importance Until recently, no ideal valid, feasible, and reliable scar scale existed to effectively assess the quality of postoperative linear scars. The Scar Cosmesis Assessment and Rating (SCAR) scale was developed and validated as a tool to assess the quality of postoperative scars in clinical and research settings. Objective To assess the reliability of using photographs in lieu of live patient scar rating assessments, and to determine the interrater and intrarater reliability of the SCAR scale. Design, Setting, and Participants This was a reliability study to assess clinicians’ interrater and intrarater reliability, as well as the reliability of using high-quality macrophotographs of postoperative scars. Patients were from a private practice dermatology clinic, with assessed scars representing a range of surgical procedures including those performed by dermatologists, plastic surgeons, and general surgeons. Assessments were performed by an international multidisciplinary team from dermatology, plastic surgery, surgical oncology, emergency medicine, and physiatry, using photographs and live patient assessments. A single photograph was assessed for each patient’s scar. Data were obtained between August 3, 2015, and January 18, 2016. Data analysis occurred between January 18, 2016, and July 29, 2016. Using the intraclass correlation coefficient (ICC), the scale was tested for photographic equivalency as well as interrater reliability and intrarater reliability by 5 raters on a set of 80 total patient scars, 20 of which were analyzed for photographic equivalency and the remaining 60 of which were analyzed for interrater and intrarater reliability. Main Outcomes and Measures The SCAR scale that measures postoperative scar cosmesis, with scores ranging from 0 (best possible scar) to 15 (worst possible scar), based on 6 clinician and 2 patient items was used. Of those 60 in the photographic subgroup, 10 were rated using not only the SCAR scale but also the Patient and Observer Scar Assessment Scale and the Vancouver Scar Scale, and 10 were assessed twice by the same rater at different times to assess intrarater reliability. Results Patients’ ages ranged from 18 to 96 years, with Fitzpatrick skin types I through VI. Thirty-seven were male, and 43 were female. A set of 20 live patient scars with associated photographs, as well as a separate set of 60 photographs, were rated; 10 patients were assessed twice for intrarater reliability. The SCAR scale ratings using photographs were found to be largely equivalent to live patient assessments, with ICCs of 0.99 (95% CI, 0.96-0.99) and 0.98 (95% CI, 0.96-0.99). The interrater reliability of the overall scale showed an ICC of 0.95 (95% CI, 0.96-0.99) using a 2-sample random-effects model. Intrarater reliability found ICCs ranging from 0.96 to 0.99 with 5 separate raters. Modeling the overall SCAR score predicted whether the rater would consider the scar undesirable, with an odds ratio of association of 1.76 (95% CI, 1.24-2.2). A secondary analysis of Fitzpatrick skin types IV, VI, and VI demonstrated a sustained interrater reliability, with an ICC of 0.93 (95% CI, 0.86-0.98). Conclusions and Relevance The SCAR scale is a reliable rating scale for postoperative linear scars, and photographs may reliably be used in lieu of live patient assessments. The SCAR scale therefore represents a reliable standard rating scale for postoperative scar cosmesis.

49 citations


Journal ArticleDOI
TL;DR: The evaluation of shoulder proprioception is most reliable when using a passive protocol with an isokinetic dynamometer for internal rotation at 90° of shoulder abduction.

49 citations


Journal ArticleDOI
TL;DR: Three clinical tests could be identified as having an adequate interrater reliability and no conclusions could be made for intrarater reliability, but further research should focus on better study designs, provide an overall agreement for uniformity and interpretation of clinical tests, and should implement research regarding validity.

43 citations


Journal ArticleDOI
TL;DR: The findings suggest that the Knosp grading scale has acceptable interrater reliability overall, but raises important questions about the "very weak" reliability of the scale's middle grades.
Abstract: OBJECTIVE The goal of this study was to determine the interrater and intrarater reliability of the Knosp grading scale for predicting pituitary adenoma cavernous sinus (CS) involvement. METHODS Six independent raters (3 neurosurgery residents, 2 pituitary surgeons, and 1 neuroradiologist) participated in the study. Each rater scored 50 unique pituitary MRI scans (with contrast) of biopsy-proven pituitary adenoma. Reliabilities for the full scale were determined 3 ways: 1) using all 50 scans, 2) using scans with midrange scores versus end scores, and 3) using a dichotomized scale that reflects common clinical practice. The performance of resident raters was compared with that of faculty raters to assess the influence of training level on reliability. RESULTS Overall, the interrater reliability of the Knosp scale was "strong" (0.73, 95% CI 0.56-0.84). However, the percent agreement for all 6 reviewers was only 10% (26% for faculty members, 30% for residents). The reliability of the middle scores (i.e., average rated Knosp Grades 1 and 2) was "very weak" (0.18, 95% CI -0.27 to 0.56) and the percent agreement for all reviewers was only 5%. When the scale was dichotomized into tumors unlikely to have intraoperative CS involvement (Grades 0, 1, and 2) and those likely to have CS involvement (Grades 3 and 4), the reliability was "strong" (0.60, 95% CI 0.39-0.75) and the percent agreement for all raters improved to 60%. There was no significant difference in reliability between residents and faculty (residents 0.72, 95% CI 0.55-0.83 vs faculty 0.73, 95% CI 0.56-0.84). Intrarater reliability was moderate to strong and increased with the level of experience. CONCLUSIONS Although these findings suggest that the Knosp grading scale has acceptable interrater reliability overall, it raises important questions about the "very weak" reliability of the scale's middle grades. By dichotomizing the scale into clinically useful groups, the authors were able to address the poor reliability and percent agreement of the intermediate grades and to isolate the most important grades for use in surgical decision making (Grades 3 and 4). Authors of future pituitary surgery studies should consider reporting Knosp grades as dichotomized results rather than as the full scale to optimize the reliability of the scale.

Journal ArticleDOI
TL;DR: Recently proposed AOSpine classification has better reliability for identifying fracture morphology than the already existing Thoracolumbar Injury Classification and Severity Score (TLICS).
Abstract: The aim of this multicentre study was to determine whether the recently introduced AOSpine Classification and Injury Severity System has better interrater and intrarater reliability than the already existing Thoracolumbar Injury Classification and Severity Score (TLICS) for thoracolumbar spine injuries. Clinical and radiological data of 50 consecutive patients admitted at a single centre with a diagnosis of an acute traumatic thoracolumbar spine injury were distributed to eleven attending spine surgeons from six different institutions in the form of PowerPoint presentation, who classified them according to both classifications. After time span of 6 weeks, cases were randomly rearranged and sent again to same surgeons for re-classification. Interobserver and intraobserver reliability for each component of TLICS and new AOSpine classification were evaluated using Fleiss Kappa coefficient (k value) and Spearman rank order correlation. Moderate interrater and intrarater reliability was seen for grading fracture type and integrity of posterior ligamentous complex (Fracture type: k = 0.43 ± 0.01 and 0.59 ± 0.16, respectively, PLC: k = 0.47 ± 0.01 and 0.55 ± 0.15, respectively), and fair to moderate reliability (k = 0.29 ± 0.01 interobserver and 0.44+/0.10 intraobserver, respectively) for total score according to TLICS. Moderate interrater (k = 0.59 ± 0.01) and substantial intrarater reliability (k = 0.68 ± 0.13) was seen for grading fracture type regardless of subtype according to AOSpine classification. Near perfect interrater and intrarater agreement was seen concerning neurological status for both the classification systems. Recently proposed AOSpine classification has better reliability for identifying fracture morphology than the existing TLICS. Additional studies are clearly necessary concerning the application of these classification systems across multiple physicians at different level of training and trauma centers to evaluate not only their reliability and reproducibility, but also the other attributes, especially the clinical significance of a good classification system.

Journal ArticleDOI
TL;DR: Despite the introduction of HN grading systems to standardize the interpretation and reporting of renal ultrasound in infants with HN, none have been proven superior in allowing clinicians to distinguish between "moderate" grades.

Journal ArticleDOI
TL;DR: In this paper, the authors evaluated the reliability, repeatability, and accuracy of MRI-based femoral version measurements in an adolescent population and found that interrater and intrarater reliability was 0.91 (95% CI, 0.86-0.95) for the CT technique compared with 0.95 (95%) for the MRI technique.
Abstract: Background Femoral version measurement techniques based on magnetic resonance imaging (MRI) studies have been developed as an alternative to the high levels of ionizing radiation associated with computed tomography (CT)-based methods. Previous studies have not evaluated the reliability, repeatability, and accuracy of MRI-based femoral version measurements in an adolescent population. Methods Subjects who underwent MRI and CT studies for clinical suspicion of hip pain secondary to hip dysplasia or femoroacetabular impingement between 2011 and 2013 were identified. Rapid sequence femoral version images were obtained from MRI Hip dGEMRIC and/or postarthrogram studies. Femoral version images were also obtained from bilateral CT lower extremity, without contrast, studies. Measurements were made by 1 fellowship-trained, pediatric hip preservation attending surgeon, 2 pediatric orthopaedic surgical fellows, and 1 fellowship-trained musculoskeletal radiologist on 2 separate occasions. Linear mixed models were used to estimate the reliability and repeatability associated with CT-based and MRI-based measurements (intraclass correlation coefficients) and to estimate the agreement (CT-MRI) between the 2 techniques. Results The mean age of 36 subjects was 15.4 years (±4.1 y). Interrater reliability was 0.91 (95% CI, 0.86-0.95) for the CT technique compared with 0.90 (95% CI, 0.86-0.94) for the rapid sequence MRI technique. Intrarater reliability for the CT technique was 0.96 (95% CI, 0.91-0.98) compared with 0.95 (95% CI, 0.90-0.97) for the MRI technique. The agreement between the MRI-based and CT-based techniques (bias: 1.9 degrees, limits of agreement: -11.3 to 14.9 degrees) was similar to the agreement between consecutive MRI measurements (bias: 0.4 degrees, limits of agreement: -7.8 to 8.6 degrees) as well as consecutive CT measurements (bias: 0.5 degrees, limits of agreement: -8.8 to 9.9 degrees). Conclusions The interrater and intrarater reliability and repeatability estimates (intraclass correlation coefficient values) associated with both techniques was excellent (>0.90). Acquirement of axial images at the pelvis and knee during MRI for investigation of adolescents with hip pain allows for reliable measurement of femoral version. Level of evidence Level II-diagnostic study.

Journal ArticleDOI
TL;DR: In this article, the intrarater and interrater reliability of the Test of Gross Motor Development 3rd Edition (TGMD-3) was examined for children aged between 3 and 9 years.
Abstract: This study examined the intrarater and interrater reliability of the Test of Gross Motor Development—3rd Edition (TGMD-3). Participants were 60 Finnish children aged between 3 and 9 years, divided into three separate samples of 20. Two samples of 20 were used to examine the intrarater reliability of two different assessors, and the third sample of 20 was used to establish interrater reliability. Children’s TGMD-3 performances were video-recorded and later assessed using an intraclass correlation coefficient, a kappa statistic, and a percent agreement calculation. The intrarater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.69 to 0.77, and percent agreement ranged from 87 to 91%. The interrater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.56 to 0.64. Percent agreement of 83% was observed for locomotor skills, ball skills, and total skills, respectively. Hop, horizontal jump, and two-hand strike as...


Journal ArticleDOI
TL;DR: A lack of high-quality studies evaluating inter-rater as well as intrarater reliability of specific physical examination tests for the diagnosis of shoulder pathologies was identified, hindering proper cross-study comparisons.
Abstract: Background Shoulder pain in the general population is common and to identify the aetiology of shoulder pain, history, motion and muscle testing, and physical examination tests are usually performed. Objective The aim of this systematic review was to summarise and evaluate intrarater and inter-rater reliability of physical examination tests in the diagnosis of shoulder pathologies. Methods A comprehensive systematic literature search was conducted using MEDLINE, EMBASE, Allied and Complementary Medicine Database (AMED) and Physiotherapy Evidence Database (PEDro) through 20 March 2015. Methodological quality was assessed using the Quality Appraisal of Reliability Studies (QAREL) tool by 2 independent reviewers. Results The search strategy revealed 3259 articles, of which 18 finally met the inclusion criteria. These studies evaluated the reliability of 62 test and test variations used for the specific physical examination tests for the diagnosis of shoulder pathologies. Methodological quality ranged from 2 to 7 positive criteria of the 11 items of the QAREL tool. Conclusions This review identified a lack of high-quality studies evaluating inter-rater as well as intrarater reliability of specific physical examination tests for the diagnosis of shoulder pathologies. In addition, reliability measures differed between included studies hindering proper cross-study comparisons. Trial registration number PROSPERO CRD42014009018.

Journal ArticleDOI
TL;DR: The test for shoulder muscle strength using elastic resistance bands has excellent validity and reliability, but produces systematically lower torque values than MVC, who has an initial concentric phase and is performed bilaterally and standing upright.
Abstract: Valid and reliable measurements of muscle strength are important in sport medicine. This study assesses concurrent validity and intrarater reliability (test-retest reliability) of elastic resistance bands for measuring shoulder muscle strength. Altogether, 50 healthy adults [mean age 36.0 (SD: 11.6), 29 women and 21 men] participated in testing and retesting 1-2 weeks later. The maximal elastic resistance (TheraBand) that each participant could hold for 3 s during standing bilateral shoulder abduction to 90° was converted into torque and validated against gold standard maximal voluntary isometric contraction (MVC) (Vishay force transducer) performed unilaterally while lying supine. The intrarater reliability of both tests were high; for the MVC and elastic band test, respectively, ICC(3,1) was 0.98 (95% CI: 0.97-0.99) and 0.99 (95% CI: 0.98-1.00), and measurement error was 4.8% (95% CI: 3.7-5.9) and 4.7% (95% CI: 3.1-6.2). For concurrent validity, ICC(3,1) was 0.96 (95% CI: 0.95-0.98) and measurement error was 8.1% (95% CI: 6.6-9.6), and the elastic band test explained 93% of the variance in the MVC test. However, the elastic band test produced systematically lower torque values than the MVC [56.5 (SD: 26.8) vs 66.5 (SD: 25.5) Nm, P < 0.01]. In conclusion, the test for shoulder muscle strength using elastic resistance bands has excellent validity and reliability, but produces systematically lower torque values than MVC. The reason for the lower torque values may be that the elastic band test has an initial concentric phase and is performed bilaterally and standing upright.

Journal ArticleDOI
TL;DR: With the ease of use, portability, and ability to record multiple measurements without stopping, these devices can be applied to clinical and research settings and show moderate to excellent concurrent validity.
Abstract: Kubas, C, Chen, Y-W, Echeverri, S, McCann, S, Denhoed, M, Walker, C, Kennedy, C, and Reid, WD. Reliability and validity of cervical range of motion and muscle strength testing. J Strength Cond Res 31(4): 1087-1096, 2017-Cervical range of motion (ROM) and strength are fundamental measures to assess treatment effectiveness. The JTECH wireless devices provide versatile means of quantifying these measurements. The purpose of this study was to determine intrarater and interrater reliabilities and concurrent validity of the JTECH wireless dual inclinometer and handheld dynamometer. This study included 20 healthy subjects (mean age = 28.7 ± 7.8 years). The directions of ROM movement measured were cervical flexion, extension, lateral flexion, and rotation. Isometric strength was measured for flexion, extension, and lateral flexion. Two testers measured cervical ROM and isometric strength for each subject using the JTECH devices during 2 or 3 sessions to determine reliability. The same ROM and muscle strength movements were measured using the CROM3 and MicroFET2, respectively, to assess concurrent validity. Reliability and validity were analyzed using intraclass correlation coefficient (ICC), along with SEM and minimal detectable change. The results of this study showed that the intrarater reliability of the JTECH inclinometer and dynamometer was moderate to excellent (ICCs (3,1) = 0.53-0.90 and 0.74-0.91, respectively). The interrater reliability of the JTECH inclinometer was moderate to excellent (ICCs (2,3) = 0.69-0.89), whereas the JTECH dynamometer showed excellent interrater reliability (ICCs (2,3) = 0.84-0.88). The JTECH inclinometer and dynamometer showed moderate to excellent concurrent validity (ICCs (3,2) = 0.65-0.91 and 0.91-0.96, respectively). With the ease of use, portability, and ability to record multiple measurements without stopping, these devices can be applied to clinical and research settings.

Book
06 Jun 2017
TL;DR: Book file that related with probabilistic_physics_of_failure_approach_to_reliability_modeling_accelera ted_testing_prognosis_and-reliability-assessment_performability_engineerin g_series_reliable_evaluation_of-engineering-systems-concepts-and-techni-ques book.
Abstract: [Read Online] Probabilistic_physics_of_failure_approach_to_reliability_modeling_accelera ted_testing_prognosis_and_reliability_assessment_performability_engineerin g_series_reliability_evaluation_of_engineering_systems_concepts_and_techni ques [PDF] [EPUB] Download or Read online all PDF Book file that related with probabilistic_physics_of_failure_approach_to_reliability_modeling_accelera ted_testing_prognosis_and_reliability_assessment_performability_engineerin g_series_reliability_evaluation_of_engineering_systems_concepts_and_techni ques book. *FREE* Shipping Probabilistic_physics_of_failure_approach_to_reliability_modeling_accelera ted_testing_prognosis_and_reliability_assessment_performability_engineerin g_series_reliability_evaluation_of_engineering_systems_concepts_and_techni ques PDF Book for everyone.

Journal ArticleDOI
TL;DR: Grading of TCVI with CTA using the Biffl Scale is reliable, and intrarater reliability was perfect in 2 raters, and near perfect in the remaining 5 raters.
Abstract: OBJECTIVE Blunt traumatic cerebrovascular injury (TCVI) represents structural injury to a vessel due to high-energy trauma. The Biffl Scale is a widely accepted grading scheme for these injuries that was developed using digital subtraction angiography. In recent years, screening CT angiography (CTA) has been used to identify patients with TCVI. The reliability of this scale, with injuries assessed using CTA, has not yet been determined. METHODS Seven independent raters, including 2 neurosurgeons, 2 neuroradiologists, 2 neurosurgical residents, and 1 neurosurgical vascular fellow, independently reviewed each presenting CTA of the neck performed in 40 patients with confirmed TCVI and assigned a Biffl grade. Ten images were repeated to assess intrarater reliability, for a total of 50 CTAs. Fleiss' multirater kappa (κ) and interclass correlation were calculated as a measure of interrater reliability. Weighted Cohen's κ was used to assess intrarater reliability. RESULTS Fleiss' multirater κ was 0.65 (95% CI 0.61-0.69), indicating substantial agreement as to the Biffl grade assignment among the 7 raters. Interclass correlation was 0.82, demonstrating excellent agreement among the raters. Intrarater reliability was perfect (weighted Cohen's κ = 1) in 2 raters, and near perfect (weighted Cohen's κ > 0.8) in the remaining 5 raters. CONCLUSIONS Grading of TCVI with CTA using the Biffl Scale is reliable.

Journal ArticleDOI
TL;DR: An efficient procedure employing the first-order reliability method (FORM) is proposed to evaluate the reliability of engineering systems governed by multiple limit state functions that are...
Abstract: An efficient procedure employing the first-order reliability method (FORM) is proposed to evaluate the reliability of engineering systems governed by multiple limit state functions that are...

Journal ArticleDOI
TL;DR: Overall interrater reliability of both Hardy subscales on MRI was strong, however, reliability of the intermediate scores was weak, and percent agreement among raters was poor using the full scales.
Abstract: Objectives The Hardy classification is used to classify pituitary tumors for clinical and research purposes. The scale was developed using lateral skull radiographs and encephalograms, and its reliability has not been evaluated in the magnetic resonance imaging (MRI) era. Design Fifty preoperative MRI scans of biopsy-proven pituitary adenomas using the sellar invasion and suprasellar extension components of the Hardy scale were reviewed. Setting This study was a cohort study set at a single institution. Participants There were six independent raters. Main Outcome Measures The main outcome measures of this study were interrater reliability, intrarater reliability, and percent agreement. Results Overall interrater reliability of both Hardy subscales on MRI was strong. However, reliability of the intermediate scores was weak, and percent agreement among raters was poor (12–16%) using the full scales. Dichotomizing the scale into clinically useful groups maintained strong interrater reliability for the sellar invasion scale and increased the percent agreement for both scales. Conclusion This study raises important questions about the reliability of the original Hardy classification. Editing the measure to a clinically relevant dichotomous scale simplifies the rating process and may be useful for preoperative tumor characterization in the MRI era. Future research studies should use the dichotomized Hardy scale (sellar invasion Grades 0–III versus Grade IV, suprasellar extension Types 0–C versus Type D).


Journal ArticleDOI
TL;DR: The intrarater reliability and validity of the FT test is found to be a reliable and valid measure for screening for physical disability, frailty, and functional mobility and may be useful in assessing readiness for independent living.
Abstract: BACKGROUND AND PURPOSE The ability to get up from the floor after a fall is a basic skill required for functional independence. Consequently, the inability to safely get down to and up from the floor or to perform a floor transfer (FT) may indicate decreased mobility and/or increased frailty. A reliable and valid test of FT ability is a critical part of the clinical decision-making process. The FT test is a simple, performance-based test that can be administered quickly and easily to determine a patient's ability to safely and successfully get down and up from the floor using any movement strategy and without time restriction. The primary purpose of this cross-sectional study was to determine the intrarater reliability and validity of the FT test as a practical alternative to several widely used yet time-consuming measures of physical disability, frailty, and functional mobility. METHODS A total of 61 community-dwelling older adults (65-96 years of age) participated in the study, divided into 2 separate subsamples: intrarater reliability was studied with 15 participants, while concurrent validity was studied with the remaining 46 participants. In both subsamples, the participants were stratified on the basis of the self-reported levels of FT ability as independent, assisted, and dependent. Intrarater reliability was assessed on 2 separate occasions and scores were analyzed by intraclass correlation coefficient and κ statistics. Concurrent validity of the FT test was assessed against the self-reported FT ability questionnaire, Physical Functioning Scale, Phenotype of Physical Frailty, and the Short Physical Performance Battery. Known-groups validity was tested by determining whether the FT test distinguished between (1) community-dwelling older adults with physical disabilities versus those without physical disabilities; and (2) community-dwelling older adults who were functionally dependent versus those who were independent. Participants were also categorized on the basis of FT test outcome as independent, assisted, or dependent. The Spearman correlation coefficients were calculated to examine the strength of the relationships between the FT test and physical status measures. The Kruskal-Wallis test was used to determine whether the FT test significantly discriminated between groups as categorized by the Physical Functioning Scale and Short Physical Performance Battery, and to examine the significance level of the sociodemographic data across the 3 FT test outcome groups. RESULTS The intrarater reliabilities of the measures were good (0.73-1.00). There were statistically positive and strong correlations between the FT test and all physical status measures (ρ ranged from 0.86 to 0.93, P < .001). Older adults who passed the FT test were collectively categorized as those without physical disabilities and functionally independent, whereas older adults who failed the FT test were categorized as those with physical disabilities and functionally dependent (P < .001). CONCLUSION The FT test is a reliable and valid measure for screening for physical disability, frailty, and functional mobility. It can determine which older adults have physical disabilities and/or functional dependence and hence may be useful in assessing readiness for independent living. Inclusion of the FT test at initial evaluation may reveal the presence of these conditions and address the safety of older adults in the community.

Journal ArticleDOI
TL;DR: Work and maximum height demonstrated acceptable reliability and agreement that was at least equivalent to the traditional repetitions measure.

Journal ArticleDOI
TL;DR: While raters experienced with theTGMD-2 can produce consistent scores for TGMD-3 total scale and subscales, additional training is needed to improve skill-specific reliability.
Abstract: The purpose of this study was to examine the inter- and intrarater reliabilities of the Test of Gross Motor Development-third edition (TGMD-3). The TGMD-3 was administered to 10 typically developing children. Five raters with experience using the Test of Gross Motor Development-second edition (TGMD-2) scored the digitally recorded performances and then rescored the same performances after a period of 2 weeks. Intraclass correlation (ICC) was used to examine both inter- and intrarater reliabilities of scores. Interrater reliability for the total score, locomotor subscale, and ball skills subscale (ICC: 0.92-0.96) were all excellent, while individual skills (ICC: 0.51-0.93) had fair-to-excellent reliability. Intrarater reliability across all raters was also excellent (ICC: 0.77-0.98) but varied widely for individual raters (ICC: 0.28-1.00) including multiple examples of poor reliability. While raters experienced with the TGMD-2 can produce consistent scores for TGMD-3 total scale and subscales, additional training is needed to improve skill-specific reliability.

Journal ArticleDOI
TL;DR: MyotonPRO demonstrated acceptable reliability when used in a ward setting in those patients with acute stroke, however, results should be interpreted with caution, due to the limitations of the study and the varying level of consistency observed between different muscles.
Abstract: A myotonometer can objectively quantify changes in muscle tone. The between-days intra-rater reliability in a ward setting for the acute stroke population remains unknown. This study aimed to investigate the device’s between-days intra-rater reliability when used in a ward setting for acute stroke participants. Muscle tone of biceps brachii, brachioradialis, rectus femoris, and tibialis anterior was recorded in the ward at bedside by one physiotherapist on two consecutive days. This study included participants who were within 1 month of their first stroke occurrence. Participants who were medically unstable or who suffered from brain stem injury were excluded. Reliability was assessed by the intraclass correlation coefficient (ICC), standard error of measurement (SEM), smallest real difference (SRD), and the Bland-Altman limits of agreement. The results indicated excellent between-days intra-rater reliability (ICC > 0.75). SEM and SRD show small differences between measurements. The Bland-Altman analysis indicated a tendency of overestimation of the rectus femoris. MyotonPRO demonstrated acceptable reliability when used in a ward setting in those patients with acute stroke. However, results should be interpreted with caution, due to the limitations of the study and the varying level of consistency observed between different muscles.

Journal ArticleDOI
TL;DR: The use of expensive devices to measure ACROM in adults with nonspecific neck pain does not seem to improve the reliability of the assessment, and significant influences were identified with intrarater reliability of inexpensive devices.

Journal ArticleDOI
TL;DR: Preoperative determination of the area using MRI is repeatable and enables planning of graft choice and size to optimally cover the tibial insertion site.
Abstract: To determine the distribution of different sizes of the area of the tibial insertion site among the population and to evaluate whether preoperative MRI measurements correlate with intraoperative findings to enable preoperative planning of the required graft size to cover the tibial insertion site sufficiently. The hypothesis was that the area of the tibial insertion site varies among individuals and that there is good agreement between MRI and intraoperative measurements. Intraoperative measurements of the tibial insertion site were taken on 117 patients. Three measurements were taken in each plane building a grid to cover the tibial insertion site as closely as possible. The mean of the three measurements in each plane was used for determination of the area. Two orthopaedic surgeons, who were blinded to the intraoperative measurements, took magnetic resonance imaging (MRI) measurements of the area of the tibial insertion site at two different time points. The intraoperative measured mean area was 123.8 ± 21.5 mm2. The mean area was 132.8 ± 15.7 mm2 (rater 1) and 136.7 ± 15.4 mm2 (rater 2) when determined using MRI. The size of the area was approximately normally distributed. Inter-rater (0.89; 95 % CI 0.84, 0.92; p < 0.001) and intrarater reliability (rater 1: 0.97; 95 % CI 0.95, 0.98; p < 0.001; rater 2: 0.95; 95 % CI 0.92, 0.96; p < 0.001) demonstrated excellent test–retest reliability. There was good agreement between MRI and intraoperative measurement of tibial insertion site area (ICCs rater 1: 0.80; 95 % CI 0.71, 0.87; p < 0.001; rater 2: 0.87; 95 % CI 0.81, 0.91; p < 0.001). The tibial insertion site varies in size and shape. Preoperative determination of the area using MRI is repeatable and enables planning of graft choice and size to optimally cover the tibial insertion site. III.