scispace - formally typeset
Search or ask a question

Showing papers on "Intra-rater reliability published in 2018"


Journal ArticleDOI
TL;DR: Inter- and intra-rater agreement for Modified Ashworth Scale scores was satisfactory and several characteristics of the studies were statistically associated to inter- rater reliability of the scores for lower and upper extremities.
Abstract: Introduction The Modified Ashworth Scale is the most widely clinical scale used to measure the increase of muscle tone. Reliability is not an immutable property of a scale and can vary as a function of the variability and composition of the sample to which it is administered. The best method to examine how the reliability of a test scores varies is by conducting a systematic review and meta-analysis of the reliability coefficients obtained in different applications of the test with the data at hand. The objectives of this systematic revision are: what is the mean inter- and intra-rater reliability of the Modified Ashworth Scale's scores in upper and lower extremities? Which study characteristics affect the reliability of the scores in this scale? Evidence acquisition The PubMed, Embase and CINAHL databases were searched from 1987 to February 2015. Two reviewers independently selected empirical studies published in English or in Spanish that applied the Modified Ashworth Scale and reported any reliability coefficient with the data at hand in children, adolescents or adults with spasticity. Evidence synthesis Thirty-three studies reported any reliability estimate of Modified Ashworth Scale scores (N.=1065 participants). For lower extremities and inter-rater agreement, the mean intraclass correlation was ICC+=0.686 (95% CI: 0.563 and 0.780) and for kappa coefficients, κ+=0.360 (95% CI: 0.241 and 0.468); for intra-rater agreement: ICC+=0.644 (95% CI: 0.543 and 0.726) and κ+=0.488 (95% CI: 0.370 and 0.591). For upper extremities and inter-rater agreement: ICC+=0.781 (95% CI: 0.679 and 0.853) and κ+=0.625 (95% CI: 0.350 and 0.801); for intra-rater agreement: ICC+=0.748 (95% CI: 0.671 and 0.809) and κ+=0.593 (95% CI: 0.467 and 0.696). The type of design, the study focus, and the number of raters presented statistically significant relationships with ICC both for lower and upper extremities. Conclusions Inter- and intra-rater agreement for Modified Ashworth Scale scores was satisfactory. Modified Ashworth Scale' scores exhibited better reliability when measuring upper extremities than lower. Several characteristics of the studies were statistically associated to inter-rater reliability of the scores for lower and upper extremities.

188 citations


Journal ArticleDOI
27 Oct 2018-BMJ Open
TL;DR: The majority of gross motor assessments for children have good-excellent validity, and the Bayley-III has the best predictive validity at 2 years of age for later motor outcome.
Abstract: Objective Gross motor assessment tools have a critical role in identifying, diagnosing and evaluating motor difficulties in childhood. The objective of this review was to systematically evaluate the psychometric properties and clinical utility of gross motor assessment tools for children aged 2–12 years. Method A systematic search of MEDLINE, Embase, CINAHL and AMED was performed between May and July 2017. Methodological quality was assessed with the COnsensus-based Standards for the selection of health status Measurement INstruments checklist and an outcome measures rating form was used to evaluate reliability, validity and clinical utility of assessment tools. Results Seven assessment tools from 37 studies/manuals met the inclusion criteria: Bayley Scale of Infant and Toddler Development-III (Bayley-III), Bruininks-Oseretsky Test of Motor Proficiency-2 (BOT-2), Movement Assessment Battery for Children-2 (MABC-2), McCarron Assessment of Neuromuscular Development (MAND), Neurological Sensory Motor Developmental Assessment (NSMDA), Peabody Developmental Motor Scales-2 (PDMS-2) and Test of Gross Motor Development-2 (TGMD-2). Methodological quality varied from poor to excellent. Validity and internal consistency varied from fair to excellent (α=0.5–0.99). The Bayley-III, NSMDA and MABC-2 have evidence of predictive validity. Test–retest reliability is excellent in the BOT-2 (intraclass correlation coefficient (ICC)=0.80–0.99), PDMS-2 (ICC=0.97), MABC-2 (ICC=0.83–0.96) and TGMD-2 (ICC=0.81–0.92). TGMD-2 has the highest inter-rater (ICC=0.88–0.93) and intrarater reliability (ICC=0.92–0.99). Conclusions The majority of gross motor assessments for children have good-excellent validity. Test–retest reliability is highest in the BOT-2, MABC-2, PDMS-2 and TGMD-2. The Bayley-III has the best predictive validity at 2 years of age for later motor outcome. None of the assessment tools demonstrate good evaluative validity. Further research on evaluative gross motor assessment tools are urgently needed.

108 citations


Journal ArticleDOI
TL;DR: Evaluated the reliability of vaginal palpation, vaginal manometry, vaginal dynamometry; and surface (transperineal) electromyography (sEMG) when evaluating pelvic floor muscle strength and/or activation and determined the associations among PFM strength measured using these assessments.
Abstract: Aims The purposes of this study were: (i) to evaluate the reliability of vaginal palpation, vaginal manometry, vaginal dynamometry; and surface (transperineal) electromyography (sEMG), when evaluating pelvic floor muscle (PFM) strength and/or activation; and (ii) to determine the associations among PFM strength measured using these assessments. Methods One hundred and fifty women with pelvic floor disorders participated on one occasion, and 20 women returned for the same investigations by two different raters on 3 different days. At each session, PFM strength was assessed using palpation (both the modified Oxford Grading Scale and the Levator ani testing), manometry, and dynamometry; and PFM activation was assessed using sEMG. Results The interrater reliability of manometry, dynamometry, and sEMG (both root-mean-square [RMS] and integral average) was high (Lin's Concordance Correlation Coefficient [CCC] = 0.95, 0.93, 0.91, 0.86, respectively), whereas the interrater reliability of both palpation grading scales was low (Cohen's Kappa [k] = 0.27-0.38). The intrarater reliability of manometry (CCC = 0.96), and dynamometry (CCC = 0.96) were high, whereas intrarater reliability of both palpation scales (k = 0.78 for both), and of sEMG (CCC = 0.79 vs 0.80 for RMS vs integral average) was moderate. The Bland-Altman plot showed good inter and intrarater agreement, with little random variability for all instruments. The correlations among palpation, manometry, and dynamometry were moderate (coefficient of determination [r2] ranged from 0.52 to 0.75), however, transperineal sEMG amplitude was only weakly correlated with all measures of strength (r2 = 0.23-0.30). Conclusions Manometry and dynamometry are more reliable tools than vaginal palpation for the assessment of PFM strength in women with pelvic floor disorders, especially when different raters are involved. The different PFM strength measures used clinically are moderately correlated; whereas, PFM activation recorded using transperineal sEMG is only weakly correlated with PFM strength. Results from perineal sEMG should not be interpreted in the context of reporting PFM strength.

72 citations


Journal ArticleDOI
29 Mar 2018-PLOS ONE
TL;DR: The MMT8 total score is a reliable assessment to consider general muscle weakness in people with myositis but not for single muscle groups, and the results confirm that HHD can be recommended to evaluate strength of single Muscle groups.
Abstract: Manual muscle testing (MMT) and hand-held dynamometry (HHD) are commonly used in people with inflammatory myopathy (IM), but their clinimetric properties have not yet been sufficiently studied. To evaluate the reliability and validity of MMT and HHD, maximum isometric strength was measured in eight muscle groups across three measurement events. To evaluate reliability of HHD, intra-class correlation coefficients (ICC), the standard error of measurements (SEM) and smallest detectable changes (SDC) were calculated. To measure reliability of MMT linear Cohen`s Kappa was computed for single muscle groups and ICC for total score. Additionally, correlations between MMT8 and HHD were evaluated with Spearman Correlation Coefficients. Fifty people with myositis (56±14 years, 76% female) were included in the study. Intra-and interrater reliability of HHD yielded excellent ICCs (0.75-0.97) for all muscle groups, except for interrater reliability of ankle extension (0.61). The corresponding SEMs% ranged from 8 to 28% and the SDCs% from 23 to 65%. MMT8 total score revealed excellent intra-and interrater reliability (ICC>0.9). Intrarater reliability of single muscle groups was substantial for shoulder and hip abduction, elbow and neck flexion, and hip extension (0.64-0.69); moderate for wrist (0.53) and knee extension (0.49) and fair for ankle extension (0.35). Interrater reliability was moderate for neck flexion (0.54) and hip abduction (0.44); fair for shoulder abduction, elbow flexion, wrist and ankle extension (0.20-0.33); and slight for knee extension (0.08). Correlations between the two tests were low for wrist, knee, ankle, and hip extension; moderate for elbow flexion, neck flexion and hip abduction; and good for shoulder abduction. In conclusion, the MMT8 total score is a reliable assessment to consider general muscle weakness in people with myositis but not for single muscle groups. In contrast, our results confirm that HHD can be recommended to evaluate strength of single muscle groups.

57 citations


Journal ArticleDOI
TL;DR: None of the identified clinical tests could be concluded to have a good intrarater reliability, and further investigation should focus on a better overall study methodology and the use of identical protocols for the description of clinical tests.

39 citations


Journal ArticleDOI
TL;DR: It is demonstrated that using a correlation coefficient is not appropriate for assessing the interchangeability of 2 such measurement methods, and an alternative approach is described, the since widely applied graphical Bland–Altman Plot, which is based on a simple estimation of the mean and standard deviation of differences between measurements by the 2 methods.
Abstract: Correlation and agreement are 2 concepts that are widely applied in the medical literature and clinical practice to assess for the presence and strength of an association. However, because correlation and agreement are conceptually distinct, they require the use of different statistics. Agreement is a concept that is closely related to but fundamentally different from and often confused with correlation. The idea of agreement refers to the notion of reproducibility of clinical evaluations or biomedical measurements. The intraclass correlation coefficient is a commonly applied measure of agreement for continuous data. The intraclass correlation coefficient can be validly applied specifically to assess intrarater reliability and interrater reliability. As its name implies, the Lin concordance correlation coefficient is another measure of agreement or concordance. In undertaking a comparison of a new measurement technique with an established one, it is necessary to determine whether they agree sufficiently for the new to replace the old. Bland and Altman demonstrated that using a correlation coefficient is not appropriate for assessing the interchangeability of 2 such measurement methods. They in turn described an alternative approach, the since widely applied graphical Bland-Altman Plot, which is based on a simple estimation of the mean and standard deviation of differences between measurements by the 2 methods. In reading a medical journal article that includes the interpretation of diagnostic tests and application of diagnostic criteria, attention is conventionally focused on aspects like sensitivity, specificity, predictive values, and likelihood ratios. However, if the clinicians who interpret the test cannot agree on its interpretation and resulting typically dichotomous or binary diagnosis, the test results will be of little practical use. Such agreement between observers (interobserver agreement) about a dichotomous or binary variable is often reported as the kappa statistic. Assessing the interrater agreement between observers, in the case of ordinal variables and data, also has important biomedical applicability. Typically, this situation calls for use of the Cohen weighted kappa. Questionnaires, psychometric scales, and diagnostic tests are widespread and increasingly used by not only researchers but also clinicians in their daily practice. It is essential that these questionnaires, scales, and diagnostic tests have a high degree of agreement between observers. It is therefore vital that biomedical researchers and clinicians apply the appropriate statistical measures of agreement to assess the reproducibility and quality of these measurement instruments and decision-making processes.

38 citations


Journal ArticleDOI
TL;DR: EOS showed excellent reliability for assessment of the sagittal alignment of the spine and pelvis in patients selected from the EOS recording system between 2016 and April 2017.
Abstract: Background The sagittal alignment of the spine and pelvis is not only closely related to the overall posture of the body but also to the evaluation and treatment of spine disease. In the last few years, the EOS imaging system, a new low-dose radiation X-ray device, became available for sagittal alignment assessment. However, there has been little research on the reliability of EOS. The purpose of this study was to evaluate the intrarater and interrater reliability of EOS for the sagittal alignment assessment of the spine and pelvis. Methods Records of 46 patients were selected from the EOS recording system between November 2016 and April 2017. The exclusion criteria were congenital spinal anomaly and deformity, and previous history of spine and pelvis operation. Sagittal parameters of the spine and pelvis were measured by three examiners three times each using both manual and EOS methods. Means comparison t-test, Pearson bivariate correlation analysis, and reliability analysis by intraclass correlation coefficients (ICCs) for intrarater and interrater reliability were performed using R package "irr." Results We found excellent intrarater and interrater reliability of EOS measurements. For intrarater reliability, the ICC ranged from 0.898 to 0.982. For interrater reliability, the ICC ranged from 0.794 to 0.837. We used a paired t-test to compare the values measured by manual and EOS methods: there was no statistically significant difference between the two methods. Correlation analysis also showed a statistically significant positive correlation. Conclusions EOS showed excellent reliability for assessment of the sagittal alignment of the spine and pelvis.

29 citations


Journal ArticleDOI
TL;DR: The use of the portable heart rate monitor to measure HRV showed acceptable intra and inter reliability in individuals with type 2 diabetes mellitus, supporting the use of this method of evaluation in research and clinical practice.
Abstract: Heart rate variability (HRV) among other methods can be used to assess diabetic cardiac autonomic neuropathy by cardiac intervals were recorded. However, the amount of error depending on this measurement methodology is unclear. To evaluate the intra- and inter-rater reliability to calculate HRV indices, comparing different times and by different trained examiners in patients with type 2 diabetes mellitus (T2DM). Thirty individuals of both genders, aged between 18 and 45 years, with T2DM. The RR interval (RRi) were recorded during a 10 min period on supine position using a portable heart rate monitor (Polar® S810i model). HRV indices were calculated by the software Kubios® HRV analysis (version 2.2). Linear (Mean RRi; STD RR; Mean HR; rMSSD; RR Tri; TINN LF; HF; total power) and non-linear (SD1; SD2; DFα1; DFα2, ApEn and, SampEn) indices were calculated by two examiners with an interval of one week between them. Substantial to excellent was found for reliability of the intra-examiner, with intraclass correlation coefficient (ICC) values ranging from 0.79 to 0.99, standard error of measurement (SEM) between 0.02 and 123.49 (in percentage: 1.83 and 16.67), and minimum detectable change (MDC) between 0.07 and 342.30. Regarding the inter-examiner reliability, substantial to excellent reliability was found, with ICC values ranging from 0.73 to 0.97, SEM between 0.04 and 178.13 (in percentage: 3.26 and 24.18), and MDC between 0.11 and 493.77. The use of the portable heart rate monitor to measure HRV showed acceptable intra and inter reliability in individuals with T2DM, supporting the use of this method of evaluation in research and clinical practice.

27 citations


Journal ArticleDOI
TL;DR: Smartphone technology, in conjunction with patient-reported outcomes, offers an accurate and practical way to remotely monitor patients and DrG was proven to be a valid and reliable tool in measuring knee ROM following arthroplasty.
Abstract: Knee range of motion (ROM) following a knee arthroplasty is an important clinical outcome that directly relates to the patient's physical function. Smartphone technology has led to the creation of applications that can measure ROM. The aim was to determine the concurrent reliability and validity of the photo-based application 'Dr Goniometer' (DrG) compared with a universal goniometer performed by a clinician. A smartphone camera was used to take photographs of the knee in full flexion and full extension, and the images were sent by participants to a study phone. Participants then rated the ease of participation. To assess validity, the patient's knee was measured by a clinician using a goniometer. To examine reliability, four clinicians assessed each image using DrG on four separate occasions spaced 1 week apart. A total of 60 images of knee ROM for 30 unicondylar or total knee arthroplasty were assessed. The goniometer and DrG showed strong correlations for flexion (r=0.94) and extension (r=0.90). DrG showed good intrarater reliability and excellent inter-rater reliability for flexion (intraclass correlation coefficient=0.990 and 0.990) and good reliability for extension (intraclass correlation coefficient=0.897 and 0.899). All participants found the process easy. DrG was proven to be a valid and reliable tool in measuring knee ROM following arthroplasty. Smartphone technology, in conjunction with patient-reported outcomes, offers an accurate and practical way to remotely monitor patients. Benefit may be found in differentiating those who need face-to-face clinical consult to those who do not.

24 citations


Journal ArticleDOI
TL;DR: The objectives were to compare the interrater agreement of the WHOC 2017 with that of theWHOC 2005 and to test the intra‐rater Agreement of the WhOC 2005.
Abstract: OBJECTIVE The World Health Organization classification (WHOC) 2017 of low-grade versus high-grade laryngeal dysplasia recently replaced the previous WHOC 2005 of mild, moderate, and severe dysplasia and carcinoma in situ. Our objectives were to compare the interrater agreement of the WHOC 2017 with that of the WHOC 2005 and to test the intra-rater agreement of the WHOC 2005. METHODS Two expert head and neck pathologists rated 211 tissue samples that were initially diagnosed with laryngeal precursor lesions. The samples were rated twice according to the WHOC 2005 and once according to the WHOC 2017; estimates of interrater and intrarater agreements were calculated with kappa statistics. RESULTS The crude intrarater agreements using the WHOC 2005 were 0.93 for rater 1 and 0.62 for rater 2. The corresponding unweighted kappa values were 0.90 (95% confidence interval [CI], 0.86-0.95) for rater 1 and 0.43 (95% CI, 0.35-0.54) for rater 2, whereas the standard linear weighted kappa values were 0.93 (95% CI, 0.90-0.97) for rater 1 and 0.60 (95% CI, 0.53-0.69) for rater 2. The crude interrater agreement for the WHOC 2005 was 0.57, with a corresponding unweighted kappa value 0.38 (95% CI, 0.31-0.48) and a standard linear weighted kappa value 0.52 (95% CI, 0.42-0.60). The crude interrater agreement for the WHOC 2017 was 0.83, with a corresponding unweighted kappa value 0.45 (95% CI, 0.31-0.59) and a standard linear weighted kappa value 0.46 (95% CI, 0.30-0.60). CONCLUSION Our results indicate difficulties in providing reliable diagnosis of laryngeal precursor lesions, even with experienced head and neck pathologists and the application of a newly revised classification system. LEVEL OF EVIDENCE 4. Laryngoscope, 128:2375-2379, 2018.

24 citations


Journal ArticleDOI
TL;DR: The inter- and intra-rater reliability for prevalence of positive hypermobility findings was fair-to-almost perfect for total scores and slight- to-almost-perfect in single joints, avoiding that assessment of GJH is based on chance.
Abstract: Comparisons across studies of generalized joint hypermobility are often difficult since there are several classification methods and methodological differences in the performance exist. The Beighton score is most commonly used and has been tested for inter- and intra-rater reliability. The Contompasis score and the Hospital del Mar criteria have not yet been evaluated for reliability. The aim of this study was to investigate the inter- and intra-rater reliability for measurements of range of motion in joints included in these three hypermobility assessment methods using a structured protocol. The study was planned in accordance with guidelines for reporting reliability studies. Healthy adults were consecutively recruited (49 for inter- and 29 for intra-rater assessments). Intra-class correlations, two-way random effects model, (ICC 2.1) with 95% confidence intervals, standard error of measurement, percentage of agreement, Cohen’s Kappa (κ) and prevalence-adjusted bias-adjusted kappa were calculated for single-joint measured in degrees and for total scores. The inter- and intra-rater reliability in total scores were ICC 2.1: 0.72–0.82 and 0.76–0.86 and for single-joint measurements in degrees 0.44–0.91 and 0.44–0.90, respectively. The difference between ratings was within 5 degrees in all but one joint. Standard error of measurement ranged from 1.0 to 6.9 degrees. The inter- and intra-rater reliability for prevalence of positive hypermobility findings the Cohen’s κ for total scores were 0.54–0.78 and 0.27–0.78 and in single joints 0.21–1.00 and 0.19–1.00, respectively. The prevalence- and bias adjusted Cohen’s κ, increased all but two values. Following a structured protocol, the inter- and intra-rater reliability was good-to-excellent for total scores and in all but two single joints, measured in degrees. The inter- and intra-rater reliability for prevalence of positive hypermobility findings was fair-to-almost perfect for total scores and slight-to-almost-perfect in single joints. By using a structured protocol, we attempted to standardize the assessment of range of motion in clinical and in research settings. This standardization could be helpful in the first part of the process of standardizing the tests thus avoiding that assessment of GJH is based on chance.

Journal ArticleDOI
TL;DR: The psychometric characteristics of THINC‐it, a cognitive assessment tool composed of four objective measures of cognition and a self‐rated assessment, in subjects without mental disorders are examined.
Abstract: Objectives There is a need for a brief, reliable, valid, and sensitive assessment tool for screening cognitive deficits in patients with Major Depressive Disorders. This paper examines the psychometric characteristics of THINC-it, a cognitive assessment tool composed of four objective measures of cognition and a self-rated assessment, in subjects without mental disorders. Methods N = 100 healthy controls with no current or past history of depression were tested on four sequential assessments to examine temporal stability, reliability, and convergent validity of the THINC-it tests. We examined temporal reliability across 1 week and stability via three consecutive assessments. Consistency of assessment by the study rater (intrarater reliability) was calculated using the data from the second and third of these consecutive assessments. Results Test-retest reliability correlations varied between Pearson's r = 0.75 and 0.8. Intrarater reliability between 0.7 and 0.93. Stability for the primary measure for each test yielded within-subject standard deviation values between 5.9 and 11.23 for accuracy measures and 0.735 and 17.3 seconds for latency measures. Convergent validity for three tasks was in the acceptable range, but low for the Symbol Check task. Conclusions Analysis shows high levels of reliability and stability. Levels of convergent validity were modest but acceptable in the case of all but one test.


Journal ArticleDOI
TL;DR: This study reviewed referential values for thoracic kyphosis and lumbar lordosis for radiography and photogrammetry analysis and search for information about the interrater and intrarater reliability.

Journal ArticleDOI
TL;DR: The MCD System demonstrates a very good interrater and intrarater reliability following lower limb surgery in children with CP and may improve standardization of AE recording with a view to accurate audits and improved clarity in outcome studies for CP.
Abstract: BACKGROUND: The modified Clavien-Dindo (MCD) system is a reliable tool for classifying adverse events (AEs) in hip preservation surgery and has since been utilized in studies involving lower limb surgery for ambulant and nonambulant children with cerebral palsy (CP). However, the profile of AEs recorded in children with CP compared with typically developing children is different, and the reliability of the MCD in CP is unknown. This study aimed to evaluate the interrater and intrarater reliability of the MCD system for classifying AEs following lower limb surgery in children with CP. METHODS: Eighteen raters were invited to participate, including clinicians from surgical, nursing, and physical therapy professions, and individuals with CP. Following a MCD familiarization session, participants rated 40 clinical scenarios on 2 occasions, 2 weeks apart. Fleiss' κ statistics were used to calculate interrater and intrarater reliability. RESULTS: The overall Fleiss' κ value for interrater reliability in the first rating was 0.70 (95% confidence interval, 0.61-0.80), and increased to 0.75 (95% confidence interval, 0.66-0.84) in the second rating. The average Fleiss' κ value for intrarater reliability was 0.78 (range, 0.48 to 1.00). Grading of more severe AEs (MCD III to V) achieved near perfect agreement (κ, 0.87 to 1.00). There was a lower level of agreement for minor AEs (MCD I-II) (κ, 0.53 to 0.55). A κ score of 0 to 0.2 was deemed as poor, 0.21 to 0.4 as fair, 0.41 to 0.6 as good, 0.61 to 0.8 as very good, and 0.81 to 1.0 as almost perfect agreement. CONCLUSIONS: The MCD System demonstrates a very good interrater and intrarater reliability following lower limb surgery in children with CP. The MCD can be used by clinicians from different health care professions with a high level of reliability. The MCD may improve standardization of AE recording with a view to accurate audits and improved clarity in outcome studies for CP. LEVEL OF EVIDENCE: Level II-diagnostic.

Journal ArticleDOI
TL;DR: The TAI 4.0 provides reliable and valid quantitative assessment of an individual's transfer without the need for comprehensive training, as is the case with the TAI 3.0.
Abstract: Background: Proper transfer technique is associated with improved biomechanics and decreased pain and pathology. However, many users do not use proper technique, and appropriate assessment and training are needed to address these deficits. The transfer assessment instrument (TAI) 4.0 was designed to meet those needs and improve on past versions by removing the need for clinician training, shortening administration time, and simplifying question content. Objectives: Evaluate the psychometric properties of the TAI 4.0. Methods: A convenience sample of full-time wheelchair users was scored on multiple transfers by four raters to assess interrater, intrarater, and test-retest reliability and concurrent validity of the TAI 4.0. Each user also was scored using a visual analog scale (VAS). Results: For 44 participants, the mean TAI 4.0 and VAS across all transfers were 7.58 ± 1.12 and 7.44 ± 1.78, respectively, and scores were significantly correlated (r = 0.52-0.7). VAS scores were more strongly influenced by the flight/landing and body setup phases of the transfer. There were no significant associations between TAI 4.0 score and demographics. Intraclass correlation coefficients (ICC) ranged from 0.80 to 0.85 for interrater reliability, 0.60 to 0.76 for intrarater reliability, and 0.55 to 0.76 for test-retest reliability. The minimum detectable change (MDC) for the total score ranged from 1.02 to 1.30. Conclusion: The TAI 4.0 provides reliable and valid quantitative assessment of an individual's transfer without the need for comprehensive training, as is the case with the TAI 3.0. The tool can be completed in 3 minutes (average) in a clinical setting with only a ruler and goniometer.

Journal ArticleDOI
TL;DR: This study generated satisfactory evidence on the content validity, substantive validity, construct validity, inter- and intrarater reliability, and known-group discrimination of the OCS-P that support its application among poststroke patients who speak Putonghua.
Abstract: Background Oxford Cognitive Screen is designed for assessing cognitive functions of poststroke patients. This study was aimed to assess the psychometric properties of the Chinese (Putonghua) version of the Oxford Cognitive Screen-Putonghua (OCS-P) for use among poststroke patients without neglect. Methods Expert review panel evaluated content validity of the Chinese-translated items. After pilot tested the translated items, the patients and healthy participants completed the OCS-P as well as the Montreal Cognitive Assessment (MoCA-ChiB) and Goldenberg's test. A group of patients completed OCS-P for the second time within seven days. Data analyses included confirmatory factor analysis, item difficulty and item-total correlation, inter- and intrarater reliability, internal consistency, and between-group discrimination. Results One hundred patients and 120 younger (n = 60) or older (n = 60) healthy participants completed all the tests. Modifications were required for items in the "Picture Naming", "Orientation", and "Sentence Reading" subscales. Confirmatory factor analysis revealed a three-factor structure for the OCS-P subscales. The internal consistency coefficients for the three identified test dimensions were 0.30 to 0.52 (Cronbach's alpha). Construct validity coefficients between the OCS-P and MoCA-ChiB subscales were 0.45 < r < 0.79 (p < 0.001) and the "Praxis" subscale of OCS-P and Goldenberg's test was r = 0.72 (p < 0.001). The interrater reliability coefficients for the subscales were in general higher than the intrarater reliability coefficients. The "Picture Naming" and "Numerical Cognition" subscales were the most significant (p = 0.003) for differentiating patient participants from their older healthy counterpart. Conclusion This study generated satisfactory evidence on the content validity, substantive validity, construct validity, inter- and intrarater reliability, and known-group discrimination of the OCS-P. They support its application among poststroke patients who speak Putonghua. Future studies could review the existing five-dimension domains for improving its structural validity and internal consistency as well as generate evidence of the OCS-P for use among the poststroke patients with neglect.

Journal ArticleDOI
TL;DR: The Korean version of the Tinetti mobility test showed excellent validity and reliability for gait and balance and had high sensitivity and specificity for predicting falls among patients with Parkinson’s disease.
Abstract: Objective Postural instability and gait disturbance are the cardinal symptoms associated with falling among patients with Parkinson's disease (PD). The Tinetti mobility test (TMT) is a well-established measurement tool used to predict falls among elderly people. However, the TMT has not been established or widely used among PD patients in Korea. The purpose of this study was to evaluate the reliability and validity of the Korean version of the TMT for PD patients. Methods Twenty-four patients diagnosed with PD were enrolled in this study. For the interrater reliability test, thirteen clinicians scored the TMT after watching a video clip. We also used the test-retest method to determine intrarater reliability. For concurrent validation, the unified Parkinson's disease rating scale, Hoehn and Yahr staging, Berg Balance Scale, Timed-Up and Go test, 10-m walk test, and gait analysis by three-dimensional motion capture were also used. We analyzed receiver operating characteristic curve to predict falling. Results The interrater reliability and intrarater reliability of the Korean Tinetti balance scale were 0.97 and 0.98, respectively. The interrater reliability and intra-rater reliability of the Korean Tinetti gait scale were 0.94 and 0.96, respectively. The Korean TMT scores were significantly correlated with the other clinical scales and three-dimensional motion capture. The cutoff values for predicting falling were 14 points (balance subscale) and 10 points (gait subscale). Conclusion We found that the Korean version of the TMT showed excellent validity and reliability for gait and balance and had high sensitivity and specificity for predicting falls among patients with PD.

Journal ArticleDOI
TL;DR: The new ankle ROM measuring device was reliable and responsive for detecting IGC and the Silfverskiöld test had poor inter- and intrarater reliability.
Abstract: Background:Important aspects on the diagnostics of isolated gastrocnemius contractures (IGCs) have been poorly described. This study was designed to validate a new ankle range of motion (ROM) measuring device for diagnosing an IGC. In addition, we wanted to investigate the reliability of the clinical Silfverskiold test.Methods:Twelve health care personnel (24 feet) were examined by 4 testers on 3 different occasions for the reliability testing of the new ankle ROM measuring device. The same participants were examined using the Silfverskiold test to examine the reliability of the clinical test. Eleven patients (15 feet) with IGC were examined before gastrocnemius recession, immediately after surgery, and 3 months after surgery to examine the validity and responsiveness of the ankle ROM device.Results:An intraclass correlation coefficient (ICC) >0.85 was found for both inter- and intrarater reliability for the new ankle ROM device. The device confirmed an IGC in 13 of 15 feet before surgery and 3 of 13 feet...

Journal ArticleDOI
TL;DR: Evidence is provided to suggest that SATS is suitable in trauma-only and mixed EDs in low-resource settings and the correlation between years of nursing experience and reliability of the SATS was assessed.
Abstract: Objective The South African Triage Scale (SATS) has demonstrated good validity in the EDs of Medecins Sans Frontieres (MSF)-supported sites in Afghanistan and Haiti; however, corresponding reliability in these settings has not yet been reported on. This study set out to assess the inter-rater and intrarater reliability of the SATS in four MSF-supported EDs in Afghanistan and Haiti (two trauma-only EDs and two mixed (including both medical and trauma cases) EDs). Methods Under classroom conditions between December 2013 and February 2014, ED nurses at each site assigned triage ratings to a set of context-specific vignettes (written case reports of ED patients). Inter-rater reliability was assessed by comparing triage ratings among nurses; intrarater reliability was assessed by asking the nurses to retriage 10 random vignettes from the original set and comparing these duplicate ratings. Inter-rater reliability was calculated using the unweighted kappa, linearly weighted kappa and quadratically weighted kappa (QWK) statistics, and the intraclass correlation coefficient (ICC). Intrarater reliability was calculated according to the percentage of exact agreement and the percentage of agreement allowing for one level of discrepancy in triage ratings. The correlation between years of nursing experience and reliability of the SATS was assessed based on comparison of ICCs and the respective 95% CIs. Results A total of 67 nurses agreed to participate in the study: In Afghanistan there were 19 nurses from Kunduz Trauma Centre and nine from Ahmed Shah Baba; in Haiti, there were 20 nurses from Martissant Emergency Centre and 19 from Tabarre Surgical and Trauma Centre. Inter-rater agreement was moderate across all sites (ICC range: 0.50–0.60; QWK range: 0.50–0.59) apart from the trauma ED in Haiti where it was moderate to substantial (ICC: 0.58; QWK: 0.61). Intrarater agreement was similar across the four sites (68%–74% exact agreement); when allowing for a one-level discrepancy in triage ratings, intrarater reliability was near perfect across all sites (96%–99%). No significant correlation was found between years of nursing experience and reliability. Conclusion The SATS has moderate reliability in different EDs in Afghanistan and Haiti. These findings, together with concurrent findings showing that the SATS has good validity in the same settings, provide evidence to suggest that SATS is suitable in trauma-only and mixed EDs in low-resource settings.

Journal ArticleDOI
TL;DR: The intrarater reliability for each feature was consistently better than the interrater reliabilities, and the specific periocular soft tissue inflammatory features measured between raters in the Clinical Activity Score and Vision, Inflammation, Strabismus, Appearance scales, could reliably be measured by both 0-1 and 0-2 scales.
Abstract: PURPOSE: To determine the reliability of 3 scales for assessing soft tissue inflammatory and congestive signs associated with thyroid eye disease. METHODS: This was a multicentered prospective observational study, recruiting 55 adults with thyroid eye disease from 9 international centers. Six thyroid eye disease soft tissue features were measured; each sign graded using 3 scales (presence/absence [0-1], 3-point scale [0-2], and percentage [0-100]). Each eye was graded twice by 2 independent raters. Accuracy (fraction of agreement) was calculated between the 2 trials for each rater (intrarater reliability) and between raters for all trials (interrater reliability) to determine the most sensitive scale for each feature that maintained a threshold of agreement greater than 0.70. Trial, intrarater reliability, and interrater reliability were determined by accuracy measurement of agreement for each inflammatory/congestive feature. RESULTS: Fifty-five patients had 218 assessments for 6 thyroid eye disease metrics. The intrarater reliability for each feature was consistently better than the interrater reliabilities. Using an agreement of 0.70 or better, for the interrater tests, conjunctival and eyelid edema could be reliably measured using the 0-1 or 0-2 scale while conjunctival and eyelid redness could only be reliably measured with the binary 0-1 scale. Caruncular edema and superior conjunctival redness could not be measured reliably between 2 raters with any scale. The percentage scale had poor agreement unless slippage intervals of >20% were allowed on either side of the measurements. CONCLUSIONS: Of the specific periocular soft tissue inflammatory features measured between raters in the Clinical Activity Score and Vision, Inflammation, Strabismus, Appearance scales, edema of the eyelids and conjunctiva could reliably be measured by both 0-1 and 0-2 scales, erythema of the eyelid and bulbar conjunctiva could reliably be measured only by the 0-1 scale, and the other parameters of superior bulbar erythema and caruncular edema were not reliably measured by any scale.

Journal ArticleDOI
TL;DR: AIMS has acceptable reliability and concurrent validity for screening of motor developmental delay in high-risk infants in China and is well correlated with all PDMS-2 subtest scores.
Abstract: The Alberta Infant Motor Scale (AIMS) is widely used to screen for delays in motor development in high-risk infants, but its reliability and validity in Chinese infants have not been investigated. To examine the reliability and concurrent validity of AIMS in high-risk infants aged 0-9 months in China, this single-center study enrolled 50 high-risk infants aged 0-9 months (range, 0.17-9.27; average, 4.14±2.02), who were divided into two groups: 0-3 months (n=23) and 4-9 months (n=27). A physical therapist evaluated the infants with AIMS, with each evaluation video-recorded. To examine interrater reliability, two other evaluators calculated AIMS scores by observing the videos. To measure intrarater reliability, the two evaluators rescored AIMS after >1 month, using the videos. Concurrent validity was assessed by comparing results between AIMS and the Peabody Developmental Motor Scale-2 (PDMS-2). For all age groups analyzed (0-3, 4-9, and 0-9 months), intraclass correlation coefficients (ICCs) for AIMS total score were high for both intrarater comparisons (0.811-0.995) and interrater comparisons (0.982-0.997). AIMS total scores were well correlated with all PDMS-2 subtest scores (ICC=0.751-0.977 for reflexes, stationary, locomotion, grasping, and visual-motor integration subsets). However, the fifth percentile of AIMS total score was only moderately correlated with the gross motor quotient, fine motor quotient, and total motor quotient subtests of PDMS-2 (kappa=0.580, 0.601, and 0.724, respectively). AIMS has acceptable reliability and concurrent validity for screening of motor developmental delay in high-risk infants in China.

Journal ArticleDOI
TL;DR: A short version of the TGMD-2 with six motor skills has been proposed in this paper, which has appropriate indices of confirmatory factorial validity (root mean square error of approximation: 0.6, 90% confidence interval [0.06, 0.07] and test-retest reliability.
Abstract: Background Assessing children's motor skills is important for identifying children with delays, measuring learning, and determining teaching effectiveness. One popular assessment for measuring fundamental motor skills in children is the Test of Gross Motor Development-2 (TGMD-2). Although the TGMD-2 long form is widely known, a short form of the TGMD-2 has not been yet proposed and investigated. The aim of this study was to develop a short form of the TGMD-2 and to examine its validity, interrater reliability and test-retest reliability. Method Data from 2,463 Brazilian children were analyzed. Exploratory and confirmatory factor analysis was used to investigate the validity of reducing the number of TGMD-2 skills. Results The short-form version of the TGMD-2 with six skills has appropriate indices of confirmatory factorial validity (root mean square error of approximation: 0.06, 90% confidence interval [0.06, 0.07]; comparative fit index: 0.94; normed fit index: 0.94: Tucker-Lewis index: 0.83; goodness-of-fit index: 0.98; adjusted goodness-of-fit index: 0.95), internal consistency (α = 0.70 for the overall test), interrater and intrarater reliability (intraclass correlation coefficients values from 0.81 to 0.96) and test-retest reliability (r values from 0.55 to 0.95). Conclusions From these findings, practitioners now have a valid and reliable, short form of the TGMD-2 for use in assessing children's motor skill competence; promoting wider use of the test for screening and pedagogical purposes.

Journal ArticleDOI
TL;DR: Ultrasound measurement is a reliable and precise method to measure the internal subglottic diameter of the airway and may provide clinicians valuable information regarding airway diameter in adults and may help to guide treatment options.

Journal ArticleDOI
TL;DR: The Wheelchair Skills Test for Powered Wheelchair Users (WST-P 4.2) is a useful addition to the clinical tools available for clinicians who assess and train for powered wheelchair use and has excellent reliability and potential for clinical use as a pre-post measure of powered wheelchair skills.
Abstract: Purpose: The purpose of this study is to estimate the interrater and intrarater reliability of the Wheelchair Skills Test (WST) Version 4.2 for powered wheelchairs operated by adult users.Materials and methods: Cohort study with a convenience sample of occupational therapists (n = 10). For the main outcome measure, participants viewed and scored eight videos of adult power wheelchair users completing the 30 skills of the WST Version 4.2 on two occasions, a minimum of two weeks apart. Using these scores, we calculated intraclass correlation coefficients to estimate interrater and intrarater reliability.Results: The interrater reliability intraclass correlation coefficient was 0.940 (95%CI 0.862–0.985). Intrarater reliability intraclass correlation coefficients ranged from 0.923 to 0.998.Conclusions: The WST Version 4.2 has excellent interrater and intrarater reliability and is a reliable tool for use in clinical and research practice to evaluate a power wheelchair user’s skill capacity.Implications...

Journal ArticleDOI
TL;DR: The CMT-SCS has good reliability for infants up to 12 months of age and can be used for initial assessment of infants suspected to have CMT, and should be standard documentation for infants with CMT.
Abstract: Purpose To establish inter- and intrarater reliability for determining severity grades of the congenital muscular torticollis severity classification system (CMT-SCS). Methods A prospective reliability study with 145 physical therapists recorded severity ratings on 24 randomly-ordered patient cases including age of infant, cervical range of motion, and presence or absence of sternocleidomastoid mass. To compute intrarater reliability, cases were randomly reordered and graded by 82 of the original raters. Results For the CMT-SCS, overall reliability was good with an interrater reliability intraclass correlation coefficient (ICC) (2,1) of 0.83 (95% confidence interval [CI], 0.74-0.91) and an intrarater reliability ICC (3,1) of 0.81 (95% CI, 0.66-0.91). Conclusions The CMT-SCS has good reliability for infants up to 12 months of age. Physical therapists can use the scale for initial assessment of infants suspected to have CMT. The CMT-SCS should be standard documentation for infants with CMT.

Journal ArticleDOI
TL;DR: The present classifications for some of the most common cervical degenerative findings yielded mainly substantial inter-rater reliability estimates and substantial to almost perfect intra-raters reliability estimates.
Abstract: Knowledge about the assessment reliability of common cervical spine changes is a prerequisite for precise and consistent communication about Magnetic Resonance Imaging (MRI) findings. The purpose of this study was to determine the inter- and intra-rater reliability of degenerative findings when assessing cervical spine MRI. Fifty cervical spine MRIs from subjects with neck pain were used. A radiologist, a chiropractor and a second-year resident of rheumatology independently assessed kyphosis, disc height, disc contour, vertebral endplate signal changes, spinal canal stenosis, neural foraminal stenosis, and osteoarthritis of the uncovertebral and zygapophyseal joints. An evaluation manual was composed containing classifications and illustrative examples, and ten of the MRIs were evaluated twice followed by consensus meetings to refine the classifications. Next, the three readers independently assessed the full sample. Reliability measures were reported using prevalence estimates and unweighted kappa (Κ) statistics. The overall inter-rater reliability was substantial (Κ ≥ 0.61) for the majority of variables and moderate only for zygapophyseal osteoarthritis (Κ = 0.56). Intra-rater reliability estimates were higher for all findings. The present classifications for some of the most common cervical degenerative findings yielded mainly substantial inter-rater reliability estimates and substantial to almost perfect intra-rater reliability estimates. . Regional Data Protection Agency (J.no. 1–16–02-86-16 ). The letter of exemption from the Regional Ethical Committee is available from the author on request (case no. 86 / 2017).

Journal ArticleDOI
TL;DR: The algometer showed excellent inter- and intra-rater reliability on normal abdominal tissue and C-section scars and the modified adheremeter showed moderate criterion validity when compared against the NPRS.

Journal ArticleDOI
TL;DR: A validated clinical end point for measuring response to treatment in patients with hidradenitis suppurativa and the validity, responsiveness and meaningfulness of the HiSCR are reported.
Abstract: Background Hidradenitis suppurativa clinical response (HiSCR) is a validated clinical end point for measuring response to treatment in patients with hidradenitis suppurativa (HS). Previous studies have reported on the validity, responsiveness and meaningfulness of the HiSCR. Objective To evaluate the HiSCR for inter- and intrarater reliability characteristics. Methods A stand-alone, two-site, prospective, non-interventional observational study consisted of 22 patients, with self-reported severity between mild, moderate and severe HS. The Patient Global Impression of Change (PGI-C) scale was completed by patients at Timepoint 2. Descriptive statistics of Hurley Stage, total abscesses, total draining fistulas, total inflammatory nodules and total AN count (sum of inflammatory nodules and lesions) were reported at two timepoints. Inter-rater reliability and intrarater reliability for the HS lesion count tool were evaluated at two timepoints (baseline and Day 7) using the HS lesion count tool. Intraclass correlation (ICC) coefficients of lesion counts were calculated to evaluate inter- and intrarater reliability of lesion counts between pairs of dermatologists. Results The majority of patients demonstrated either no change or minimally worse PGI-C in HS scores. Descriptive statistics were similar between rater groups and timepoints assessed. Inter-rater ICC coefficients for abscess count at Timepoints 1 and 2 were 0.38 and 0.67. The ICC coefficients for draining fistula and AN count were ≥0.61 at both timepoints. In an exploratory model, ICC coefficients were ≥0.68 for all evaluated lesion counts. The test-retest reliability using ICC coefficients was ≥0.70 for total abscess, draining fistula, inflammatory nodule and AN count. Conclusion The HS lesion count tool had an acceptable inter- and intrarater reliability, indicating that HiSCR has a strong degree of reproducibility and consistency in the evaluation of patients with HS.

Journal ArticleDOI
27 Jun 2018
TL;DR: As only certain devices returned valid step measurements, continued testing in applied environments are needed to have confidence in utilizing technology to track health and activity goals.
Abstract: Because wearable technology is ubiquitous, it is important to determine validity and reliability not only in a laboratory setting, but applied environments where the general population utilizes the...