scispace - formally typeset
Search or ask a question

Showing papers on "Intra-rater reliability published in 1997"


Journal ArticleDOI
TL;DR: The goal of the Standardized Mini-Mental State Examination (SMMSE) was to impose strict guidelines for administration and scoring to improve the reliability of the instrument.
Abstract: The Mini-Mental State Examination (MMSE) is a widely used screening test for cognitive impairment in older adults. Because the guidelines for its application are brief, the administration and scoring of the test can vary between different individuals. This can diminish its reliability. Furthermore, some of the items must be changed to accommodate different settings, such as the clinic, home, or hospital. Because there are no time limits, it is not clear how long one should wait for a reply to a question. It is also not clear how one deals with answers that are "near misses." The goal of the Standardized Mini-Mental State Examination (SMMSE) was to impose strict guidelines for administration and scoring to improve the reliability of the instrument. The reliability of the MMSE was compared with the reliability of the SMMSE in 48 older adults who had the tests administered by university students on three different occasions to assess the interrater and intrarater reliability of the tests. The SMMSE had significantly better interrater and intrarater reliability compared with the MMSE: The interrater variance was reduced by 76% and the intrarater variance was reduced by 86%. It took less time to administer the SMMSE compared with the MMSE (average 10.5 minutes and 13.4 minutes, respectively). The intraclass correlation for the MMSE was .69, and .9 for the SMMSE. Administering and scoring the SMMSE on a task-by-task basis are discussed.

404 citations


Journal ArticleDOI
TL;DR: Clinical reliability was demonstrated for each technique; however, validity compared with the radiographic measurement could not be established and future research is necessary to establish interrater reliability and assess each technique's ability to detect postural changes over time.
Abstract: Clinicians often rely on visual inspection and descriptive terms to document a patient's forward shoulder posture. The purpose of this study was to assess the validity and intrarater reliability of four objective techniques to measure forward shoulder posture. Subjects were 25 males and 24 females. Subjects had a lateral cervical spine radiograph taken, from which the horizontal distance from the C7 spinous process to the anterior tip of the left anterior acromion process was measured. Subjects then proceeded twice through a random order of four measurements: the Baylor square, the double square, the Sahrmann technique, and scapular position. These results were then used to determine the intrarater reliability of each technique. Multiple regression analyses were performed on each measure's mean scores to determine both the correlation with and the predictive value for the radiographic measurement. The intraclass correlation coefficients for intrarater reliability ranged from .89 to .91. The correlation co...

101 citations


Journal ArticleDOI
TL;DR: The contention that some physical performance measures can be used to test individuals in the later stages of Alzheimer's disease given appropriate modification is supported.
Abstract: BACKGROUND: Investigation of the effects of exercise on frail, institutionalized individuals with dementia has been impeded by concerns about the reliability of physical performance measures when used in this population. METHODS: The physical performance of 33 institutionalized subjects with Alzheimer's disease was measured during both the morning and afternoon of day 1 by rater 1 and during both the morning and afternoon of day 2, one week later, by rater 1 and rater 2. Intraclass correlation coefficients (ICCs) were calculated to examine the inter- and intrarater reliability of "sit to stand," "25-foot walk," and "the distance walked in 6 minutes" and walking speed over 25 feet and for 6 minutes. An analysis of variance was performed to determine the components of variance for each test. RESULTS: ICCs for "distance walked in 6 minutes" ranged from .80 to .99 with 77% of the variance explained by inter-subject difference. The ICCs for "time to walk 25 feet" ranged from .57 to .97 with 25% of the variance explained by inter-subject differences. In contrast, the "sit to stand" measure produced ICCs ranging from -.07 to .85 with only 7% of the variance explained by inter-subject differences in this impaired population. CONCLUSION: Our results support the contention that some physical performance measures can be used to test individuals in the later stages of Alzheimer's disease given appropriate modification. Although subjects with Alzheimer's disease may have difficulty following commands and/or require physical assistance, this does not prohibit the reliable assessment of physical performance if measurements are made over longer (6-minute walk) rather than shorter periods (25-foot walk). Language: en

88 citations


Book
01 Jan 1997
TL;DR: In this article, the authors present an overview of mechanical system reliability and reliability design variables and models, mathematical basis of reliability Monte Carlo simulation, and cost integration physics-based reliability models system reliability analysis probabilistic crack growth and modelling.
Abstract: Overview of mechanical system reliability mechanical reliability design variables and models mathematical basis of reliability Monte Carlo simulation mechanical system reliability and cost integration physics-based reliability models system reliability analysis probabilistic crack growth and modelling.

86 citations


Book
27 Jul 1997
TL;DR: The aim of this monograph is to establish a baseline for a model of Reliability Modeling for System Predictions, and to demonstrate the importance of knowing this baseline before proceeding with further studies.
Abstract: I. THE RELIABILITY OBJECTIVE. 1. I Want What I Want When I Want It! 2. Defining Reliability. 3. Computing Reliability Parameters. II. MEASURING AND EVALUATING RELIABILITY. 4. Reliability Predictions. 5. Evaluating Data for Failure Rate Estimation. 6. Graphical Evaluation for Reliability Prediction. 7. Restorability. 8. Reliability Modeling for System Predictions. 9. Reliability Modeling of Complex Systems. 10. System Availability and Dependability. III. RELIABILITY ASSURANCE AND IMPROVEMENT. 11. Reliability and Restorability Demonstration Testing. 12. Reliability Growth Testing. 13. Risk Assessment. 14. Epilogue. Appendix A: Review of Probability and Statistics. Appendix B: Probability Tables. Appendix C: System MTBF and Availability Tables. Index.

61 citations


Journal ArticleDOI
01 Jul 1997-Sleep
TL;DR: The hypothesis that infant SP and SS ratings can be reliably scored at substantial levels of agreement is tested and supports the conclusion that the IPSG is a reliable source of clinical and research data when supported by significant kappa s and CIs.
Abstract: Infant polysomnography (IPSG) is an increasingly important procedure for studying infants with sleep and breathing disorders. Since analyses of these IPSG data are subjective, an equally important issue is the reliability or strength of agreement among scorers (especially among experienced clinicians) of sleep parameters (SP) and sleep states (SS). One basic issue of this problem was examined by proposing and testing the hypothesis that infant SP and SS ratings can be reliably scored at substantial levels of agreement, that is, kappa (kappa) > or = 0.61. In light of the importance of IPSG reliability in the collaborative home infant monitoring evaluation (CHIME) study, a reliability training and evaluation process was developed and implemented. The bases for training on SP and SS scoring were CHIME criteria that were modifications and supplements to Anders, Emde, and Parmelee (10). The kappa statistic was adopted as the method for evaluating reliability between and among scorers. Scorers were three experienced investigators and four trainees. Inter- and intrarater reliabilities for SP codes and SSs were calculated for 408 randomly selected 30-second epochs of nocturnal IPSG recorded at five CHIME clinical sites from healthy full term (n = 5), preterm (n = 4), apnea of infancy (n = 2), and siblings of the sudden infant death syndrome (SIDS) (n = 4) enrolled subjects. Infant PSG data set 1 was scored by both experienced investigators and trained scorers and was used to assess initial interrater reliability. Infant PSG data set 2 was scored twice by the trained scorers and was used to reassess inter-rater reliability and to assess intrarater reliability. The kappa s for SS ranged from 0.45 to 0.58 for data set 1 and represented a moderate level of agreement. Therefore, rater disagreements were reviewed, and the scoring criteria were modified to clarify ambiguities. The kappa s and confidence intervals (CIs) computed for data set 2 yielded substantial inter-rater and intrarater agreements for the four trained scorers; for SS, the kappa = 0.68 and for SP the kappa s ranged from 0.62 to 0.76. Acceptance of the hypothesis supports the conclusion that the IPSG is a reliable source of clinical and research data when supported by significant kappa s and CIs. Reliability can be maximized with strictly detailed scoring guidelines and training.

52 citations


Journal ArticleDOI
TL;DR: The intra-rater reliability of dial calipers for measurement of RAD was investigated by this study, using a repeated measures design, and high reliability was demonstrated for resting and active RAD measurements.
Abstract: To date, physiotherapists have relied upon the use of finger widths for measurement of rectus abdominis diastasis (RAD) This method has been proven unreliable, due to variations in finger widths The intra-rater reliability of dial calipers for measurement of RAD was investigated by this study, using a repeated measures design Measurements were taken at rest and during contraction on three occasions in 30 postpartum subjects High reliability was demonstrated for resting and active RAD measurements, (ICC = 093 and 095 respectively) In conclusion, dial calipers are a reliable measuring device when used by a single clinician Further testing is required to determine inter-rater reliability

51 citations


Journal ArticleDOI
TL;DR: Rigorous quality assurance standards and monitoring of clinical evaluators should be incorporated into the design of multicenter studies using MVIC, since low variability is necessary to detect a modest treatment effect.
Abstract: Maximal voluntary isometric contraction (MVIC) is becoming widely used for monitoring disease progression in amyotrophic lateral sclerosis (ALS). We evaluated the variability of MVIC in a large multicenter (29 sites) drug trial in ALS. Intra- and interrater variability were assessed twice during the 19-month study. Intrarater reliability increased from the first to the second test, approaching the reliability reported for a single experienced clinical evaluator, but interrater reliability did not. Multiple clinical evaluators in a single site increased the variability of MVIC measurements. Rigorous quality assurance standards and monitoring of clinical evaluators should be incorporated into the design of multicenter studies using MVIC, since low variability is necessary to detect a modest treatment effect.

43 citations


Journal ArticleDOI
TL;DR: The concepts of Reliability and Validity Explained with Examples Internal Consistency Validity || Reliability || Practicality What's the difference !?Measuring Reliability R Tutorial: Measurement, validity

38 citations


Journal ArticleDOI
TL;DR: In this paper, two types of analysis are proposed: the sensitivity analysis and the reliability analysis, which are applicable to problems related to reliability, availability, maintainability and safety (RAMS).

27 citations


Journal ArticleDOI
TL;DR: Intrarater reliability for DUE was good to excellent, however, interrater reliability exhibited only marginal reproducibility, particularly where evaluators were required to use subjective judgment (i.e., complications, clinical outcomes).
Abstract: OBJECTIVE:To test the reliability of drug utilization evaluation (DUE) applied to medications commonly used by the ambulatory elderly.METHODS:A DUE model was developed for four domains: (1) justification for use, (2) critical process indicators, (3) complications, and (4) clinical outcomes. DUE criteria specific to use in the elderly were developed for angiotensin-converting enzyme (ACE) inhibitors and histamine2 (H2)-antagonists, and consensus was reached by an external expert panel. After pilot testing, two clinical pharmacists independently evaluated these medications, applying the DUE criteria and rating each item as appropriate or inappropriate. Interrater and intrarater reliability was assessed by using κ statistics.RESULTS:In a sample of 208 ambulatory elderly veterans, 42 (20.2%) were taking an ACE inhibitor and 56 (26.9%) an H2-antagonist. The interrater agreement for individual domains, represented by κ statistics, were 0.10–0.58 and 0–0.83 for ACE inhibitors and H2-antagonists, respectively. Th...

Journal ArticleDOI
TL;DR: Reliability was found to be higher for men than for women and for subjects who claimed to have more rather than less experience in similar manual dexterity tasks, indicating that the assessment is most appropriate for a population of men with manual dexterity experience.
Abstract: OBJECTIVE The purpose of this study was to compare the test-retest reliability of three administrative methods of the Work Box: (a) the original instructions, (b) a revised version of the original instructions, and (c) another revised version that was based on suggestions made by authors of the first two versions of the instructions. METHOD Sixty subjects without disabilities were randomly grouped so that 20 subjects were tested per administrative method. The assessment was administered to each subject on two occasions, with a 7-day to 14-day period between tests. Scores were recorded as time in seconds, and intraclass correlation coefficients (ICCs) were used to calculate the reliability. RESULTS The ICCs for assembly, disassembly, and total scores were .589, .604, and .654, respectively, for the original instructions; .424, .572, and .545 for the revised instructions; and .781, .579, .717 for the second revised instructions. Reliability was found to be higher for men than for women and for subjects who claimed to have more rather than less experience in similar manual dexterity tasks. CONCLUSIONS On the basis of the reliability of each administrative method and comments made by subjects about their understanding of the instructions, the second revised version of the instructions is recommended as the standard method. The results also indicate that the assessment is most appropriate for a population of men with manual dexterity experience. With further standardization, the Work Box could be a valuable assessment tool for therapists working in industrial rehabilitation settings.

Journal ArticleDOI
TL;DR: In this article, a building-in reliability approach based on identifying and controlling the causes for reduced reliability is described and compared to the traditional testing-in-reliability approach, which is shown to be no longer a viable response to the aggressive reliability and market-entry demands facing the semiconductor industry.

Journal ArticleDOI
TL;DR: The purpose of the current study was to investigate the concurrent validity of on-site to videotape timing of the TFMs, followed by interrater, intrarater, and test-retest reliability of theTFMs battery with a frail, elderly population.
Abstract: Light's Timed-Functional-Movements (TFMs) battery is a timed functional mobility assessment for older persons. The purpose of the current study was to investigate the concurrent validity of on-site to videotape timing of the TFMs, followed by interrater, intrarater, and test-retest reliability of the TFMs battery with a frail, elderly population. Twenty frail, elderly subjects participated in two test sessions. Test movements were timed and videotaped on-site by a primary examiner. After establishing the high concurrent validity of videotape timing, the interrater, intrarater, and test-retest reliability of the TFMs were examined by videotape timed measures. This methodology was chosen to allow the test-retest reliability of the client's performance to be separated from the therapist's intrarater reliability. Reliability measures were calculated using intraclass correlation coefficients (ICCs; 2,1). The concurrent validity of videotape timing to on-site timing ranged from ICC = .90-1.0. Interrater ICCs fo...

Proceedings ArticleDOI
04 Jun 1997
TL;DR: Two interactive grading tools were deployed into a CSI environment, and consistency was compared to the conventional methodology, and a benchmark of rater consistency was obtained using conventional methodology.
Abstract: Significant problems with reliability and consistency of student performance evaluation have been well documented and largely ignored. Software tools have strong potential for improving student performance evaluation by providing interactive support for graders. Two such evaluation environments .were developed, and a benchmark of rater consistency was obtained using.conventional methodology. These two interactive grading tools were deployed into a CSI environment, and consistency was compared to the . conventional methodology. Results of the study and implications and applicability to other fields are discussed. Use of these tools is not limited to computer science courses. ’ 1 Background ! Performance evahration is critical to education. Rating gives students feedback about their performance so that they can learn from their successes and mistakes; rating also allows instructors to quantitatively determine how well students are performing and what areas and subjects need more attention in the classroom. Rating occurs in a variety of class environments; for example, some classes are small with only the instructor and no teaching assistant (TA), while other classes are large with one or more instructors and many teaching assistants. Larger classes with many raters face an important problem: increasing the number of raters (often TAs) in a class has the dangerous potential of decreasing the reliability of performance evaluation. From experience we know that students submit similar work to be evaIuated by different raters and receive different feedback and scores even when the quality of the student responses is almost identical. This inconsistency in rating undermines the feedback that students and instructors receive; students believe that incorrect responses are correct or correct responses are incorrect, and instructors may build false notions of student understauding. Inconsistency in evaluation also undermines the students’ confidence in the class and raters. Permission to make digital/hard copy of part or all this work for personal or classrobm use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. ITiCSE ‘97 Working Group Reports and Supplemental Proceedings

Journal ArticleDOI
TL;DR: The Mandibular Excursiometer had higher intrarater and interrater reliability for measuring deviation and deflection during active mandibular opening than observation alone, based on a comparison with the literature.
Abstract: Measurement tools improve the reliability and validity of measurement. The purpose of this study was to test the intrarater and interrater reliability of a new instrument, the Mandibular Excursiometer, for measuring mandibular excursion on the X and Y axis in the coronal plane during active opening. Two raters measured 12 volunteers. Four ratio, three nominal, and one ordinal scale measurements were analyzed using percent agreement. The Mandibular Excursiometer had high intrarater reliability for vertical opening (100%) and for the categorization of the presence or absence and direction of lateral deviation at the maximum point during opening (92–100%). Overall, moderate intrarater reliability existed for the quantity of lateral deviation at the maximum point during opening (66–83%), presence and direction of deflection (66–83%), presence of deviation or deflection during opening (66–83%), and in which third of opening the maximum point of lateral deviation occurred (66–83%). Moderate interrater reliabili...


Book ChapterDOI
01 Jan 1997

Journal ArticleDOI
TL;DR: In this article, the authors presented results on the reliability and validity of difference (gain) scores for the case where the pretest variance is less than the post-test variance.
Abstract: In 1996 Williams, et al. presented results on the reliability and validity of difference (gain) scores for the case where the pretest variance is less than the posttest variance. We extend this work to the remaining two possible cases—wherein the two variances are approximately equal and wherein the pretest variance exceeds the posttest variance. Plausible applied scenarios are presented for these two cases. Using these scenarios and varying the pretest-posttest reliabilities, validities, and inter-correlation, the resulting reliabilities and validities for the gain score are delineated. Our results provide the applied researcher with additional insights into the psychometric properties of gain scores in various potential situations.

Journal ArticleDOI
TL;DR: To develop and test quality of care process measures for three medical conditions of nursing home patients: fever, shortness of breath, and chest pain.
Abstract: OBJECTIVE: To develop and test quality of care process measures for three medical conditions of nursing home patients: fever, shortness of breath, and chest pain. DESIGN: Flowsheets designed to capture the critical elements of care for the above conditions were developed by an expert panel. Nursing home residents charts were reviewed retrospectively using the flow sheets. The reviews were translated into clinical scenarios, and the quality of care the scenarios represented was rated by an expert panel. SETTING: All nursing homes in Hennepin County, MN, that care for Medicaid patients. PATIENTS: A random sample of 1405 Medicaid nursing home residents from 1984 and 1988. MEASURES: Measures of quality of physician assessment and intervention, quality of nurse assessment and intervention, and global quality were developed and the intra- and interrater reliability were tested. The measures' validity was assessed by their ability to predict resident death. RESULTS: Intrarater reliability was measured as the correlation of the ratings of blinded duplicates. The correlation for the global scale and the four subscales ranged from. 74 to. 88 (P <.001 for all). Interrater reliability was tested by examining what percentage of the quality ratings were within one unit (1–5 scale) for all three raters. All three raters were within one unit for more than 72% scenarios for all scales. The subscale of quality of physician assessment was able to predict resident death when the worst episode of care (OR =. 47, 95%CI(.31−.74)) or the mean episode of care (OR. 54, 95% CI(.30−.99)) was used. None of the other subscales or the global measure predicted death. CONCLUSIONS: Through the use of an expert panel, measures of nursing home quality of care were developed for shortness of breath, fever, and chest pain. These measures have reasonable reliability and significant face validity. Their validity is supported further by the ability of one of the measures to predict resident death.


09 Jan 1997
TL;DR: This volume in the OHA-report series deals with the statistical reliability assessment of software based systems on the basis of dynamic test results and qualitative evidence from the system design process.
Abstract: Plant vendors nowadays propose software-based systems even for the most critical safety functions. The reliability estimation of safety critical software-based systems is difficult since the conventional modeling techniques do not necessarily apply to the analysis of these systems, and the quantification seems to be impossible. Due to lack of operational experience and due to the nature of software faults, the conventional reliability estimation methods can not be applied. New methods are therefore needed for the safety assessment of software-based systems. In the research project “Programmable automation systems in nuclear power plants (OHA)”, financed together by the Finnish Centre for Radiation and Nuclear Safety, the Ministry of Trade and Industry and the Technical Research Centre of Finland, various safety assessment methods and tools for software based systems are developed and evaluated. This volume in the OHA-report series deals with the statistical reliability assessment of software based systems on the basis of dynamic test results and qualitative evidence from the system design process. Other reports to be published later on in OHA-report series will handle the diversity requirements in safety critical software-based systems, generation of test data from operational profiles and handling of programmable automation in plant PSA-studies.

BookDOI
01 Jan 1997
TL;DR: Reliability, Quality and safety of software-intensive systems: IFIP TC5 WG5.4 3rd international conference on reliability, quality and safety as discussed by the authors, 29th-30th may 1997, Athens, Greece
Abstract: Reliability, Quality and safety of software-Intensive systems: IFIP TC5 WG5.4 3rd international conference on reliability, quality and safety of software-intensive systems (ENCRESS; 97), 29th-30th may 1997, Athens, Greece , Reliability, Quality and safety of software-Intensive systems: IFIP TC5 WG5.4 3rd international conf... , کتابخانه دیجیتال جندی شاپور اهواز

Proceedings ArticleDOI
08 Dec 1997
TL;DR: A possible strategy for evaluating the reliability of an active control system is presented moving from the evaluation of the reliabilityof each single component to a sensitivity analysis as a measure of the importance of each sub-system in the global reliability assessment.
Abstract: The use of control systems as a means of structural protection against environmental loads, such as wind and earthquakes, has received considerable attention in recent years. Reliability is the core component toward the best performance of a control system. A possible strategy for evaluating the reliability of an active control system is presented moving from the evaluation of the reliability of each single component. A sensitivity analysis is also performed as a measure of the importance of each sub-system in the global reliability assessment.

Book ChapterDOI
01 Jan 1997
TL;DR: A case study is described aimed at investigating the relation between test coverage and reliability growth and proposes a short survey of proposed measures of test coverage in software reliability estimation.
Abstract: In recent works some authors proposed to use measures of test coverage in software reliability estimation. They suggested to use these measures either to improve the predictive accuracy of classical reliability growth models or to provide a direct estimation of reliability. This paper provides a short survey of these approaches and describes a case study aimed at investigating the relation between test coverage and reliability growth. Results of the case study are analysed and used to discuss the validity of the proposed approaches.



Journal ArticleDOI
TL;DR: In this paper, a nonverbal cognitive group-screening measure for 5- to 9-year-old children was presented. But the data from subgroups of 1397 5-to 9-yr. old children were not included.
Abstract: Psychometric data from subgroups of 1397 5- to 9-yr. old children on a nonverbal cognitive group-screening measure are reported.