scispace - formally typeset
Search or ask a question

Showing papers on "Reliability (statistics) published in 2005"


Journal ArticleDOI
TL;DR: The issue of statistical testing of kappa is considered, including the use of confidence intervals, and appropriate sample sizes for reliability studies using kappa are tabulated.
Abstract: Purpose. This article examines and illustrates the use and interpretation of the kappa statistic in musculoskeletal research. Summary of Key Points. The reliability of clinicians' ratings is an important consideration in areas such as diagnosis and the interpretation of examination findings. Often, these ratings lie on a nominal or an ordinal scale. For such data, the kappa coefficient is an appropriate measure of reliability. Kappa is defined, in both weighted and unweighted forms, and its use is illustrated with examples from musculoskeletal research. Factors that can influence the magnitude of kappa (prevalence, bias, and nonindependent ratings) are discussed, and ways of evaluating the magnitude of an obtained kappa are considered. The issue of statistical testing of kappa is considered, including the use of confidence intervals, and appropriate sample sizes for reliability studies using kappa are tabulated. Conclusions. The article concludes with recommendations for the use and interpretation of kappa.

3,427 citations


01 Jan 2005
TL;DR: The Index of Learning Styles (ILS) as mentioned in this paper is an instrument designed to assess preferences on the four dimensions of the Felder-Silverman learning style model and has been used hundreds of thousands of times per year.
Abstract: RICHARD M. FELDER and JONI SPURLIN North Carolina State University, Raleigh, North Carolina 27695±7905, USA. E-mail: rmfelder@mindspring.com The Index of Learning Styles (ILS) is an instrument designed to assess preferences on the four dimensions of the Felder-Silverman learning style model. The Web-based version of the ILS is taken hundreds of thousands of times per year and has been used in a number of published studies, some of which include data reflecting on the reliability and validity of the instrument. This paper seeks to provide the first comprehensive examination of the ILS, including answers to several questions: (1) What are the dimensions and underlying assumptions of the model upon which the ILS is based? (2) How should the ILS be used and what misuses should be avoided? (3) What research studies have been conducted using the ILS and what conclusions regarding its reliability and validity may be inferred from the data?

1,259 citations


Journal ArticleDOI
TL;DR: It can be shown that the average F-measure among pairs of experts is numerically identical to the average positive specific agreement among experts and that kappa approaches these measures as the number of negative cases grows large.

779 citations


Journal ArticleDOI
TL;DR: This paper forms reliability models based on both the PMP and the UFLP and presents an optimal Lagrangian relaxation algorithm to solve them, and discusses how to use these models to generate a trade-off curve between the day-to-day operating cost and the expected cost, taking failures into account.
Abstract: Classical facility location models like the P-median problem (PMP) and the uncapacitated fixed-charge location problem (UFLP) implicitly assume that, once constructed, the facilities chosen will always operate as planned. In reality, however, facilities "fail" from time to time due to poor weather, labor actions, changes of ownership, or other factors. Such failures may lead to excessive transportation costs as customers must be served from facilities much farther than their regularly assigned facilities. In this paper, we present models for choosing facility locations to minimize cost, while also taking into account the expected transportation cost after failures of facilities. The goal is to choose facility locations that are both inexpensive under traditional objective functions and also reliable. This reliability approach is new in the facility location literature. We formulate reliability models based on both the PMP and the UFLP and present an optimal Lagrangian relaxation algorithm to solve them. We discuss how to use these models to generate a trade-off curve between the day-to-day operating cost and the expected cost, taking failures into account, and we use these trade-off curves to demonstrate empirically that substantial improvements in reliability are often possible with minimal increases in operating cost.

703 citations



Journal ArticleDOI
TL;DR: Methods that can be used to assess reliability and how data from reliability analyses can aid the interpretation of results from rehabilitation interventions are presented.
Abstract: To evaluate the effects of rehabilitation interventions, we need reliable measurements. The measurements should also be sufficiently sensitive to enable the detection of clinically important changes. In recent years, the assessment of reliability in clinical practice and medical research has developed from the use of correlation coefficients to a comprehensive set of statistical methods. In this review, we present methods that can be used to assess reliability and describe how data from reliability analyses can aid the interpretation of results from rehabilitation interventions.

536 citations


Book
19 Aug 2005
TL;DR: This monograph is the first book that gives a comprehensive description of the universal generating function technique and its applications in binary and multi-state system reliability analysis.

502 citations


Journal ArticleDOI
TL;DR: Evaluations of two recent road pricing demonstrations in southern California provide particularly useful opportunities for measuring commuters' values of time and reliability, and both sets of studies find that the value of time saved on the morning commute is quite high and reliability is also valued quite highly.
Abstract: This paper compares results from evaluations of two recent road pricing demonstrations in southern California. These projects provide particularly useful opportunities for measuring commuters’ values of time and reliability. Unlike most revealed preference studies of value of time, the choice to pay to use the toll facilities in these demonstrations is relatively independent from other travel choices such as whether to use public transit. Unlike most stated preference studies, the scenarios presented in these surveys are real ones that travelers have faced or know about from media coverage. By combining revealed and stated preference data, some of the studies have obtained enough independent variation in variables to disentangle effects of cost, time, and reliability, while still grounding the results in real behavior. Both sets of studies find that the value of time saved on the morning commute is quite high (between $20 and $40 per hour) when based on revealed behavior, and less than half that amount when based on hypothetical behavior. When satisfactorily identified, reliability is also valued quite highly. There is substantial heterogeneity in these values across the population, but it is difficult to isolate its exact origins.

475 citations



Journal ArticleDOI
TL;DR: An assessment of the proposed methodology indicates that its adoption in MIS research would greatly improve the rigor of construct development projects and its performance when compared to a number of prominent standards for assessing construct development research.
Abstract: This paper presents a comprehensive methodology for developing constructs in MIS research. It is applicable to both individual and organizational levels of analysis, depending on the nature of the concept under study. The methodology is presented as a research guide progressing through three stages: (1) domain definition, (2) instrument construction, and (3) evaluation of measurement properties. The methodology addresses six key measurement properties (content validity, factorial validity, reliability, convergent validity, discriminant validity, nomological validity), which are discussed in detail. An assessment of the proposed methodology indicates that its adoption in MIS research would greatly improve the rigor of construct development projects. This is evidenced by the wide range of quality publications that have used its techniques and its performance when compared to a number of prominent standards for assessing construct development research.

443 citations



Proceedings ArticleDOI
15 Aug 2005
TL;DR: It is found that the t-test is highly reliable (more so than the sign or Wilcoxon test), and is far more reliable than simply showing a large percentage difference in effectiveness measures between IR systems.
Abstract: The effectiveness of information retrieval systems is measured by comparing performance on a common set of queries and documents. Significance tests are often used to evaluate the reliability of such comparisons. Previous work has examined such tests, but produced results with limited application. Other work established an alternative benchmark for significance, but the resulting test was too stringent. In this paper, we revisit the question of how such tests should be used. We find that the t-test is highly reliable (more so than the sign or Wilcoxon test), and is far more reliable than simply showing a large percentage difference in effectiveness measures between IR systems. Our results show that past empirical work on significance tests over-estimated the error of such tests. We also re-consider comparisons between the reliability of precision at rank 10 and mean average precision, arguing that past comparisons did not consider the assessor effort required to compute such measures. This investigation shows that assessor effort would be better spent building test collections with more topics, each assessed in less detail.


Journal ArticleDOI
TL;DR: In this article, a functional relationship between failure rate and maintenance measures has been developed for a cable component and the results show the value of using a systematic quantitative approach for investigating the effect of different maintenance strategies.
Abstract: This paper proposes a method for comparing the effect of different maintenance strategies on system reliability and cost. This method relates reliability theory with the experience gained from statistics and practical knowledge of component failures and maintenance measures. The approach has been applied to rural and urban distribution systems. In particular, a functional relationship between failure rate and maintenance measures has been developed for a cable component. The results show the value of using a systematic quantitative approach for investigating the effect of different maintenance strategies.

Journal ArticleDOI
TL;DR: There was evidence that the BI might be less reliable in patients with cognitive impairment and when scores obtained by patient interview are compared with patient testing, and there remain important uncertainties concerning its reliability when used with older people.
Abstract: Objective: the Barthel Index (BI) has been recommended for the functional assessment of older people but the reliability of the measure for this patient group is uncertain. To investigate this issue we undertook a systematic review to identify relevant studies from which an overview is presented. Method: studies investigating the reliability of the BI were obtained by searching Medline, Cinahl and Embase to January 2003. Screening for potentially relevant papers and data extraction of the studies meeting the inclusion criteria were carried out independently by two researchers. Results: the scope of the 12 studies identified included all the common clinical settings relevant to older people. No study investigated test-retest reliability. Inter-rater reliability was reported as 'fair' to 'moderate' agreement for individual BI items, and a high percentage agreement for the total BI score. However, these findings were difficult to interpret as few studies reported the prevalence of the disability categories for the study populations. There may be considerable inter-observer disagreement (95% CI of +/-4 points). There was evidence that the BI might be less reliable in patients with cognitive impairment and when scores obtained by patient interview are compared with patient testing. The role of assessor training and/or guidelines on the reliability of the BI has not been investigated. Conclusions: although the BI is highly recommended, there remain important uncertainties concerning its reliability when used with older people. Further studies are justified to investigate this issue.

Proceedings ArticleDOI
Kinarn Kim1, Su Jin Ahn1
17 Apr 2005
TL;DR: The reliability issues for high-density commercial memory products such as disturbance immunity, endurance, and data retention are addressed and evaluated by using a 64 Mb PRAM with 0.12 /spl mu/m technology.
Abstract: In this paper, PRAM (phase-change memory), exploiting new memory materials called chalcogenides, is introduced. The reliability issues for high-density commercial memory products such as disturbance immunity, endurance, and data retention are addressed and evaluated by using a 64 Mb PRAM with 0.12 /spl mu/m technology. Moreover, observed degradation modes and underlying physical mechanisms are investigated.

Journal ArticleDOI
TL;DR: In this article, an enriched performance measure approach is presented for reliability-based design optimization to substantially improve computational efficiency when applied to large-scale applications, where the authors show that deterministic design optimization helps improve numerical efficiency by reducing some reliability based design optimization iterations.
Abstract: An enriched performance measure approach is presented for reliability-based design optimization to substantially improve computational efficiency when applied to large-scale applications. In the enriched performance measure approach, four improvements are made over the original performance measure approach: as a way to launch reliability-based design optimization at a deterministic optimum design, as a new enhanced hybrid-mean value method, as an efficient probabilistic feasibility check, and as a fast reliability analysis under the condition of design closeness. It is found that deterministic design optimization helps improve numerical efficiency by reducing some reliability-based design optimization iterations. In reliability-based design optimization, a computational burden on the feasibility check of constraints can be significantly reduced by using a mean value first-order method and by carrying out the refined reliability analysis using the enhanced hybrid-mean value method for e-active and violated constraints. The enhanced hybrid-mean value method is developed to handle nonlinear and/or nonmonotonic constraints in reliability analysis. The fast reliability analysis method is proposed to efficiently evaluate probabilistic constraints under the condition of design closeness. Moreover, two numerical examples are provided to compare the enriched performance measure approach to existing reliability-based design optimization methods from a numerical efficiency and stability point of view.

Journal ArticleDOI
TL;DR: The reliability of electric transmission systems is examined using a scale-free model of network topology and failure propagation, and the results suggest that scale- free network models are usable to estimate aggregate electric grid reliability.
Abstract: The reliability of electric transmission systems is examined using a scale-free model of network topology and failure propagation. The topologies of the North American eastern and western electric grids are analyzed to estimate their reliability based on the Barabasi–Albert network model. A commonly used power system reliability index is computed using a simple failure propagation model. The results are compared to the values of power system reliability indices previously obtained using standard power engineering methods, and they suggest that scale-free network models are usable to estimate aggregate electric grid reliability.

Journal ArticleDOI
TL;DR: It is suggested that it is the research procedure itself that is unreliable, and this lack of reliability may strongly contribute to the lack of convergence in empirical studies on software prediction models.
Abstract: Empirical studies on software prediction models do not converge with respect to the question "which prediction model is best?" The reason for this lack of convergence is poorly understood. In this simulation study, we have examined a frequently used research procedure comprising three main ingredients: a single data sample, an accuracy indicator, and cross validation. Typically, these empirical studies compare a machine learning model with a regression model. In our study, we use simulation and compare a machine learning and a regression model. The results suggest that it is the research procedure itself that is unreliable. This lack of reliability may strongly contribute to the lack of convergence. Our findings thus cast some doubt on the conclusions of any study of competing software prediction models that used this research procedure as a basis of model comparison. Thus, we need to develop more reliable research procedures before we can have confidence in the conclusions of comparative studies of software prediction models.

Journal ArticleDOI
TL;DR: A Monte-Carlo (MC) simulation methodology for estimating the reliability of a multi-state network and it is discussed that the MC approach consistently yields accurate results while the accuracy of the bounding methodologies can be dependant on components that have considerable impact on the system design.

Journal ArticleDOI
TL;DR: Although the author has achieved the purposes he sets out in Chapter 1, I can recommend this book only to experienced statisticians and analysts who either have access to SAS EM or are trying to justify its purchase to their management.
Abstract: Several of the logic structures and formats used leave much to be desired. Part I leaves the reader with only the barest explanation of which methods apply to which problems. There are rather large leaps in mathematical symbolism that spans algebra, integral calculus, matrices, sets, and directed graphs. Questions regarding the details of computation, code, algorithms, mathematics, and statistics are directed to references. This tactic would be appealing had the citations referred to specific chapters and pages. Another frequent irritation is that much terminology is introduced without definition and many terms are not explained until subsequent chapters. References to case studies in the methods section are few, and the case studies themselves do not obviously point back to the methods section that they support. Consequently, it is easy to get lost in tangential concepts, which could have been avoided with a more sequential presentation. In Part II, each case is presented according to the following outline: Objectives, description of the data, EDA, model building, model comparison, and summary report. Only the models change between cases. The individual cases are well organized, and the structure of the presentation will appeal to any scientifically inclined reader. My only serious criticism of Part II is that the defense of the “best” method chosen often seems highly subjective, especially when computational methods are being evaluated. Although the author has achieved the purposes he sets out in Chapter 1, I can recommend this book only to experienced statisticians and analysts who either have access to SAS EM or are trying to justify its purchase to their management. It might also be of value to users trying to assemble a set of DM procedures, because SAS has clearly defined the state of the art in DM packages. Those interested in the do-it-yourself SAS approach may wish to consult the texts by Fernandez (2003) and Rud (2001), reported in Technometrics by Caby (2004) and Ziegel (2002).

Journal ArticleDOI
TL;DR: It was determined that reliability coefficients for HRV measures were highly varied, and although no single HRV measurement appeared less reliable than another, there was evidence that optimal data collection conditions for specific frequency domain measures exist.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed an approach based on consideration of an augmented reliability problem, where the design parameters are artificially considered as uncertain and the desired information about reliability sensitivity can be extracted through failure analysis of the augmented problem.

Journal ArticleDOI
TL;DR: In this article, a new response surface called ADAPRES is proposed, in which a weighted regression method is applied in place of normal regression, and the experimental points are also selected from the region where the design point is most likely to exist.

01 Jan 2005
TL;DR: In this paper, the authors analyse the components leading to the effectiveness of high reliability teams by examining the teams that comprise them and propose guidelines and developmental strategies that will help the healthcare community to shift more quickly to high reliability status by not focusing solely on the organizational level.
Abstract: Many organizations have been using teams as a means of achieving organizational outcomes (such as productivity and safety). Research has indicated that teams, especially those operating in complex environments, are not always effective. There is a subset of organizations in which teams operate that are able to balance effectiveness and safety despite the complexities of the environment (for example, aviation, nuclear power). These high reliability organizations (HROs) have begun to be examined as a model for those in other complex domains, such as health care, that strive to reach a status of high reliability. In this paper we analyse the components leading to the effectiveness of HROs by examining the teams that comprise them. We use a systems perspective to uncover the behavioral markers by which high reliability teams (HRTs) are able to uphold the values of their parent organizations, thereby promoting safety. Using these markers, we offer guidelines and developmental strategies that will help the healthcare community to shift more quickly to high reliability status by not focusing solely on the organizational level.

Journal ArticleDOI
TL;DR: In this article, the authors investigated the application of multiobjective evolutionary algorithms to the identification of the payoff characteristic between total cost and reliability of a water distribution system using the well-known "AnyTown" network as an example.
Abstract: This paper investigates the application of multiobjective evolutionary algorithms to the identification of the payoff characteristic between total cost and reliability of a water distribution system using the well-known "Anytown" network as an example. An expanded rehabilitation problem is considered where the design variables are the pipe rehabilitation decisions, tank sizing, tank siting, and pump operation schedules. To provide flexibility, the network is designed and operated under multiple loading conditions. Inclusion of pump operation schedules requires consideration of water system operation over an extended period. The cost of the solution includes the capital costs of pipes and tanks as well as the present value of the energy consumed during a specified period. Optimization tends to reduce costs by reducing the diameter of, or completely eliminating, some pipes, thus leaving the system with insufficient capacity to respond to pipe breaks or demands that exceed design values without violating required performance levels. A resilience index is considered as a second objective to increase the hydraulic reliability and availability of water during pipe failures. Sensitivity analysis of solutions on the payoff curve generated by twin-objective optimization shows poor performance of these networks under random pipe failure or a pump being out of service. The minimum surplus head is added as a third objective to overcome the shortcomings of the resilience index. Results are presented for the payoff characteristics between total cost and reliability for 24 h design and five loading conditions.

Journal ArticleDOI
TL;DR: The model can be used not only to determine the reliability of the degraded systems in the context of multi-state functions, but also to obtain the states of the systems by calculating the system state probabilities.
Abstract: In this paper, we develop a generalized multi-state degraded system reliability model subject to multiple competing failure processes, including two degradation processes, and random shocks. The operating condition of the multi-state systems is characterized by a finite number of states. We also present a methodology to generate the system states when there are multi-failure processes. The model can be used not only to determine the reliability of the degraded systems in the context of multi-state functions, but also to obtain the states of the systems by calculating the system state probabilities. Several numerical examples are given to illustrate the concepts.

Journal ArticleDOI
TL;DR: In this article, the authors presented three artificial neural network (ANN)-based reliability analysis methods, i.e., ANN-based Monte-Carlo simulation (MCS), the first-order reliability methods (FORM) and the secondorder reliability method (SORM), for structural safety evaluation.

Book
01 Jan 2005
TL;DR: This book discusses Guidelines for Identifying and Correcting for Error in Measure Development Generic Issues in Designing Psychometric Tests, and Implications for Using Measures in Research Design andimplications for using Structural Equation Modeling Implications in Applied Research.
Abstract: Foreword - Richard Bagozzi Preface Acknowledgments 1. WHAT IS MEASUREMENT? Overview What Is Measurement Error? Overview of Traditional Measure Development Procedures Conceptual and Operational Definitions Domain Delineation Measure Design and Item Generation Internal Consistency Reliability Test-Retest Reliability Dimensionality - Exploratory Factor Analysis Dimensionality - Confirmatory Factor Analysis and Structural Equation Modeling Validity General Issues in Measurement Summary Appendices 2. WHAT IS MEASUREMENT ERROR? Overview Random Error Systematic Error Types of Random and Systematic Error Illustrations of Measurement Error Through Error Patterns Patterns of Responses in Measurement Error Summary Appendix 3. WHAT CAUSES MEASUREMENT ERROR? Overview Sources of Measurement Error Taxonomy of Error Sources Summary 4. CAN EMPIRICAL PROCEDURES PINPOINT TYPES OF MEASUREMENT ERROR? Overview Internal Consistency Reliability Procedures Test-Retest Reliability Procedures Factor Analysis Procedures Validity Tests Summary 5. HOW CAN MEASUREMENT ERROR BE IDENTIFIED AND CORRECTED FOR IN MEASURE DEVELOPMENT? Overview Guidelines for Identifying and Correcting For Error in Measure Development Generic Issues in Designing Psychometric Tests Item-to-Total Correlations (Internal Consistency Procedures) Item Means Test-Retest Correlations (Test-Retest Reliability) Factor Loadings (Exploratory Factor Analysis) Residuals (Confirmatory Factor Analysis) Cross-Construct Correlations (Validity Tests) Conditions of Future Use of Measures Discussion Summary 6. HOW CAN ERROR BE IDENTIFIED THROUGH INNOVATIVE DESIGN AND ANALYSES? Overview Using Internal Consistency and Test-Retest Reliability in Conjunction Using Correlations Across Item-Level Correlations Empirical Assessment of Item-Sequencing Effects Summary 7. HOW DO MEASURES DIFFER? Overview Stimulus-Centered Versus Respondent-Centered Scales Formative and Reflective Indicators of Constructs Summary 8. WHAT ARE EXAMPLES OF MEASURES AND MEASUREMENT ACROSS VARIOUS DISCIPLINES? Overview Types of Measures Types of Response Formats Specific Examples of Scales From Different Disciplines Cross-Cultural Measurement Summary 9. WHAT ARE THE IMPLICATIONS OF UNDERSTANDING MEASUREMENT ERROR FOR RESEARCH DESIGN AND ANALYSIS? Overview Implications for Using Measures in Research Design Implications for Using Structural Equation Modeling Implications for Applied Research Summary 10. HOW DOES MEASUREMENT ERROR AFFECT RESEARCH DESIGN? Overview Types of Research Designs Measurement Error in Survey Designs Measurement Error in Experimental Designs Research Design and Measurement Error Summary Appendices 11. WHAT IS THE ROLE OF MEASUREMENT IN SCIENCE? Overview Assumptions of Measurement Qualitative Versus Quantitative Research Measuring the "Measurable" From Physical to Psychological Measurement Informal Measurement Ethics in Measurement Summary 12. WHAT ARE THE KEY PRINCIPLES AND GUIDING ORIENTATIONS OF THIS BOOK? Overview Summary of Chapters Implications for Measurement and Research Design Summary of Orientations References Index About the Author

Journal ArticleDOI
TL;DR: In this article, the authors proposed a probabilistic branch and bound method for solving the transmission system expansion planning problem subject to practical future uncertainties, which minimizes the investment budget for constructing new transmission lines subject to the uncertainties of transmission system elements.
Abstract: This paper proposes a method for choosing the best transmission system expansion plan considering a probabilistic reliability criterion (/sub R/LOLE). The method minimizes the investment budget for constructing new transmission lines subject to probabilistic reliability criteria, which consider the uncertainties of transmission system elements. Two probabilistic reliability criteria are used as constraints. One is a transmission system reliability criterion (/sub R/LOLE/sub TS/) constraint, and the other is a bus/nodal reliability criterion (/sub R/LOLE/sub Bus/) constraint. The proposed method models the transmission system expansion problem as an integer programming problem. It solves for the optimal strategy using a probabilistic branch and bound method that utilizes a network flow approach and the maximum flow-minimum cut set theorem. Test results on an existing 21-bus system are included in the paper. They demonstrate the suitability of the proposed method for solving the transmission system expansion planning problem subject to practical future uncertainties.