scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Reliability in 1993"


Journal ArticleDOI
TL;DR: In this article, a simple generalization of the Weibull distribution is presented, which is well suited for modeling bathtub failure rate lifetime data and for testing goodness-of-fit of the weibull and negative exponential models as subhypotheses.
Abstract: A simple generalization of the Weibull distribution is presented. The distribution is well suited for modeling bathtub failure rate lifetime data and for testing goodness-of-fit of the Weibull and negative exponential models as subhypotheses. >

1,028 citations


Journal ArticleDOI
TL;DR: This work presents an algorithm to generate all d-MCs from each MC for each system capacity level d, and compares it with the algorithm given by J. Xue (1985).
Abstract: Many systems can be regarded as flow networks whose arcs have discrete and multi-valued random capacities. The probability of the maximum flow at each various level and the reliability of such a flow network can be calculated in terms of K-lattices which are generated from each subset of the family of all MCs (minimal cutsets). However the size of such a family 2/sup m/-1 (m=number of MCs) grows exponentially with m. Such a flow network can be considered as a multistate system with multistate components so that its reliability can be evaluated in terms of upper boundary points of each level d (named d-MCs here). This work presents an algorithm to generate all d-MCs from each MC for each system capacity level d. The new algorithm is analyzed and compared with the algorithm given by J. Xue (1985). Examples show how all d-MCs are generated; the reliability of one example is computed. >

231 citations


Journal ArticleDOI
TL;DR: A software-reliability growth model incorporating the amount of test effort expended during the software testing phase is developed and the method of data analysis for software reliability measurement is developed.
Abstract: Software reliability measurement during the testing phase is essential for examining the degree of quality or reliability of a developed software system. A software-reliability growth model incorporating the amount of test effort expended during the software testing phase is developed. The time-dependent behavior of test-effort expenditures is described by a Weibull curve. Assuming that the error detection rate to the amount of test effort spent during the testing phase is proportional to the current error content, the model is formulated by a nonhomogeneous Poission process. Using the model, the method of data analysis for software reliability measurement is developed. This model is applied to the prediction of additional test-effort expenditures to achieve the objective number of errors detected by software testing, and the determination of the optimum time to stop software testing for release. >

193 citations


Journal ArticleDOI
TL;DR: In this paper, a decomposition method based on branch and bound is used for solving the problem of network topological optimization with a reliability constraint, where the objective is to find the topological layout of links, at a minimal cost, under the constraint that the reliability is not less than a given level of system reliability.
Abstract: Network topological optimization with a reliability constraint is considered. The objective is to find the topological layout of links, at a minimal cost, under the constraint that the network reliability is not less than a given level of system reliability. A decomposition method, based on branch and bound, is used for solving the problem. In order to speed up the procedure, an upper bound on system reliability, in terms of node degrees, is applied. A numerical example illustrates the effectiveness of the method. >

179 citations


Journal ArticleDOI
TL;DR: In this paper, a general repairable system model that includes age and preventive maintenance dependent failure rates is proposed, which introduces the concept of system availability as a random variable, leading to availability measures for individual realizations, rather than for the average over many trials.
Abstract: A general repairable system model that includes age and preventive maintenance (PM) dependent failure rates is proposed. This model introduces the concept of system availability as a random variable, leading to availability measures for individual realizations, rather than for the average over many trials. Thus this model generalizes the classical definition, and serves as a performance index for system design. A design scheme suggested to maximize the probability of achieving a specified stochastic cycle availability with respect to the duration of the operating interval between PMs. Numerical computations of the results for Weibull-like distributions illustrate the design criteria. Asymptotic behavior of the performance with no PM has been studied. Extension to other distributions is straightforward. >

162 citations


Journal ArticleDOI
TL;DR: In this paper, a modification of the G-O model is proposed and the results of the new model are compared with real software failure data and compared with the Jelinski-Moranda models.
Abstract: A stochastic model (G-O) for the software failure phenomenon based on a nonhomogeneous Poisson process (NHPP) was suggested by Goel and Okumoto (1979). This model has been widely used but some important work remains undone on estimating the parameters. The authors present a necessary and sufficient condition for the likelihood estimates to be finite, positive, and unique. A modification of the G-O model is suggested. The performance measures and parametric inferences of the new model are discussed. The results of the new model are applied to real software failure data and compared with G-O and Jelinski-Moranda models. >

143 citations


Journal ArticleDOI
TL;DR: In this article, the authors introduced the Joint Reliable Importance (JRI) of two edges in an undirected network, which is an appropriate quantitative measure of the interactions between two edges with respect to source-to-terminal reliability.
Abstract: Joint reliability importance (JRI) of two edges in an undirected network is introduced. Concepts of joint failure importance (JFI) and marginal failure importance (MFI), duals of JRI and marginal reliability importance (MRI), are also introduced. The JRI of two edges in an undirected network is represented by the MRI of each edge in subnetworks. Relationships between JRI and MRI, JRI and JFI, and JFI and MFI are presented. From these relationships, it is shown that the JRI is an appropriate quantitative measure of the interactions of two edges in a network, with respect to source-to-terminal reliability. It can also show that the sign of the JRI of two edges can be determined without computing the JRI of these two edges in some special cases. >

131 citations


Journal ArticleDOI
TL;DR: In this paper, two general approaches available for assessing the reliability of electronics during design are device failure rate prediction, and physics-of-failure (PofF) prediction, which are compared in a way that is readily understandable by a wide range of readers concerned with the design, manufacture and support of electronic equipment.
Abstract: Two general approaches available for assessing the reliability of electronics during design are device failure rate prediction, and physics-of-failure. This work broadly compares these two approaches in a way that is readily understandable by the wide range of readers concerned with the design, manufacture, and support of electronic equipment. >

119 citations


Journal ArticleDOI
S. H. Sim1, J. Endrenyi1
TL;DR: In this paper, a Markov model for a continuously operating service device whose condition deteriorates with time in service is proposed, which incorporates deterioration and Poisson failures, minimal repair, periodic minimal maintenance, and major maintenance after a given number of minimal maintenances.
Abstract: A Markov model for a continuously operating service device whose condition deteriorates with time in service is proposed. The model incorporates deterioration and Poisson failures, minimal repair, periodic minimal maintenance, and major maintenance after a given number of minimal maintenances. An exact recursive algorithm computes the steady-state probabilities of the device. A cost function is defined using different cost rates for the different types of outages. Based on minimal unavailability or minimal costs, optimal solutions of the model are derived. Major maintenance is seldom beneficial if optimal maintenance intervals are used. If a maintenance policy is based on nonoptical intervals between maintenances, periodic major maintenance can reduce costs. >

95 citations


Journal ArticleDOI
TL;DR: The authors present a heuristic method for solving constrained redundancy optimization problems in complex systems that allows excursions over a bounded infeasible region, which can alleviate the risks of being trapped at a local optimum.
Abstract: The authors present a heuristic method for solving constrained redundancy optimization problems in complex systems. The proposed method allows excursions over a bounded infeasible region, which can alleviate the risks of being trapped at a local optimum. Computational results show that the method performs consistently better than other heuristic methods in terms of solution quality. If solution quality is of more concern and if one is willing to accept a moderate increase in computing time for better solutions, then the authors believe that this method is an attractive alternative to other heuristic methods. >

91 citations


Journal ArticleDOI
TL;DR: Researchers have proposed cardinality-, lexicographic-, and Hamming-distance-order methods to preprocess the path terms in sum of disjoint products (SDP) techniques for network reliability analysis, showing that preprocessing based on cardinality or its combinations with lexicography- and/or Hamming distance-ordering performs better.
Abstract: Researchers have proposed cardinality-, lexicographic-, and Hamming-distance-order methods to preprocess the path terms in sum of disjoint products (SDP) techniques for network reliability analysis. For cutsets, an ordering based on the node partition associated with each cut is suggested. Experimental results showing the number of disjoint products and computer time involved in generating SDP terms are presented. Nineteen benchmark networks containing paths varying from 4 to 780, and cuts from 4 to 7376, are considered. Several SDP techniques are generalized into three propositions to find their inherent merits and drawbacks. An efficient SDP technique is then used to run input files of paths/cuts preprocesses using cardinality-, lexicographic-, and Hamming-distance-ordering, and their combinations. The results are analyzed, showing that preprocessing based on cardinality or its combinations with lexicographic-, and/or Hamming-distance-ordering performs better. >

Journal ArticleDOI
TL;DR: The effectiveness of the revised NVP design paradigm in improving software reliability by providing fault tolerance is demonstrated.
Abstract: The application of the N-version programming (NVP) technique in a project that reused the revised specification of a real, automatic airplane landing problem. The study involved 40 students, who formed 15 independent programming teams to design, program, test, and evaluate the application. The impact of the paradigm on the software development process, the improvement of the resulting N-version software (NVS) product, the insight, experience, and learning in conducting this project, various testing procedures applied to the program versions, several quantitative measures of the resulting NVS product, and some comparisons with previous projects are discussed. The effectiveness of the revised NVP design paradigm in improving software reliability by providing fault tolerance is demonstrated. >

Journal ArticleDOI
TL;DR: The evaluation results reveal the extent to which performance, dependability, and performability of a variant are improved relative to the basic NVP system and demonstrates that such evaluations are indeed feasible and useful with regard to enhancing software performability.
Abstract: Model-based performability evaluation is used to assess and improve the effectiveness of fault-tolerant software. The evaluation employs a measure that combines quantifications of performance and dependability in a synergistic manner, thus capturing the interaction between these two important attributes. The specific systems evaluated are a basic realization of N-version programming (NVP) (N=3) along with variants thereof. For each system, its corresponding stochastic process model is constructed in two layers, with performance and dependability submodels residing in the lower layer. The evaluation results reveal the extent to which performance, dependability, and performability of a variant are improved relative to the basic NVP system. More generally, the investigation demonstrates that such evaluations are indeed feasible and useful with regard to enhancing software performability. >

Journal ArticleDOI
TL;DR: In this paper, a new model that accounts for the usage growth of commercial software during the operational phase, and that incorporates a factor to estimate (from field-failure reports) the use growth is presented.
Abstract: A new model that accounts for the usage growth of commercial software during the operational phase, and that incorporates a factor to estimate (from field-failure reports) the usage growth is presented. The model can estimate the number of remaining unique defects in wide-distribution commercial software during the operational phase, and the anticipated arrival times of customer-reported failures attributable to these unique defects. The model is based on the Weibull distribution, which assumes that field usage of commercial software increases as a power function of time. The model was fit to the actual failure times for two commercial software products-one that runs on 10/sup 5/ systems, and the other that runs on 10/sup 4/ systems. The model fits the general shape of the arrival distribution for the actual defect discovery times, but there are minor peaks in the example data that are not explained by the model. Some of the minor modes correspond to peak defect discovery times for subsequent releases of the software. >

Journal ArticleDOI
TL;DR: In this article, the reliability bounds for the Birnbaum-Saunders distribution and point and interval estimates for the critical time of the failure (hazard-rate) were discussed under the assumption that both parameters are unknown.
Abstract: The Birnbaum-Saunders distribution, under certain conditions, can be used to model the fatigue failure-time caused by the catastrophic crack size. Reliability bounds for the Birnbaum-Saunders distribution and point and interval estimates for the critical time of the failure (hazard)-rate are discussed. The confidence intervals are constructed under the assumption that both parameters are unknown. This method is not affected by censoring, as long as confidence intervals for the parameters can be established. Numerical examples illustrate the procedure. >

Journal ArticleDOI
TL;DR: In this article, the authors presented a method that uses accelerated life-test data to estimate the mean life at the service stress and the threshold stress below which a failure is unlikely to occur.
Abstract: The author presents a method that uses accelerated life-test data to estimate the mean life at the service stress and the threshold stress below which a failure is unlikely to occur. The relation between stress and mean-life at that stress is assumed to follow an inverse power law that includes a threshold stress. The failure times at a given stress are assumed to follow a Weibull distribution in which the shape parameter varies with the stress. This model extends the well-known Weibull inverse power law model. If only the mean life but not a specific percentile point at a service stress is sought, the maximum likelihood method is useful for parameter estimation. This is a tradeoff in the parametric approach. For adoption of an appropriate probability model, the likelihood ratio test as well as the Akaike Information Criterion are used. Type I right censored data are considered. Extensions of the method are discussed. >

Journal ArticleDOI
TL;DR: A one-step algorithm, GEAR (generalized evaluation algorithm for reliability), is introduced that computes the reliability of a distributed computing system (DCS), which usually consists of processing element, memory unit, input/output devices, data-files, and programs as its shared resources.
Abstract: A one-step algorithm, GEAR (generalized evaluation algorithm for reliability), is introduced that computes the reliability of a distributed computing system (DCS), which usually consists of processing element, memory unit, input/output devices, data-files, and programs as its shared resources. The probability that a task or an application can be computed successfully by sharing the required resources on the DCS is termed the system reliability. Some of the important reliabilities defined using the above concept are discussed, including terminal-pair, computer-network, distributed-program, and distributed-system. GEAR is general enough to compute all four of these parameters, and does not require any prior knowledge about multiterminal connections for computing reliability expression. Many examples are included to illustrate the usefulness of GEAR for computing reliability measures of a DCS. >

Journal ArticleDOI
TL;DR: In this paper, the authors extend the results of Usher and Hodgson (1988) by deriving exact maximum likelihood estimators (MLE) for the general case of a series system of three exponential components with independent masking.
Abstract: This work estimates component reliability from masked series-system life data, viz, data where the exact component causing system failure might be unknown. The authors extend the results of Usher and Hodgson (1988) by deriving exact maximum likelihood estimators (MLE) for the general case of a series system of three exponential components with independent masking. Their previous work shows that closed-form MLE are intractable, and they propose an iterative method for the solution of a system of three nonlinear likelihood equations. >

Journal ArticleDOI
TL;DR: In this article, the authors illustrate design situations where creep and creep rupture of components can compromise system performance over time, thereby acting as a wearout failure mechanism, and present analytic microstructural creep mechanisms leading to failure of these materials.
Abstract: This tutorial illustrates design situations where creep and creep rupture of components can compromise system performance over time, thereby acting as a wearout failure mechanism. Polycrystalline materials, such as metals and ceramics, and polymers are treated. Analytic microstructural creep mechanisms leading to failure of these materials are presented. Continuum microscale models for predicting long-term creep are explained for practical design purposes. Two examples illustrate these models for mechanical engineering and electronic packaging situations. >

Journal ArticleDOI
TL;DR: P Phenomenological continuum length-scale models, based on micromechanical considerations, are presented to predict the onset (or initiation) of fatigue cracking in ductile materials, and the number of load cycles required to cause failure is predicted based on these models.
Abstract: This work illustrates design situations where mechanical fatigue under cyclic loading, of one or more components, can compromise system performance. In this failure mechanism, damage accumulates with each load cycle, thereby causing a physical wearout failure mechanism. Phenomenological continuum length-scale models, based on micromechanical considerations, are presented to predict the onset (or initiation) of fatigue cracking in ductile materials. Fatigue crack propagation is modeled with continuum fracture mechanics principles. The number of load cycles required to cause failure is predicted based on these models. Approaches for modeling creep fatigue interactions are briefly discussed. Analytic physics-of-failure method and examples are presented for designing against wearout failure due to cyclic fatigue. These models can be implemented in an engineering design environment. The associated stress analysis requires numerical finite element techniques in many cases. The associated material property characterization techniques have matured since the 1950s and are specified in engineering handbooks. >

Journal ArticleDOI
TL;DR: In this paper, lower and upper reliability bounds for two-dimensional consecutive k-out-of-n:F systems with independent, but not necessarily identically distributed, components were derived, and a Weibull limit was proven for system time-to-failure for i.i.d. components.
Abstract: The authors derive lower and upper reliability bounds for the two-dimensional consecutive k-out-of-n:F system (Salvia Lasher, 1990) with independent, but not necessarily identically distributed, components. A Weibull limit theory is proven for system time-to-failure for i.i.d. components. >

Journal ArticleDOI
TL;DR: Two new versions of component relevancy for multistate structure functions are introduced and they are compared with some existing component-relevance conditions and their general properties are investigated.
Abstract: Two new versions of component relevancy for multistate structure functions are introduced. They are compared with some existing component-relevance conditions and their general properties are investigated. Based on the two relevance conditions, two component-importance measures for multistate systems are defined; they are most appropriate for comparing components when a certain type of system improvement is considered. Series and parallel structures are characterized within the L-superadditive and L-subadditive structure functions by imposing the two new relevance conditions. >

Journal ArticleDOI
TL;DR: In this article, a fully-Bayes approach is presented for analyzing product reliability during the development phase, where the product goes through a series of test/modification stages, where each product test yields attribute (pass-fail) data, and failure types are classified as fixable or nonfixable.
Abstract: A fully Bayes approach is presented for analyzing product reliability during the development phase. Based on a Bayes version of the Barlow-Scheuer reliability-growth model, it is assumed that the product goes through a series of test/modification stages, where each product test yields attribute (pass-fail) data, and failure types are classified as fixable or nonfixable. Relevant information on both the failure probabilities and the reliability-growth process is used to motivate the prior joint distribution for the probability of each failure type over the specified range of testing. Results at a particular test-stage can be used to update the knowledge about the probability of each failure type (and thus product reliability) at the current test-stage as well as at subsequent test-stages, and at the end of the development phase. A relative ease of incorporation of prior information and a tractability of the posterior analysis are accomplished by using a Dirichlet distribution as the prior distribution for a transformation of the failure probabilities. >

Journal ArticleDOI
TL;DR: This study extends the Freund bivariate model to a three- component model and a special N-component model, and a general form for system reliability is developed.
Abstract: A shared-load model of the exponential distribution is used to describe the characteristics of dependent redundancies. In the shared-load model, the redundant components equally share the workload. Upon failure of one or more components, the remaining components must carry an increased load. This study extends the Freund bivariate model to a three-component model and a special N-component model. A general form for system reliability is developed. Failure rates are estimated using maximum likelihood. Systems such as a multiprocessor computer, a multiple-engine system, and a paired system can be described by this model. >

Journal ArticleDOI
TL;DR: In this article, upper and lower bounds for the reliability of a (linear or circular) consecutive k-within-m-out-of-n:F system with unequal component-failure probabilities are provided.
Abstract: Upper and lower bounds for the reliability of a (linear or circular) consecutive k-within-m-out-of-n:F system with unequal component-failure probabilities are provided. Numerical calculations indicate that, for systems with components of good enough reliability, these bounds quite adequately estimate system reliability. The estimate is easy to calculate, having computational complexity O(m/sup 2/*n). For identically distributed components, a Weibull limit theorem for system time-to-failure is proved. >

Journal ArticleDOI
TL;DR: This work analyzes the use of time and device redundancy in systems subject to correlated failure and compares fault-tolerant designs and without a rollback capability, accounting for the increased hardware-failure rate due to processor duplication when faults are detected in hardware, and the doubled execution times when detection is implemented in software.
Abstract: Real-time computers are often used in embedded, life-critical applications where high reliability is important. A common approach to making such systems dependable is to vote on redundant processors executing multiple copies of the same task is described. The processors which make up such voted systems are subjected not only to independently occurring permanent and transient failure, but also to correlated transients brought about by electromagnetic interference from the operating environment. To counteract these transients, checkpointing and time redundancy are required, in addition to processor redundancy. This work analyzes the use of time and device redundancy in systems subject to correlated failure. The tradeoffs in checkpoint placement in such a system are found to be considerably different from those for non-redundant systems without real-time constraints. The authors compare fault-tolerant designs and without a rollback capability, accounting for the increased hardware-failure rate due to processor duplication when faults are detected in hardware, and the doubled execution times when detection is implemented in software. >

Journal ArticleDOI
TL;DR: The authors demonstrate a methodology for evaluating the fault-tolerance characteristics of operational software and illustrate it through case studies of three operating systems: the Tandem GUARDIAN fault-Tolerant system, the VAX/VMS distributed system, and the IBM/MVS system.
Abstract: The authors demonstrate a methodology for evaluating the fault-tolerance characteristics of operational software and illustrate it through case studies of three operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, software error characteristics are investigated by analyzing error distributions and correlation. Two levels of models are developed to analyze the error and recovery processes inside an operating system and the interactions among multiple copies of an operating system running in a distributed environment. Reward analysis is used to evaluate the loss of service due to software errors and the effect of fault-tolerant techniques implemented in the systems. >

Journal ArticleDOI
TL;DR: In this paper, a decision model is proposed to determine the waiting time to call the repair facility when the first piece of equipment fails in a two-unit standby system, where the failure and repair rates are assumed to be constant and elicited from an expert's prior beliefs.
Abstract: Obtaining a good maintenance strategy for a standby system is discussed. The problem is analyzed via decision theory to determine the waiting time to call the repair facility (for a two-unit standby system) when the first piece of equipment fails. Previous research into this kind of system is briefly described, and a need for constructing a decision model is explained. The uncertainty of the parameters is accounted for in a Bayes approach in order to consider expert prior knowledge. The failure and repair rates are assumed to be constant and are elicited from an expert's prior beliefs. When no data are available, expert guesses are used. A method is presented for solving the conflicting requirements of system availability and cost through a multiattribute utility function which can express cardinal values for the decision maker's preferences over the objective variables. The decision model derives the appropriate maintenance strategy; it corresponds to a set of actions, procedures, and resources, giving a consequent waiting time before calling the repair facility. The use of the model is demonstrated for a telecommunication system. >

Journal ArticleDOI
TL;DR: Four implementations of fault-tolerant software techniques are evaluated with respect to hardware and design faults and the techniques are ranked using an application taxonomy.
Abstract: Four implementations of fault-tolerant software techniques are evaluated with respect to hardware and design faults. Project participants were divided into four groups, each of which developed fault-tolerant software based on a common specification. Each group applied one of the following techniques: N-version programming, recovery block, concurrent error-detection, and algorithm-based fault tolerance. Independent testing and modeling groups analyzed the software. The testing group subjected it to simulated design and hardware faults. The data were then mapped into a discrete-time Markov model developed by the modeling group. The effectiveness of each technique with respect to availability, correctness, and time to failure given an error, as shown by the model, is contrasted with measured data. The model is analyzed with respect to additional figures of merit identified during the modeling process, and the techniques are ranked using an application taxonomy. >

Journal ArticleDOI
TL;DR: In this article, Salvia and Lasher extended the definition of two-dimensional consecutive-k-out-of-n:F systems to rectangular and cylindrical 2D systems.
Abstract: A.A. Salvia and W.C. Lasher (IEEE Trans. Reliability, vol.39, no.3, p.382-5, Aug. 1990) introduced the concept of two-dimensional consecutive-k-out-of-n:F systems and developed upper and lower bounds for system reliability. This work extends the definition of two-dimensional consecutive-k-out-of-n:F systems to rectangular and cylindrical two-dimensional consecutive-k-out-of-n systems. Invariant optimal designs of two-dimensional consecutive-k-out-of-n:G systems are developed. >