Showing papers in &quot;IEEE Transactions on Reliability in 1993&quot;

Reliability evaluation of a limited-flow network in terms of minimal cutsets

TL;DR: In this article, a simple generalization of the Weibull distribution is presented, which is well suited for modeling bathtub failure rate lifetime data and for testing goodness-of-fit of the weibull and negative exponential models as subhypotheses.

...read moreread less

Abstract: A simple generalization of the Weibull distribution is presented. The distribution is well suited for modeling bathtub failure rate lifetime data and for testing goodness-of-fit of the Weibull and negative exponential models as subhypotheses. >

...read moreread less

1,028 citations

Journal Article•DOI•

[...]

C.-C. Jane¹, J.-S. Lin¹, John Yuan¹•Institutions (1)

National Tsing Hua University¹

Software-reliability growth with a Weibull test-effort: a model and application

TL;DR: This work presents an algorithm to generate all d-MCs from each MC for each system capacity level d, and compares it with the algorithm given by J. Xue (1985).

...read moreread less

Abstract: Many systems can be regarded as flow networks whose arcs have discrete and multi-valued random capacities. The probability of the maximum flow at each various level and the reliability of such a flow network can be calculated in terms of K-lattices which are generated from each subset of the family of all MCs (minimal cutsets). However the size of such a family 2/sup m/-1 (m=number of MCs) grows exponentially with m. Such a flow network can be considered as a multistate system with multistate components so that its reliability can be evaluated in terms of upper boundary points of each level d (named d-MCs here). This work presents an algorithm to generate all d-MCs from each MC for each system capacity level d. The new algorithm is analyzed and compared with the algorithm given by J. Xue (1985). Examples show how all d-MCs are generated; the reliability of one example is computed. >

...read moreread less

231 citations

Journal Article•DOI•

[...]

Shigeru Yamada¹, J. Hishitani¹, Shunji Osaki¹•Institutions (1)

Hiroshima University¹

Topological optimization of a communication network subject to a reliability constraint

TL;DR: A software-reliability growth model incorporating the amount of test effort expended during the software testing phase is developed and the method of data analysis for software reliability measurement is developed.

...read moreread less

Abstract: Software reliability measurement during the testing phase is essential for examining the degree of quality or reliability of a developed software system. A software-reliability growth model incorporating the amount of test effort expended during the software testing phase is developed. The time-dependent behavior of test-effort expenditures is described by a Weibull curve. Assuming that the error detection rate to the amount of test effort spent during the testing phase is proportional to the current error content, the model is formulated by a nonhomogeneous Poission process. Using the model, the method of data analysis for software reliability measurement is developed. This model is applied to the prediction of additional test-effort expenditures to achieve the objective number of errors detected by software testing, and the determination of the optimum time to stop software testing for release. >

...read moreread less

193 citations

Journal Article•DOI•

[...]

Rong-Hong Jan¹, Fung-Jen Hwang¹, Sheng-Tzong Chen•Institutions (1)

National Chiao Tung University¹

Modeling repairable systems with failure rates that depend on age and maintenance

TL;DR: In this paper, a decomposition method based on branch and bound is used for solving the problem of network topological optimization with a reliability constraint, where the objective is to find the topological layout of links, at a minimal cost, under the constraint that the reliability is not less than a given level of system reliability.

...read moreread less

Abstract: Network topological optimization with a reliability constraint is considered. The objective is to find the topological layout of links, at a minimal cost, under the constraint that the network reliability is not less than a given level of system reliability. A decomposition method, based on branch and bound, is used for solving the problem. In order to speed up the procedure, an upper bound on system reliability, in terms of node degrees, is applied. A numerical example illustrates the effectiveness of the method. >

...read moreread less

179 citations

Journal Article•DOI•

[...]

J.-K. Chan¹, L. Shaw²•Institutions (2)

D/S Norden¹, New York University²

Estimating the parameters of a non-homogeneous Poisson-process model for software reliability

TL;DR: In this paper, a general repairable system model that includes age and preventive maintenance dependent failure rates is proposed, which introduces the concept of system availability as a random variable, leading to availability measures for individual realizations, rather than for the average over many trials.

...read moreread less

Abstract: A general repairable system model that includes age and preventive maintenance (PM) dependent failure rates is proposed. This model introduces the concept of system availability as a random variable, leading to availability measures for individual realizations, rather than for the average over many trials. Thus this model generalizes the classical definition, and serves as a performance index for system design. A design scheme suggested to maximize the probability of achieving a specified stochastic cycle availability with respect to the duration of the operating interval between PMs. Numerical computations of the results for Weibull-like distributions illustrate the design criteria. Asymptotic behavior of the performance with no PM has been studied. Extension to other distributions is straightforward. >

...read moreread less

162 citations

Journal Article•DOI•

[...]

S.A. Hossain¹, R.C. Dahiya²•Institutions (2)

Middle Tennessee State University¹, Old Dominion University²

Joint reliability-importance of two edges in an undirected network

TL;DR: In this paper, a modification of the G-O model is proposed and the results of the new model are compared with real software failure data and compared with the Jelinski-Moranda models.

...read moreread less

Abstract: A stochastic model (G-O) for the software failure phenomenon based on a nonhomogeneous Poisson process (NHPP) was suggested by Goel and Okumoto (1979). This model has been widely used but some important work remains undone on estimating the parameters. The authors present a necessary and sufficient condition for the likelihood estimates to be finite, positive, and unique. A modification of the G-O model is suggested. The performance measures and parametric inferences of the new model are discussed. The results of the new model are applied to real software failure data and compared with G-O and Jelinski-Moranda models. >

...read moreread less

143 citations

Journal Article•DOI•

[...]

Jung Sik Hong, Chang Hoon Lie

Comparison of electronics-reliability assessment approaches

TL;DR: In this article, the authors introduced the Joint Reliable Importance (JRI) of two edges in an undirected network, which is an appropriate quantitative measure of the interactions between two edges with respect to source-to-terminal reliability.

...read moreread less

Abstract: Joint reliability importance (JRI) of two edges in an undirected network is introduced. Concepts of joint failure importance (JFI) and marginal failure importance (MFI), duals of JRI and marginal reliability importance (MRI), are also introduced. The JRI of two edges in an undirected network is represented by the MRI of each edge in subnetworks. Relationships between JRI and MRI, JRI and JFI, and JFI and MFI are presented. From these relationships, it is shown that the JRI is an appropriate quantitative measure of the interactions of two edges in a network, with respect to source-to-terminal reliability. It can also show that the sign of the JRI of two edges can be determined without computing the JRI of these two edges in some special cases. >

...read moreread less

131 citations

Journal Article•DOI•

[...]

M.J. Cushing, D.E. Mortin, T.J. Stadterman, A. Malhotra¹•Institutions (1)

University of Maryland, College Park¹

A failure-repair model with minimal and major maintenance

TL;DR: In this paper, two general approaches available for assessing the reliability of electronics during design are device failure rate prediction, and physics-of-failure (PofF) prediction, which are compared in a way that is readily understandable by a wide range of readers concerned with the design, manufacture and support of electronic equipment.

...read moreread less

Abstract: Two general approaches available for assessing the reliability of electronics during design are device failure rate prediction, and physics-of-failure. This work broadly compares these two approaches in a way that is readily understandable by the wide range of readers concerned with the design, manufacture, and support of electronic equipment. >

...read moreread less

119 citations

Journal Article•DOI•

[...]

S. H. Sim¹, J. Endrenyi¹•Institutions (1)

Hydro One¹

A heuristic method for solving redundancy optimization problems in complex systems

TL;DR: In this paper, a Markov model for a continuously operating service device whose condition deteriorates with time in service is proposed, which incorporates deterioration and Poisson failures, minimal repair, periodic minimal maintenance, and major maintenance after a given number of minimal maintenances.

...read moreread less

Abstract: A Markov model for a continuously operating service device whose condition deteriorates with time in service is proposed. The model incorporates deterioration and Poisson failures, minimal repair, periodic minimal maintenance, and major maintenance after a given number of minimal maintenances. An exact recursive algorithm computes the steady-state probabilities of the device. A cost function is defined using different cost rates for the different types of outages. Based on minimal unavailability or minimal costs, optimal solutions of the model are derived. Major maintenance is seldom beneficial if optimal maintenance intervals are used. If a maintenance policy is based on nonoptical intervals between maintenances, periodic major maintenance can reduce costs. >

...read moreread less

95 citations

Journal Article•DOI•

[...]

Jae-Hwan Kim¹, Bong-Jin Yum²•Institutions (2)

Korea Maritime and Ocean University¹, KAIST²

Experimental results on preprocessing of path/cut terms in sim of disjoint products technique

TL;DR: The authors present a heuristic method for solving constrained redundancy optimization problems in complex systems that allows excursions over a bounded infeasible region, which can alleviate the risks of being trapped at a local optimum.

...read moreread less

Abstract: The authors present a heuristic method for solving constrained redundancy optimization problems in complex systems. The proposed method allows excursions over a bounded infeasible region, which can alleviate the risks of being trapped at a local optimum. Computational results show that the method performs consistently better than other heuristic methods in terms of solution quality. If solution quality is of more concern and if one is willing to accept a moderate increase in computing time for better solutions, then the authors believe that this method is an attractive alternative to other heuristic methods. >

...read moreread less

91 citations

Journal Article•DOI•

[...]

Sieteng Soh¹, Suresh Rai¹•Institutions (1)

Louisiana State University¹

Improving the N-version programming process through the evolution of a design paradigm

TL;DR: Researchers have proposed cardinality-, lexicographic-, and Hamming-distance-order methods to preprocess the path terms in sum of disjoint products (SDP) techniques for network reliability analysis, showing that preprocessing based on cardinality or its combinations with lexicography- and/or Hamming distance-ordering performs better.

...read moreread less

Abstract: Researchers have proposed cardinality-, lexicographic-, and Hamming-distance-order methods to preprocess the path terms in sum of disjoint products (SDP) techniques for network reliability analysis. For cutsets, an ordering based on the node partition associated with each cut is suggested. Experimental results showing the number of disjoint products and computer time involved in generating SDP terms are presented. Nineteen benchmark networks containing paths varying from 4 to 780, and cuts from 4 to 7376, are considered. Several SDP techniques are generalized into three propositions to find their inherent merits and drawbacks. An efficient SDP technique is then used to run input files of paths/cuts preprocesses using cardinality-, lexicographic-, and Hamming-distance-ordering, and their combinations. The results are analyzed, showing that preprocessing based on cardinality or its combinations with lexicographic-, and/or Hamming-distance-ordering performs better. >

...read moreread less

Journal Article•DOI•

[...]

Michael R. Lyu, Y.-T. He¹•Institutions (1)

University of Iowa¹

Performability enhancement of fault-tolerant software

TL;DR: The effectiveness of the revised NVP design paradigm in improving software reliability by providing fault tolerance is demonstrated.

...read moreread less

Abstract: The application of the N-version programming (NVP) technique in a project that reused the revised specification of a real, automatic airplane landing problem. The study involved 40 students, who formed 15 independent programming teams to design, program, test, and evaluate the application. The impact of the paradigm on the software development process, the improvement of the resulting N-version software (NVS) product, the insight, experience, and learning in conducting this project, various testing procedures applied to the program versions, several quantitative measures of the resulting NVS product, and some comparisons with previous projects are discussed. The effectiveness of the revised NVP design paradigm in improving software reliability by providing fault tolerance is demonstrated. >

...read moreread less

Journal Article•DOI•

[...]

A.T. Tai, John F. Meyer¹, Algirdas Avizienis²•Institutions (2)

University of Michigan¹, University of California, Los Angeles²

Estimating defects in commercial software during operational use

TL;DR: The evaluation results reveal the extent to which performance, dependability, and performability of a variant are improved relative to the basic NVP system and demonstrates that such evaluations are indeed feasible and useful with regard to enhancing software performability.

...read moreread less

Abstract: Model-based performability evaluation is used to assess and improve the effectiveness of fault-tolerant software. The evaluation employs a measure that combines quantifications of performance and dependability in a synergistic manner, thus capturing the interaction between these two important attributes. The specific systems evaluated are a basic realization of N-version programming (NVP) (N=3) along with variants thereof. For each system, its corresponding stochastic process model is constructed in two layers, with performance and dependability submodels residing in the lower layer. The evaluation results reveal the extent to which performance, dependability, and performability of a variant are improved relative to the basic NVP system. More generally, the investigation demonstrates that such evaluations are indeed feasible and useful with regard to enhancing software performability. >

...read moreread less

Journal Article•DOI•

[...]

G.Q. Kenny¹•Institutions (1)

Research Triangle Park¹

Reliability bounds and critical time for the Birnbaum-Saunders distribution

TL;DR: In this paper, a new model that accounts for the usage growth of commercial software during the operational phase, and that incorporates a factor to estimate (from field-failure reports) the use growth is presented.

...read moreread less

Abstract: A new model that accounts for the usage growth of commercial software during the operational phase, and that incorporates a factor to estimate (from field-failure reports) the usage growth is presented. The model can estimate the number of remaining unique defects in wide-distribution commercial software during the operational phase, and the anticipated arrival times of customer-reported failures attributable to these unique defects. The model is based on the Weibull distribution, which assumes that field usage of commercial software increases as a power function of time. The model was fit to the actual failure times for two commercial software products-one that runs on 10/sup 5/ systems, and the other that runs on 10/sup 4/ systems. The model fits the general shape of the arrival distribution for the actual defect discovery times, but there are minor peaks in the example data that are not explained by the model. Some of the minor modes correspond to peak defect discovery times for subsequent releases of the software. >

...read moreread less

Journal Article•DOI•

[...]

Dong Shang Chang¹, Loon Ching Tang²•Institutions (2)

National Central University¹, National University of Singapore²

Estimation of threshold stress in accelerated life-testing

TL;DR: In this article, the reliability bounds for the Birnbaum-Saunders distribution and point and interval estimates for the critical time of the failure (hazard-rate) were discussed under the assumption that both parameters are unknown.

...read moreread less

Abstract: The Birnbaum-Saunders distribution, under certain conditions, can be used to model the fatigue failure-time caused by the catastrophic crack size. Reliability bounds for the Birnbaum-Saunders distribution and point and interval estimates for the critical time of the failure (hazard)-rate are discussed. The confidence intervals are constructed under the assumption that both parameters are unknown. This method is not affected by censoring, as long as confidence intervals for the parameters can be established. Numerical examples illustrate the procedure. >

...read moreread less

Journal Article•DOI•

[...]

Hideo Hirose

A generalized algorithm for evaluating distributed-program reliability

TL;DR: In this article, the authors presented a method that uses accelerated life-test data to estimate the mean life at the service stress and the threshold stress below which a failure is unlikely to occur.

...read moreread less

Abstract: The author presents a method that uses accelerated life-test data to estimate the mean life at the service stress and the threshold stress below which a failure is unlikely to occur. The relation between stress and mean-life at that stress is assumed to follow an inverse power law that includes a threshold stress. The failure times at a given stress are assumed to follow a Weibull distribution in which the shape parameter varies with the stress. This model extends the well-known Weibull inverse power law model. If only the mean life but not a specific percentile point at a service stress is sought, the maximum likelihood method is useful for parameter estimation. This is a tradeoff in the parametric approach. For adoption of an appropriate probability model, the likelihood ratio test as well as the Akaike Information Criterion are used. Type I right censored data are considered. Extensions of the method are discussed. >

...read moreread less

Journal Article•DOI•

[...]

Anup Kumar¹, Dharma P. Agrawal²•Institutions (2)

University of Louisville¹, North Carolina State University²

Exact maximum likelihood estimation using masked system data

TL;DR: A one-step algorithm, GEAR (generalized evaluation algorithm for reliability), is introduced that computes the reliability of a distributed computing system (DCS), which usually consists of processing element, memory unit, input/output devices, data-files, and programs as its shared resources.

...read moreread less

Abstract: A one-step algorithm, GEAR (generalized evaluation algorithm for reliability), is introduced that computes the reliability of a distributed computing system (DCS), which usually consists of processing element, memory unit, input/output devices, data-files, and programs as its shared resources. The probability that a task or an application can be computed successfully by sharing the required resources on the DCS is termed the system reliability. Some of the important reliabilities defined using the above concept are discussed, including terminal-pair, computer-network, distributed-program, and distributed-system. GEAR is general enough to compute all four of these parameters, and does not require any prior knowledge about multiterminal connections for computing reliability expression. Many examples are included to illustrate the usefulness of GEAR for computing reliability measures of a DCS. >

...read moreread less

Journal Article•DOI•

[...]

Dennis K.J. Lin¹, John S. Usher², Frank M. Guess¹•Institutions (2)

University of Tennessee¹, University of Louisville²

01 Jan 1993-IEEE Transactions on Reliability

TL;DR: In this paper, the authors extend the results of Usher and Hodgson (1988) by deriving exact maximum likelihood estimators (MLE) for the general case of a series system of three exponential components with independent masking.

...read moreread less

Abstract: This work estimates component reliability from masked series-system life data, viz, data where the exact component causing system failure might be unknown. The authors extend the results of Usher and Hodgson (1988) by deriving exact maximum likelihood estimators (MLE) for the general case of a series system of three exponential components with independent masking. Their previous work shows that closed-form MLE are intractable, and they propose an iterative method for the solution of a system of three nonlinear likelihood equations. >

...read moreread less

Journal Article•DOI•

Failure-mechanism models for creep and creep rupture

[...]

J. Li¹, Abhijit Dasgupta¹•Institutions (1)

University of Maryland, College Park¹

Failure mechanism models for cyclic fatigue

TL;DR: In this article, the authors illustrate design situations where creep and creep rupture of components can compromise system performance over time, thereby acting as a wearout failure mechanism, and present analytic microstructural creep mechanisms leading to failure of these materials.

...read moreread less

Abstract: This tutorial illustrates design situations where creep and creep rupture of components can compromise system performance over time, thereby acting as a wearout failure mechanism. Polycrystalline materials, such as metals and ceramics, and polymers are treated. Analytic microstructural creep mechanisms leading to failure of these materials are presented. Continuum microscale models for predicting long-term creep are explained for practical design purposes. Two examples illustrate these models for mechanical engineering and electronic packaging situations. >

...read moreread less

Journal Article•DOI•

[...]

Abhijit Dasgupta¹•Institutions (1)

University of Maryland, College Park¹

Reliability of 2-dimensional consecutive-k-out-of-n:F systems

TL;DR: P Phenomenological continuum length-scale models, based on micromechanical considerations, are presented to predict the onset (or initiation) of fatigue cracking in ductile materials, and the number of load cycles required to cause failure is predicted based on these models.

...read moreread less

Abstract: This work illustrates design situations where mechanical fatigue under cyclic loading, of one or more components, can compromise system performance. In this failure mechanism, damage accumulates with each load cycle, thereby causing a physical wearout failure mechanism. Phenomenological continuum length-scale models, based on micromechanical considerations, are presented to predict the onset (or initiation) of fatigue cracking in ductile materials. Fatigue crack propagation is modeled with continuum fracture mechanics principles. The number of load cycles required to cause failure is predicted based on these models. Approaches for modeling creep fatigue interactions are briefly discussed. Analytic physics-of-failure method and examples are presented for designing against wearout failure due to cyclic fatigue. These models can be implemented in an engineering design environment. The associated stress analysis requires numerical finite element techniques in many cases. The associated material property characterization techniques have matured since the 1950s and are specified in engineering handbooks. >

...read moreread less

Journal Article•DOI•

[...]

Markos V. Koutras¹, George Papadopoulos, Stavros Papastavridis•Institutions (1)

Athens State University¹

Component-relevancy and characterization results in multistate systems

TL;DR: In this paper, lower and upper reliability bounds for two-dimensional consecutive k-out-of-n:F systems with independent, but not necessarily identically distributed, components were derived, and a Weibull limit was proven for system time-to-failure for i.i.d. components.

...read moreread less

Abstract: The authors derive lower and upper reliability bounds for the two-dimensional consecutive k-out-of-n:F system (Salvia Lasher, 1990) with independent, but not necessarily identically distributed, components. A Weibull limit theory is proven for system time-to-failure for i.i.d. components. >

...read moreread less

Journal Article•DOI•

[...]

F.C. Meng

A Bayes method for assessing product-reliability during development testing

TL;DR: Two new versions of component relevancy for multistate structure functions are introduced and they are compared with some existing component-relevance conditions and their general properties are investigated.

...read moreread less

Abstract: Two new versions of component relevancy for multistate structure functions are introduced. They are compared with some existing component-relevance conditions and their general properties are investigated. Based on the two relevance conditions, two component-importance measures for multistate systems are defined; they are most appropriate for comparing components when a certain type of system improvement is considered. Series and parallel structures are characterized within the L-superadditive and L-subadditive structure functions by imposing the two new relevance conditions. >

...read moreread less

Journal Article•DOI•

[...]

Thomas A. Mazzuchi¹, Refik Soyer¹•Institutions (1)

George Washington University¹

A multivariant exponential shared-load model

TL;DR: In this article, a fully-Bayes approach is presented for analyzing product reliability during the development phase, where the product goes through a series of test/modification stages, where each product test yields attribute (pass-fail) data, and failure types are classified as fixable or nonfixable.

...read moreread less

Abstract: A fully Bayes approach is presented for analyzing product reliability during the development phase. Based on a Bayes version of the Barlow-Scheuer reliability-growth model, it is assumed that the product goes through a series of test/modification stages, where each product test yields attribute (pass-fail) data, and failure types are classified as fixable or nonfixable. Relevant information on both the failure probabilities and the reliability-growth process is used to motivate the prior joint distribution for the probability of each failure type over the specified range of testing. Results at a particular test-stage can be used to update the knowledge about the probability of each failure type (and thus product reliability) at the current test-stage as well as at subsequent test-stages, and at the end of the development phase. A relative ease of incorporation of prior information and a tractability of the posterior analysis are accomplished by using a Dirichlet distribution as the prior distribution for a transformation of the failure probabilities. >

...read moreread less

Journal Article•DOI•

[...]

Hsin-Hui Lin¹, Kaung-Hwa Chen¹, Rong-Tsorng Wang¹•Institutions (1)

National Sun Yat-sen University¹

Bounds for reliability of consecutive k-within-m-out-of-n:F systems

TL;DR: This study extends the Freund bivariate model to a three- component model and a special N-component model, and a general form for system reliability is developed.

...read moreread less

Abstract: A shared-load model of the exponential distribution is used to describe the characteristics of dependent redundancies. In the shared-load model, the redundant components equally share the workload. Upon failure of one or more components, the remaining components must carry an increased load. This study extends the Freund bivariate model to a three-component model and a special N-component model. A general form for system reliability is developed. Failure rates are estimated using maximum likelihood. Systems such as a multiprocessor computer, a multiple-engine system, and a paired system can be described by this model. >

...read moreread less

Journal Article•DOI•

[...]

S.G. Papastavridis¹, Markos V. Koutras¹•Institutions (1)

Athens State University¹

Reliability of checkpointed real-time systems using time redundancy

TL;DR: In this article, upper and lower bounds for the reliability of a (linear or circular) consecutive k-within-m-out-of-n:F system with unequal component-failure probabilities are provided.

...read moreread less

Abstract: Upper and lower bounds for the reliability of a (linear or circular) consecutive k-within-m-out-of-n:F system with unequal component-failure probabilities are provided. Numerical calculations indicate that, for systems with components of good enough reliability, these bounds quite adequately estimate system reliability. The estimate is easy to calculate, having computational complexity O(m/sup 2/*n). For identically distributed components, a Weibull limit theorem for system time-to-failure is proved. >

...read moreread less

Journal Article•DOI•

[...]

Chandan Krishna, Adit D. Singh

Measurement-based evaluation of operating system fault tolerance

TL;DR: This work analyzes the use of time and device redundancy in systems subject to correlated failure and compares fault-tolerant designs and without a rollback capability, accounting for the increased hardware-failure rate due to processor duplication when faults are detected in hardware, and the doubled execution times when detection is implemented in software.

...read moreread less

Abstract: Real-time computers are often used in embedded, life-critical applications where high reliability is important. A common approach to making such systems dependable is to vote on redundant processors executing multiple copies of the same task is described. The processors which make up such voted systems are subjected not only to independently occurring permanent and transient failure, but also to correlated transients brought about by electromagnetic interference from the operating environment. To counteract these transients, checkpointing and time redundancy are required, in addition to processor redundancy. This work analyzes the use of time and device redundancy in systems subject to correlated failure. The tradeoffs in checkpoint placement in such a system are found to be considerably different from those for non-redundant systems without real-time constraints. The authors compare fault-tolerant designs and without a rollback capability, accounting for the increased hardware-failure rate due to processor duplication when faults are detected in hardware, and the doubled execution times when detection is implemented in software. >

...read moreread less

Journal Article•DOI•

[...]

I. Lee¹, D. Tang¹, Ravishankar K. Iyer¹, M.-C. Hsueh•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 1993-IEEE Transactions on Reliability

TL;DR: The authors demonstrate a methodology for evaluating the fault-tolerance characteristics of operational software and illustrate it through case studies of three operating systems: the Tandem GUARDIAN fault-Tolerant system, the VAX/VMS distributed system, and the IBM/MVS system.

...read moreread less

Abstract: The authors demonstrate a methodology for evaluating the fault-tolerance characteristics of operational software and illustrate it through case studies of three operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, software error characteristics are investigated by analyzing error distributions and correlation. Two levels of models are developed to analyze the error and recovery processes inside an operating system and the interactions among multiple copies of an operating system running in a distributed environment. Reward analysis is used to evaluate the loss of service due to software errors and the effect of fault-tolerant techniques implemented in the systems. >

...read moreread less

Journal Article•DOI•

Decision theory in maintenance strategy for a 2-unit redundant standby system

[...]

A.T. de Almeida, F. M. Campello De Souza¹•Institutions (1)

Federal University of Pernambuco¹

Evaluation and comparison of fault-tolerant software techniques

TL;DR: In this paper, a decision model is proposed to determine the waiting time to call the repair facility when the first piece of equipment fails in a two-unit standby system, where the failure and repair rates are assumed to be constant and elicited from an expert's prior beliefs.

...read moreread less

Abstract: Obtaining a good maintenance strategy for a standby system is discussed. The problem is analyzed via decision theory to determine the waiting time to call the repair facility (for a two-unit standby system) when the first piece of equipment fails. Previous research into this kind of system is briefly described, and a need for constructing a decision model is explained. The uncertainty of the parameters is accounted for in a Bayes approach in order to consider expert prior knowledge. The failure and repair rates are assumed to be constant and are elicited from an expert's prior beliefs. When no data are available, expert guesses are used. A method is presented for solving the conflicting requirements of system availability and cost through a multiattribute utility function which can express cardinal values for the decision maker's preferences over the objective variables. The decision model derives the appropriate maintenance strategy; it corresponds to a set of actions, procedures, and resources, giving a consequent waiting time before calling the repair facility. The use of the model is demonstrated for a telecommunication system. >

...read moreread less

Journal Article•DOI•

[...]

J.J. Hudak¹, B.-H. Suh¹, Daniel P. Siewiorek¹, Z. Segall¹•Institutions (1)

Carnegie Mellon University¹

Reliability and design of 2-dimensional consecutive-k-out-of-n systems

TL;DR: Four implementations of fault-tolerant software techniques are evaluated with respect to hardware and design faults and the techniques are ranked using an application taxonomy.

...read moreread less

Abstract: Four implementations of fault-tolerant software techniques are evaluated with respect to hardware and design faults. Project participants were divided into four groups, each of which developed fault-tolerant software based on a common specification. Each group applied one of the following techniques: N-version programming, recovery block, concurrent error-detection, and algorithm-based fault tolerance. Independent testing and modeling groups analyzed the software. The testing group subjected it to simulated design and hardware faults. The data were then mapped into a discrete-time Markov model developed by the modeling group. The effectiveness of each technique with respect to availability, correctness, and time to failure given an error, as shown by the model, is contrasted with measured data. The model is analyzed with respect to additional figures of merit identified during the modeling process, and the techniques are ranked using an application taxonomy. >

...read moreread less

Journal Article•DOI•

[...]

M. Zuo