scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Reliability in 2003"


Journal ArticleDOI
TL;DR: The proposed model compares well with other competing models to fit data that exhibits a bathtub-shaped hazard-rate function and can be considered as another useful 3-parameter generalization of the Weibull distribution.
Abstract: A new lifetime distribution capable of modeling a bathtub-shaped hazard-rate function is proposed. The proposed model is derived as a limiting case of the Beta Integrated Model and has both the Weibull distribution and Type 1 extreme value distribution as special cases. The model can be considered as another useful 3-parameter generalization of the Weibull distribution. An advantage of the model is that the model parameters can be estimated easily based on a Weibull probability paper (WPP) plot that serves as a tool for model identification. Model characterization based on the WPP plot is studied. A numerical example is provided and comparison with another Weibull extension, the exponentiated Weibull, is also discussed. The proposed model compares well with other competing models to fit data that exhibits a bathtub-shaped hazard-rate function.

488 citations


Journal ArticleDOI
TL;DR: It is shown to be unlikely that any practical hazard function is decreasing near zero, and great care should be taken in interpreting the hazard function, particularly in applying quality-control practices, such as burn-in or environmental-stress-screening to manufactured products.
Abstract: This paper addresses some of the fundamental assumptions underlying the bathtub curve It is shown to be unlikely that any practical hazard function is decreasing near zero Great care should be taken in interpreting the hazard function, particularly in applying quality-control practices, such as burn-in or environmental-stress-screening to manufactured products

305 citations


Journal ArticleDOI
TL;DR: Three redundancy analysis algorithms which can be implemented on-chip based on the local-bitmap idea are presented: the local repair-most approach is efficient for a general spare architecture, and the local optimization approach has the best repair rate.
Abstract: With the advance of VLSI technology, the capacity and density of memories is rapidly growing. The yield improvement and testing issues have become the most critical challenges for memory manufacturing. Conventionally, redundancies are applied so that the faulty cells can be repairable. Redundancy analysis using external memory testers is becoming inefficient as the chip density continues to grow, especially for the system chip with large embedded memories. This paper presents three redundancy analysis algorithms which can be implemented on-chip. Among them, two are based on the local-bitmap idea: the local repair-most approach is efficient for a general spare architecture, and the local optimization approach has the best repair rate. The essential spare pivoting technique is proposed to reduce the control complexity. Furthermore, a simulator has been developed for evaluating the repair efficiency of different algorithms. It is also used for determining certain important parameters in redundancy design. The redundancy analysis circuit can easily be integrated with the built-in self-test circuit.

189 citations


Journal ArticleDOI
TL;DR: ECAY yields a less costly reliability allocation within a reasonable computing time on large systems, and optimizes the weight and space-obstruction in system design throughout an optimal redundancy allocation.
Abstract: This paper considers the allocation of reliability and redundancy to parallel-series systems, while minimizing the cost of the system. It is proven that under usual conditions satisfied by cost functions, a necessary condition for optimal reliability allocation of parallel-series systems is that the reliability of the redundant components of a given subsystem are identical. An optimal algorithm is proposed to solve this optimization problem. This paper proves that the components in each stage of a parallel-series system must have identical reliability, under some nonrestrictive condition on the component's reliability cost functions. This demonstration provides a firm grounding for what many authors have hitherto taken as a working hypothesis. Using this result, an algorithm, ECAY, is proposed for the design of systems with parallel-series architecture, which allows the allocation of both reliability and redundancy to each subsystem for a target reliability for minimizing the system cost. ECAY has the added advantage of allowing the optimal reliability allocation in a very short time. A benchmark is used to compare the ECAY performance to LM-based algorithms. For a given reliability target, ECAY produced the lowest reliability costs and the optimum redundancy levels in the successive reliability allocation for all cases studied, viz, systems of 4, 5, 6, 7, 8, 9 stages or subsystems. Thus ECAY, as compared with LM-based algorithms, yields a less costly reliability allocation within a reasonable computing time on large systems, and optimizes the weight and space-obstruction in system design throughout an optimal redundancy allocation.

170 citations


Journal ArticleDOI
TL;DR: This study applies, test, and compares two EWMA techniques to detect anomalous changes in event intensity for intrusion detection: EWMA for autocorrelated data andEWMA for uncor related data.
Abstract: Reliability and quality of service from information systems has been threatened by cyber intrusions. To protect information systems from intrusions and thus assure reliability and quality of service, it is highly desirable to develop techniques that detect intrusions. Many intrusions manifest in anomalous changes in intensity of events occurring in information systems. In this study, we apply, test, and compare two EWMA techniques to detect anomalous changes in event intensity for intrusion detection: EWMA for autocorrelated data and EWMA for uncorrelated data. Different parameter settings and their effects on performance of these EWMA techniques are also investigated to provide guidelines for practical use of these techniques.

145 citations


Journal ArticleDOI
TL;DR: A failure prediction method for PM by state estimation using the Kalman filter is presented, which can achieve early failure detection and isolation for fault diagnosis but also facilitates event count, system state description, and automatic shutdown or regulation.
Abstract: Preventive maintenance (PM) is an effective approach for reliability enhancement. Time-based and condition-based maintenance are two major approaches for PM. In contrast, condition-based maintenance can be a better and more cost-effective type of maintenance than time-based maintenance. However, irrespective of the approach adopted for PM, whether a failure can be detected early or even predicted is the key point. This paper presents a failure prediction method for PM by state estimation using the Kalman filter. To improve preventive maintenance, this study uses a hybrid Petri-net modeling method coupled with fault-tree analysis and Kalman filtering to perform failure prediction and processing. A Petri net arrangement, viz, early failure detection and isolation arrangement (EFDIA), is used; it facilitates alarm, early failure detection, fault isolation, event count, system-state description, and automatic shutdown or regulation. These functions are very useful for health-monitoring and preventive-maintenance of a system. This study implements EFDIA to an application-specific integrated circuit on a Xilinx Demonstration Board. A condition-monitoring system of a thermal power plant is used as an example to demonstrate the proposed scheme. Linking the Kalman filter to the EFDIA Petri net, a condition-based failure prediction and processing scheme has been completed for preventive maintenance. This paper presents a failure prediction and processing scheme for PM via the thermal power-plant example, by using a hybrid Petri net modeling method endowed with fault-tree analysis and Kalman filtering. The FPN (Petri net dealing with system failure) has to be constructed beforehand. The next step is to obtain control charts for all fault places in the FPN in order to prescribe thresholds and increment times for every step in Kalman prediction. Afterwards, the system model of each place in the FPN must be derived to perform Kalman filtering. With these prerequisites, this method can be applied to any system. The proposed Petri net approach not only can achieve early failure detection and isolation for fault diagnosis but also facilitates event count, system state description, and automatic shutdown or regulation. These capabilities are very useful for health monitoring and PM of a system. Since the triggering signal of S/sub i/ place of the EFDIA in Section IV (S/sub i/ is a place for the Kalman-predicted indicator value of the sensing signal for the Petri net dealing with system failure) indicates that subsystem #i performance is going to reach the prescribed failure threshold, the signal can be provided via the Kalman filtering method in Section III. Linking the Kalman filter to the EFDIA Petri net, a condition-based failure prediction and processing scheme has been completed for preventive maintenance.

123 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the probability coverages of the pivotal quantities (for location and scale parameters) based on asymptotic s-normality are unsatisfactory, and particularly so when the effective sample size is small, and suggests using unconditional simulated percentage points of these pivotal quantities for constructing s-confidence intervals.
Abstract: The likelihood equations based on a progressively Type-II censored sample from a Gaussian distribution do not provide explicit solutions in any situation except the complete sample case. This paper examines numerically the bias and mean square error of the MLE, and demonstrates that the probability coverages of the pivotal quantities (for location and scale parameters) based on asymptotic s-normality are unsatisfactory, and particularly so when the effective sample size is small. Therefore, this paper suggests using unconditional simulated percentage points of these pivotal quantities for constructing s-confidence intervals. An approximation of the Gaussian hazard function is used to develop approximate estimators which are explicit and are almost as efficient as the MLE in terms of bias and mean square error; however, the probability coverages of the corresponding pivotal quantities based on asymptotic s-normality are also unsatisfactory. A wide range of sample sizes and progressive censoring schemes are used in this study.

120 citations


Journal ArticleDOI
TL;DR: The probabilistic measure of component importance developed by Birnbaum (1969) is considered and an extension of this measure is proposed which enables noncoherent importance analysis and should make the derivation of other measures possible.
Abstract: Importance analysis of noncoherent systems is limited, and is generally inaccurate because all measures of importance that have been developed are strictly for coherent analysis. This paper considers the probabilistic measure of component importance developed by Birnbaum (1969). An extension of this measure is proposed which enables noncoherent importance analysis. As a result of the proposed extension the average number of system failures in a given interval for noncoherent systems can be calculated more efficiently. Furthermore, because Birnbaum's measure of component importance is central to many other measures of importance; its extension should make the derivation of other measures possible.

105 citations


Journal ArticleDOI
TL;DR: A statistical model is developed of an operating/maintenance environment in which optimal timing of maintenance repairs depends fundamentally on the failure rate of the system, and appropriate parameter estimates for multiple phenomena can be obtained.
Abstract: A system (machine) is observed to operate in 1 of 2 modes. The most common mode is loaded (or regular) operation. Occasionally the system is placed in an unloaded state, wherein while the system is mechanically still operating, it is assumed that the failure intensity is lower due to this reduction in operating intensity. A proportional hazards framework is used to capture this potential reduction in failure intensity due to switching of operating modes. In either operating condition, analyzed maintenance-records indicate that the system was occasionally shut down, and either a minor or a major repair was undertaken. Furthermore, despite such repairs, it is observed that both modes of operation (loaded or unloaded) resulted in random failures. On failure, 1 of 3 actions are taken: (1) failures were minimally repaired, (2) given a minor repair, or (3) given a major repair. Both minor and major repairs are assumed to impact the intensity following a virtual age process of the general form proposed by Kijima. This research develops a statistical model of such an operating/maintenance environment. Its purpose is to quantify the impacts of performing these repair actions on the failure intensities. Field data from an industrial-setting demonstrate that appropriate parameter estimates for such multiple phenomena can be obtained. Providing a richer, more detailed, modeling of the failure intensity of a system incorporating both operating conditions and repair effects has important ramifications for maintenance planning. This paper refers to related research, in which optimal timing of maintenance repairs depends fundamentally on the failure rate of the system.

102 citations


Journal ArticleDOI
TL;DR: This paper proposes an alternative method which seeks to identify those links or nodes whose failure would impair network performance the most, and suggests that the mixed strategy Nash equilibrium can be found by the VISA (method of successive averages).
Abstract: Conventional approaches to network reliability analysis are based on either connectivity or capacity. This paper proposes an alternative method which seeks to identify those links or nodes whose failure would impair network performance the most. It is assumed that all links have two costs, a normal cost and a failed cost, both of which can be traffic-dependent. A 2-player, noncooperative, zero-sum game is envisaged between a router, seeking a least-cost path, and a virtual network tester, seeking to maximize trip-cost by failing 1 link. At the mixed strategy Nash equilibrium, link-use probabilities are optimal for the router, and link-failure probabilities are optimal for the tester. Finding the equilibrium involves solving a maximin programming problem. When link costs are fixed (not traffic-dependent), the maximin problem can be recast as a linear programming problem. Two forms of the linear programming problem are presented, one requiring path enumeration, and the other not. The interpretation of the primal and dual variables is elucidated by two propositions. Where link costs are traffic-dependent (e.g., where queuing is a feature), the mixed strategy Nash equilibrium can be found by the VISA (method of successive averages). A numerical example illustrates the approach on a stochastic network with queuing. While the example relates to single commodity e.g., where there are multiple origins and destinations.

93 citations


Journal ArticleDOI
TL;DR: Four commonly used measures of importance are extended, using the noncoherent extension of Birnbaum's measure of component reliability importance, to highlight that it is crucial to choose appropriate measures to analyze component importance.
Abstract: Component importance analysis is a key part of the system reliability quantification process. It enables the weakest areas of a system to be identified and indicates modifications, which will improve the system reliability. Although a wide range of importance measures have been developed, the majority of these measures are strictly for coherent system analysis. Noncoherent systems can occur and accurate importance analysis is essential. This paper extends four commonly used measures of importance, using the noncoherent extension of Birnbaum's measure of component reliability importance. Since both component failure and repair can contribute to system failure in a noncoherent system, both of these influences need to be considered. This paper highlights that it is crucial to choose appropriate measures to analyze component importance. First the aims of the analysis must be outlined and then the roles that component failures and repairs can play in system state deterioration can be considered. For example, the failure/repair of components in safety systems can play only a passive role in system failure, since it is usually inactive, hence measures that consider initiator importance are not appropriate to analyze the importance of these components. Measures of importance must be chosen carefully to ensure analysis is meaningful and useful conclusions can be drawn.

Journal ArticleDOI
TL;DR: The solution, in this paper, is to partition complex models into a hierarchy of submodels, to transform lower-level n-state, m-transition Markov reward models and stochastic reward nets into equivalent 2- state, 2-Transition models, and then to back-substitute the equivalent submodels into the higher-level models.
Abstract: Telecommunication systems are large and complex, consisting of multiple intelligent modules in shelves, multiple shelves in frames, and multiple frames to compose a single network element. In the availability and performability analysis of such a complex system, combinatorial models are computationally efficient but have limited expressive power. State-based models are expressive but computationally complex. Furthermore, this complexity grows exponentially with the size of the model. This state-space explosion problem must be solved in order to model complex-systems using state-based models. The solution, in this paper, is to partition complex models into a hierarchy of submodels, to transform lower-level n-state, m-transition Markov reward models and stochastic reward nets into equivalent (with respect to their steady-state behavior) 2-state, 2-transition models, and then to back-substitute the equivalent submodels into the higher-level models. This paper also proposes a canonical form for the equivalent submodels. This technique is defined for availability models, where the state of the system is either up of down, and for performability models, where the state of the system may be up, down, or partially-up/partially-down. This paper also shows how this technique can be used to obtain common availability measures for telecommunication systems, and when to apply it to availability models and when to use it in performability models. For future work, it would be interesting to more tightly integrate this technique with modeling tools, perhaps coupled with a graphic front-end to facilitate the navigation of the model hierarchy.

Journal ArticleDOI
TL;DR: The results of testing and of sensitivity analysis of the model prove that a trade-off exists between the replacement related cost and the inventory related cost, and indicates that separate optimization of preventive maintenance policy and spare-provisioning policy does not ensure minimal total cost of system maintenance.
Abstract: This paper considers the problem of joint optimization of "preventive maintenance" and "spare-provisioning policy" for system components subject to wear-out failures. A stochastic mathematical model is developed to determine the jointly optimal "block replacement" and "periodic review spare-provisioning policy." The objective function of the model represents the s-expected total cost of system maintenance per unit time, while the preventive replacement interval and the maximal inventory level are chosen as the decision variables. The objective function of the model is in an analytic form with parameters easily obtainable from field data. The model has been tested using field data on electric locomotives in Slovenian Railways. The calculated optimal values of the model decision variables are realistic. "Sensitivity analysis of the model" shows that the model is relatively insensitive to moderate changes of the parameter values. The results of testing and of sensitivity analysis of the model prove that a trade-off exists between the replacement related cost and the inventory related cost. The jointly optimal preventive replacement interval defined by this model differs appreciably from the corresponding interval determined by the conventional model where only replacement related costs are considered. Also, the results of the sensitivity analysis show that even minor modification of the value of each model decision variable (without the appropriate adjustment of the value of the other decision variable) can lead to important increase of the s-expected total cost of system maintenance. This indicates that separate optimization of preventive maintenance policy and spare-provisioning policy does not ensure minimal total cost of system maintenance. This model can be readily applied to optimize maintenance procedures for a variety of industrial systems, and to upgrade maintenance policy in situations where block replacement preventive maintenance is already in use.

Journal ArticleDOI
TL;DR: This paper defines a new utility importance of a state of a component in multi-state systems that overcomes some drawbacks of a well-known importance measure suggested by William S. Griffith (J. Applied Probability, 1980).
Abstract: This paper defines a new utility importance of a state of a component in multi-state systems. This utility importance overcomes some drawbacks of a well-known importance measure suggested by William S. Griffith (J. Applied Probability, 1980). The relationship between this new utility importance and the Griffith importance is studied and their difference is illustrated with examples. The contribution of an individual component to the performance utility of a multi-state system is discussed. Examples show that a meaningful index for measuring the performance of individual components in a multi-state system can hardly be defined in general, without considering the actual values of the utility levels and the distributions of the component-states in the system. An example illustrates how genetic algorithm, simulated annealing, and tabu search can be used in selecting components and defining the position order of components so that the performance utility of a multi-state system is optimized.

Journal ArticleDOI
TL;DR: This paper studies a geometric-process maintenance-model for a deteriorating system under a random environment, using a compound Poisson process model as a particular case.
Abstract: This paper studies a geometric-process maintenance-model for a deteriorating system under a random environment. Assume that the number of random shocks, up to time t, produced by the random environment forms a counting process. Whenever a random shock arrives, the system operating time is reduced. The successive reductions in the system operating time are statistically independent and identically distributed random variables. Assume that the consecutive repair times of the system after failures, form an increasing geometric process; under the condition that the system suffers no random shock, the successive operating times of the system after repairs constitute a decreasing geometric process. A replacement policy N, by which the system is replaced at the time of the failure N, is adopted. An explicit expression for the average cost rate (long-run average cost per unit time) is derived. Then, an optimal replacement policy is determined analytically. As a particular case, a compound Poisson process model is also studied.

Journal ArticleDOI
TL;DR: The proposed system allows obtaining SSMM characterized by high reliability and high speed due the intrinsic parallelism of the switching matrix, and shows that the SCU is able to recover from transient faults.
Abstract: This paper describes a novel architecture of fault tolerant solid state mass memory (SSMM) for satellite applications. Mass memories with low-latency time, high throughput, and storage capabilities cannot be easily implemented using space qualified components, due to the inevitable technological delay of these kind of components. For this reason, the choice of commercial off the shelf (COTS) components is mandatory for this application. Therefore, the design of an electronic system for space applications, based on commercial components, must match the reliability requirements using system level methodologies. In the proposed architecture, error-correcting codes are used to strengthen the commercial dynamic random access memory (DRAM) chips, while the system controller is developed by applying fault tolerant design solutions. The main features of the SSMM are the dynamic reconfiguration capability, and the high performances which can be gracefully reduced in case of permanent faults, maintaining part of the system functionality. The paper shows the system design methodology, the architecture, and the simulation results of the SSMM. The properties of the building blocks are described in detail both in their functionality and fault tolerant capabilities. A detailed analysis of the system reliability and data integrity is reported. The graceful degradation capability of our system allows different levels of acceptable performances, in terms of active I/O link interfaces and storage capability. The results also show that the overall reliability of the SSMM is almost the same using different RS coding schemes, allowing a dynamic reconfiguration of the coding to reduce the latency (shorter codewords), or to improve the data integrity (longer codewords). The use of a scrubbing technique can be useful if a high SEU rate is expected, or if the data must be stored for a long period in the SSMM. The reported simulations show the behavior of the SSMM in presence of permanent and transient faults. In fact, we show that the SCU is able to recover from transient faults. On the other hand, using a spare microcontroller also hard faults can be tolerated. The distributed file system confines the unrecoverable fault effects only in a single I/O Interface. In this way, the SSMM maintains its capability to store and read data. The proposed system allows obtaining SSMM characterized by high reliability and high speed due the intrinsic parallelism of the switching matrix.

Journal ArticleDOI
TL;DR: This paper proposes a new formulation of the recursive variance-reduction Monte Carlo estimator of the /spl kappa/ terminal unreliability parameter of communication systems that allows significant reduction in the simulation execution time, as demonstrated by experimental results.
Abstract: This paper proposes a new formulation of the recursive variance-reduction Monte Carlo estimator of the /spl kappa/ terminal unreliability parameter of communication systems. This formulation allows significant reduction in the simulation execution time, as demonstrated by experimental results.

Journal ArticleDOI
TL;DR: The proposed methodology aims to achieve processor data paths for VLIW architectures able to autonomously detect transient and permanent hardware faults while executing their applications by exploiting the intrinsic redundancy of this class of architectures.
Abstract: The proposed methodology aims to achieve processor data paths for VLIW architectures able to autonomously detect transient and permanent hardware faults while executing their applications. The approach, carried out on the compiled application software, provides the introduction of additional instructions for controlling the correctness of the computation with respect to failures in one of the data path functional units. The advantage of a software approach to hardware fault detection is interesting because it allows one to apply it only to the critical applications executed on the VLIW architecture, thus not causing a delay in the execution of noncritical tasks. Furthermore, by exploiting the intrinsic redundancy of this class of architectures no hardware modification is required on the data path so that no processor customization is necessary.

Journal ArticleDOI
TL;DR: A new model that generalizes the consecutive k-out-of-r-from-n:F system to the multi-state case is proposed and an algorithm for system reliability evaluation is based on an extended universal moment-generating function.
Abstract: This paper proposes a new model that generalizes the consecutive k-out-of-r-from-n:F system to the multi-state case. In this model (linear multi-state sliding window system), the system consists of n linearly ordered multi-state elements. Each element can have various states: from complete-failure up to perfect-functioning. A performance rate is associated with each state. The sliding window system fails if the sum of the performance rates of any r consecutive multi-state elements is lower than a minimum allowable level. An algorithm for system reliability evaluation is based on an extended universal moment-generating function. Examples of system reliability evaluation are presented.

Journal ArticleDOI
TL;DR: A hybrid intelligent algorithm is presented to solve both parallel and standby redundancy optimization problems to maximize the mean system-lifetime, /spl alpha/-system lifetime, or system reliability and a spectrum of redundancy stochastic programming models are established.
Abstract: This paper provides a unified modeling idea for both parallel and standby redundancy optimization problems. A spectrum of redundancy stochastic programming models is constructed to maximize the mean system-lifetime, /spl alpha/-system lifetime, or system reliability. To solve these models, a hybrid intelligent algorithm is presented. Some numerical examples illustrate the effectiveness of the proposed algorithm. This paper considers both parallel redundant systems and standby redundant systems whose components are connected with each other in a logical configuration with a known system structure function. Three types of system performance-expected system lifetime, /spl alpha/-system lifetime and system reliability-are introduced. A stochastic simulation is designed to estimate these system performances. In order to model general redundant systems, a spectrum of redundancy stochastic programming models is established. Stochastic simulation, NN and GA are integrated to produce a hybrid intelligent algorithm for solving the proposed models. Finally, the effectiveness of the hybrid intelligent algorithm is illustrated by some numerical examples.

Journal ArticleDOI
TL;DR: The R/sup 2/ test provides a very simple and useful objective approach for decision making with regard to model validation, and is reasonably powerful compared with the usual PLP GOF tests.
Abstract: The PLP (power-law process) or the Duane model is a simple model that can be used for both reliability growth and reliability deterioration. GOF (goodness-of-fit) tests for the PLP have attracted much attention. However, the practical use of the PLP model is its graphical analysis or the Duane plot, which is a log-log plot of the cumulative number of failures versus time. This has been commonly used for model validation and parameter estimation. When a plot is made, and the coefficient of determination, R/sup 2/, of the regression line is computed, the model can be tested based on this value. This paper introduces a statistical test, based on this simple procedure. The distribution of R/sup 2/ under the PLP hypothesis is shown not to depend on the true model parameters. Hence, it is possible to build a statistical GOF test for the PLP. The critical values of the test depend only on the sample size. Simulations show that this test is reasonably powerful compared with the usual PLP GOF tests. It is sometimes more powerful, especially for deteriorating systems. Implementing this test needs only the computation of a coefficient of determination. It is much easier than, for example, computing an Anderson-Darling statistic. Further study is needed to compare more precisely this new test with the existing ones. But the R/sup 2/ test provides a very simple and useful objective approach for decision making with regard to model validation.

Journal ArticleDOI
TL;DR: This approach is presented in terms of a Markov chain which is used for solving a dynamic fault-tree, but the approach applies to any acyclic Markov reliability model.
Abstract: Acyclic Markov chains are frequently used for reliability analysis of nonmaintained mission-critical computer-based systems. Since traditional sensitivity (or importance) analysis using Markov chains can be computationally expensive, an approximate approach is presented which is easy to compute and which performs quite well in test cases. This approach is presented in terms of a Markov chain which is used for solving a dynamic fault-tree, but the approach applies to any acyclic Markov reliability model.

Journal ArticleDOI
TL;DR: Numerical experiments show that the PM policy based on the observed failure rate of the system is more robust to defective monitoring information and that it performs better than a policy which discards completely this imperfect information.
Abstract: We propose a state-based PM policy based on a stopping rule for an imperfectly monitored two-unit parallel system consisting of s-dependent units. The observed failure rate of the system is proposed as an efficient tool to integrate the imperfect monitoring information in the maintenance decision process : the possible nondetection of the failure of a unit and the monitoring quality are explicitly taken into account when optimizing the maintenance decisions. Using classical martingale results, a stochastic cost model is developed to assess the performance of the monitoring-maintenance policy for this system. Numerical experiments show that the PM policy based on the observed failure rate of the system is more robust to defective monitoring information and that it performs better than a policy which discards completely this imperfect information.

Journal ArticleDOI
TL;DR: It is observed that optimal preprocessing for SVI-SDP can be different from optimal pre processing for SDP algorithms which use multiple-variable inversion; one reason for this is that MVI-sDP algorithms handle disjoint minpaths much more effectively than SVI -SDP algorithms do.
Abstract: Network reliability algorithms which produce sums of disjoint products (SDP) are sensitive to the order in which the minimal pathsets are analyzed The minpaths are preprocessed by choosing this order in the hope that an SDP algorithm will then provide a relatively efficient analysis The most commonly used preprocessing strategy is to list the minpaths in order of increasing size This paper gives examples for which this strategy is not optimal A new preprocessing strategy which works well for SDP algorithms with single-variable inversion (SVI) is introduced It is also observed that optimal preprocessing for SVI-SDP can be different from optimal preprocessing for SDP algorithms which use multiple-variable inversion; one reason for this is that MVI-SDP algorithms handle disjoint minpaths much more effectively than SVI-SDP algorithms do Both kinds of SDP algorithms profit from prior reduction of elements and of subsystems which are in parallel or in series

Journal ArticleDOI
TL;DR: The system availability function is expressed as the weighted average of the survival functions and their shifts, and gives the limiting average availability.
Abstract: A periodically inspected system is maintained through a fixed number of imperfect-repairs before being replaced or perfectly repaired. The lifetime distribution of the system in its new and imperfectly repaired states is arbitrary. The times required for imperfect repairs and for perfect repair or replacement are random. The repaired system is restored to operation at the next scheduled inspection time. The system availability function is expressed as the weighted average of the survival functions and their shifts, and gives the limiting average availability.

Journal ArticleDOI
TL;DR: This paper proposes a logical framework to clarify the meaning of looped sets of Boolean equations; and a binary decision diagrams based method is proposed to assess them.
Abstract: It is often convenient, in reliability analyses, to describe the system under study by means of a set of Boolean equations. Fault trees can be seen as hierarchical sets of Boolean equations. In some cases, the model contains loops, because the system embeds at least two components whose states depend on one another. Reliability networks are a typical example of looped models. Classical fault tree assessment methods fail to assess this kind of model. This paper proposes a logical framework to clarify the meaning of looped sets of Boolean equations; and a binary decision diagrams based method is proposed to assess them. This approach is illustrated with experimental results on a benchmark of reliability networks.

Journal ArticleDOI
TL;DR: The Transparent Online Memory Test (TOMT) introduced here has been specifically developed for online testing of word-oriented memories with parity or Hamming protection and not only detects soft errors but also functional faults and reliably prevents fault accumulation.
Abstract: The Transparent Online Memory Test (TOMT) introduced here has been specifically developed for online testing of word-oriented memories with parity or Hamming protection. Careful interleaving of a word-oriented and a bit-oriented test facilitates a fault coverage and a test duration comparable to the widely used March C- algorithm. Unlike similar methods TOMT actively exercises all bit cells in memory within one test period. Hence it not only detects soft errors but also functional faults and reliably prevents fault accumulation. Different variants of the basic TOMT algorithm are investigated in terms of fault coverage and test time. A prototype implementation for SRAM is introduced which-integrated into a standard processor/memory interface-autonomously performs the transparent online memory test. The trade-offs in terms of hardware overhead and memory access delay caused by this system integration are explored.

Journal ArticleDOI
TL;DR: The paper suggests reliability measures for multi-state systems with 2 failure modes, and presents a procedure for evaluating these measures, based on the use of a universal moment generating function (UMGF), which allows one to estimate availability and s-expected performance of complex systems with series-parallel and bridge topology.
Abstract: Systems with two failure modes (STFM) consist of devices, which can fail in either of two modes. For example, switching systems can not only fail to close when commanded to close but can also fail to open when commanded to open. This paper considers systems consisting of different elements characterized by nominal performance level in each mode. Such systems are multi-state because they have multiple performance levels in both modes, depending on the combination of elements available at the moment. The system availability is defined as the probability of satisfaction of given constraints imposed on system performance in both modes. The paper suggests reliability measures for multi-state systems with 2 failure modes, and presents a procedure for evaluating these measures. The procedure is based on the use of a universal moment generating function (UMGF). It allows one to estimate availability and s-expected performance of complex systems with series-parallel and bridge topology. Basic UMGF technique operators are developed for two types of systems, based on transmitting-capacity and on operation-time.

Journal ArticleDOI
TL;DR: An algorithm based on the universal generating function method is suggested for the linear consecutively-connected system reliability determination and can handle cases in which any number of multistate elements are allocated in the same position while some positions remain empty.
Abstract: A linear consecutively-connected system consists of N+1 linear ordered positions. M s-independent multistate elements with different characteristics are to be allocated to the first N positions. Each element can provide a connection between the position to which it is allocated and the next few positions. The reliability of the connection for any given element depends on: (1) the position to which it is allocated; and (2) the number of positions it connects. The system fails if the first position (source) is not connected with the N+1 position (sink). An algorithm based on the universal generating function method is suggested for the linear consecutively-connected system reliability determination. This algorithm can handle cases in which any number of multistate elements are allocated in the same position while some positions remain empty. In many cases, such uneven allocation provides greater system reliability than the even one. A genetic algorithm is used as an optimization tool to solve the optimal element-allocation problem.

Journal ArticleDOI
TL;DR: Numerical results indicate that the number of infant mortality failures predicted by the clustering model can differ appreciably from calculations that ignore clustering, particularly apparent when wafer probe yields are low, and clustering is high.
Abstract: The integrated yield-reliability model for integrated circuits allows one to estimate the yield, following both wafer probe and burn-in testing. The model is based on the long observed clustering of defects and the experimentally verified relation between defects causing wafer probe failures, and defects causing infant mortality failures. The 2-parameter negative binomial distribution is used to describe the distribution of defects over a semiconductor wafer. The clustering parameter /spl alpha/, while known to play a key role in accurately determining wafer probe yields, is shown, for the first time, to play a similar role in determining burn-in fall-out. Numerical results indicate that the number of infant mortality failures predicted by the clustering model can differ appreciably from calculations that ignore clustering. This is particularly apparent when wafer probe yields are low, and clustering is high.