Journal ArticleDOI
Analysis and modeling of correlated failures in multicomputer systems
D. Tang,Ravishankar K. Iyer +1 more
TLDR
Based on the measurements from two DEC VAX-cluster multicomputer systems, the issue of correlated failures is addressed and two validated models, the c- dependent model and the p-dependent model, are developed to evaluate the dependability of systems with correlated failures.Citations
More filters
Proceedings ArticleDOI
What Supercomputers Say: A Study of Five System Logs
Adam J. Oliner,Jon Stearley +1 more
TL;DR: This paper examines system logs from five supercomputers with the aim of providing useful insight and direction for future research into the use of such logs, and proposes a simpler and more effective filtering algorithm.
Proceedings ArticleDOI
Glacier: highly durable, decentralized storage despite massive correlated failures
TL;DR: Glasgow is described, a distributed storage system that relies on massive redundancy to mask the effect of large-scale correlated failures and is used as the storage layer for an experimental serverless email system.
Proceedings ArticleDOI
BlueGene/L Failure Analysis and Prediction Models
TL;DR: This study has collected RAS event logs from BlueGene/L over a period of more than 100 days, and investigated the characteristics of fatal failure events, as well as the correlation between fatal events and non-fatal events, leading to three simple yet effective failure prediction methods.
Journal ArticleDOI
A Continuum Approximation Approach to Reliable Facility Location Design under Correlated Probabilistic Disruptions
Xiaopeng Li,Yanfeng Ouyang +1 more
TL;DR: In this paper, the authors studied the reliable uncapacitated fixed charge location problem (RUFL) where facilities are subject to spatially correlated disruptions that occur with location-dependent probabilities (due to reasons such as natural or man-made disasters).
References
More filters
Journal ArticleDOI
Probability and Statistics with Reliability, Queuing, and Computer Science Applications.
Robert Geist,Kishor S. Trivedi +1 more
TL;DR: Probability and Statistics with Reliability, Queuing and Computer Science Applications, Second Edition, offers a comprehensive introduction to probabiliby, stochastic processes, and statistics for students of computer science, electrical and computer engineering, and applied mathematics.
Book
Probability and Statistics With Reliability, Queuing and Computer Science Applications
TL;DR: Probability and Statistics with Reliability, Queuing and Computer Science Applications, Second Edition as discussed by the authors is a comprehensive introduction to probabiliby, stochastic processes, and statistics for students of computer science, electrical and computer engineering, and applied mathematics.
Book
Survival distributions : reliability applications in the biomedical sciences
Alan J. Gross,Virginia A. Clark +1 more
Journal ArticleDOI
VAXcluster: a closely-coupled distributed system
TL;DR: A VAXcluster is a highly available and extensible configuration of VAX computers that operate as a single system that uses a distributed version of the VAX/VMS operating system to achieve performance in a multicomputer environment.
Journal ArticleDOI
Reliability Modeling Using SHARPE
Robin Sahner,Kishor S. Trivedi +1 more
TL;DR: This paper presents an approach for avoiding the large state space problem and uses a hierarchical modeling technique for analyzing complex reliability models that allows the flexibility of Markov models where necessary and retains the efficiency of combinatorial solution where possible.
Related Papers (5)
Error log analysis: statistical modeling and heuristic trend analysis
T.-T.Y. Lin,Daniel P. Siewiorek +1 more