scispace - formally typeset
Journal ArticleDOI

Analysis and modeling of correlated failures in multicomputer systems

TLDR
Based on the measurements from two DEC VAX-cluster multicomputer systems, the issue of correlated failures is addressed and two validated models, the c- dependent model and the p-dependent model, are developed to evaluate the dependability of systems with correlated failures.
Abstract
Based on the measurements from two DEC VAX-cluster multicomputer systems, the issue of correlated failures is addressed. In particular, the characteristics of correlated failures, their impact and their modelling on dependability, are discussed. It is found from the data that most correlated failures are related to errors in shared resources and propagate from one machine to another. Comparisons between measurement-based models and analytical models that assume failure independence show that the impact of correlated failures on dependability is significant. Two validated models. the c-dependent model and the p-dependent model, are developed to evaluate the dependability of systems with correlated failures. >

read more

Citations
More filters
Proceedings ArticleDOI

What Supercomputers Say: A Study of Five System Logs

TL;DR: This paper examines system logs from five supercomputers with the aim of providing useful insight and direction for future research into the use of such logs, and proposes a simpler and more effective filtering algorithm.
Proceedings ArticleDOI

Glacier: highly durable, decentralized storage despite massive correlated failures

TL;DR: Glasgow is described, a distributed storage system that relies on massive redundancy to mask the effect of large-scale correlated failures and is used as the storage layer for an experimental serverless email system.
Proceedings ArticleDOI

BlueGene/L Failure Analysis and Prediction Models

TL;DR: This study has collected RAS event logs from BlueGene/L over a period of more than 100 days, and investigated the characteristics of fatal failure events, as well as the correlation between fatal events and non-fatal events, leading to three simple yet effective failure prediction methods.
Journal ArticleDOI

A Continuum Approximation Approach to Reliable Facility Location Design under Correlated Probabilistic Disruptions

TL;DR: In this paper, the authors studied the reliable uncapacitated fixed charge location problem (RUFL) where facilities are subject to spatially correlated disruptions that occur with location-dependent probabilities (due to reasons such as natural or man-made disasters).
References
More filters
Journal ArticleDOI

Probability and Statistics with Reliability, Queuing, and Computer Science Applications.

TL;DR: Probability and Statistics with Reliability, Queuing and Computer Science Applications, Second Edition, offers a comprehensive introduction to probabiliby, stochastic processes, and statistics for students of computer science, electrical and computer engineering, and applied mathematics.
Book

Probability and Statistics With Reliability, Queuing and Computer Science Applications

TL;DR: Probability and Statistics with Reliability, Queuing and Computer Science Applications, Second Edition as discussed by the authors is a comprehensive introduction to probabiliby, stochastic processes, and statistics for students of computer science, electrical and computer engineering, and applied mathematics.
Journal ArticleDOI

VAXcluster: a closely-coupled distributed system

TL;DR: A VAXcluster is a highly available and extensible configuration of VAX computers that operate as a single system that uses a distributed version of the VAX/VMS operating system to achieve performance in a multicomputer environment.
Journal ArticleDOI

Reliability Modeling Using SHARPE

TL;DR: This paper presents an approach for avoiding the large state space problem and uses a hierarchical modeling technique for analyzing complex reliability models that allows the flexibility of Markov models where necessary and retains the efficiency of combinatorial solution where possible.
Related Papers (5)