scispace - formally typeset
Search or ask a question
Topic

Failure transparency

About: Failure transparency is a research topic. Over the lifetime, 9 publications have been published within this topic receiving 261 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: It is shown that a distributed system can be modeled as a single sequential execution sequence and this model is used to discuss simple techniques for implementing the various forms of transparency.
Abstract: The concepts of transaction and of data consistency are defined for a distributed system. The cases of partitioned data, where fragments of a file are stored at multiple nodes, and replicated data, where a file is replicated at several nodes, are discussed. It is argued that the distribution and replication of data should be transparent to the programs which use the data. That is, the programming interface should provide location transparency, replica transparency, concurrency transparency, and failure transparency. Techniques for providing such transparencies are abstracted and discussed.By extending the notions of system schedule and system clock to handle multiple nodes, it is shown that a distributed system can be modeled as a single sequential execution sequence. This model is then used to discuss simple techniques for implementing the various forms of transparency.

150 citations

Proceedings ArticleDOI
22 Oct 2000
TL;DR: It is found that several real applications get failure transparency in the presence of simple stop failures with overhead of 0-12%, and that applications violate one invariant in the course of upholding the other for more than 90% of application faults and 3-15% of operating system faults, rendering transparent recovery impossible for these cases.
Abstract: We explore the abstraction of failure transparency in which the operating system provides the illusion of failure-free operation. To provide failure transparency, an operating system must recover applications after hardware, operating system, and application failures, and must do so without help from the programmer or unduly slowing failure-free performance. We describe two invariants that must be upheld to provide failure transparency: one that ensures sufficient application state is saved to guarantee the user cannot discern failures, and another that ensures sufficient application state is lost to allow recovery from failures affecting application state. We find that several real applications get failure transparency in the presence of simple stop failures with overhead of 0-12%. Less encouragingly, we find that applications violate one invariant in the course of upholding the other for more than 90% of application faults and 3-15% of operating system faults, rendering transparent recovery impossible for these cases.

77 citations

Journal ArticleDOI
TL;DR: A model of remote procedure call which reflects certain generic properties of the application layer that can be exploited by the RPC layer during failure recovery is presented and a technique of adopting orphans caused by failures, which is based on the model, is described.
Abstract: A model of remote procedure call (RPC) which reflects certain generic properties of the application layer that can be exploited by the RPC layer during failure recovery is presented. A technique of adopting orphans caused by failures, which is based on the model, is described. The technique minimizes the rollback which may be required in orphan-killing techniques. Algorithmic details of the adoption technique are described, and a quantitative analysis is presented. The model is implemented as a prototype on a local area network. The simplicity and generality of the failure recovery renders the RPC model useful in distributed systems, particularly those that are large and heterogeneous and hence have complex failure modes. >

19 citations

Journal IssueDOI
TL;DR: This paper shows how the WS-Naming profile on WS-Addressing Endpoint References can be used for identity, transparent failover, replication, and migration in the Web Services realm.
Abstract: Naming transparencies, i.e. abstracting the name and binding of the entity being used from the endpoints that are actually doing the work, are used in distributed systems to simplify application development by hiding the complexity of the environment. In this paper, we demonstrate how to apply traditional distributed systems naming and binding techniques in the Web Services realm. Specifically, we show how the WS-Naming profile on WS-Addressing Endpoint References can be used for identity, transparent failover, replication, and migration. We begin with a discussion of the traditional distributed systems transparencies. We then present four detailed use cases. Next, we provide a brief background on both WS-Addressing and WS-Naming. Finally, we show how WS-Naming can be used to provide transparent implementations of our use cases. Copyright © 2009 John Wiley & Sons, Ltd.

5 citations

01 Jul 1991
TL;DR: This paper describes a method for recovering permanent object state in an object-oriented distributed system and recommends that the user be insulated to the greatest possible degree from failure and its recovery and that the resulting system be as efficient as possible under normal conditions.
Abstract: This paper describes a method for recovering permanent object state in an object-oriented distributed system. Inspiration for this work was derived from observation of the lengths to which programmers have traditionally been forced to go in order to make their programs resilient to failure. This experience led to the decision that such a burden was unacceptable and that the onus of recovery be shifted onto the underlying operating system. Further goals were that the user be insulated to the greatest possible degree from failure and its recovery (failure transparency) and that the resulting system be as efficient as possible under normal conditions.

4 citations

Network Information
Related Topics (5)
Locale (computer hardware)
599 papers, 9K citations
93% related
Common Open Policy Service
68 papers, 3.3K citations
89% related
Scalable Source Routing
16 papers, 10.3K citations
86% related
Market fragmentation
581 papers, 8.7K citations
86% related
Logical unit number
210 papers, 3.8K citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20091
20021
20002
19991
19971
19911