Topic

Failure transparency

About: Failure transparency is a research topic. Over the lifetime, 9 publications have been published within this topic receiving 261 citations.

...read moreread less

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Transactions and consistency in distributed database systems

[...]

Irving L. Traiger¹, Jim Gray¹, Cesare A. Galtieri¹, Bruce G. Lindsay¹•Institutions (1)

IBM¹

01 Sep 1982-ACM Transactions on Database Systems

TL;DR: It is shown that a distributed system can be modeled as a single sequential execution sequence and this model is used to discuss simple techniques for implementing the various forms of transparency.

...read moreread less

Abstract: The concepts of transaction and of data consistency are defined for a distributed system. The cases of partitioned data, where fragments of a file are stored at multiple nodes, and replicated data, where a file is replicated at several nodes, are discussed. It is argued that the distribution and replication of data should be transparent to the programs which use the data. That is, the programming interface should provide location transparency, replica transparency, concurrency transparency, and failure transparency. Techniques for providing such transparencies are abstracted and discussed.By extending the notions of system schedule and system clock to handle multiple nodes, it is shown that a distributed system can be modeled as a single sequential execution sequence. This model is then used to discuss simple techniques for implementing the various forms of transparency.

...read moreread less

150 citations

Proceedings Article•DOI•

Exploring failure transparency and the limits of generic recovery

[...]

David E. Lowell, Subhachandra Chandra¹, Peter M. Chen¹•Institutions (1)

University of Michigan¹

22 Oct 2000

TL;DR: It is found that several real applications get failure transparency in the presence of simple stop failures with overhead of 0-12%, and that applications violate one invariant in the course of upholding the other for more than 90% of application faults and 3-15% of operating system faults, rendering transparent recovery impossible for these cases.

...read moreread less

Abstract: We explore the abstraction of failure transparency in which the operating system provides the illusion of failure-free operation. To provide failure transparency, an operating system must recover applications after hardware, operating system, and application failures, and must do so without help from the programmer or unduly slowing failure-free performance. We describe two invariants that must be upheld to provide failure transparency: one that ensures sufficient application state is saved to guarantee the user cannot discern failures, and another that ensures sufficient application state is lost to allow recovery from failures affecting application state. We find that several real applications get failure transparency in the presence of simple stop failures with overhead of 0-12%. Less encouragingly, we find that applications violate one invariant in the course of upholding the other for more than 90% of application faults and 3-15% of operating system faults, rendering transparent recovery impossible for these cases.

...read moreread less

77 citations

Journal Article•DOI•

Failure transparency in remote procedure calls

[...]

Kaliappa Ravindran¹, Samuel T. Chanson²•Institutions (2)

bell northern research¹, University of British Columbia²

01 Aug 1989-IEEE Transactions on Computers

TL;DR: A model of remote procedure call which reflects certain generic properties of the application layer that can be exploited by the RPC layer during failure recovery is presented and a technique of adopting orphans caused by failures, which is based on the model, is described.

...read moreread less

Abstract: A model of remote procedure call (RPC) which reflects certain generic properties of the application layer that can be exploited by the RPC layer during failure recovery is presented. A technique of adopting orphans caused by failures, which is based on the model, is described. The technique minimizes the rollback which may be required in orphan-killing techniques. Algorithmic details of the adoption technique are described, and a quantitative analysis is presented. The model is implemented as a prototype on a local area network. The simplicity and generality of the failure recovery renders the RPC model useful in distributed systems, particularly those that are large and heterogeneous and hence have complex failure modes. >

...read moreread less

19 citations

Journal Issue•DOI•

WS-Naming: location migration, replication, and failure transparency support for Web Services

[...]

Andrew S. Grimshaw¹, Mark Morgan¹, Karolina Sarnowska¹•Institutions (1)

University of Virginia¹

01 Jun 2009-Concurrency and Computation: Practice and Experience

TL;DR: This paper shows how the WS-Naming profile on WS-Addressing Endpoint References can be used for identity, transparent failover, replication, and migration in the Web Services realm.

...read moreread less

Abstract: Naming transparencies, i.e. abstracting the name and binding of the entity being used from the endpoints that are actually doing the work, are used in distributed systems to simplify application development by hiding the complexity of the environment. In this paper, we demonstrate how to apply traditional distributed systems naming and binding techniques in the Web Services realm. Specifically, we show how the WS-Naming profile on WS-Addressing Endpoint References can be used for identity, transparent failover, replication, and migration. We begin with a discussion of the traditional distributed systems transparencies. We then present four detailed use cases. Next, we provide a brief background on both WS-Addressing and WS-Naming. Finally, we show how WS-Naming can be used to provide transparent implementations of our use cases. Copyright © 2009 John Wiley & Sons, Ltd.

...read moreread less

5 citations

Failure and its Recovery in an Object-Oriented Distributed System

[...]

S. Crane, Brendan Tangney

01 Jul 1991

TL;DR: This paper describes a method for recovering permanent object state in an object-oriented distributed system and recommends that the user be insulated to the greatest possible degree from failure and its recovery and that the resulting system be as efficient as possible under normal conditions.

...read moreread less

Abstract: This paper describes a method for recovering permanent object state in an object-oriented distributed system. Inspiration for this work was derived from observation of the lengths to which programmers have traditionally been forced to go in order to make their programs resilient to failure. This experience led to the decision that such a burden was unacceptable and that the onus of recovery be shifted onto the underlying operating system. Further goals were that the user be insulated to the greatest possible degree from failure and its recovery (failure transparency) and that the resulting system be as efficient as possible under normal conditions.

...read moreread less

4 citations

Network Information

Performance

Metrics

Papers

261

Citations

No. of papers in the topic in previous years
Year	Papers
2009	1
2002	1
2000	2
1999	1
1997	1
1991	1

Failure transparency

Papers

Network Information

Related Topics (5)

Performance

Metrics