Fault tolerance for highly available internet services: concepts, approaches, and issues

doi:10.1109/COMST.2008.4564478

Journal ArticleDOI

Fault tolerance for highly available internet services: concepts, approaches, and issues

N. Ayari, +3 more

- 01 Apr 2008 -

IEEE Communications Surveys and Tutorial...

- Vol. 10, Iss: 2, pp 34-46

TLDR

It is shown how the redundancy of application servers can be invested to ensure efficient failover of Internet services when the legitimate processing server goes down.

Abstract:

Fault-tolerant frameworks provide highly available services by means of fault detection and fault recovery mechanisms. These frameworks need to meet different constraints related to the fault model strength, performance, and resource consumption. One of the factors that led to this work is the observation that current fault-tolerant frameworks are not always adapted to existing Internet services. In fact, most of the proposed frameworks are not transport-level- or session-level-aware, although the concerned services range from regular services like HTTP and FTP to more recent Internet services such as multimodal conferencing and voice over IP. In this work we give a comprehensive overview of fault tolerance concepts, approaches, and issues. We show how the redundancy of application servers can be invested to ensure efficient failover of Internet services when the legitimate processing server goes down.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Energy Efficiency in the Future Internet: A Survey of Existing Approaches and Trends in Energy-Aware Fixed Network Infrastructures

Raffaele Bolla, +3 more

- 01 Jan 2011 -

IEEE Communications Surveys and Tutorial...

TL;DR: This paper explores current perspectives in power consumption for next generation networks, and provides a detailed survey on emerging technologies, projects, and work-in-progress standards, which can be adopted in networks and related infrastructures in order to reduce their carbon footprint.

...read moreread less

Journal ArticleDOI

Fault Tolerance Management in Cloud Computing: A System-Level Perspective

Ravi Jhawar, +2 more

- 01 Jun 2013 -

IEEE Systems Journal

TL;DR: An innovative, system-level, modular perspective on creating and managing fault tolerance in Clouds is introduced and a comprehensive high-level approach to shading the implementation details of the fault tolerance techniques to application developers and users by means of a dedicated service layer is proposed.

...read moreread less

Journal ArticleDOI

Improving Convergence Speed and Scalability in OSPF: A Survey

Mukul Goyal, +6 more

- 02 Apr 2012 -

IEEE Communications Surveys and Tutorial...

TL;DR: This paper presents a comprehensive survey of significant efforts aimed at improving OSPF's convergence speed as well as scalability and extending O SPF to achieve seamless integration of mobile adhoc networks with conventional wired networks.

...read moreread less

Book ChapterDOI

Fault Tolerance and Resilience in Cloud Computing Environments

Ravi Jhawar, +1 more

TL;DR: In this paper, the authors focus on characterizing the recurrent failures in a typical cloud computing environment, analyzing the effects of failures on users' applications and surveying fault tolerance solutions corresponding to each class of failures.

...read moreread less

Book ChapterDOI

Cloud Standby: Disaster Recovery of Distributed Systems in the Cloud

Alexander Lenk, +1 more

TL;DR: It is shown that by using Cloud Standby the recovery time and long-term costs of disaster recovery can be reduced, and the cost of recovery can significantly be reduced.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Practical Byzantine fault tolerance

Miguel Castro, +1 more

TL;DR: A new replication algorithm that is able to tolerate Byzantine faults that works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude.

...read moreread less

Journal ArticleDOI

Practical byzantine fault tolerance and proactive recovery

Miguel Castro, +1 more

- 01 Nov 2002 -

ACM Transactions on Computer Systems

TL;DR: A new replication algorithm, BFT, is described that can be used to build highly available systems that tolerate Byzantine faults and is used to implement the first Byzantine-fault-tolerant NFS file system, BFS.

...read moreread less

Host extensions for IP multicasting

S. E. Deering

TL;DR: This memo specifies the extensions required of a host implementation of the Internet Protocol to support multicasting and obsoletes RFCs 998 and 1054.

...read moreread less

Book

Fault tolerance, principles and practice

P. A. Lee, +4 more

TL;DR: Methodology and Framework for Fault Tolerance.- Idealised Fault Tolerant Components.- Failure Exceptions.- Critical Components.- The Future.

...read moreread less

Journal ArticleDOI

Optimistic recovery in distributed systems

Rob Strom, +1 more

- 01 Aug 1985 -

ACM Transactions on Computer Systems

TL;DR: Optimistic Recovery is a new technique supporting application-independent transparent recovery from processor failures in distributed systems that can tolerate the failure of an arbitrary number of processors and yields better throughput and response time than other general recovery techniques whenever failures are infrequent.

...read moreread less

Indian journal of science and technology

Broker's Communication for Service Oriented Network Architecture

A.P. Manu, +3 more

Fault tolerance for highly available internet services: concepts, approaches, and issues

Citations

Energy Efficiency in the Future Internet: A Survey of Existing Approaches and Trends in Energy-Aware Fixed Network Infrastructures

Fault Tolerance Management in Cloud Computing: A System-Level Perspective

Improving Convergence Speed and Scalability in OSPF: A Survey

Fault Tolerance and Resilience in Cloud Computing Environments

Cloud Standby: Disaster Recovery of Distributed Systems in the Cloud

References

Practical Byzantine fault tolerance

Practical byzantine fault tolerance and proactive recovery

Host extensions for IP multicasting

Fault tolerance, principles and practice

Optimistic recovery in distributed systems

Related Papers (5)

Remus: high availability via asynchronous virtual machine replication

Characterizing cloud computing hardware reliability

T2CP-AR: A system for Transparent TCP Active Replication

Web Services Failures and Recovery Strategies: A Review

Broker's Communication for Service Oriented Network Architecture

Trending Questions (1)