scispace - formally typeset
Journal ArticleDOI

Fault tolerance for highly available internet services: concepts, approaches, and issues

TLDR
It is shown how the redundancy of application servers can be invested to ensure efficient failover of Internet services when the legitimate processing server goes down.
Abstract
Fault-tolerant frameworks provide highly available services by means of fault detection and fault recovery mechanisms. These frameworks need to meet different constraints related to the fault model strength, performance, and resource consumption. One of the factors that led to this work is the observation that current fault-tolerant frameworks are not always adapted to existing Internet services. In fact, most of the proposed frameworks are not transport-level- or session-level-aware, although the concerned services range from regular services like HTTP and FTP to more recent Internet services such as multimodal conferencing and voice over IP. In this work we give a comprehensive overview of fault tolerance concepts, approaches, and issues. We show how the redundancy of application servers can be invested to ensure efficient failover of Internet services when the legitimate processing server goes down.

read more

Citations
More filters
Journal ArticleDOI

Energy Efficiency in the Future Internet: A Survey of Existing Approaches and Trends in Energy-Aware Fixed Network Infrastructures

TL;DR: This paper explores current perspectives in power consumption for next generation networks, and provides a detailed survey on emerging technologies, projects, and work-in-progress standards, which can be adopted in networks and related infrastructures in order to reduce their carbon footprint.
Journal ArticleDOI

Fault Tolerance Management in Cloud Computing: A System-Level Perspective

TL;DR: An innovative, system-level, modular perspective on creating and managing fault tolerance in Clouds is introduced and a comprehensive high-level approach to shading the implementation details of the fault tolerance techniques to application developers and users by means of a dedicated service layer is proposed.
Journal ArticleDOI

Improving Convergence Speed and Scalability in OSPF: A Survey

TL;DR: This paper presents a comprehensive survey of significant efforts aimed at improving OSPF's convergence speed as well as scalability and extending O SPF to achieve seamless integration of mobile adhoc networks with conventional wired networks.
Book ChapterDOI

Fault Tolerance and Resilience in Cloud Computing Environments

TL;DR: In this paper, the authors focus on characterizing the recurrent failures in a typical cloud computing environment, analyzing the effects of failures on users' applications and surveying fault tolerance solutions corresponding to each class of failures.
Book ChapterDOI

Cloud Standby: Disaster Recovery of Distributed Systems in the Cloud

TL;DR: It is shown that by using Cloud Standby the recovery time and long-term costs of disaster recovery can be reduced, and the cost of recovery can significantly be reduced.
References
More filters
Proceedings ArticleDOI

Practical Byzantine fault tolerance

TL;DR: A new replication algorithm that is able to tolerate Byzantine faults that works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude.
Journal ArticleDOI

Practical byzantine fault tolerance and proactive recovery

TL;DR: A new replication algorithm, BFT, is described that can be used to build highly available systems that tolerate Byzantine faults and is used to implement the first Byzantine-fault-tolerant NFS file system, BFS.

Host extensions for IP multicasting

S. E. Deering
TL;DR: This memo specifies the extensions required of a host implementation of the Internet Protocol to support multicasting and obsoletes RFCs 998 and 1054.
Book

Fault tolerance, principles and practice

TL;DR: Methodology and Framework for Fault Tolerance.- Idealised Fault Tolerant Components.- Failure Exceptions.- Critical Components.- The Future.
Journal ArticleDOI

Optimistic recovery in distributed systems

TL;DR: Optimistic Recovery is a new technique supporting application-independent transparent recovery from processor failures in distributed systems that can tolerate the failure of an arbitrary number of processors and yields better throughput and response time than other general recovery techniques whenever failures are infrequent.
Related Papers (5)
Trending Questions (1)
How do you know if DBD servers are down?

We show how the redundancy of application servers can be invested to ensure efficient failover of Internet services when the legitimate processing server goes down.