Journal ArticleDOI
Fault tolerance for highly available internet services: concepts, approaches, and issues
TLDR
It is shown how the redundancy of application servers can be invested to ensure efficient failover of Internet services when the legitimate processing server goes down.Abstract:
Fault-tolerant frameworks provide highly available services by means of fault detection and fault recovery mechanisms. These frameworks need to meet different constraints related to the fault model strength, performance, and resource consumption. One of the factors that led to this work is the observation that current fault-tolerant frameworks are not always adapted to existing Internet services. In fact, most of the proposed frameworks are not transport-level- or session-level-aware, although the concerned services range from regular services like HTTP and FTP to more recent Internet services such as multimodal conferencing and voice over IP. In this work we give a comprehensive overview of fault tolerance concepts, approaches, and issues. We show how the redundancy of application servers can be invested to ensure efficient failover of Internet services when the legitimate processing server goes down.read more
Citations
More filters
Journal ArticleDOI
Energy Efficiency in the Future Internet: A Survey of Existing Approaches and Trends in Energy-Aware Fixed Network Infrastructures
TL;DR: This paper explores current perspectives in power consumption for next generation networks, and provides a detailed survey on emerging technologies, projects, and work-in-progress standards, which can be adopted in networks and related infrastructures in order to reduce their carbon footprint.
Journal ArticleDOI
Fault Tolerance Management in Cloud Computing: A System-Level Perspective
TL;DR: An innovative, system-level, modular perspective on creating and managing fault tolerance in Clouds is introduced and a comprehensive high-level approach to shading the implementation details of the fault tolerance techniques to application developers and users by means of a dedicated service layer is proposed.
Journal ArticleDOI
Improving Convergence Speed and Scalability in OSPF: A Survey
Mukul Goyal,M. Soperi,Emmanuel Baccelli,G. Choudhury,Aman Shaikh,Hossein Hosseini,Kishor S. Trivedi +6 more
TL;DR: This paper presents a comprehensive survey of significant efforts aimed at improving OSPF's convergence speed as well as scalability and extending O SPF to achieve seamless integration of mobile adhoc networks with conventional wired networks.
Book ChapterDOI
Fault Tolerance and Resilience in Cloud Computing Environments
Ravi Jhawar,Vincenzo Piuri +1 more
TL;DR: In this paper, the authors focus on characterizing the recurrent failures in a typical cloud computing environment, analyzing the effects of failures on users' applications and surveying fault tolerance solutions corresponding to each class of failures.
Book ChapterDOI
Cloud Standby: Disaster Recovery of Distributed Systems in the Cloud
Alexander Lenk,Stefan Tai +1 more
TL;DR: It is shown that by using Cloud Standby the recovery time and long-term costs of disaster recovery can be reduced, and the cost of recovery can significantly be reduced.
References
More filters
Proceedings ArticleDOI
Practical Byzantine fault tolerance
Miguel Castro,Barbara Liskov +1 more
TL;DR: A new replication algorithm that is able to tolerate Byzantine faults that works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude.
Journal ArticleDOI
Practical byzantine fault tolerance and proactive recovery
Miguel Castro,Barbara Liskov +1 more
TL;DR: A new replication algorithm, BFT, is described that can be used to build highly available systems that tolerate Byzantine faults and is used to implement the first Byzantine-fault-tolerant NFS file system, BFS.
Host extensions for IP multicasting
TL;DR: This memo specifies the extensions required of a host implementation of the Internet Protocol to support multicasting and obsoletes RFCs 998 and 1054.
Book
Fault tolerance, principles and practice
TL;DR: Methodology and Framework for Fault Tolerance.- Idealised Fault Tolerant Components.- Failure Exceptions.- Critical Components.- The Future.
Journal ArticleDOI
Optimistic recovery in distributed systems
Rob Strom,Shaula Yemini +1 more
TL;DR: Optimistic Recovery is a new technique supporting application-independent transparent recovery from processor failures in distributed systems that can tolerate the failure of an arbitrary number of processors and yields better throughput and response time than other general recovery techniques whenever failures are infrequent.