scispace - formally typeset
Search or ask a question
Journal ArticleDOI

On the Resiliency of Static Forwarding Tables

TL;DR: This paper embarked upon a systematic algorithmic study of the resiliency of forwarding tables in a variety of models (i.e., deterministic/probabilistic routing, with packets-header-rewriting, with packet-duplication), and shows that resiliencies to four simultaneous link failures, with limited path stretch, can be achieved without any packet modification/duplications or randomization.
Abstract: Fast reroute and other forms of immediate failover have long been used to recover from certain classes of failures without invoking the network control plane. While the set of such techniques is growing, the level of resiliency to failures that this approach can provide is not adequately understood. In this paper, we embarked upon a systematic algorithmic study of the resiliency of forwarding tables in a variety of models (i.e., deterministic/probabilistic routing, with packet-header-rewriting, with packet-duplication). Our results show that the resiliency of a routing scheme depends on the “connectivity” $k$ of a network, i.e., the minimum number of link deletions that partition a network. We complement our theoretical result with extensive simulations. We show that resiliency to four simultaneous link failures, with limited path stretch, can be achieved without any packet modification/duplication or randomization. Furthermore, our routing schemes provide resiliency against $k-1$ failures, with limited path stretch, by storing $\log (k)$ bits in the packet header, with limited packet duplication, or with randomized forwarding technique.
Citations
More filters
Proceedings Article
01 Mar 2019
TL;DR: An implementation of Blink in P4 together with an extensive evaluation on real and synthetic traffic traces indicate that Blink achieves sub-second rerouting for large fractions of Internet traffic and prevents unnecessary traffic shifts even in the presence of noise.
Abstract: We present Blink, a data-driven system that leverages TCPinduced signals to detect failures directly in the data plane. The key intuition behind Blink is that a TCP flow exhibits a predictable behavior upon disruption: retransmitting the same packet over and over, at epochs exponentially spaced in time. When compounded over multiple flows, this behavior creates a strong and characteristic failure signal. Blink efficiently analyzes TCP flows to: (i) select which ones to track; (ii) reliably and quickly detect major traffic disruptions; and (iii) recover connectivity—all this, completely in the data plane. We present an implementation of Blink in P4 together with an extensive evaluation on real and synthetic traffic traces. Our results indicate that Blink: (i) achieves sub-second rerouting for large fractions of Internet traffic; and (ii) prevents unnecessary traffic shifts even in the presence of noise. We further show the feasibility of Blink by running it on an actual Tofino switch.

86 citations


Cites background from "On the Resiliency of Static Forward..."

  • ...[11] consider generalizations of the DDC approach, and study the relationship between the resilience achieved through data-plane primitives and network connectivity....

    [...]

06 Mar 2006
TL;DR: In this paper, the authors specify additional information that can be inserted in IS-IS LSPs to convey link capabilities that may be useful in certain application cases, such as local protection, provided by a U-turn alternate, in the event of a node failure and/or node reconverging onto a new topology.
Abstract: This document specifies additional information that can inserted in IS-IS LSPs to convey link capabilities that may be useful in certain applications. In particular, an IS may convey that zero or more of its links are explicit marked and/or implicit U-turn recipient capable, which may be described as capable of identifying traffic as U-turn traffic and redirecting the traffic to a suitable alternate. The immediate applicability for these two link capabilities is in support of local protection, provided by a U-turn alternate, in the event of a link and/or node failure while the IS-IS area is reconverging onto a new topology.

70 citations

Journal ArticleDOI
TL;DR: This survey presents a systematic, tutorial-like overview of packet-based fast-recovery mechanisms in the data plane, focusing on concepts but structured around different networking technologies, from traditional link-layer and IP-based mechanisms, over BGP and MPLS to emerging software-defined networks and programmable data planes.
Abstract: In order to meet their stringent dependability requirements, most modern packet-switched communication networks support fast-recovery mechanisms in the data plane. While reactions to failures in the data plane can be significantly faster compared to control plane mechanisms, implementing fast recovery in the data plane is challenging, and has recently received much attention in the literature. This survey presents a systematic, tutorial-like overview of packet-based fast-recovery mechanisms in the data plane, focusing on concepts but structured around different networking technologies, from traditional link-layer and IP-based mechanisms, over BGP and MPLS to emerging software-defined networks and programmable data planes. We examine the evolution of fast-recovery standards and mechanisms over time, and identify and discuss the fundamental principles and algorithms underlying different mechanisms. We then present a taxonomy of the state of the art, summarize the main lessons learned, and propose a few concrete future directions.

42 citations


Cites background from "On the Resiliency of Static Forward..."

  • ...showed in [163] that generally, the approach can tolerate at least half of the maximally possible link failures....

    [...]

  • ...• With or Without Input Network Interface Matching: Can the forwarding action to be applied to a packet depend on the incoming link on which it arrived? Input interface matching can improve the resilience and quality of fast rerouting (in particular, by detecting and avoiding forwarding loops) [163], [212], but may render the forwarding logic more complex....

    [...]

  • ...1) Rerouting Along Arborescences: In order to achieve a very high degree of resilience, several previous works [163]– [165], [228] introduced an algorithmic approach based on the idea of covering the network with arc-disjoint directed arborescences rooted at the destination....

    [...]

  • ...For further detail on IPFRR refer to [23], [24], for the algorithmic aspects see [163]–[167], and for a comprehensive evaluation and comparison of different IPFRR techniques see [35], [168]....

    [...]

  • ...[163], [164]: the authors showed that there exist randomized static rerouting algorithms which tolerate k − 1 link failures if the underlying network is k-edge connected, even without header rewriting....

    [...]

Proceedings ArticleDOI
29 Apr 2019
TL;DR: This paper presents CASA, an algorithm providing a high degree of robustness as well as a provable quality of fast rerouting, and shows that there exists an inherent tradeoff in terms of achievable locality and congestion of failover routes.
Abstract: To meet the stringent requirements on the maximally tolerable disruptions of traffic under link failures, many communication networks feature some sort of static failover mechanism for fast rerouting. However, configuring such static failover mechanisms to achieve a high degree of robustness is known to be challenging, in particular when packet tagging or dynamic node state cannot be used. This paper initiates the systematic study of such local fast failover mechanisms which not only provide connectivity guarantees, even under multiple link failures, but also account for the quality of the resulting failover routes, with respect to locality (i.e., route length) and congestion. Failover quality has received less attention in the literature so far, yet it is increasingly important to support emerging applications.We first show that there exists an inherent tradeoff in terms of achievable locality and congestion of failover routes. We then present CASA, an algorithm providing a high degree of robustness as well as a provable quality of fast rerouting. CASA combines two crucial static resilient routing techniques: combinatorial designs and arc-disjoint arborescences. We complement our formal analysis with a simulation study, in which we compare our algorithms with the state-of-the-art in different scenarios and show benefits in terms of stretch, load, and resilience.

31 citations


Cites background from "On the Resiliency of Static Forward..."

  • ..., consider a network with a dead-end, forcing the packets to return along the same link [16]....

    [...]

  • ...While tagging can improve the robustness of routing [16], [17], it is often undesirable in practice to change header fields....

    [...]

Proceedings ArticleDOI
15 Apr 2018
TL;DR: This paper initiates the theoretical study of static fast failover mechanisms which do not depend on reconvergence and hence support a very fast reaction to failures, and introduces formal models and fundamental tradeoffs on what can and cannot be achieved in terms of static resilient routing.
Abstract: Segment Routing (SR) promises to provide scalable and fine-grained traffic engineering. However, little is known today on how to implement resilient routing in SR, i.e., routes which tolerate one or even multiple failures. This paper initiates the theoretical study of static fast failover mechanisms which do not depend on reconvergence and hence support a very fast reaction to failures. We introduce formal models and identify fundamental tradeoffs on what can and cannot be achieved in terms of static resilient routing. In particular, we identify an inherent price in terms of performance if routing paths need to be resilient, even in the absence of failures. Our main contribution is a first algorithm which is resilient even to multiple failures and which comes with provable resiliency and performance guarantees. We complement our formal analysis with simulations on real topologies, which show the benefits of our approach over existing algorithms.

24 citations


Cites background from "On the Resiliency of Static Forward..."

  • ...under many link failures [10], [11], [12], [29] ....

    [...]

References
More filters
Journal ArticleDOI
31 Mar 2008
TL;DR: This whitepaper proposes OpenFlow: a way for researchers to run experimental protocols in the networks they use every day, based on an Ethernet switch, with an internal flow-table, and a standardized interface to add and remove flow entries.
Abstract: This whitepaper proposes OpenFlow: a way for researchers to run experimental protocols in the networks they use every day. OpenFlow is based on an Ethernet switch, with an internal flow-table, and a standardized interface to add and remove flow entries. Our goal is to encourage networking vendors to add OpenFlow to their switch products for deployment in college campus backbones and wiring closets. We believe that OpenFlow is a pragmatic compromise: on one hand, it allows researchers to run experiments on heterogeneous switches in a uniform way at line-rate and with high port-density; while on the other hand, vendors do not need to expose the internal workings of their switches. In addition to allowing researchers to evaluate their ideas in real-world traffic settings, OpenFlow could serve as a useful campus component in proposed large-scale testbeds like GENI. Two buildings at Stanford University will soon run OpenFlow networks, using commercial Ethernet switches and routers. We will work to encourage deployment at other schools; and We encourage you to consider deploying OpenFlow in your university network too

9,138 citations

Book
01 May 1997
TL;DR: Gaph Teory Fourth Edition is standard textbook of modern graph theory which covers the core material of the subject with concise yet reliably complete proofs, while offering glimpses of more advanced methods in each chapter by one or two deeper results.
Abstract: Gaph Teory Fourth Edition Th is standard textbook of modern graph theory, now in its fourth edition, combines the authority of a classic with the engaging freshness of style that is the hallmark of active mathematics. It covers the core material of the subject with concise yet reliably complete proofs, while offering glimpses of more advanced methods in each fi eld by one or two deeper results, again with proofs given in full detail.

6,255 citations

Journal ArticleDOI
17 Aug 2008
TL;DR: This paper shows how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements and argues that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions.
Abstract: Today's data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Non-uniform bandwidth among data center nodes complicates application design and limits overall system performance.In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.

3,549 citations

Proceedings ArticleDOI
16 Aug 2009
TL;DR: VL2 is a practical network architecture that scales to support huge data centers with uniform high capacity between servers, performance isolation between services, and Ethernet layer-2 semantics, and is built on a working prototype.
Abstract: To be agile and cost effective, data centers should allow dynamic resource allocation across large server pools. In particular, the data center network should enable any server to be assigned to any service. To meet these goals, we present VL2, a practical network architecture that scales to support huge data centers with uniform high capacity between servers, performance isolation between services, and Ethernet layer-2 semantics. VL2 uses (1) flat addressing to allow service instances to be placed anywhere in the network, (2) Valiant Load Balancing to spread traffic uniformly across network paths, and (3) end-system based address resolution to scale to large server pools, without introducing complexity to the network control plane. VL2's design is driven by detailed measurements of traffic and fault data from a large operational cloud service provider. VL2's implementation leverages proven network technologies, already available at low cost in high-speed hardware implementations, to build a scalable and reliable network architecture. As a result, VL2 networks can be deployed today, and we have built a working prototype. We evaluate the merits of the VL2 design using measurement, analysis, and experiments. Our VL2 prototype shuffles 2.7 TB of data among 75 servers in 395 seconds - sustaining a rate that is 94% of the maximum possible.

2,350 citations

Proceedings ArticleDOI
16 Aug 2009
TL;DR: Experiments in the testbed demonstrate that BCube is fault tolerant and load balancing and it significantly accelerates representative bandwidth-intensive applications.
Abstract: This paper presents BCube, a new network architecture specifically designed for shipping-container based, modular data centers. At the core of the BCube architecture is its server-centric network structure, where servers with multiple network ports connect to multiple layers of COTS (commodity off-the-shelf) mini-switches. Servers act as not only end hosts, but also relay nodes for each other. BCube supports various bandwidth-intensive applications by speeding-up one-to-one, one-to-several, and one-to-all traffic patterns, and by providing high network capacity for all-to-all traffic.BCube exhibits graceful performance degradation as the server and/or switch failure rate increases. This property is of special importance for shipping-container data centers, since once the container is sealed and operational, it becomes very difficult to repair or replace its components.Our implementation experiences show that BCube can be seamlessly integrated with the TCP/IP protocol stack and BCube packet forwarding can be efficiently implemented in both hardware and software. Experiments in our testbed demonstrate that BCube is fault tolerant and load balancing and it significantly accelerates representative bandwidth-intensive applications.

1,639 citations