scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Resilient overlay networks

21 Oct 2001-Vol. 35, Iss: 5, pp 131-145
TL;DR: It is found that forwarding packets via at most one intermediate RON node is sufficient to overcome faults and improve performance in most cases, demonstrating the benefits of moving some of the control over routing into the hands of end-systems.
Abstract: A Resilient Overlay Network (RON) is an architecture that allows distributed Internet applications to detect and recover from path outages and periods of degraded performance within several seconds, improving over today's wide-area routing protocols that take at least several minutes to recover. A RON is an application-layer overlay on top of the existing Internet routing substrate. The RON nodes monitor the functioning and quality of the Internet paths among themselves, and use this information to decide whether to route packets directly over the Internet or by way of other RON nodes, optimizing application-specific routing metrics.Results from two sets of measurements of a working RON deployed at sites scattered across the Internet demonstrate the benefits of our architecture. For instance, over a 64-hour sampling period in March 2001 across a twelve-node RON, there were 32 significant outages, each lasting over thirty minutes, over the 132 measured paths. RON's routing mechanism was able to detect, recover, and route around all of them, in less than twenty seconds on average, showing that its methods for fault detection and recovery work well at discovering alternate paths in the Internet. Furthermore, RON was able to improve the loss rate, latency, or throughput perceived by data transfers; for example, about 5% of the transfers doubled their TCP throughput and 5% of our transfers saw their loss probability reduced by 0.05. We found that forwarding packets via at most one intermediate RON node is sufficient to overcome faults and improve performance in most cases. These improvements, particularly in the area of fault detection and recovery, demonstrate the benefits of moving some of the control over routing into the hands of end-systems.

Summary (8 min read)

1. Introduction

  • The Internet is organized as independently operating autonomous systems (AS’s) that peer together.
  • Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
  • RON detects problems by aggressively probing and monitoring the paths connecting its nodes.
  • RON nodes exchange information about the quality of the paths among themselves via a routing protocol and build forwarding tables based on a variety of path metrics, including latency, packet loss rate, and available throughput.
  • Using RON’s routing machinery, an Overlay ISP can provide more resilient and failure-resistant Internet service to its customers.

2.1 Internet Performance Studies

  • Labovitz et al. [12] use a combination of measurement and analysis to show that inter-domain routers in the Internet may take tens of minutes to reach a consistent view of the network topology after a fault, primarily because of routing table oscillations during BGP’s rather complicated path selection process.
  • They find that during this period of “delayed convergence,” end-to-end communication is adversely affected.
  • In fact, outages on the order of minutes cause active TCP connections (i.e., connections in the ESTABLISHED state with outstanding data) to terminate when TCP does not receive an acknowledgment for its outstanding data.
  • Furthermore, they find that about 40% of all path outages take more than 30 minutes to repair and are heavy-tailed in their duration.
  • The Detour measurement study made the observation, using Paxson’s and their own data collected at various times between 1995 and 1999, that path selection in the wide-area Internet is suboptimal from the standpoint of end-to-end latency, packet loss rate, and TCP throughput [23].

2.2 Network-layer Techniques

  • Much work has been done on performance-based and faulttolerant routing within a single routing domain, but practical mechanisms for wide-area Internet recovery from outages or badly performing paths are lacking.
  • Early ARPANET routing was more dynamic, responding to the current delay and utilization of the network.
  • An oft-cited “solution” to achieving fault-tolerant network connectivity for a small- or medium-sized customer is to multi-home, advertising a customer network through multiple ISPs.
  • This solution does not generally achieve fault detection and recovery within several seconds because of the degree of aggregation used to achieve wide-area routing scalability.
  • To limit the size of their routing tables, many ISPs will not accept routing announcements for fewer than 8192 contiguous addresses (a “/19” netblock).

2.3 Overlay-based Techniques

  • Overlay networks are an old idea; in fact, the Internet itself was developed as an overlay on the telephone network.
  • It provides management functions and mechanisms to insert packets into the overlay, but does not yet support fault-tolerant operation or application-controlled path selection.
  • RON takes advantage of underlying Internet path redundancy on time-scales of a few seconds, reacting responsively to path outages and performance failures.
  • This by itself will not improve the resilience of Internet communications for two reasons.
  • Various Content Delivery Networks (CDNs) use overlay techniques and caching to improve the performance of content delivery for specific applications such as HTTP and streaming video.

3.1 Fast Failure Detection and Recovery

  • Today’s wide-area Internet routing system based on BGP-4 does not handle failures well.
  • Applications perceive all failures in one of two ways: outages or performance failures.
  • Link failures and extreme path failures cause outages, when the average packet loss rate over a sustained period of several minutes is high (about 30% or higher), causing most protocols including TCP to degrade by several orders of magnitude.
  • Compounding this problem, IP-layer protocols like BGP-4 cannot detect problems such as packet floods and persistent congestion on links or paths that greatly degrade end-to-end performance.
  • As long as a link is deemed “live” (i.e., the BGP session is still alive), BGP’s AS-pathbased routing will continue to route packets down the faulty path; unfortunately, such a path may not provide adequate performance for an application using it.

3.2 Tighter Integration with Applications

  • Failures and faults are application-specific notions: network conditions that are fatal for one application may be acceptable for another, more adaptive one.
  • At loss rates of 30% or more, TCP becomes essentially unusable because it times out for most packets [16].
  • RON allows applications to independently define and react to failures.
  • In addition, applications may prioritize some metrics over others (e.g., latency over throughput, or low loss over latency) in their path selection.
  • They may also construct their own metrics to select paths.

3.3 Expressive Policy Routing

  • BGP-4 is incapable of expressing finegrained policies aimed at users or hosts.
  • This lack of precision not only reduces the set of paths available in the case of a failure, but also inhibits innovation in the use of carefully targeted policies, such as end-to-end per-user rate controls or enforcement of acceptable use policies (AUPs) based on packet classification.
  • Figure 2 shows the AS-level network connectivity between four of their RON hosts; the full graph for (only) 12 hosts traverses 36 different autonomous systems.
  • If the Aros-UUNET connection failed, users at Aros would be unable to reach MIT even if they were authorized to use Utah’s network resources to get there.

4. Design

  • RON nodes, deployed at various locations on the Internet, form an application-layer overlay to cooperatively route packets for each other.
  • Each RON node monitors the quality of the Internet paths between it and the other nodes, and uses this information to intelligently select paths for packets.
  • Each Internet path between two nodes is called a virtual link.
  • Most of RON’s design supports routing through multiple intermediate nodes, but their results (Section 6) show that using at most one intermediate RON node is sufficient most of the time.
  • Therefore, parts of their design focus on finding better paths via a single intermediate RON node.

4.1 Software Architecture

  • Each program that communicates with the RON software on a node is a RON client.
  • This group of clients can use service-specific routing metrics when deciding how to forward packets in the group.
  • A RON client interacts with RON across an API called a conduit, which the client uses to send and receive packets.
  • Recv(pkt, via ron) is a callback function that is called when a packet arrives for the client program.
  • RON routers and membership managers exchange packets using RON as their forwarding service, rather than over direct IP paths.

4.2 Routing and Path Selection

  • Routing is the process of building up the forwarding tables that are used to choose paths for packets.
  • Tagging, like the IPv6 flow ID, helps support multi-hop routing by speeding up the forwarding path at intermediate nodes.
  • The small size of a RON relative to the Internet allows it to maintain information about multiple alternate routes and to select the path that best suits the RON client according to a client-specified routing metric.
  • By default, it maintains information about three specific metrics for each virtual link: (i) latency, (ii) packet loss rate, and (iii) throughput, as might be obtained by a bulk-transfer TCP connection between the end-points of the virtual link.
  • The router builds up forwarding tables for each combination of policy routing and chosen routing metric.

4.2.1 Link-State Dissemination

  • The default RON router uses a link-state routing protocol to disseminate topology information between routers, which in turn is used to build the forwarding tables.
  • Each node in an -node RON has virtual links.
  • Each node’s router periodically requests summary information of the different performance metrics to the other nodes from its local performance database and disseminates its view to the others.
  • This information is sent via the RON forwarding mesh itself, to ensure that routing information is propagated in the event of path outages and heavy loss periods.
  • Thus, the RON routing protocol is itself a RON client, with a well-defined RON packet type.

4.2.2 Path Evaluation and Selection

  • These numbers are relative, and are only compared to other numbers from the same evaluator.
  • Throughput optimization combines the latency and loss metrics using a simplified version of the TCP throughput equation [16], which provides an upper-bound on TCP throughput.
  • The authors granularity of loss rate detection is 1%, and the throughput equation is more sensitive at lower loss rates.
  • While each of the evaluation metrics applies some smoothing, this is not enough to avoid “flapping” between two nearly equal routes: RON routers therefore employ hysteresis.
  • Measurement data is often noisy, and different clients may have different ways in which they will use that data; for instance, an outage detector may want to know if any packets were successfully sent in the last 15 seconds, but a throughput improver may be interested in a longer-term packet loss average.

4.3 Policy Routing

  • RON allows users or administrators to define the types of traffic allowed on particular network links.
  • RON separates policy routing into two components: classification and routing table formation.
  • When a packet enters the RON, it is classified and given a policy tag; this tag is used to perform lookups in the appropriate set of routing tables at each RON router.
  • The policy classifier produces the permits function that determines if a given policy is allowed to use a particular virtual link.
  • The authors have designed two policy mechanisms: exclusive cliques and general policies.

4.4 Data Forwarding

  • If it requires further delivery, the forwarder passes the RON packet header to the routing table, as shown in Figure 5.
  • RON also provides a policy tag that is interpreted by the forwarders to decide which network routing policies apply to the packet.
  • If the packet’s flow ID has a valid flow cache entry, the forwarder short-cuts the routing process with this entry.
  • There is one routing preference table for each known policy tag.
  • Next, the lookup procedure examines the routing preference flags to find a compatible route selection metric for the packet.

4.5 Bootstrap and Membership Management

  • In addition to allowing clients to define their own membership mechanisms, RON provides two system membership managers: a simple static membership mechanism that loads its peers from a file, and a dynamic announcement-based, soft-state membership protocol.
  • The main challenge in the dynamic membership protocol is to avoid confusing a path outage to a node from its having left the RON.
  • Each node builds up and periodically (every five minutes on average in their implementation) floods to all other nodes its list of peer RON nodes.
  • The overhead of broadcasting is minuscule compared to the traffic caused by active probing and routing updates, especially given the limited size of a RON.
  • This redundancy causes a node to be deleted from another node’s view of the RON only if the former node is genuinely partitioned for over an hour from every other node in the RON.

5. Implementation

  • Each client can pick and choose the components that best suits its needs.
  • The RON core services run without special kernel support or elevated privileges.
  • This classification is clientspecific; it labels the packet with information that decides what routing metric is later used to route the packet.
  • Non-entry RON nodes route the packet based only on the attached label and destination of the packet.
  • The conference node conduits implement the application-specific methods that label packets to tell the IP forwarder which routing metrics to use for conference packets.

5.1 The IP Forwarder

  • The authors implemented the resilient IP forwarder using FreeBSD’s divert sockets to automatically send IP traffic over the RON, and emit it at the other end.
  • The resilient IP forwarder provides classification, encapsulation, and decapsulation of IP packets through a special conduit called the ip conduit .

5.2 Routers

  • RON routers implement the router virtual interface, which has only a single function call, lookup(pkt *mypkt).
  • The RON library provides a trivial static router, and a dynamic router that routes based upon different metric optimizations.
  • Metric descriptions provide an evaluation function that returns the “score” of a link, and a list of metrics that the routing table needs to generate and propagate.
  • The implementation’s routing table creation is specific to single-hop indirection, which also eliminates the need for the flow cache.
  • Both provide classifiers for the resilient IP forwarder.

6. Evaluation

  • The goal of the RON system is to overcome path outages and performance failures, without introducing excessive overhead or new failure modes.
  • The authors present an evaluation of how well RON meets these goals.
  • First, the authors study RON’s ability to detect outages and recover quickly from them.
  • Next, the authors investigate performance failures and RON’s ability to improve the loss rate, latency, and throughput of badly performing paths.
  • Finally, the authors investigate two important aspects of RON’s routing, showing the effectiveness of its one-intermediate-hop strategy compared to more general alternatives and the stability of RONgenerated routes.

6.1 Methodology

  • Most of their results come from experiments with a wide-area RON deployed at several Internet sites.
  • To demonstrate that their policy routing module works, and to make their measurements closer to what Internet hosts in general would observe, all the measurements reported herein were taken with a policy that prohibited sending traffic to or from commercial sites over the Internet2.
  • The raw measurement data used in this paper consists of probe packets, throughput samples, and traceroute results.
  • To probe, each RON node independently repeated the following steps: 1. Pick a random node, .

6.2 Overcoming Path Outages

  • Precisely measuring a path outage is harder than one might think.
  • These statistics are obtained by calculating 13,650 30-minute loss-rate averages of a 51-hour subset of the packet trace, involving 132 different communication paths.
  • There were 5 “path-hours” of complete outage (100% loss rate) and 16 hours of TCP-perceived outage ( > % loss rate); RON routed around all these situations.
  • The combination of the Internet2 policy and consideration of only single-hop indirection meant that this RON could not provide connectivity between Cisco-MA and other non-MIT educational institutions.

6.2.1 Overhead and Outage Detection Time

  • The implementation of the resilient IP forwarder adds about 220 rate was when the direct path to an almost-partitioned site had a 99% loss rate!.
  • Thus, the average time between two probes is seconds.
  • A RON node sends a routing update to every other RON node every seconds on average.
  • The time to detect a failed path suggests that passive monitoring of in-use links will improve the single-virtual-link failure recovery case considerably, since the traffic flowing on the virtual link can be treated as “probes.”.
  • The authors believe that this overhead is reasonable for several classes of applications that require recovery from failures within several seconds.

6.2.2 Handling Packet Floods

  • To measure recovery time under controlled conditions and evaluate the effectiveness of RON in routing around a flood-induced outage, the authors conducted tests on the Utah Network Emulation Testbed, which has Intel PIII/600MHz machines on a quiescent 100Mbps switched Ethernet with Intel Etherexpress Pro/100 interfaces.
  • Indirect routing was possible through the third node, but the latencies made it less preferable than the direct path.
  • The rightmost trace (the horizontal dots) shows the non-RON TCP connection during the flooding attack.
  • These “overhead” packets are necessary for reliability and congestion control; similarly, RON’s active probes may be viewed as “overhead” that help achieve rapid recovery from failures.
  • RON still routed the returning ACK traffic along the flooded link; if BGP had declared the link “dead,” it would have eliminated a perfectly usable link.

6.3.1 Loss Rate

  • RON improved the loss rate by more than 0.05 a little more than 5% of the time.
  • Upon closer analysis, the authors found that the outage detection component of RON routing was instrumental in detecting bad situations promptly and in triggering a new path.
  • The authors found that the path between MIT and CCI had a highly asymmetric loss rate, which led to significant improvements due to RON on the MIT CCI path, but also infrequent occurrences when the loss rate on the CCI MIT path was made worse by RON.
  • 3.2 Latency Figure 13 shows the CDF of 39,683 five-minute-averaged roundtrip latency samples, collected across the 132 communicating paths in .4.
  • The points in the scatterplot appear in clustered bands, showing the improvements achieved on different node pairs at different times.

6.3.3 TCP Throughput

  • RON also improves TCP throughput between communicating nodes in many cases.
  • RON’s throughput-optimizing router does not attempt to detect or change routes to obtain small changes in throughput, since underlying Internet throughput is not particularly stable on most paths; rather, it seeks to obtain at least a 50% improvement in throughput on a RON path.
  • To compare a throughput-optimized RON path to the direct Internet path, the authors repeatedly took four sequential throughput samples— two with RON and two without—on all 132 paths, and compared the ratio of the average throughput achieved by RON to the average throughput achieved directly over the Internet.
  • Figure 15 shows the distribution of these ratios.
  • Out of 2,035 paired quartets of throughput samples, only 1% received less than 50% of the direct-path throughput with RON, while 5% of the samples doubled their throughput.

6.4 RON Routing Behavior

  • The authors instrumented a RON node to output its link-state routing table every 14 seconds on average, with a random jitter to avoid periodic effects.
  • The authors analyzed a 16-hour time-series trace containing 5,616 individual snapshots of the table, corresponding to 876,096 different pairwise routes.

6.4.1 RON Path Lengths

  • The authors outage results show that RON’s single-hop indirection worked well for avoiding problematic paths.
  • As noted in Section 6.2, policy routing could make certain links unusable and the consideration of longer paths will provide better recovery.
  • The direct Internet path provided the best average latency about @ of the time.
  • The following simple (and idealized) model provides an explanation.
  • The authors show that even small values of these probabilities, under independence assumptions (which are justufiable if RON nodes are in different AS’s), lead to at most one intermediate hop providing lowest-latency paths most of the time.

6.4.2 RON Route Stability

  • As a dynamic, measurement-based routing system, RON creates the potential for instability or route flapping.
  • The authors simulated RON’s path selection algorithms on the link-state trace to investigate route stability.
  • The “Changes” column of Table 5 shows the number of path changes that occurred as a function of the hysteresis before triggering a path change.
  • The other columns show the persistence of each RON route, obtained by calculating the number of consecutive samples over which the route remained unchanged.
  • The average time between samples was 14 seconds.

6.4.3 Application-specific Path Selection

  • The authors did not analyze the effects of the no-Internet2 policy here, and only considered latency optimization without outage avoidance.
  • There are situations where RON’s latency-, loss-, and throughput-optimizing routers pick different paths.
  • In contrast, between MIT and Cisco-MA, RON’s latency optimizer made the latency worse because the outage detector was triggered frequently.
  • The existence of these trade-offs (although the authors do not know how frequently they occur in the global Internet), and the lack of a single, obvious, optimal path reinforces their belief that a flexible, application-informed routing system can benefit applications.

7. Discussion

  • This section discusses three common criticisms of RONs relating to routing policy, scalability, and operation across Network Address Translators (NATs) and firewalls.
  • RON creates the possibility of misuse or violation of AUPs and BGP transit policies.
  • A corporation with tens of sites around the world could greatly improve the reachability between its sites by using a RON-based VPN, without using expensive dedicated links.
  • If RONs become popular, the authors do not expect this fundamental trade-off to change, but they expect to see many RONs co-existing and competing on Internet paths.
  • The second problem posed by NATs is that if two hosts are both behind NATs, they may not be able to communicate directly.

8. Conclusion

  • This paper showed that a Resilient Overlay Network (RON) can greatly improve the reliability of Internet packet delivery by detecting and recovering from outages and path failures more quickly than current inter-domain routing protocols.
  • A RON works by deploying nodes in different Internet routing domains, which cooperatively route packets for each other.
  • The authors found that RON was able to overcome 100% (in ) and 60% (in ) of the several hundred significant observed outages.
  • The authors implementation takes 18 seconds, on average, to detect and recover from a fault, significantly better than the several minutes taken by BGP-4.
  • RONs also overcome performance failures, substantially improving the loss rate, latency, and TCP throughput of badly performing Internet paths.

Did you find this useful? Give us your feedback

Figures (20)

Content maybe subject to copyright    Report

Resilient Overlay Networks
David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris
MIT Laboratory for Computer Science
ron@nms.lcs.mit.edu
http://nms.lcs.mit.edu/ron/
Abstract
A Resilient Overlay Network (RON) is an architecture that allows
distributed Internet applications to detect and recover from path
outages and periods of degraded performance within several sec-
onds, improving over today’s wide-area routing protocols that take
at least several minutes to recover. A RON is an application-layer
overlay on top of the existing Internet routing substrate. The RON
nodes monitor the functioning and quality of the Internet paths
among themselves, and use this information to decide whether to
route packets directly over the Internet or by way of other RON
nodes, optimizing application-specific routing metrics.
Results from two sets of measurements of a working RON de-
ployed at sites scattered across the Internet demonstrate the benefits
of our architecture. For instance, over a 64-hour sampling period in
March 2001 across a twelve-node RON, there were 32 significant
outages, each lasting over thirty minutes, over the 132 measured
paths. RON’s routing mechanism was able to detect, recover, and
route around all of them, in less than twenty seconds on average,
showing that its methods for fault detection and recovery work well
at discovering alternate paths in the Internet. Furthermore, RON
was able to improve the loss rate, latency, or throughput perceived
by data transfers; for example, about 5% of the transfers doubled
their TCP throughput and 5% of our transfers saw their loss prob-
ability reduced by 0.05. We found that forwarding packets via at
most one intermediate RON node is sufficient to overcome faults
and improve performance in most cases. These improvements, par-
ticularly in the area of fault detection and recovery, demonstrate the
benefits of moving some of the control over routing into the hands
of end-systems.
1. Introduction
The Internet is organized as independently operating au-
tonomous systems (AS’s) that peer together. In this architecture,
detailed routing information is maintained only within a single AS
This research was sponsored by the Defense Advanced Research
Projects Agency (DARPA) and the Space and Naval Warfare Sys-
tems Center, San Diego, under contract N66001-00-1-8933.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
18th ACM Symp. on Operating Systems Principles (SOSP) October 2001,
Banff, Canada.
Copyright 2001 ACM
and its constituent networks, usually operated by some network ser-
vice provider. The information shared with other providers and
AS’s is heavily filtered and summarized using the Border Gateway
Protocol (BGP-4) running at the border routers between AS’s [21],
which allows the Internet to scale to millions of networks.
This wide-area routing scalability comes at the cost of re-
duced fault-tolerance of end-to-end communication between Inter-
net hosts. This cost arises because BGP hides many topological
details in the interests of scalability and policy enforcement, has
little information about traffic conditions, and damps routing up-
dates when potential problems arise to prevent large-scale oscil-
lations. As a result, BGP’s fault recovery mechanisms sometimes
take many minutes before routes converge to a consistent form [12],
and there are times when path outages even lead to significant dis-
ruptions in communication lasting tens of minutes or more [3, 18,
19]. The result is that today’s Internet is vulnerable to router and
link faults, configuration errors, and malice—hardly a week goes
by without some serious problem affecting the connectivity pro-
vided by one or more Internet Service Providers (ISPs) [15].
Resilient Overlay Networks (RONs) are a remedy for some of
these problems. Distributed applications layer a “resilient overlay
network” over the underlying Internet routing substrate. The nodes
comprising a RON reside in a variety of routing domains, and co-
operate with each other to forward data on behalf of any pair of
communicating nodes in the RON. Because AS’s are independently
administrated and configured, and routing domains rarely share in-
terior links, they generally fail independently of each other. As
a result, if the underlying topology has physical path redundancy,
RON can often find paths between its nodes, even when wide-area
routing Internet protocols like BGP-4 cannot.
The main goal of RON is to enable a group of nodes to commu-
nicate with each other in the face of problems with the underlying
Internet paths connecting them. RON detects problems by aggres-
sively probing and monitoring the paths connecting its nodes. If
the underlying Internet path is the best one, that path is used and no
other RON node is involved in the forwarding path. If the Internet
path is not the best one, the RON will forward the packet by way of
other RON nodes. In practice, we have found that RON can route
around most failures by using only one intermediate hop.
RON nodes exchange information about the quality of the paths
among themselves via a routing protocol and build forwarding ta-
bles based on a variety of path metrics, including latency, packet
loss rate, and available throughput. Each RON node obtains the
path metrics using a combination of active probing experiments
and passive observations of on-going data transfers. In our imple-
mentation, each RON is explicitly designed to be limited in size—
between two and fifty nodes—to facilitate aggressive path main-
tenance via probing without excessive bandwidth overhead. This

CCI
Aros
Utah
CMU
To vu.nl
Lulea.se
MIT
MA−Cable
Cisco
Cornell
NYU
NC−Cable
OR−DSL
CA−T1
PDI
Mazu
Figure 1: The current sixteen-node RON deployment. Five sites
are at universities in the USA, two are European universities
(not shown), three are “broadband” home Internet hosts con-
nected by Cable or DSL, one is located at a US ISP, and five are
at corporations in the USA.
allows RON to recover from problems in the underlying Internet in
several seconds rather than several minutes.
The second goal of RON is to integrate routing and path selec-
tion with distributed applications more tightly than is traditionally
done. This integration includes the ability to consult application-
specific metrics in selecting paths, and the ability to incorporate
application-specific notions of what network conditions constitute a
“fault. As a result, RONs can be used in a variety of ways. A mul-
timedia conferencing program may link directly against the RON
library, transparently forming an overlay between all participants
in the conference, and using loss rates, delay jitter, or application-
observed throughput as metrics on which to choose paths. An ad-
ministrator may wish to use a RON-based router application to
form an overlay network between multiple LANs as an “Overlay
VPN. This idea can be extended further to develop an “Overlay
ISP, formed by linking (via RON) points of presence in different
traditional ISPs after buying bandwidth from them. Using RON’s
routing machinery, an Overlay ISP can provide more resilient and
failure-resistant Internet service to its customers.
The third goal of RON is to provide a framework for the imple-
mentation of expressive routing policies, which govern the choice
of paths in the network. For example, RON facilitates classifying
packets into categories that could implement notions of acceptable
use, or enforce forwarding rate controls.
This paper describes the design and implementation of RON,
and presents several experiments that evaluate whether RON is a
good idea. To conduct this evaluation and demonstrate the ben-
efits of RON, we have deployed a working sixteen-node RON at
sites sprinkled across the Internet (see Figure 1). The RON client
we experiment with is a resilient IP forwarder, which allows us to
compare connections between pairs of nodes running over a RON
against running straight over the Internet.
We have collected a few weeks’ worth of experimental results of
path outages and performance failures and present a detailed analy-
sis of two separate datasets:

with twelve nodes measured in
March 2001 and

with sixteen nodes measured in May 2001.
In both datasets, we found that RON was able to route around be-
tween 60% and 100% of all significant outages. Our implementa-
tion takes 18 seconds, on average, to detect and route around a path
failure and is able to do so in the face of an active denial-of-service
attack on a path. We also found that these benefits of quick fault de-
tection and successful recovery are realized on the public Internet
and do not depend on the existence of non-commercial or private
networks (such as the Internet2 backbone that interconnects many
educational institutions); our ability to determine this was enabled
by RON’s policy routing feature that allows the expression and im-
plementation of sophisticated policies that determine how paths are
selected for packets.
We also found that RON successfully routed around performance
failures: in

, the loss probability improved by at least 0.05
in 5% of the samples, end-to-end communication latency reduced
by 40ms in 11% of the samples, and TCP throughput doubled in
5% of all samples. In addition, we found cases when RON’s loss,
latency, and throughput-optimizing path selection mechanisms all
chose different paths between the same two nodes, suggesting that
application-specific path selection techniques are likely to be use-
ful in practice. A noteworthy finding from the experiments and
analysis is that in most cases, forwarding packets via at most one
intermediate RON node is sufficient both for recovering from fail-
ures and for improving communication latency.
2. Related Work
To our knowledge, RON is the first wide-area network overlay
system that can detect and recover from path outages and periods of
degraded performance within several seconds. RON builds on pre-
vious studies that quantify end-to-end network reliability and per-
formance, on IP-based routing techniques for fault-tolerance, and
on overlay-based techniques to enhance performance.
2.1 Internet Performance Studies
Labovitz et al. [12] use a combination of measurement and anal-
ysis to show that inter-domain routers in the Internet may take tens
of minutes to reach a consistent view of the network topology after
a fault, primarily because of routing table oscillations during BGP’s
rather complicated path selection process. They find that during
this period of “delayed convergence, end-to-end communication
is adversely affected. In fact, outages on the order of minutes cause
active TCP connections (i.e., connections in the ESTABLISHED
state with outstanding data) to terminate when TCP does not re-
ceive an acknowledgment for its outstanding data. They also find
that, while part of the convergence delays can be fixed with changes
to the deployed BGP implementations, long delays and temporary
oscillations are a fundamental consequence of the BGP path vector
routing protocol.
Paxson’s probe experiments show that routing pathologies pre-
vent selected Internet hosts from communicating up to 3.3% of the
time averaged over a long time period, and that this percentage has
not improved with time [18]. Labovitz et al. find, by examining
routing table logs at Internet backbones, that 10% of all considered
routes were available less than 95% of the time, and that less than
35% of all routes were available more than 99.99% of the time [13].
Furthermore, they find that about 40% of all path outages take more
than 30 minutes to repair and are heavy-tailed in their duration.
More recently, Chandra et al. find using active probing that 5%
of all detected failures last more than 10,000 seconds (2 hours, 45
minutes), and that failure durations are heavy-tailed and can last
for as long as 100,000 seconds before being repaired [3]. These
findings do not augur well for mission-critical services that require
a higher degree of end-to-end communication availability.
The Detour measurement study made the observation, using Pax-
son’s and their own data collected at various times between 1995
and 1999, that path selection in the wide-area Internet is sub-
optimal from the standpoint of end-to-end latency, packet loss rate,
and TCP throughput [23]. This study showed the potential long-
term benefits of “detouring” packets via a third node by comparing

the long-term average properties of detoured paths against Internet-
chosen paths.
2.2 Network-layer Techniques
Much work has been done on performance-based and fault-
tolerant routing within a single routing domain, but practical mech-
anisms for wide-area Internet recovery from outages or badly per-
forming paths are lacking.
Although today’s wide-area BGP-4 routing is based largely on
AS hop-counts, early ARPANET routing was more dynamic, re-
sponding to the current delay and utilization of the network. By
1989, the ARPANET evolved to using a delay- and congestion-
based distributed shortest path routing algorithm [11]. However,
the diversity and size of today’s decentralized Internet necessitated
the deployment of protocols that perform more aggregation and
fewer updates. As a result, unlike some interior routing protocols
within AS’s, BGP-4 routing between AS’s optimizes for scalable
operation over all else.
By treating vast collections of subnetworks as a single entity for
global routing purposes, BGP-4 is able to summarize and aggregate
enormous amounts of routing information into a format that scales
to hundreds of millions of hosts. To prevent costly route oscilla-
tions, BGP-4 explicitly damps changes in routes. Unfortunately,
while aggregation and damping provide good scalability, they in-
terfere with rapid detection and recovery when faults occur. RON
handles this by leaving scalable operation to the underlying Inter-
net substrate, moving fault detection and recovery to a higher layer
overlay that is capable of faster response because it does not have
to worry about scalability.
An oft-cited “solution” to achieving fault-tolerant network con-
nectivity for a small- or medium-sized customer is to multi-home,
advertising a customer network through multiple ISPs. The idea
is that an outage in one ISP would leave the customer connected
via the other. However, this solution does not generally achieve
fault detection and recovery within several seconds because of the
degree of aggregation used to achieve wide-area routing scalabil-
ity. To limit the size of their routing tables, many ISPs will not
accept routing announcements for fewer than 8192 contiguous ad-
dresses (a “/19” netblock). Small companies, regardless of their
fault-tolerance needs, do not often require such a large address
block, and cannot effectively multi-home. One alternative may be
“provider-based addressing, where an organization gets addresses
from multiple providers, but this requires handling two distinct sets
of addresses on its hosts. It is unclear how on-going connections
on one address set can seamlessly switch on a failure in this model.
2.3 Overlay-based Techniques
Overlay networks are an old idea; in fact, the Internet itself was
developed as an overlay on the telephone network. Several Inter-
net overlays have been designed in the past for various purposes,
including providing OSI network-layer connectivity [10], easing
IP multicast deployment using the MBone [6], and providing IPv6
connectivity using the 6-Bone [9]. The X-Bone is a recent infras-
tructure project designed to speed the deployment of IP-based over-
lay networks [26]. It provides management functions and mecha-
nisms to insert packets into the overlay, but does not yet support
fault-tolerant operation or application-controlled path selection.
Few overlay networks have been designed for efficient fault de-
tection and recovery, although some have been designed for better
end-to-end performance. The Detour framework [5, 22] was mo-
tivated by the potential long-term performance benefits of indirect
routing [23]. It is an in-kernel packet encapsulation and routing
architecture designed to support alternate-hop routing, with an em-
phasis on high performance packet classification and routing. It
uses IP-in-IP encapsulation to send packets along alternate paths.
While RON shares with Detour the idea of routing via other
nodes, our work differs from Detour in three significant ways. First,
RON seeks to prevent disruptions in end-to-end communication in
the face of failures. RON takes advantage of underlying Internet
path redundancy on time-scales of a few seconds, reacting respon-
sively to path outages and performance failures. Second, RON is
designed as an application-controlled routing overlay; because each
RON is more closely tied to the application using it, RON more
readily integrates application-specific path metrics and path selec-
tion policies. Third, we present and analyze experimental results
from a real-world deployment of a RON to demonstrate fast re-
covery from failure and improved latency and loss-rates even over
short time-scales.
An alternative design to RON would be to use a generic overlay
infrastructure like the X-Bone and port a standard network routing
protocol (like OSPF or RIP) with low timer values. However, this
by itself will not improve the resilience of Internet communications
for two reasons. First, a reliable and low-overhead outage detection
module is required, to distinguish between packet losses caused by
congestion or error-prone links from legitimate problems with a
path. Second, generic network-level routing protocols do not utilize
application-specific definitions of faults.
Various Content Delivery Networks (CDNs) use overlay tech-
niques and caching to improve the performance of content delivery
for specific applications such as HTTP and streaming video. The
functionality provided by RON may ease future CDN development
by providing some routing components required by these services.
3. Design Goals
The design of RON seeks to meet three main design goals: (i)
failure detection and recovery in less than 20 seconds; (ii) tighter
integration of routing and path selection with the application; and
(iii) expressive policy routing.
3.1 Fast Failure Detection and Recovery
Today’s wide-area Internet routing system based on BGP-4 does
not handle failures well. From a network perspective, we define
two kinds of failures. Link failures occur when a router or a link
connecting two routers fails because of a software error, hardware
problem, or link disconnection. Path failures occur for a variety of
reasons, including denial-of-service attacks or other bursts of traffic
that cause a high degree of packet loss or high, variable latencies.
Applications perceive all failures in one of two ways: outages or
performance failures. Link failures and extreme path failures cause
outages, when the average packet loss rate over a sustained period
of several minutes is high (about 30% or higher), causing most pro-
tocols including TCP to degrade by several orders of magnitude.
Performance failures are less extreme; for example, throughput, la-
tency, or loss-rates might degrade by a factor of two or three.
BGP-4 takes a long time, on the order of several minutes, to con-
verge to a new valid route after a link failure causes an outage [12].
In contrast, RON’s goal is to detect and recover from outages and
performance failures within several seconds. Compounding this
problem, IP-layer protocols like BGP-4 cannot detect problems
such as packet floods and persistent congestion on links or paths
that greatly degrade end-to-end performance. As long as a link is
deemed “live” (i.e., the BGP session is still alive), BGP’s AS-path-
based routing will continue to route packets down the faulty path;
unfortunately, such a path may not provide adequate performance
for an application using it.

155Mbps / 60ms
BBN
Qwest
UUNET
AT&T
MediaOne
6Mbps
130
Mbps
Private
Peering
45Mbps
5ms
1Mbps, 3ms
Cable Modem
Private
Peering
3Mbps
6ms
ArosNet
Utah
155
MIT
vBNS / Internet 2
Figure 2: Internet interconnectionsare often complex. The dot-
ted links are private and are not announced globally.
3.2 Tighter Integration with Applications
Failures and faults are application-specific notions: network con-
ditions that are fatal for one application may be acceptable for an-
other, more adaptive one. For instance, a UDP-based Internet audio
application not using good packet-level error correction may not
work at all at loss rates larger than 10%. At this loss rate, a bulk
transfer application using TCP will continue to work because of
TCP’s adaptation mechanisms, albeit at lower performance. How-
ever, at loss rates of 30% or more, TCP becomes essentially un-
usable because it times out for most packets [16]. RON allows
applications to independently define and react to failures.
In addition, applications may prioritize some metrics over oth-
ers (e.g., latency over throughput, or low loss over latency) in their
path selection. They may also construct their own metrics to select
paths. A routing system may not be able to optimize all of these
metrics simultaneously; for example, a path with a one-second la-
tency may appear to be the best throughput path, but this degree
of latency may be unacceptable to an interactive application. Cur-
rently, RON’s goal is to allow applications to influence the choice
of paths using a single metric. We plan to explore multi-criteria
path selection in the future.
3.3 Expressive Policy Routing
Despite the need for policy routing and enforcement of accept-
able use and other policies, today’s approaches are primitive and
cumbersome. For instance, BGP-4 is incapable of expressing fine-
grained policies aimed at users or hosts. This lack of precision
not only reduces the set of paths available in the case of a failure,
but also inhibits innovation in the use of carefully targeted policies,
such as end-to-end per-user rate controls or enforcement of accept-
able use policies (AUPs) based on packet classification. Because
RONs will typically run on relatively powerful end-points, we be-
lieve they are well-suited to providing fine-grained policy routing.
Figure 2 shows the AS-level network connectivity between four
of our RON hosts; the full graph for (only) 12 hosts traverses 36
different autonomous systems. The figure gives a hint of the con-
siderable underlying path redundancy available in the Internet—the
reason RON works—and shows situations where BGP’s blunt pol-
icy expression inhibits fail-over. For example, if the Aros-UUNET
connection failed, users at Aros would be unable to reach MIT even
if they were authorized to use Utah’s network resources to get there.
This is because it impossible to announce a BGP route only to par-
ticular users, so the Utah-MIT link is kept completely private.
External Probes
Data
Node 2 Node 3
Performance Database
Node 1
Probes
Forwarder
Router Probes
Forwarder
Router Probes
Forwarder
Router
Conduits Conduits Conduits
Figure 3: The RON system architecture. Data enters the RON
from RON clients via a conduit at an entry node. At each node,
the RON forwarder consults with its router to determine the best
path for the packet, and sends it to the next node. Path selec-
tion is done at the entry node, which also tags the packet, sim-
plifying the forwarding path at other nodes. When the packet
reaches the RON exit node, the forwarder there hands it to the
appropriate output conduit, which passes the data to the client.
To choose paths, RON nodes monitor the quality of their vir-
tual links using active probing and passive observation. RON
nodes use a link-state routing protocol to disseminate the topol-
ogy and virtual-link quality of the overlay network.
4. Design
The conceptual design of RON, shown in Figure 3, is quite sim-
ple. RON nodes, deployed at various locations on the Internet,
form an application-layer overlay to cooperatively route packets
for each other. Each RON node monitors the quality of the Internet
paths between it and the other nodes, and uses this information to
intelligently select paths for packets. Each Internet path between
two nodes is called a virtual link. To discover the topology of the
overlay network and obtain information about all virtual links in
the topology, every RON node participates in a routing protocol
to exchange information about a variety of quality metrics. Most
of RON’s design supports routing through multiple intermediate
nodes, but our results (Section 6) show that using at most one inter-
mediate RON node is sufficient most of the time. Therefore, parts
of our design focus on finding better paths via a single intermediate
RON node.
4.1 Software Architecture
Each program that communicates with the RON software on a
node is a RON client. The overlay network is defined by a sin-
gle group of clients that collaborate to provide a distributed service
or application. This group of clients can use service-specific rout-
ing metrics when deciding how to forward packets in the group.
Our design accommodates a variety of RON clients, ranging from
a generic IP packet forwarder that improves the reliability of IP
packet delivery, to a multi-party conferencing application that in-
corporates application-specific metrics in its route selection.
A RON client interacts with RON across an API called a conduit,
which the client uses to send and receive packets. On the data path,
the first node that receives a packet (via the conduit) classifies it
to determine the type of path it should use (e.g., low-latency, high-
throughput, etc.). This node is called the entry node: it determines
a path from its topology table, encapsulates the packet into a RON
header, tags it with some information that simplifies forwarding
by downstream RON nodes, and forwards it on. Each subsequent
RON node simply determines the next forwarding hop based on the
destination address and the tag. The final RON node that delivers
the packet to the RON application is called the exit node.
The conduits access RON via two functions:
1. send(pkt, dst, via
ron) allows a node to forward
a packet to a destination RON node either along the RON or

using the direct Internet path. RON’s delivery, like UDP, is
best-effort and unreliable.
2. recv(pkt, via ron) is a callback function that is
called when a packet arrives for the client program. This
callback is invoked after the RON conduit matches the type
of the packet in the RON header to the set of types pre-
registered by the client when it joins the RON. The RON
packet type is a demultiplexing field for incoming packets.
The basic RON functionality is provided by the forwarder
object, which implements the above functions. It also provides a
timer registration and callback mechanism to perform periodic op-
erations, and a similar service for network socket data availability.
Each client must instantiate a forwarder and hand to it two mod-
ules: a RON router and a RON membership manager. The RON
router implements a routing protocol. The RON membership man-
ager implements a protocol to maintain the list of members of a
RON. By default, RON provides a few different RON router and
membership manager modules for clients to use.
RON routers and membership managers exchange packets using
RON as their forwarding service, rather than over direct IP paths.
This feature of our system is beneficial because it allows these mes-
sages to be forwarded even when some underlying IP paths fail.
4.2 Routing and Path Selection
Routing is the process of building up the forwarding tables that
are used to choose paths for packets. In RON, the entry node
has more control over subsequent path selection than in traditional
datagram networks. This node tags the packet’s RON header with
an identifier that identifies the flow to which the packet belongs;
subsequent routers attempt to keep a flow ID on the same path it
first used, barring significant link changes. Tagging, like the IPv6
flow ID, helps support multi-hop routing by speeding up the for-
warding path at intermediate nodes. It also helps tie a packet flow
to a chosen path, making performance more predictable, and pro-
vides a basis for future support of multi-path routing in RON. By
tagging at the entry node, the application is given maximum control
over what the network considers a “flow.
The small size of a RON relative to the Internet allows it to main-
tain information about multiple alternate routes and to select the
path that best suits the RON client according to a client-specified
routing metric. By default, it maintains information about three
specific metrics for each virtual link: (i) latency, (ii) packet loss
rate, and (iii) throughput, as might be obtained by a bulk-transfer
TCP connection between the end-points of the virtual link. RON
clients can override these defaults with their own metrics, and the
RON library constructs the appropriate forwarding table to pick
good paths. The router builds up forwarding tables for each com-
bination of policy routing and chosen routing metric.
4.2.1 Link-State Dissemination
The default RON router uses a link-state routing protocol to dis-
seminate topology information between routers, which in turn is
used to build the forwarding tables. Each node in an
-node RON
has

virtual links. Each node’s router periodically requests
summary information of the different performance metrics to the

other nodes from its local performance database and dis-
seminates its view to the others.
This information is sent via the RON forwarding mesh itself, to
ensure that routing information is propagated in the event of path
outages and heavy loss periods. Thus, the RON routing protocol
is itself a RON client, with a well-defined RON packet type. This
leads to an attractive property: The only time a RON router has
incomplete information about any other one is when all paths in
the RON from the other RON nodes to it are unavailable.
4.2.2 Path Evaluation and Selection
The RON routers need an algorithm to determine if a path is still
alive, and a set of algorithms with which to evaluate potential paths.
The responsibility of these metric evaluators is to provide a number
quantifying how “good” a path is according to that metric. These
numbers are relative, and are only compared to other numbers from
the same evaluator. The two important aspects of path evaluation
are the mechanism by which the data for two links are combined
into a single path, and the formula used to evaluate the path.
Every RON router implements outage detection, which it uses
to determine if the virtual link between it and another node is still
working. It uses an active probing mechanism for this. On de-
tecting the loss of a probe, the normal low-frequency probing is re-
placed by a sequence of consecutive probes, sent in relatively quick
succession spaced by
 
seconds. If
 
probes in a row elicit no response, then the path is considered
“dead. If even one of them gets a response, then the subsequent
higher-frequency probes are canceled. Paths experiencing outages
are rated on their packet loss rate history; a path having an out-
age will always lose to a path not experiencing an outage. The
 
and the frequency of probing (
 
)
permit a trade-off between outage detection time and the bandwidth
consumed by the (low-frequency) probing process (Section 6.2 in-
vestigates this).
By default, every RON router implements three different routing
metrics: the latency-minimizer, the loss-minimizer, and the TCP
throughput-optimizer. The latency-minimizer forwarding table is
computed by computing an exponential weighted moving average
(EWMA) of round-trip latency samples with parameter
. For any
link
, its latency estimate
! "$#
is updated as:
! "$#&%(')! "$#+*-,
.
0/1'32&4)5 63 798:;4#
(1)
We use
=<?>@ A
, which means that 10% of the current latency
estimate is based on the most recent sample. This number is similar
to the values suggested for TCP’s round-trip time estimator [20].
For a RON path, the overall latency is the sum of the individual
virtual link latencies:
! "CB3D)E;FG<H
#!I
B3D)E;F
! "
#
.
To estimate loss rates, RON uses the average of the last
JK<
>>
probe samples as the current average. Like Floyd et al. [7], we
found this to be a better estimator than EWMA, which retains some
memory of samples obtained in the distant past as well. It might be
possible to further improve our estimator by unequally weighting
some of the
J
samples [7].
Loss metrics are multiplicative on a path: if we assume that
losses are independent, the probability of success on the entire path
is roughly equal to the probability of surviving all hops individu-
ally:
!L663M "N4
BOD)EPF
<
QSR
#!I
BOD)EPF
,
Q
!L66)M "N4)#P/
.
RON does not attempt to find optimal throughput paths, but
strives to avoid paths of low throughput when good alternatives are
available. Given the time-varying and somewhat unpredictable na-
ture of available bandwidth on Internet paths [2, 19], we believe this
is an appropriate goal. From the standpoint of improving the reli-
ability of path selection in the face of performance failures, avoid-
ing bad paths is more important than optimizing to eliminate small
throughput differences between paths. While a characterization of
the utility received by programs at different available bandwidths
may help determine a good path selection threshold, we believe that
more than a 50% bandwidth reduction is likely to reduce the util-
ity of many programs. This threshold also falls outside the typical
variation observed on a given path over time-scales of tens of min-

Citations
More filters
Proceedings ArticleDOI
27 Aug 2001
TL;DR: Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.
Abstract: A fundamental problem that confronts peer-to-peer applications is to efficiently locate the node that stores a particular data item. This paper presents Chord, a distributed lookup protocol that addresses this problem. Chord provides support for just one operation: given a key, it maps the key onto a node. Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data item pair at the node to which the key maps. Chord adapts efficiently as nodes join and leave the system, and can answer queries even if the system is continuously changing. Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.

10,286 citations

Journal ArticleDOI
01 Jan 2015
TL;DR: This paper presents an in-depth analysis of the hardware infrastructure, southbound and northbound application programming interfaces (APIs), network virtualization layers, network operating systems (SDN controllers), network programming languages, and network applications, and presents the key building blocks of an SDN infrastructure using a bottom-up, layered approach.
Abstract: The Internet has led to the creation of a digital society, where (almost) everything is connected and is accessible from anywhere. However, despite their widespread adoption, traditional IP networks are complex and very hard to manage. It is both difficult to configure the network according to predefined policies, and to reconfigure it to respond to faults, load, and changes. To make matters even more difficult, current networks are also vertically integrated: the control and data planes are bundled together. Software-defined networking (SDN) is an emerging paradigm that promises to change this state of affairs, by breaking vertical integration, separating the network's control logic from the underlying routers and switches, promoting (logical) centralization of network control, and introducing the ability to program the network. The separation of concerns, introduced between the definition of network policies, their implementation in switching hardware, and the forwarding of traffic, is key to the desired flexibility: by breaking the network control problem into tractable pieces, SDN makes it easier to create and introduce new abstractions in networking, simplifying network management and facilitating network evolution. In this paper, we present a comprehensive survey on SDN. We start by introducing the motivation for SDN, explain its main concepts and how it differs from traditional networking, its roots, and the standardization activities regarding this novel paradigm. Next, we present the key building blocks of an SDN infrastructure using a bottom-up, layered approach. We provide an in-depth analysis of the hardware infrastructure, southbound and northbound application programming interfaces (APIs), network virtualization layers, network operating systems (SDN controllers), network programming languages, and network applications. We also look at cross-layer problems such as debugging and troubleshooting. In an effort to anticipate the future evolution of this new paradigm, we discuss the main ongoing research efforts and challenges of SDN. In particular, we address the design of switches and control platforms—with a focus on aspects such as resiliency, scalability, performance, security, and dependability—as well as new opportunities for carrier transport networks and cloud providers. Last but not least, we analyze the position of SDN as a key enabler of a software-defined environment.

3,589 citations

Journal ArticleDOI
TL;DR: Results from theoretical analysis and simulations show that Chord is scalable: Communication cost and the state maintained by each node scale logarithmically with the number of Chord nodes.
Abstract: A fundamental problem that confronts peer-to-peer applications is the efficient location of the node that stores a desired data item. This paper presents Chord, a distributed lookup protocol that addresses this problem. Chord provides support for just one operation: given a key, it maps the key onto a node. Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data pair at the node to which the key maps. Chord adapts efficiently as nodes join and leave the system, and can answer queries even if the system is continuously changing. Results from theoretical analysis and simulations show that Chord is scalable: Communication cost and the state maintained by each node scale logarithmically with the number of Chord nodes.

3,518 citations


Cites background from "Resilient overlay networks"

  • ...Node n can become the successor only for keys that were previously the responsibility of the node immediately follow#define successor finger[1]:node...

    [...]

  • ...finger[1]....

    [...]

  • ...The Chord nodes are at ten sites on a subset of the RON test-bed in the United States [1], in California, Colorado, Massachusetts, New York, North Carolina, and Pennsylvania....

    [...]

  • ...successor the next node on the identifier circle; finger[1]:node...

    [...]

  • ...Newly joined nodes that have not yet been fingered may cause find predecessor to initially undershoot, but the loop in the lookup algorithm will nevertheless follow successor (finger[1]) pointers through the newly joined nodes until the correct predecessor is reached....

    [...]

Proceedings ArticleDOI
Kevin Fall1
25 Aug 2003
TL;DR: This work proposes a network architecture and application interface structured around optionally-reliable asynchronous message forwarding, with limited expectations of end-to-end connectivity and node resources.
Abstract: The highly successful architecture and protocols of today's Internet may operate poorly in environments characterized by very long delay paths and frequent network partitions. These problems are exacerbated by end nodes with limited power or memory resources. Often deployed in mobile and extreme environments lacking continuous connectivity, many such networks have their own specialized protocols, and do not utilize IP. To achieve interoperability between them, we propose a network architecture and application interface structured around optionally-reliable asynchronous message forwarding, with limited expectations of end-to-end connectivity and node resources. The architecture operates as an overlay above the transport layers of the networks it interconnects, and provides key services such as in-network data storage and retransmission, interoperable naming, authenticated forwarding and a coarse-grained class of service.

3,511 citations


Cites background from "Resilient overlay networks"

  • ...A notable exception is the Resilient Overlay Networks [ABKM01] project, which does provide a form of message delivery capability between users, but using an Internet-like message delivery semantic with fully-connected path abstraction....

    [...]

01 Jan 2001
TL;DR: Tapestry is an overlay location and routing infrastructure that provides location-independent routing of messages directly to the closest copy of an object or service using only point-to-point links and without centralized resources.
Abstract: In today’s chaotic network, data and services are mobile and replicated widely for availability, durability, and locality. Components within this infrastructure interact in rich and complex ways, greatly stressing traditional approaches to name service and routing. This paper explores an alternative to traditional approaches called Tapestry. Tapestry is an overlay location and routing infrastructure that provides location-independent routing of messages directly to the closest copy of an object or service using only point-to-point links and without centralized resources. The routing and directory information within this infrastructure is purely soft state and easily repaired. Tapestry is self-administering, faulttolerant, and resilient under load. This paper presents the architecture and algorithms of Tapestry and explores their advantages through a number of experiments.

2,275 citations


Additional excerpts

  • ...Finally, Resilient Overlay Networks [ 3 ], leverages the GRID location mechanism and the semantic routing of the Intentional Naming System (INS) [1] to provide fault-resilient overlay routing across the wide-area....

    [...]

References
More filters
01 Sep 1981

3,411 citations

01 Jul 1994
TL;DR: This document, together with its companion document, "Application of the Border Gateway Protocol in the Internet", define an inter- autonomous system routing protocol for the Internet.
Abstract: This document, together with its companion document, "Application of the Border Gateway Protocol in the Internet", define an inter- autonomous system routing protocol for the Internet.

2,832 citations


"Resilient overlay networks" refers methods in this paper

  • ...The information shared with other providers and AS’s is heavily filtered and summarized using the Border Gateway Protocol (BGP-4) running at the border routers between AS’s [21], which allows the Internet to scale to millions of networks....

    [...]

Proceedings Article
01 Jan 2000
TL;DR: The potential benefits of transferring multicast functionality from end systems to routers significantly outweigh the performance penalty incurred and the results indicate that the performance penalties are low both from the application and the network perspectives.

2,372 citations

Proceedings ArticleDOI
01 Oct 1998
TL;DR: In this article, the authors developed a simple analytic characterization of the steady state throughput, as a function of loss rate and round trip time for a bulk transfer TCP flow, i.e., a flow with an unlimited amount of data to send.
Abstract: In this paper we develop a simple analytic characterization of the steady state throughput, as a function of loss rate and round trip time for a bulk transfer TCP flow, i.e., a flow with an unlimited amount of data to send. Unlike the models in [6, 7, 10], our model captures not only the behavior of TCP's fast retransmit mechanism (which is also considered in [6, 7, 10]) but also the effect of TCP's timeout mechanism on throughput. Our measurements suggest that this latter behavior is important from a modeling perspective, as almost all of our TCP traces contained more time-out events than fast retransmit events. Our measurements demonstrate that our model is able to more accurately predict TCP throughput and is accurate over a wider range of loss rates.

2,145 citations

Journal ArticleDOI
28 Aug 2000
TL;DR: A mechanism for equation-based congestion control for unicast traffic that refrains from reducing the sending rate in half in response to a single packet drop, and uses both simulations and experiments over the Internet to explore performance.
Abstract: This paper proposes a mechanism for equation-based congestion control for unicast traffic. Most best-effort traffic in the current Internet is well-served by the dominant transport protocol, TCP. However, traffic such as best-effort unicast streaming multimedia could find use for a TCP-friendly congestion control mechanism that refrains from reducing the sending rate in half in response to a single packet drop. With our mechanism, the sender explicitly adjusts its sending rate as a function of the measured rate of loss events, where a loss event consists of one or more packets dropped within a single round-trip time. We use both simulations and experiments over the Internet to explore performance.

1,458 citations

Frequently Asked Questions (2)
Q1. What are the contributions mentioned in the paper "Resilient overlay networks" ?

The authors found that forwarding packets via at most one intermediate RON node is sufficient to overcome faults and improve performance in most cases. 

Understanding the interactions between them and investigating routing stability in an Internet with many RONs is an area for future work.