Resilient overlay networks
Summary (8 min read)
1. Introduction
- The Internet is organized as independently operating autonomous systems (AS’s) that peer together.
- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
- RON detects problems by aggressively probing and monitoring the paths connecting its nodes.
- RON nodes exchange information about the quality of the paths among themselves via a routing protocol and build forwarding tables based on a variety of path metrics, including latency, packet loss rate, and available throughput.
- Using RON’s routing machinery, an Overlay ISP can provide more resilient and failure-resistant Internet service to its customers.
2.1 Internet Performance Studies
- Labovitz et al. [12] use a combination of measurement and analysis to show that inter-domain routers in the Internet may take tens of minutes to reach a consistent view of the network topology after a fault, primarily because of routing table oscillations during BGP’s rather complicated path selection process.
- They find that during this period of “delayed convergence,” end-to-end communication is adversely affected.
- In fact, outages on the order of minutes cause active TCP connections (i.e., connections in the ESTABLISHED state with outstanding data) to terminate when TCP does not receive an acknowledgment for its outstanding data.
- Furthermore, they find that about 40% of all path outages take more than 30 minutes to repair and are heavy-tailed in their duration.
- The Detour measurement study made the observation, using Paxson’s and their own data collected at various times between 1995 and 1999, that path selection in the wide-area Internet is suboptimal from the standpoint of end-to-end latency, packet loss rate, and TCP throughput [23].
2.2 Network-layer Techniques
- Much work has been done on performance-based and faulttolerant routing within a single routing domain, but practical mechanisms for wide-area Internet recovery from outages or badly performing paths are lacking.
- Early ARPANET routing was more dynamic, responding to the current delay and utilization of the network.
- An oft-cited “solution” to achieving fault-tolerant network connectivity for a small- or medium-sized customer is to multi-home, advertising a customer network through multiple ISPs.
- This solution does not generally achieve fault detection and recovery within several seconds because of the degree of aggregation used to achieve wide-area routing scalability.
- To limit the size of their routing tables, many ISPs will not accept routing announcements for fewer than 8192 contiguous addresses (a “/19” netblock).
2.3 Overlay-based Techniques
- Overlay networks are an old idea; in fact, the Internet itself was developed as an overlay on the telephone network.
- It provides management functions and mechanisms to insert packets into the overlay, but does not yet support fault-tolerant operation or application-controlled path selection.
- RON takes advantage of underlying Internet path redundancy on time-scales of a few seconds, reacting responsively to path outages and performance failures.
- This by itself will not improve the resilience of Internet communications for two reasons.
- Various Content Delivery Networks (CDNs) use overlay techniques and caching to improve the performance of content delivery for specific applications such as HTTP and streaming video.
3.1 Fast Failure Detection and Recovery
- Today’s wide-area Internet routing system based on BGP-4 does not handle failures well.
- Applications perceive all failures in one of two ways: outages or performance failures.
- Link failures and extreme path failures cause outages, when the average packet loss rate over a sustained period of several minutes is high (about 30% or higher), causing most protocols including TCP to degrade by several orders of magnitude.
- Compounding this problem, IP-layer protocols like BGP-4 cannot detect problems such as packet floods and persistent congestion on links or paths that greatly degrade end-to-end performance.
- As long as a link is deemed “live” (i.e., the BGP session is still alive), BGP’s AS-pathbased routing will continue to route packets down the faulty path; unfortunately, such a path may not provide adequate performance for an application using it.
3.2 Tighter Integration with Applications
- Failures and faults are application-specific notions: network conditions that are fatal for one application may be acceptable for another, more adaptive one.
- At loss rates of 30% or more, TCP becomes essentially unusable because it times out for most packets [16].
- RON allows applications to independently define and react to failures.
- In addition, applications may prioritize some metrics over others (e.g., latency over throughput, or low loss over latency) in their path selection.
- They may also construct their own metrics to select paths.
3.3 Expressive Policy Routing
- BGP-4 is incapable of expressing finegrained policies aimed at users or hosts.
- This lack of precision not only reduces the set of paths available in the case of a failure, but also inhibits innovation in the use of carefully targeted policies, such as end-to-end per-user rate controls or enforcement of acceptable use policies (AUPs) based on packet classification.
- Figure 2 shows the AS-level network connectivity between four of their RON hosts; the full graph for (only) 12 hosts traverses 36 different autonomous systems.
- If the Aros-UUNET connection failed, users at Aros would be unable to reach MIT even if they were authorized to use Utah’s network resources to get there.
4. Design
- RON nodes, deployed at various locations on the Internet, form an application-layer overlay to cooperatively route packets for each other.
- Each RON node monitors the quality of the Internet paths between it and the other nodes, and uses this information to intelligently select paths for packets.
- Each Internet path between two nodes is called a virtual link.
- Most of RON’s design supports routing through multiple intermediate nodes, but their results (Section 6) show that using at most one intermediate RON node is sufficient most of the time.
- Therefore, parts of their design focus on finding better paths via a single intermediate RON node.
4.1 Software Architecture
- Each program that communicates with the RON software on a node is a RON client.
- This group of clients can use service-specific routing metrics when deciding how to forward packets in the group.
- A RON client interacts with RON across an API called a conduit, which the client uses to send and receive packets.
- Recv(pkt, via ron) is a callback function that is called when a packet arrives for the client program.
- RON routers and membership managers exchange packets using RON as their forwarding service, rather than over direct IP paths.
4.2 Routing and Path Selection
- Routing is the process of building up the forwarding tables that are used to choose paths for packets.
- Tagging, like the IPv6 flow ID, helps support multi-hop routing by speeding up the forwarding path at intermediate nodes.
- The small size of a RON relative to the Internet allows it to maintain information about multiple alternate routes and to select the path that best suits the RON client according to a client-specified routing metric.
- By default, it maintains information about three specific metrics for each virtual link: (i) latency, (ii) packet loss rate, and (iii) throughput, as might be obtained by a bulk-transfer TCP connection between the end-points of the virtual link.
- The router builds up forwarding tables for each combination of policy routing and chosen routing metric.
4.2.1 Link-State Dissemination
- The default RON router uses a link-state routing protocol to disseminate topology information between routers, which in turn is used to build the forwarding tables.
- Each node in an -node RON has virtual links.
- Each node’s router periodically requests summary information of the different performance metrics to the other nodes from its local performance database and disseminates its view to the others.
- This information is sent via the RON forwarding mesh itself, to ensure that routing information is propagated in the event of path outages and heavy loss periods.
- Thus, the RON routing protocol is itself a RON client, with a well-defined RON packet type.
4.2.2 Path Evaluation and Selection
- These numbers are relative, and are only compared to other numbers from the same evaluator.
- Throughput optimization combines the latency and loss metrics using a simplified version of the TCP throughput equation [16], which provides an upper-bound on TCP throughput.
- The authors granularity of loss rate detection is 1%, and the throughput equation is more sensitive at lower loss rates.
- While each of the evaluation metrics applies some smoothing, this is not enough to avoid “flapping” between two nearly equal routes: RON routers therefore employ hysteresis.
- Measurement data is often noisy, and different clients may have different ways in which they will use that data; for instance, an outage detector may want to know if any packets were successfully sent in the last 15 seconds, but a throughput improver may be interested in a longer-term packet loss average.
4.3 Policy Routing
- RON allows users or administrators to define the types of traffic allowed on particular network links.
- RON separates policy routing into two components: classification and routing table formation.
- When a packet enters the RON, it is classified and given a policy tag; this tag is used to perform lookups in the appropriate set of routing tables at each RON router.
- The policy classifier produces the permits function that determines if a given policy is allowed to use a particular virtual link.
- The authors have designed two policy mechanisms: exclusive cliques and general policies.
4.4 Data Forwarding
- If it requires further delivery, the forwarder passes the RON packet header to the routing table, as shown in Figure 5.
- RON also provides a policy tag that is interpreted by the forwarders to decide which network routing policies apply to the packet.
- If the packet’s flow ID has a valid flow cache entry, the forwarder short-cuts the routing process with this entry.
- There is one routing preference table for each known policy tag.
- Next, the lookup procedure examines the routing preference flags to find a compatible route selection metric for the packet.
4.5 Bootstrap and Membership Management
- In addition to allowing clients to define their own membership mechanisms, RON provides two system membership managers: a simple static membership mechanism that loads its peers from a file, and a dynamic announcement-based, soft-state membership protocol.
- The main challenge in the dynamic membership protocol is to avoid confusing a path outage to a node from its having left the RON.
- Each node builds up and periodically (every five minutes on average in their implementation) floods to all other nodes its list of peer RON nodes.
- The overhead of broadcasting is minuscule compared to the traffic caused by active probing and routing updates, especially given the limited size of a RON.
- This redundancy causes a node to be deleted from another node’s view of the RON only if the former node is genuinely partitioned for over an hour from every other node in the RON.
5. Implementation
- Each client can pick and choose the components that best suits its needs.
- The RON core services run without special kernel support or elevated privileges.
- This classification is clientspecific; it labels the packet with information that decides what routing metric is later used to route the packet.
- Non-entry RON nodes route the packet based only on the attached label and destination of the packet.
- The conference node conduits implement the application-specific methods that label packets to tell the IP forwarder which routing metrics to use for conference packets.
5.1 The IP Forwarder
- The authors implemented the resilient IP forwarder using FreeBSD’s divert sockets to automatically send IP traffic over the RON, and emit it at the other end.
- The resilient IP forwarder provides classification, encapsulation, and decapsulation of IP packets through a special conduit called the ip conduit .
5.2 Routers
- RON routers implement the router virtual interface, which has only a single function call, lookup(pkt *mypkt).
- The RON library provides a trivial static router, and a dynamic router that routes based upon different metric optimizations.
- Metric descriptions provide an evaluation function that returns the “score” of a link, and a list of metrics that the routing table needs to generate and propagate.
- The implementation’s routing table creation is specific to single-hop indirection, which also eliminates the need for the flow cache.
- Both provide classifiers for the resilient IP forwarder.
5.3 Monitoring Virtual Links
- Each RON node in an -node RON monitors its virtual links using randomized periodic probes.
- The active prober component maintains a copy of a peers table with a next probe time field per peer.
- Each probe packet has a random 64-bit ID.
- When the originating node sees response 1, it sends response 2 back to the peer, so that both sides get reachability and RTT information from 3 packets.
- The probing protocol is implemented as a RON client, which communicates with performance database (implemented as a standalone application running on the Berkeley DB3 backend) using a simple UDP-based protocol.
6. Evaluation
- The goal of the RON system is to overcome path outages and performance failures, without introducing excessive overhead or new failure modes.
- The authors present an evaluation of how well RON meets these goals.
- First, the authors study RON’s ability to detect outages and recover quickly from them.
- Next, the authors investigate performance failures and RON’s ability to improve the loss rate, latency, and throughput of badly performing paths.
- Finally, the authors investigate two important aspects of RON’s routing, showing the effectiveness of its one-intermediate-hop strategy compared to more general alternatives and the stability of RONgenerated routes.
6.1 Methodology
- Most of their results come from experiments with a wide-area RON deployed at several Internet sites.
- To demonstrate that their policy routing module works, and to make their measurements closer to what Internet hosts in general would observe, all the measurements reported herein were taken with a policy that prohibited sending traffic to or from commercial sites over the Internet2.
- The raw measurement data used in this paper consists of probe packets, throughput samples, and traceroute results.
- To probe, each RON node independently repeated the following steps: 1. Pick a random node, .
6.2 Overcoming Path Outages
- Precisely measuring a path outage is harder than one might think.
- These statistics are obtained by calculating 13,650 30-minute loss-rate averages of a 51-hour subset of the packet trace, involving 132 different communication paths.
- There were 5 “path-hours” of complete outage (100% loss rate) and 16 hours of TCP-perceived outage ( > % loss rate); RON routed around all these situations.
- The combination of the Internet2 policy and consideration of only single-hop indirection meant that this RON could not provide connectivity between Cisco-MA and other non-MIT educational institutions.
6.2.1 Overhead and Outage Detection Time
- The implementation of the resilient IP forwarder adds about 220 rate was when the direct path to an almost-partitioned site had a 99% loss rate!.
- Thus, the average time between two probes is seconds.
- A RON node sends a routing update to every other RON node every seconds on average.
- The time to detect a failed path suggests that passive monitoring of in-use links will improve the single-virtual-link failure recovery case considerably, since the traffic flowing on the virtual link can be treated as “probes.”.
- The authors believe that this overhead is reasonable for several classes of applications that require recovery from failures within several seconds.
6.2.2 Handling Packet Floods
- To measure recovery time under controlled conditions and evaluate the effectiveness of RON in routing around a flood-induced outage, the authors conducted tests on the Utah Network Emulation Testbed, which has Intel PIII/600MHz machines on a quiescent 100Mbps switched Ethernet with Intel Etherexpress Pro/100 interfaces.
- Indirect routing was possible through the third node, but the latencies made it less preferable than the direct path.
- The rightmost trace (the horizontal dots) shows the non-RON TCP connection during the flooding attack.
- These “overhead” packets are necessary for reliability and congestion control; similarly, RON’s active probes may be viewed as “overhead” that help achieve rapid recovery from failures.
- RON still routed the returning ACK traffic along the flooded link; if BGP had declared the link “dead,” it would have eliminated a perfectly usable link.
6.3.1 Loss Rate
- RON improved the loss rate by more than 0.05 a little more than 5% of the time.
- Upon closer analysis, the authors found that the outage detection component of RON routing was instrumental in detecting bad situations promptly and in triggering a new path.
- The authors found that the path between MIT and CCI had a highly asymmetric loss rate, which led to significant improvements due to RON on the MIT CCI path, but also infrequent occurrences when the loss rate on the CCI MIT path was made worse by RON.
- 3.2 Latency Figure 13 shows the CDF of 39,683 five-minute-averaged roundtrip latency samples, collected across the 132 communicating paths in .4.
- The points in the scatterplot appear in clustered bands, showing the improvements achieved on different node pairs at different times.
6.3.3 TCP Throughput
- RON also improves TCP throughput between communicating nodes in many cases.
- RON’s throughput-optimizing router does not attempt to detect or change routes to obtain small changes in throughput, since underlying Internet throughput is not particularly stable on most paths; rather, it seeks to obtain at least a 50% improvement in throughput on a RON path.
- To compare a throughput-optimized RON path to the direct Internet path, the authors repeatedly took four sequential throughput samples— two with RON and two without—on all 132 paths, and compared the ratio of the average throughput achieved by RON to the average throughput achieved directly over the Internet.
- Figure 15 shows the distribution of these ratios.
- Out of 2,035 paired quartets of throughput samples, only 1% received less than 50% of the direct-path throughput with RON, while 5% of the samples doubled their throughput.
6.4 RON Routing Behavior
- The authors instrumented a RON node to output its link-state routing table every 14 seconds on average, with a random jitter to avoid periodic effects.
- The authors analyzed a 16-hour time-series trace containing 5,616 individual snapshots of the table, corresponding to 876,096 different pairwise routes.
6.4.1 RON Path Lengths
- The authors outage results show that RON’s single-hop indirection worked well for avoiding problematic paths.
- As noted in Section 6.2, policy routing could make certain links unusable and the consideration of longer paths will provide better recovery.
- The direct Internet path provided the best average latency about @ of the time.
- The following simple (and idealized) model provides an explanation.
- The authors show that even small values of these probabilities, under independence assumptions (which are justufiable if RON nodes are in different AS’s), lead to at most one intermediate hop providing lowest-latency paths most of the time.
6.4.2 RON Route Stability
- As a dynamic, measurement-based routing system, RON creates the potential for instability or route flapping.
- The authors simulated RON’s path selection algorithms on the link-state trace to investigate route stability.
- The “Changes” column of Table 5 shows the number of path changes that occurred as a function of the hysteresis before triggering a path change.
- The other columns show the persistence of each RON route, obtained by calculating the number of consecutive samples over which the route remained unchanged.
- The average time between samples was 14 seconds.
6.4.3 Application-specific Path Selection
- The authors did not analyze the effects of the no-Internet2 policy here, and only considered latency optimization without outage avoidance.
- There are situations where RON’s latency-, loss-, and throughput-optimizing routers pick different paths.
- In contrast, between MIT and Cisco-MA, RON’s latency optimizer made the latency worse because the outage detector was triggered frequently.
- The existence of these trade-offs (although the authors do not know how frequently they occur in the global Internet), and the lack of a single, obvious, optimal path reinforces their belief that a flexible, application-informed routing system can benefit applications.
7. Discussion
- This section discusses three common criticisms of RONs relating to routing policy, scalability, and operation across Network Address Translators (NATs) and firewalls.
- RON creates the possibility of misuse or violation of AUPs and BGP transit policies.
- A corporation with tens of sites around the world could greatly improve the reachability between its sites by using a RON-based VPN, without using expensive dedicated links.
- If RONs become popular, the authors do not expect this fundamental trade-off to change, but they expect to see many RONs co-existing and competing on Internet paths.
- The second problem posed by NATs is that if two hosts are both behind NATs, they may not be able to communicate directly.
8. Conclusion
- This paper showed that a Resilient Overlay Network (RON) can greatly improve the reliability of Internet packet delivery by detecting and recovering from outages and path failures more quickly than current inter-domain routing protocols.
- A RON works by deploying nodes in different Internet routing domains, which cooperatively route packets for each other.
- The authors found that RON was able to overcome 100% (in ) and 60% (in ) of the several hundred significant observed outages.
- The authors implementation takes 18 seconds, on average, to detect and recover from a fault, significantly better than the several minutes taken by BGP-4.
- RONs also overcome performance failures, substantially improving the loss rate, latency, and TCP throughput of badly performing Internet paths.
Did you find this useful? Give us your feedback
Citations
10,286 citations
3,589 citations
3,518 citations
Cites background from "Resilient overlay networks"
...Node n can become the successor only for keys that were previously the responsibility of the node immediately follow#define successor finger[1]:node...
[...]
...finger[1]....
[...]
...The Chord nodes are at ten sites on a subset of the RON test-bed in the United States [1], in California, Colorado, Massachusetts, New York, North Carolina, and Pennsylvania....
[...]
...successor the next node on the identifier circle; finger[1]:node...
[...]
...Newly joined nodes that have not yet been fingered may cause find predecessor to initially undershoot, but the loop in the lookup algorithm will nevertheless follow successor (finger[1]) pointers through the newly joined nodes until the correct predecessor is reached....
[...]
3,511 citations
Cites background from "Resilient overlay networks"
...A notable exception is the Resilient Overlay Networks [ABKM01] project, which does provide a form of message delivery capability between users, but using an Internet-like message delivery semantic with fully-connected path abstraction....
[...]
2,275 citations
Additional excerpts
...Finally, Resilient Overlay Networks [ 3 ], leverages the GRID location mechanism and the semantic routing of the Intentional Naming System (INS) [1] to provide fault-resilient overlay routing across the wide-area....
[...]
References
2,832 citations
"Resilient overlay networks" refers methods in this paper
...The information shared with other providers and AS’s is heavily filtered and summarized using the Border Gateway Protocol (BGP-4) running at the border routers between AS’s [21], which allows the Internet to scale to millions of networks....
[...]
2,372 citations
2,145 citations
1,458 citations
Related Papers (5)
Frequently Asked Questions (2)
Q2. What are the future works in "Resilient overlay networks" ?
Understanding the interactions between them and investigating routing stability in an Internet with many RONs is an area for future work.