What are the contributions mentioned in the paper "Resilient overlay networks" ?

The authors found that forwarding packets via at most one intermediate RON node is sufficient to overcome faults and improve performance in most cases.

What have the authors stated for future works in "Resilient overlay networks" ?

Understanding the interactions between them and investigating routing stability in an Internet with many RONs is an area for future work.

How much hysteresis is used to predict the last good route?

Based on an analysis of 5000 snapshots from a RON node’s link-state table, the authors chose to apply a simple 5% hysteresis bonus to the “last good” route for the three metrics.

Why do the authors think RONs are well-suited to providing fine-grained policy routing?

Because RONs will typically run on relatively powerful end-points, the authors believe they are well-suited to providing fine-grained policy routing.

What is the policy tag used to demultiplex the packet?

If the packet is destined for the local node, the forwarder uses the packet type field to demultiplex the packet to the RON client.

How long does it take to detect a failed path?

The time to detect a failed path suggests that passive monitoring of in-use linkswill improve the single-virtual-link failure recovery case considerably, since the traffic flowing on the virtual link can be treated as “probes.

How many seconds is the next probe packet sent?

Their implementation uses the following values:12 seconds ) 3 seconds ) 14 secondsWhen a probe loss occurs, the next probe packet is sent immediately, up to a maximum of 3 more “quick” probes.

What is the maximum error in the one-way loss rate estimate?

Assuming that losses are independent on the two paths and , then the maximum absolute error in the one-way loss rate estimate occurs when all of the loss is in only one direction, resulting in an error equal to# .

How many intermediate nodes were involved in the shortest path?

In addition, the remaining @ of the time when RON’s overlay routing was involved, the shortest path involved only one intermediate node essentially all the time: about 98%.

How long does it take to detect and recover from a fault?

Their implementation takes 18 seconds, on average, to detect and recover from a fault, significantly better than the several minutes taken by BGP-4.

How many times did the authors take samples from a MByte bulk transfer?

The authors also took 8,855 throughput samples from 1 MByte bulk transfers (or 30 seconds, whichever came sooner), recording the time at which each power of two’s worth of data was sent and the duration of the transfer.

What is the construction of a shortest-paths algorithm?

This construction applies to multi-hop or single-hop indirection; because a standard shortest-paths algorithm may not work for all metrics (specifically, TCP throughput), their implementation, described in Section 5.2, is specific to single-hop indirection.

(Open Access) Resilient overlay networks (2002) | David G. Andersen

Q: What is the importance of avoiding bad paths?

From the standpoint of improving the reliability of path selection in the face of performance failures, avoiding bad paths is more important than optimizing to eliminate small throughput differences between paths.

Resilient Overlay Networks

David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris

MIT Laboratory for Computer Science

ron@nms.lcs.mit.edu

http://nms.lcs.mit.edu/ron/

Abstract

A Resilient Overlay Network (RON) is an architecture that allows

distributed Internet applications to detect and recover from path

outages and periods of degraded performance within several sec-

onds, improving over today’s wide-area routing protocols that take

at least several minutes to recover. A RON is an application-layer

overlay on top of the existing Internet routing substrate. The RON

nodes monitor the functioning and quality of the Internet paths

among themselves, and use this information to decide whether to

route packets directly over the Internet or by way of other RON

nodes, optimizing application-speciﬁc routing metrics.

Results from two sets of measurements of a working RON de-

ployed at sites scattered across the Internet demonstrate the beneﬁts

of our architecture. For instance, over a 64-hour sampling period in

March 2001 across a twelve-node RON, there were 32 signiﬁcant

outages, each lasting over thirty minutes, over the 132 measured

paths. RON’s routing mechanism was able to detect, recover, and

route around all of them, in less than twenty seconds on average,

showing that its methods for fault detection and recovery work well

at discovering alternate paths in the Internet. Furthermore, RON

was able to improve the loss rate, latency, or throughput perceived

by data transfers; for example, about 5% of the transfers doubled

their TCP throughput and 5% of our transfers saw their loss prob-

ability reduced by 0.05. We found that forwarding packets via at

most one intermediate RON node is sufﬁcient to overcome faults

and improve performance in most cases. These improvements, par-

ticularly in the area of fault detection and recovery, demonstrate the

beneﬁts of moving some of the control over routing into the hands

of end-systems.

1. Introduction

The Internet is organized as independently operating au-

tonomous systems (AS’s) that peer together. In this architecture,

detailed routing information is maintained only within a single AS

This research was sponsored by the Defense Advanced Research

Projects Agency (DARPA) and the Space and Naval Warfare Sys-

tems Center, San Diego, under contract N66001-00-1-8933.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

18th ACM Symp. on Operating Systems Principles (SOSP) October 2001,

Banff, Canada.

and its constituent networks, usually operated by some network ser-

vice provider. The information shared with other providers and

AS’s is heavily ﬁltered and summarized using the Border Gateway

Protocol (BGP-4) running at the border routers between AS’s [21],

which allows the Internet to scale to millions of networks.

This wide-area routing scalability comes at the cost of re-

duced fault-tolerance of end-to-end communication between Inter-

net hosts. This cost arises because BGP hides many topological

details in the interests of scalability and policy enforcement, has

little information about trafﬁc conditions, and damps routing up-

dates when potential problems arise to prevent large-scale oscil-

lations. As a result, BGP’s fault recovery mechanisms sometimes

take many minutes before routes converge to a consistent form [12],

and there are times when path outages even lead to signiﬁcant dis-

ruptions in communication lasting tens of minutes or more [3, 18,

19]. The result is that today’s Internet is vulnerable to router and

link faults, conﬁguration errors, and malice—hardly a week goes

by without some serious problem affecting the connectivity pro-

vided by one or more Internet Service Providers (ISPs) [15].

Resilient Overlay Networks (RONs) are a remedy for some of

these problems. Distributed applications layer a “resilient overlay

network” over the underlying Internet routing substrate. The nodes

comprising a RON reside in a variety of routing domains, and co-

operate with each other to forward data on behalf of any pair of

communicating nodes in the RON. Because AS’s are independently

administrated and conﬁgured, and routing domains rarely share in-

terior links, they generally fail independently of each other. As

a result, if the underlying topology has physical path redundancy,

RON can often ﬁnd paths between its nodes, even when wide-area

routing Internet protocols like BGP-4 cannot.

The main goal of RON is to enable a group of nodes to commu-

nicate with each other in the face of problems with the underlying

Internet paths connecting them. RON detects problems by aggres-

sively probing and monitoring the paths connecting its nodes. If

the underlying Internet path is the best one, that path is used and no

other RON node is involved in the forwarding path. If the Internet

path is not the best one, the RON will forward the packet by way of

other RON nodes. In practice, we have found that RON can route

around most failures by using only one intermediate hop.

RON nodes exchange information about the quality of the paths

among themselves via a routing protocol and build forwarding ta-

bles based on a variety of path metrics, including latency, packet

loss rate, and available throughput. Each RON node obtains the

path metrics using a combination of active probing experiments

and passive observations of on-going data transfers. In our imple-

mentation, each RON is explicitly designed to be limited in size—

between two and ﬁfty nodes—to facilitate aggressive path main-

tenance via probing without excessive bandwidth overhead. This

CCI

Aros

Utah

CMU

To vu.nl

Lulea.se

MIT

MA−Cable

Cisco

Cornell

NYU

NC−Cable

OR−DSL

CA−T1

PDI

Mazu

Figure 1: The current sixteen-node RON deployment. Five sites

are at universities in the USA, two are European universities

(not shown), three are “broadband” home Internet hosts con-

nected by Cable or DSL, one is located at a US ISP, and ﬁve are

at corporations in the USA.

allows RON to recover from problems in the underlying Internet in

several seconds rather than several minutes.

The second goal of RON is to integrate routing and path selec-

tion with distributed applications more tightly than is traditionally

done. This integration includes the ability to consult application-

speciﬁc metrics in selecting paths, and the ability to incorporate

application-speciﬁc notions of what network conditions constitute a

“fault.” As a result, RONs can be used in a variety of ways. A mul-

timedia conferencing program may link directly against the RON

library, transparently forming an overlay between all participants

in the conference, and using loss rates, delay jitter, or application-

observed throughput as metrics on which to choose paths. An ad-

ministrator may wish to use a RON-based router application to

form an overlay network between multiple LANs as an “Overlay

VPN.” This idea can be extended further to develop an “Overlay

ISP,” formed by linking (via RON) points of presence in different

traditional ISPs after buying bandwidth from them. Using RON’s

routing machinery, an Overlay ISP can provide more resilient and

failure-resistant Internet service to its customers.

The third goal of RON is to provide a framework for the imple-

mentation of expressive routing policies, which govern the choice

of paths in the network. For example, RON facilitates classifying

packets into categories that could implement notions of acceptable

use, or enforce forwarding rate controls.

This paper describes the design and implementation of RON,

and presents several experiments that evaluate whether RON is a

good idea. To conduct this evaluation and demonstrate the ben-

eﬁts of RON, we have deployed a working sixteen-node RON at

sites sprinkled across the Internet (see Figure 1). The RON client

we experiment with is a resilient IP forwarder, which allows us to

compare connections between pairs of nodes running over a RON

against running straight over the Internet.

We have collected a few weeks’ worth of experimental results of

path outages and performance failures and present a detailed analy-

sis of two separate datasets:



with twelve nodes measured in

March 2001 and



with sixteen nodes measured in May 2001.

In both datasets, we found that RON was able to route around be-

tween 60% and 100% of all signiﬁcant outages. Our implementa-

tion takes 18 seconds, on average, to detect and route around a path

failure and is able to do so in the face of an active denial-of-service

attack on a path. We also found that these beneﬁts of quick fault de-

tection and successful recovery are realized on the public Internet

and do not depend on the existence of non-commercial or private

networks (such as the Internet2 backbone that interconnects many

educational institutions); our ability to determine this was enabled

by RON’s policy routing feature that allows the expression and im-

plementation of sophisticated policies that determine how paths are

selected for packets.

We also found that RON successfully routed around performance

failures: in



, the loss probability improved by at least 0.05

in 5% of the samples, end-to-end communication latency reduced

by 40ms in 11% of the samples, and TCP throughput doubled in

5% of all samples. In addition, we found cases when RON’s loss,

latency, and throughput-optimizing path selection mechanisms all

chose different paths between the same two nodes, suggesting that

application-speciﬁc path selection techniques are likely to be use-

ful in practice. A noteworthy ﬁnding from the experiments and

analysis is that in most cases, forwarding packets via at most one

intermediate RON node is sufﬁcient both for recovering from fail-

ures and for improving communication latency.

2. Related Work

To our knowledge, RON is the ﬁrst wide-area network overlay

system that can detect and recover from path outages and periods of

degraded performance within several seconds. RON builds on pre-

vious studies that quantify end-to-end network reliability and per-

formance, on IP-based routing techniques for fault-tolerance, and

on overlay-based techniques to enhance performance.

2.1 Internet Performance Studies

Labovitz et al. [12] use a combination of measurement and anal-

ysis to show that inter-domain routers in the Internet may take tens

of minutes to reach a consistent view of the network topology after

a fault, primarily because of routing table oscillations during BGP’s

rather complicated path selection process. They ﬁnd that during

this period of “delayed convergence,” end-to-end communication

is adversely affected. In fact, outages on the order of minutes cause

active TCP connections (i.e., connections in the ESTABLISHED

state with outstanding data) to terminate when TCP does not re-

ceive an acknowledgment for its outstanding data. They also ﬁnd

that, while part of the convergence delays can be ﬁxed with changes

to the deployed BGP implementations, long delays and temporary

oscillations are a fundamental consequence of the BGP path vector

routing protocol.

Paxson’s probe experiments show that routing pathologies pre-

vent selected Internet hosts from communicating up to 3.3% of the

time averaged over a long time period, and that this percentage has

not improved with time [18]. Labovitz et al. ﬁnd, by examining

routing table logs at Internet backbones, that 10% of all considered

routes were available less than 95% of the time, and that less than

35% of all routes were available more than 99.99% of the time [13].

Furthermore, they ﬁnd that about 40% of all path outages take more

than 30 minutes to repair and are heavy-tailed in their duration.

More recently, Chandra et al. ﬁnd using active probing that 5%

of all detected failures last more than 10,000 seconds (2 hours, 45

minutes), and that failure durations are heavy-tailed and can last

for as long as 100,000 seconds before being repaired [3]. These

ﬁndings do not augur well for mission-critical services that require

a higher degree of end-to-end communication availability.

The Detour measurement study made the observation, using Pax-

son’s and their own data collected at various times between 1995

and 1999, that path selection in the wide-area Internet is sub-

optimal from the standpoint of end-to-end latency, packet loss rate,

and TCP throughput [23]. This study showed the potential long-

term beneﬁts of “detouring” packets via a third node by comparing

the long-term average properties of detoured paths against Internet-

chosen paths.

2.2 Network-layer Techniques

Much work has been done on performance-based and fault-

tolerant routing within a single routing domain, but practical mech-

anisms for wide-area Internet recovery from outages or badly per-

forming paths are lacking.

Although today’s wide-area BGP-4 routing is based largely on

AS hop-counts, early ARPANET routing was more dynamic, re-

sponding to the current delay and utilization of the network. By

1989, the ARPANET evolved to using a delay- and congestion-

based distributed shortest path routing algorithm [11]. However,

the diversity and size of today’s decentralized Internet necessitated

the deployment of protocols that perform more aggregation and

fewer updates. As a result, unlike some interior routing protocols

within AS’s, BGP-4 routing between AS’s optimizes for scalable

operation over all else.

By treating vast collections of subnetworks as a single entity for

global routing purposes, BGP-4 is able to summarize and aggregate

enormous amounts of routing information into a format that scales

to hundreds of millions of hosts. To prevent costly route oscilla-

tions, BGP-4 explicitly damps changes in routes. Unfortunately,

while aggregation and damping provide good scalability, they in-

terfere with rapid detection and recovery when faults occur. RON

handles this by leaving scalable operation to the underlying Inter-

net substrate, moving fault detection and recovery to a higher layer

overlay that is capable of faster response because it does not have

to worry about scalability.

An oft-cited “solution” to achieving fault-tolerant network con-

nectivity for a small- or medium-sized customer is to multi-home,

advertising a customer network through multiple ISPs. The idea

is that an outage in one ISP would leave the customer connected

via the other. However, this solution does not generally achieve

fault detection and recovery within several seconds because of the

degree of aggregation used to achieve wide-area routing scalabil-

ity. To limit the size of their routing tables, many ISPs will not

accept routing announcements for fewer than 8192 contiguous ad-

dresses (a “/19” netblock). Small companies, regardless of their

fault-tolerance needs, do not often require such a large address

block, and cannot effectively multi-home. One alternative may be

“provider-based addressing,” where an organization gets addresses

from multiple providers, but this requires handling two distinct sets

of addresses on its hosts. It is unclear how on-going connections

on one address set can seamlessly switch on a failure in this model.

2.3 Overlay-based Techniques

Overlay networks are an old idea; in fact, the Internet itself was

developed as an overlay on the telephone network. Several Inter-

net overlays have been designed in the past for various purposes,

including providing OSI network-layer connectivity [10], easing

IP multicast deployment using the MBone [6], and providing IPv6

connectivity using the 6-Bone [9]. The X-Bone is a recent infras-

tructure project designed to speed the deployment of IP-based over-

lay networks [26]. It provides management functions and mecha-

nisms to insert packets into the overlay, but does not yet support

fault-tolerant operation or application-controlled path selection.

Few overlay networks have been designed for efﬁcient fault de-

tection and recovery, although some have been designed for better

end-to-end performance. The Detour framework [5, 22] was mo-

tivated by the potential long-term performance beneﬁts of indirect

routing [23]. It is an in-kernel packet encapsulation and routing

architecture designed to support alternate-hop routing, with an em-

phasis on high performance packet classiﬁcation and routing. It

uses IP-in-IP encapsulation to send packets along alternate paths.

While RON shares with Detour the idea of routing via other

nodes, our work differs from Detour in three signiﬁcant ways. First,

RON seeks to prevent disruptions in end-to-end communication in

the face of failures. RON takes advantage of underlying Internet

path redundancy on time-scales of a few seconds, reacting respon-

sively to path outages and performance failures. Second, RON is

designed as an application-controlled routing overlay; because each

RON is more closely tied to the application using it, RON more

readily integrates application-speciﬁc path metrics and path selec-

tion policies. Third, we present and analyze experimental results

from a real-world deployment of a RON to demonstrate fast re-

covery from failure and improved latency and loss-rates even over

short time-scales.

An alternative design to RON would be to use a generic overlay

infrastructure like the X-Bone and port a standard network routing

protocol (like OSPF or RIP) with low timer values. However, this

by itself will not improve the resilience of Internet communications

for two reasons. First, a reliable and low-overhead outage detection

module is required, to distinguish between packet losses caused by

congestion or error-prone links from legitimate problems with a

path. Second, generic network-level routing protocols do not utilize

application-speciﬁc deﬁnitions of faults.

Various Content Delivery Networks (CDNs) use overlay tech-

niques and caching to improve the performance of content delivery

for speciﬁc applications such as HTTP and streaming video. The

functionality provided by RON may ease future CDN development

by providing some routing components required by these services.

3. Design Goals

The design of RON seeks to meet three main design goals: (i)

failure detection and recovery in less than 20 seconds; (ii) tighter

integration of routing and path selection with the application; and

(iii) expressive policy routing.

3.1 Fast Failure Detection and Recovery

Today’s wide-area Internet routing system based on BGP-4 does

not handle failures well. From a network perspective, we deﬁne

two kinds of failures. Link failures occur when a router or a link

connecting two routers fails because of a software error, hardware

problem, or link disconnection. Path failures occur for a variety of

reasons, including denial-of-service attacks or other bursts of trafﬁc

that cause a high degree of packet loss or high, variable latencies.

Applications perceive all failures in one of two ways: outages or

performance failures. Link failures and extreme path failures cause

outages, when the average packet loss rate over a sustained period

of several minutes is high (about 30% or higher), causing most pro-

tocols including TCP to degrade by several orders of magnitude.

Performance failures are less extreme; for example, throughput, la-

tency, or loss-rates might degrade by a factor of two or three.

BGP-4 takes a long time, on the order of several minutes, to con-

verge to a new valid route after a link failure causes an outage [12].

In contrast, RON’s goal is to detect and recover from outages and

performance failures within several seconds. Compounding this

problem, IP-layer protocols like BGP-4 cannot detect problems

such as packet ﬂoods and persistent congestion on links or paths

that greatly degrade end-to-end performance. As long as a link is

deemed “live” (i.e., the BGP session is still alive), BGP’s AS-path-

based routing will continue to route packets down the faulty path;

unfortunately, such a path may not provide adequate performance

for an application using it.

155Mbps / 60ms

BBN

Qwest

UUNET

AT&T

MediaOne

6Mbps

130

Mbps

Private

Peering

45Mbps

5ms

1Mbps, 3ms

Cable Modem

Private

Peering

3Mbps

6ms

ArosNet

Utah

155

MIT

vBNS / Internet 2

Figure 2: Internet interconnectionsare often complex. The dot-

ted links are private and are not announced globally.

3.2 Tighter Integration with Applications

Failures and faults are application-speciﬁc notions: network con-

ditions that are fatal for one application may be acceptable for an-

other, more adaptive one. For instance, a UDP-based Internet audio

application not using good packet-level error correction may not

work at all at loss rates larger than 10%. At this loss rate, a bulk

transfer application using TCP will continue to work because of

TCP’s adaptation mechanisms, albeit at lower performance. How-

ever, at loss rates of 30% or more, TCP becomes essentially un-

usable because it times out for most packets [16]. RON allows

applications to independently deﬁne and react to failures.

In addition, applications may prioritize some metrics over oth-

ers (e.g., latency over throughput, or low loss over latency) in their

path selection. They may also construct their own metrics to select

paths. A routing system may not be able to optimize all of these

metrics simultaneously; for example, a path with a one-second la-

tency may appear to be the best throughput path, but this degree

of latency may be unacceptable to an interactive application. Cur-

rently, RON’s goal is to allow applications to inﬂuence the choice

of paths using a single metric. We plan to explore multi-criteria

path selection in the future.

3.3 Expressive Policy Routing

Despite the need for policy routing and enforcement of accept-

able use and other policies, today’s approaches are primitive and

cumbersome. For instance, BGP-4 is incapable of expressing ﬁne-

grained policies aimed at users or hosts. This lack of precision

not only reduces the set of paths available in the case of a failure,

but also inhibits innovation in the use of carefully targeted policies,

such as end-to-end per-user rate controls or enforcement of accept-

able use policies (AUPs) based on packet classiﬁcation. Because

RONs will typically run on relatively powerful end-points, we be-

lieve they are well-suited to providing ﬁne-grained policy routing.

Figure 2 shows the AS-level network connectivity between four

of our RON hosts; the full graph for (only) 12 hosts traverses 36

different autonomous systems. The ﬁgure gives a hint of the con-

siderable underlying path redundancy available in the Internet—the

reason RON works—and shows situations where BGP’s blunt pol-

icy expression inhibits fail-over. For example, if the Aros-UUNET

connection failed, users at Aros would be unable to reach MIT even

if they were authorized to use Utah’s network resources to get there.

This is because it impossible to announce a BGP route only to par-

ticular users, so the Utah-MIT link is kept completely private.

External Probes

Data

Node 2 Node 3

Performance Database

Node 1

Probes

Forwarder

Router Probes

Forwarder

Router Probes

Forwarder

Router

Conduits Conduits Conduits

Figure 3: The RON system architecture. Data enters the RON

from RON clients via a conduit at an entry node. At each node,

the RON forwarder consults with its router to determine the best

path for the packet, and sends it to the next node. Path selec-

tion is done at the entry node, which also tags the packet, sim-

plifying the forwarding path at other nodes. When the packet

reaches the RON exit node, the forwarder there hands it to the

appropriate output conduit, which passes the data to the client.

To choose paths, RON nodes monitor the quality of their vir-

tual links using active probing and passive observation. RON

nodes use a link-state routing protocol to disseminate the topol-

ogy and virtual-link quality of the overlay network.

4. Design

The conceptual design of RON, shown in Figure 3, is quite sim-

ple. RON nodes, deployed at various locations on the Internet,

form an application-layer overlay to cooperatively route packets

for each other. Each RON node monitors the quality of the Internet

paths between it and the other nodes, and uses this information to

intelligently select paths for packets. Each Internet path between

two nodes is called a virtual link. To discover the topology of the

overlay network and obtain information about all virtual links in

the topology, every RON node participates in a routing protocol

to exchange information about a variety of quality metrics. Most

of RON’s design supports routing through multiple intermediate

nodes, but our results (Section 6) show that using at most one inter-

mediate RON node is sufﬁcient most of the time. Therefore, parts

of our design focus on ﬁnding better paths via a single intermediate

RON node.

4.1 Software Architecture

Each program that communicates with the RON software on a

node is a RON client. The overlay network is deﬁned by a sin-

gle group of clients that collaborate to provide a distributed service

or application. This group of clients can use service-speciﬁc rout-

ing metrics when deciding how to forward packets in the group.

Our design accommodates a variety of RON clients, ranging from

a generic IP packet forwarder that improves the reliability of IP

packet delivery, to a multi-party conferencing application that in-

corporates application-speciﬁc metrics in its route selection.

A RON client interacts with RON across an API called a conduit,

which the client uses to send and receive packets. On the data path,

the ﬁrst node that receives a packet (via the conduit) classiﬁes it

to determine the type of path it should use (e.g., low-latency, high-

throughput, etc.). This node is called the entry node: it determines

a path from its topology table, encapsulates the packet into a RON

header, tags it with some information that simpliﬁes forwarding

by downstream RON nodes, and forwards it on. Each subsequent

RON node simply determines the next forwarding hop based on the

destination address and the tag. The ﬁnal RON node that delivers

the packet to the RON application is called the exit node.

The conduits access RON via two functions:

1. send(pkt, dst, via

ron) allows a node to forward

a packet to a destination RON node either along the RON or

using the direct Internet path. RON’s delivery, like UDP, is

best-effort and unreliable.

2. recv(pkt, via ron) is a callback function that is

called when a packet arrives for the client program. This

callback is invoked after the RON conduit matches the type

of the packet in the RON header to the set of types pre-

registered by the client when it joins the RON. The RON

packet type is a demultiplexing ﬁeld for incoming packets.

The basic RON functionality is provided by the forwarder

object, which implements the above functions. It also provides a

timer registration and callback mechanism to perform periodic op-

erations, and a similar service for network socket data availability.

Each client must instantiate a forwarder and hand to it two mod-

ules: a RON router and a RON membership manager. The RON

router implements a routing protocol. The RON membership man-

ager implements a protocol to maintain the list of members of a

RON. By default, RON provides a few different RON router and

membership manager modules for clients to use.

RON routers and membership managers exchange packets using

RON as their forwarding service, rather than over direct IP paths.

This feature of our system is beneﬁcial because it allows these mes-

sages to be forwarded even when some underlying IP paths fail.

4.2 Routing and Path Selection

Routing is the process of building up the forwarding tables that

are used to choose paths for packets. In RON, the entry node

has more control over subsequent path selection than in traditional

datagram networks. This node tags the packet’s RON header with

an identiﬁer that identiﬁes the ﬂow to which the packet belongs;

subsequent routers attempt to keep a ﬂow ID on the same path it

ﬁrst used, barring signiﬁcant link changes. Tagging, like the IPv6

ﬂow ID, helps support multi-hop routing by speeding up the for-

warding path at intermediate nodes. It also helps tie a packet ﬂow

to a chosen path, making performance more predictable, and pro-

vides a basis for future support of multi-path routing in RON. By

tagging at the entry node, the application is given maximum control

over what the network considers a “ﬂow.”

The small size of a RON relative to the Internet allows it to main-

tain information about multiple alternate routes and to select the

path that best suits the RON client according to a client-speciﬁed

routing metric. By default, it maintains information about three

speciﬁc metrics for each virtual link: (i) latency, (ii) packet loss

rate, and (iii) throughput, as might be obtained by a bulk-transfer

TCP connection between the end-points of the virtual link. RON

clients can override these defaults with their own metrics, and the

RON library constructs the appropriate forwarding table to pick

good paths. The router builds up forwarding tables for each com-

bination of policy routing and chosen routing metric.

4.2.1 Link-State Dissemination

The default RON router uses a link-state routing protocol to dis-

seminate topology information between routers, which in turn is

used to build the forwarding tables. Each node in an



-node RON

has





virtual links. Each node’s router periodically requests

summary information of the different performance metrics to the





other nodes from its local performance database and dis-

seminates its view to the others.

This information is sent via the RON forwarding mesh itself, to

ensure that routing information is propagated in the event of path

outages and heavy loss periods. Thus, the RON routing protocol

is itself a RON client, with a well-deﬁned RON packet type. This

leads to an attractive property: The only time a RON router has

incomplete information about any other one is when all paths in

the RON from the other RON nodes to it are unavailable.

4.2.2 Path Evaluation and Selection

The RON routers need an algorithm to determine if a path is still

alive, and a set of algorithms with which to evaluate potential paths.

The responsibility of these metric evaluators is to provide a number

quantifying how “good” a path is according to that metric. These

numbers are relative, and are only compared to other numbers from

the same evaluator. The two important aspects of path evaluation

are the mechanism by which the data for two links are combined

into a single path, and the formula used to evaluate the path.

Every RON router implements outage detection, which it uses

to determine if the virtual link between it and another node is still

working. It uses an active probing mechanism for this. On de-

tecting the loss of a probe, the normal low-frequency probing is re-

placed by a sequence of consecutive probes, sent in relatively quick

succession spaced by

 

seconds. If

 

probes in a row elicit no response, then the path is considered

“dead.” If even one of them gets a response, then the subsequent

higher-frequency probes are canceled. Paths experiencing outages

are rated on their packet loss rate history; a path having an out-

age will always lose to a path not experiencing an outage. The

 

and the frequency of probing (

 

)

permit a trade-off between outage detection time and the bandwidth

consumed by the (low-frequency) probing process (Section 6.2 in-

vestigates this).

By default, every RON router implements three different routing

metrics: the latency-minimizer, the loss-minimizer, and the TCP

throughput-optimizer. The latency-minimizer forwarding table is

computed by computing an exponential weighted moving average

(EWMA) of round-trip latency samples with parameter



. For any

link



, its latency estimate

! "$#

is updated as:

! "$#&%(')! "$#+*-,

.

0/1'32&4)5 63 798:;4#

(1)

We use

=<?>@ A

, which means that 10% of the current latency

estimate is based on the most recent sample. This number is similar

to the values suggested for TCP’s round-trip time estimator [20].

For a RON path, the overall latency is the sum of the individual

virtual link latencies:

! "CB3D)E;FG<H

#!I

B3D)E;F

! "

To estimate loss rates, RON uses the average of the last

JK<



>>

probe samples as the current average. Like Floyd et al. [7], we

found this to be a better estimator than EWMA, which retains some

memory of samples obtained in the distant past as well. It might be

possible to further improve our estimator by unequally weighting

some of the

samples [7].

Loss metrics are multiplicative on a path: if we assume that

losses are independent, the probability of success on the entire path

is roughly equal to the probability of surviving all hops individu-

ally:

!L663M "N4

BOD)EPF

QSR

#!I

BOD)EPF

Q

!L66)M "N4)#P/

RON does not attempt to ﬁnd optimal throughput paths, but

strives to avoid paths of low throughput when good alternatives are

available. Given the time-varying and somewhat unpredictable na-

ture of available bandwidth on Internet paths [2, 19], we believe this

is an appropriate goal. From the standpoint of improving the reli-

ability of path selection in the face of performance failures, avoid-

ing bad paths is more important than optimizing to eliminate small

throughput differences between paths. While a characterization of

the utility received by programs at different available bandwidths

may help determine a good path selection threshold, we believe that

more than a 50% bandwidth reduction is likely to reduce the util-

ity of many programs. This threshold also falls outside the typical

variation observed on a given path over time-scales of tens of min-

Resilient overlay networks

Figures

Citations

A delay-tolerant network architecture for challenged internets

Software-Defined Networking: A Comprehensive Survey

A taxonomy of DDoS attack and DDoS defense mechanisms

Wide-area cooperative storage with CFS

Scalable application layer multicast

References

A Border Gateway Protocol 4 (BGP-4)

A case for end system multicast

Resilient overlay networks

End-to-end Internet packet dynamics

Overcast: reliable multicasting with on overlay network

Related Papers (5)

Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Chord: A scalable peer-to-peer lookup service for internet applications

The end-to-end effects of Internet path selection

A scalable content-addressable network

BRITE: an approach to universal topology generation

Frequently Asked Questions (18)

Q1. What are the contributions mentioned in the paper "Resilient overlay networks" ?

Q2. What have the authors stated for future works in "Resilient overlay networks" ?

Q3. What is the importance of avoiding bad paths?

Q4. How long does it take to detect and route around a path failure?

Q5. How much hysteresis is used to predict the last good route?

Q6. Why do the authors think RONs are well-suited to providing fine-grained policy routing?

Q7. How long does it take to converge to a new valid route?

Q8. What is the common way to achieve fault-tolerant network connectivity?

Q9. What is the policy tag used to demultiplex the packet?

Q10. How long does it take to detect a failed path?

Q11. How many seconds is the next probe packet sent?

Q12. What is the maximum error in the one-way loss rate estimate?

Q13. How many intermediate nodes were involved in the shortest path?

Q14. What is the result of the Internet’s vulnerability to router and link faults?

Q15. How long does it take to detect and recover from a fault?

Q16. What is the way to limit the size of the routing tables?

Q17. How many times did the authors take samples from a MByte bulk transfer?

Q18. What is the construction of a shortest-paths algorithm?