Application-layer anycasting: a server selection architecture and use in a replicated Web service

doi:10.1109/90.865074

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 4, AUGUST 2000 455

Application-Layer Anycasting: A Server Selection

Architecture and Use in a Replicated Web Service

Ellen W. Zegura,

Member, IEEE,

Mostafa H. Ammar,

Senior Member, IEEE,

Zongming Fei, and Samrat Bhattacharjee

Abstract--Server

replication improves the ability of a service

to handle a large number of clients. One of the important fac-

tors in the efficient utilization of replicated servers is the ability

to direct client requests to the "best" server, according to some

optimality criteria. In the anycasting communication paradigm,

a sender communicates with a receiver chosen from an anycast

group of equivalent receivers. As such, anycasting is well suited to

the problem of directing clients to replicated servers.

This paper examines the definition and support of the anycasting

paradigm at the application layer, providing a service that uses an

anycast resolver to map an anycast domain name and a selection

criteria into an IP address. By realizing anycasting in the appli-

cation layer, we achieve flexibility in the optimization criteria and

ease the deployment of the service.

As a case study, we examine the performance of our system for

a key service: replicated web servers. To this end, we develop an

approach for estimating the response time that a client will experi-

ence when accessing given servers. Such information is maintained

in the anycast resolver that clients query to obtain the identity of

the server with the best estimated response time. Our performance

collection technique combines server push with resolver probes to

estimate the expected response time without undue overhead. Our

experiments show that selecting a server using our architecture

and estimation technique can improve the client response time by

a factor of two over nearest server selection and by a factor of four

over random server selection.

Index Terms--Anycasting,

replication, server selection.

I. INTRODUCTION

U

" SERS increasingly view the Internet as providing more

than simple connectivity, but rather a range of sophisti-

cated and complex services. As this view becomes prevalent, it

becomes important to provide explicit support for the efficient

delivery of networked services. Such support must be scalable to

a large number of geographically widespread users, while main-

taining user-perceived quality of service (e.g., response time,

throughput, reliability).

Server replication

[11] provides scalability by deploying

multiple copies of a server and sharing client load across

the copies. Server replication is appealing because it offers a

relatively straightforward method to potentially improve client

Manuscript received October 9, 1998; revised May 30, 1999; approved by

IEEE/ACM TRANSACTIONS ON NETWORKING Editor S. McCanne.

E. W. Zegura and M. H. Ammar are with the College of Computing,

Georgia Institute of Technology, Atlanta, GA 30332-0280 USA (e-mail:

ewz@cc.gatech.edu; ammar @ cc.gatech.edu).

Z. Fei was with the College of Computing, Georgia Institute of Technology,

Atlanta, GA 30332-0280 USA. He is now with the Department of Computer

Science, University of Kentucky, Lexington, KY 40506 USA.

S. Bhattacharjee is with the Department of Computer Science, University of

Ma.ryland, College Park, MD 20742 USA.

Publisher Item Identifier S 1063-6692(00)06790-X.

performance and reduce network load. A key issue in realizing

this potential is the method used for

server selection.

That is,

given a set of servers, how does a client select the "best" server?

A server selection system has the obvious design goal of

improving client performance. In addition, a server selection

system should satisfy the following goals. First, it should be

flexible in the specification of selection criteria. The "best"

server will vary depending on the service and (potentially)

on the preferences of the clients. A server selection system

should support a rich and flexible set of selection criteria.

Second, it should be suitable for wide-area server replication.

Although servers can be replicated locally in server farms, our

interest is in server selection with global replication across

geographically widespread locations. Local replication is both

easier and more limited in ability to handle request load from

widespread clients. Third, it should be deployable in the current

Internet without modifications to the network infrastructure.

Last, it should be scalable to a large number of services, clients,

and client requests.

A number of services are currently replicated, using both

local and global replication. The methods currently used for

server selection include:

1) Domain name system (DNS) modifications [18] to return

one IP address from a set of servers when the DNS server

is queried. The DNS server typically uses a round-robin

mechanism to allocate the servers to clients, thus this

technique is best suited to local replication of servers with

comparable capacity.

2) Network-layer anycasting [24], which associates a

common IP anycast address with the group of replicated

servers. The routing protocol routes datagrams to the

closest server, using the routing distance metric. Standard

intradomain unicast routing protocols can accomplish

this, assuming each server advertises the common IP

address. The limitations of network-layer anycasting

include lack of flexibility in the selection criteria--the

routing protocol determines the (single) criteria, typi-

cally hop count--and difficulty in extending to wide-area

selection.

3) Router-assisted server selection, as in Cisco's Dis-

tributedDirector product [7], [12]. This product asso-

ciates a Cisco router with each replicated server to act

as the server's agent. Client requests are directed to a

central location--the DistributedDirector (DD)--which

queries the server agents to determine either hop count

or link latency between each server and the client. The

DD redirects the client to a server using the query results.

This solution is best suited to server selection within a

456 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 4, AUGUST 2000

small to moderate-size domain, since it requires signif-

icant coordinated deployment of Cisco equipment and

relies on routing tables to determine hop counts from a

server to a client. For larger domains, scalability is likely

to be an issue.

4) Combined caching and server selection systems, such as

developed in several recent commercial systems (e.g.,

Akamai J, Sandpiper 2), which operate their own system

of caches containing content from a large number of

servers. Client requests are directed to a cache, based on

cache content and measurements of network and server

load. Relatively little information is available regarding

the operation and performance details of such systems.

The basic premise differs, however, from our focus on

"pure" server selection without deployment of caches.

None of the current solutions meet all of the design criteria out-

lined above.

Our proposed solution begins with network-layer anycasting.

We adopt a general view of anycasting as a

communication par-

adigm

that is analogous to the unicast, broadcast, and multi-

cast communication paradigms. In particular, we differentiate

between the anycasting

service definition

and

the protocol layer

providing the anycasting service. The original anycasting pro-

posal [24] can, therefore, be viewed as providing an anycasting

service definition

and

examining the provision of this service

within the IP layer.

We move anycasting to the application layer, allowing us to

achieve flexibility in selection criteria, extension to the wide-

area, and ease of deployment. For scalability, we retain the best-

effort nature of the original network-layer anycasting service

definition. This paper describes our application-layer architec-

ture and develops a case study using the architecture for repli-

cated web servers. Our contributions are threefold:

1) We generalize the original definition of anycasting to de-

sign an anycasting service that offers considerable advan-

tages in flexibility over the traditional network-layer any-

casting service.

2) We develop an application-layer architecture to realize

our anycasting service. Our architecture provides for scal-

ability by using replicated

resolvers

to handle queries

from a set of clients and by organizing the resolvers into

a DNS-style hierarchy.

3) We examine the performance of our system for client

access to replicated web servers. We develop an ap-

proach for estimating the client response time that

combines server push with resolver probing. This metric

is challenging to estimate because the response time is a

function of both server load (relative to capacity) and of

path load between the server and client. Our experiments

show that selecting a server using our architecture and

estimation technique can improve the client response

time by a factor of two over nearest server selection and

by a factor of four over random server selection.

The paper is structured as follows. In Section II we define

anycasting as a paradigm and identify the components of our

1Available: http://www.akamai.com/

2Available: http://www.sandpiper.com/

application-layer architecture. Section III describes a key aspect

of the architecture, specifically maintenance of performance

metric information. Sections IV and V consider the use of the

system for replicated web access. Our technique for estimating

response time is developed in Section IV, while Section V

describes a set of performance evaluation experiments. We de-

scribe related work in Section VI and conclude in Section VII.

II. APPLICATION-LAYER ANYCASTING: SYSTEM OVERVIEW

The anycast paradigm shares characteristics with both the

multicast and unicast paradigms. Similar to multicast, the any-

cast paradigm consists of

groups

of destinations, with the se-

mantics that each destination in a given anycast group is equiv-

alent in some sense. Similar to unicast, a sender that communi-

cates with an anycast group typically interacts with one destina-

tion, chosen from the anycast group. This section describes our

anycasting service and the architecture for providing the ser-

vice at the application-layer. We conclude the section with an

assessment of how well the architecture meets the design goals

outlined in the Introduction.

A. Architecture

In our architecture, we define an

anycast group

to be a (po-

tentially dynamic) set of unicast or multicast IP addresses. Such

a definition allows considerable flexibility in the types of ser-

vices that our selection method supports. We see two particu-

larly useful consequences of this definition. First, a set of servers

may be grouped together based on equivalence

from a user's

perspective.

That is, "exact" replication is not required for mem-

bership in the same group. A user might define an anycast group

to contain, for example, the web sites for CNN Interactive, Time

Magazine, and USA Today. Second, allowing multicast IP ad-

dresses means we can support services that require multiple

servers to provide a single instance of the service. For example,

a client may wish to merge or edit video clips that can be found

on different sets of replicated video servers. The desired service

is provided by a group of servers, one per video clip.

In our architecture, a client interacts with an anycast group

via a query-response protocol illustrated in Fig. 1. The anycast

query contains the

anycast domain name

(ADN), which identifies

the group, and the selection criteria to be used in choosing from

the group. The anycast response contains the IP address for the

selected server. As illustrated in Fig. 1, the architecture centers

around the use of a hierarchy

ofanycast resolvers

that perform the

ADN to IP address mapping. The resolver receives the anycast

query and applies

afilter

to control the selection. A filter operates

on a set of anycast group members and returns a (possibly empty)

subset. A second filter may be applied at the client. Filters may

be content-independent (e.g., select any member at random), or

based on performance metrics or policy information.

To do the mapping, the resolvers maintain two types of in-

formation: 1) the list of IP addresses that form particular any-

cast groups; and 2) a metric database of information associated

with each member of the anycast group. As described further

below, authoritative resolvers maintain the definitive list of IP

addresses for a group, whereas local resolvers cache this in-

formation. A membership protocol updates the anycast group

ZEGURA

et al.:

APPLICATION-LAYER ANYCASTING 457

CLIENT

anycast service

interface

anycast domain

°°°

filter specification

IP address flitter r

anycast query

anycast resolver

anycast response

Fig. 1. Anycast name resolution query/response cycle.

information, and a service creation protocol defines new any-

cast groups. We do not discuss the details of such protocols

here; some effort in this area has been undertaken in the IETF

[31]. Many of the metrics are locally significant, thus they are

maintained independently at each anycast resolver that has the

ADN group membership information cached. The authoritative

resolver may provide its locally maintained metric information

as a "hint" whenever it receives a request from another resolver

for the anycast group member list for a given ADN.

The structure of ADNs influences the operation of the any-

casting system in general, and the anycast resolver architec-

ture in particular. We use a DNS-style naming and directory

service architecture for scalability and ease of integration into

the existing Internet infrastructure. While the anycast resolver

is logically distinct from other name servers like DNS [21],

the functions of an anycast resolver could be integrated with

the operation of DNS. In our scheme, an ADN is of the form

(Service) % <DoraainName). Such a name will typically be used

as an argument to a library call that invokes the anycasting ser-

vice and results in the mapping of this ADN to an IP address.

The DomainName part of the system indicates the location of

the

authoritative

anycast resolver for this ADN. The Service

part of the ADN identifies the service within the authoritative

resolver.

The architecture for handling anycast requests is shown in

Fig. 2. Each network location is preconfigured with the address

of its local anycast resolver in the same way local DNS servers

are configured. An anycast client makes its initial anycast query

to its local resolver. If the resolver is authoritative for the ADN

in the query or if it has cached information about the ADN, it

can process the query immediately and return the appropriate re-

sponse. Otherwise, the local resolver determines the address of

the authoritative resolver for the

DomainName

part of the ADN

and obtains the anycast group information from this resolver.

Determining the address of the authoritative anycast resolver

for a particular domain can be done using techniques similar

to DNS authoritative name determination [21].

Authoritative

Resolver

for

ADN X

Local Anycast Resolver

3:

request for

ADN X ADN X

2: determine authoritative

resolver [members and metrics members I metrics

5: cache ADN X

members, metrics;

1. IP addr 01 metrics 0

IP addr 1 metrics 1

initiate metric~ collection 4:

list ofADN X eee ee,

1 : anycast

request [

members and metrics

for

ADN X J ] 6.' ~ycast

response

C

Anycast Client

Fig. 2. Anycast request-handling architecture.

B. Design Goals Revisited

With respect to the design goals presented in the Introduc-

tion, the proposed architecture clearly meets the first three goals.

The user specifies the selection criteria by way of the filters,

thus supporting flexibility in selection. The resolvers maintain

lists of servers and explicitly track metrics associated with each

server. These metrics may include both path and server-load

characteristics, as is necessary for wide-area server selection.

The architecture does not rely upon changes to the network in-

frastructure, thus it is deployable in the current Internet. As we

will see in the next section, modest changes to the servers can

facilitate metric collection.

Whether the architecture meets the scalability design goal is

less clear. The architecture attempts to achieve scalability in

three ways. First, the service is best-effort, thus explicitly al-

lowing techniques that improve scalability at some sacrifice in

optimal performance. For example, a given resolver might only

track the performance at a subset of servers that are deemed to

be most promising, based on some (longer time-scale) mech-

anism. We have not, however, fully explored the performance

trade-offs associated with such scalability techniques. Second,

we use DNS-style replication and hierarchy in the resolvers,

thus reducing the load on any one resolver. Third, we have devel-,

oped a relatively efficient mechanism to track server response

time, using a combination of light-weight server pushes and less

frequent, heavier-weight probes. Various methods for metric

maintenance are discussed next. The hybrid push-probe mech-

anism is discussed in detail in Section IV, and the performance

is evaluated in Section V.

III. MAINTENANCE OF METRIC INFORMATION IN RESOLVERS

The methods used by the resolver to maintain selection

metrics are key to the performance of the architecture. Metrics

fall into three general categories: those that depend only on

server characteristics, those that depend on characteristics of

the server-client path, and those that depend on both server and

path. A variety of techniques will be used to maintain metric

information, depending on factors such as the category, the

accuracy required, and the cost of burdening the network and/or

the server. Examples of maintenance techniques include:

Remote Server Performance Probing:

In this technique, a probing agent makes periodic queries

to the servers to estimate the performance that a client

would experience. These queries appear to the server to

458 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 4, AUGUST 2000

be legitimate client requests, and thus they measure ex-

pected client performance. Probing agents would nor-

mally be co-located with resolvers but may also be run-

ning at other locations. Each probing agent acts as a

proxy for real clients within a certain region, thus the

farther away a client is (in Internet "distance") from a

probing agent, the less useful the probe measurements.

This technique measures network path performance and

does not require server modification; on the other hand,

the load on the network and servers may be significant.

Server Push:

In the server push technique [16], [17], the server

monitors its performance and pushes this information

to the resolvers when interesting changes occur. For

additional scalability, the update information can be

network-layer multicast to all resolvers that maintain

information about the server. The anycast resolvers

can join well-known multicast groups for each server

that they are interested in, allowing the servers to

disseminate performance information without knowing

the identities of the resolvers.

The server can control the network traffic generated by

this mechanism b2¢ adjusting the monitoring and push

schedules. The primary advantages of this technique are

scalability and accurate server measurements; the dis-

advantages are that the servers must be modified and the

network path performance is not easily measured. Some

properties of the one-way path from the server to each

resolver could be measured as part of the multicast push.

For example, the hop count from the server to the re-

solver could be determined via use of the TTL.

Probing for Locally Maintained Server Performance:

A variation on the probing technique allows the probing

agent to obtain server load information. Specifically,

each server can maintain its own locally monitored

performance metrics in a globally readable file. Remote

probing locations can then read the information in the

file (as opposed to attempting to exercise the server)

to obtain the desired information. Since probes merely

read from a locally maintained file, they may represent

less of a burden on the server than the probes that mimic

client requests.

User Experience:

Users currently make server access decisions based in

part on past experience. Collecting information about

past experience offers a coarse method of maintaining

server performance. The primary advantage of this

method is that the information is collected for free; no

additional burden is placed on the server or the network.

The quantity and accuracy of the information can be

increased by sharing of experience among clients. For

example, a gateway into a campus might maintain server

performance information based on the experience of all

clients on the campus. An architecture for collection

and sharing of such information is being developed in

the SPANDS project [29].

Table I summarizes the four techniques based on performance

and cost dimensions. The first three columns are measures of

system overhead. The Net Load column represents the number

TABLE I

COMPARISON OF METRIC COLLECTION TECHNIQUES

Net Server Server Exercises Accuracy

Load Mod ' Load Net Path

Probing 2PTp No High Yes Moderate

Server Push T0 Yes Low No* High

Reading Server Log 2PT e Yes Moderate Yes High

User Experience None No None

Yes Low/V~ies

(* Can measure one-way path information)

of messages generated per unit time to obtain the metric data

from one server, where P is the number of probing agents, Tp

is the period of probing, T8 is the period of server push. The

Server Push messages can be multicast rather than unicast, re-

ducing their burden. The Server Mod column indicates whether

the server must be modified to allow the metric to be collected.

The Server Load column expresses (relatively) how much ad-

ditional load is placed on the server by the collection of the

metric data. The last two columns are performance measures,

indicating whether the method exercises network path, and (rel-

atively) how accurately the method is able to maintain the met-

tics that it can evaluate.

The appropriate technique to use for maintaining perfor-

mance metric information is highly dependent on the service

details and context. In the next two sections we examine

in detail a technique that is well suited to selection among

replicated web servers.

IV. CASE STUDY: WEB SERVER RESPONSE TIME

We turn our focus to the issue of how our application-layer

architecture can be used for selection amongst replicated web

servers. In particular, we design and evaluate a performance

monitoring system for the estimation of service response time

experienced at a client and use the estimates to guide selection

of servers within our system.

The response time metric is important because it directly cor-

relates with a user's perception of the quality of service. In ad-

dition, it is a very difficult metric to monitor since it depends

on server capabilities (e.g., speed and number of processors at

the server), current server load (e.g., number of queries cur-

rently being served), network path characteristics (e.g., propa-

gation delay on the path), and current path load. Thus, the metric

collection technique must measure both server and path perfor-

mance.

The metric collection technique should meet two basic goals.

First, it should be scalable to a large number of servers, any-

cast groups, and clients. The load placed on any component of

the system--servers, network resources, resolvers, clients--hn

collecting metric data must be kept "reasonable". Second, the

metric collection should be

relatively

accurate. The service pro-

vided by anycasting can inherently deal with inaccuracy in the

absolute values of the metrics, since the service makes a relative

selection amongst servers. The service is also somewhat robust

against errors in the relative values of the metrics, due to the

best-effort nature of the service. The performance penalty as-

sociated with out-of-date or slightly inaccurate metric data will

ZEGURA

et al.:

APPLICATION-LAYER ANYCASTING 459

not typically be severe; rather than selecting the "best" server,

the service may identify a "nearly-best" server.

The two goals constrain the design of the metric collection

technique in the following ways. First, metric updates should

occur primarily in response to significant changes in metric

value, rather than on a periodic basis. This implies monitoring

of metric values to determine when updates are needed. Second,

servers should have some control over the load incurred due to

metric collection. A server should be able to decrease metric

collection load, if desired.

A: Overview: Metric Collection Technique

To build a metric collection technique meeting the goals

and constraints outlined above, we combine the probing and

server-push techniques described in Section III. Probing gives

the most accurate estimate of what the probing agent expects in

terms of server response time. Probes, however, can represent

a significant overhead if performed frequently. Server pushes,

while more lightweight, are less accurate predictors of response

time since they only propagate server performance information.

Our technique combines server push with less frequent periodic

probing.

1) Server-Push Algorithm:

The server will measure its per-

formance and push performance information according to an

update algorithm. To define the way the server measures its per-

formance, consider the server response cycle

assign process to handle query

parse query

locate requested file

repeat until file is written:

read from file

write to socket

To assess its performance, the server measures the time from

just after assigning the process until just before doing the first

read. These measured values are averaged and smoothed before

being used in the update algorithm described below. (Note that

this is the cycle used by the Apache server. We expect that other

web servers will have a similar high-level processing structure.

If this is not the case, the server measurements will need to be

modified accordingly.)

We want the server to push performance information when-

ever its measured performance has changed sufficiently to be

"interesting," with some constraint on the maximum frequency

of updates so as to bound the overhead of the updating mecha-

nism. The task of updating link state in a distributed routing en-

vironment has precisely the same criteria, thus we have adopted

the link state update algorithm used in the ARPANET [25]. The

update algorithm is parameterized by a measurement interval

I, a maximum threshold T, and a reduction factor R. The al-

gorithm maintains a current threshold C, initialized to T. The

server measures its performance over each interval I. If the new

measured value changes from the previous measurement by at

least C, the new measurement is pushed, and C is reset to T. If

the state does not change by at least C, Cis reduced by _R. When

C becomes 0, the state will be pushed, and C will be reset to T.

The algorithm will send updates at least every

TI/R

time units

and at most every I time units.

2) Agent Probe Mechanism:

The probe is made to a well-

known file that is maintained at anycast-aware servers specif-

ically to service probe requests. The file contains the most re-

cent measured performance value by the server and is padded

with dummy data. Each probe results in a response time mea-

surement, taken from just before sending the query to just after

receiving the complete response. This time' depends on server

and path characteristics and on the size of the file being probed.

3) Hybrid Push~Probe Technique:

We combine the perfor-

mance value pushed by the server with the response time mea-

sured by the probes to keep an estimate of server response time.

The idea is to use the probes to get a measurement of the re-

sponse time that includes the network path. The measurement

is then used to calibrate the more frequently pushed server time

value to get an expected response time at a given resolver.

Specifically, let R denote the most recent measurement of re-

sponse time when probing for the well-known file. As indicated

earlier, the server includes in the well-known file the most recent

performance value measured as described above. Let S denote

the server time value reported in the file during the most re-

cent probe. In between consecutive probes the server typically

pushes a sequence of server values. Let

S(i)

denote the ith value

pushed by the server. The resolver adjusts the server-reported

value

S(i)

by multiplying by an adjustment factor A =

R/S.

Thus, the resolver estimates the current response time as

R(i) =

A * S(i).

Typically, the probes will occur less frequently than

the server pushes its measured time value, thus a given adjust-

ment factor will be used to adjust a sequence of pushed server

values, until the next probe occurs and updates A.

As will be shown by the results of the experiments, this tech-

nique works quite well for our purposes. To understand the in-

tuition behind it, we note that the value of S is the average time

until the server

begins

to serve a page and includes delays in-

curred because of the need to process other requests at the server.

In a sense S is the time required for the request to receive one

unit of service and A =

R/S

is an estimate of the number of

units of service required to service a page. While S is a function

of server load, the value of R, and consequently A, is strongly

dependent on the characteristics of the path from server to client.

B. Evaluation of Push-Probe Technique

In Section V, we examine the performance of the overall any-

casting system. Prior to combining all parts of the system, we

evaluate the accuracy of the metric collection technique in isola-

tion. To do this, we experimented with various locations and ca-

pabilities of servers and resolvers, variations in server load, and

alternative file sizes. Fig. 3 shows a typical result, plotting the

estimated and actual values of response time over 270 queries.

The x-axis indicates the index of each query.

In this particular experiment, the probing agent was at the

University of Maryland, College Park, and the server was lo-

cated 12 Internet hops away at Georgia Tech, Atlanta. The agent

made requests to the server according to an access log file from a

real server. That is, the access log file was used to determine the

time of the query (relative to the starting time of the experiment)

and the size of the particular file to request. Approximately 26

minutes elapsed from the first to the last access, thus the average.

interarrival time of accesses was 5.8 s. The server was loaded

Application-layer anycasting: a server selection architecture and use in a replicated Web service

Citations

On selfish routing in internet-like environments

Content request routing and load balancing for content distribution networks

On the predictability of large transfer TCP throughput

On selfish routing in internet-like environments

Towards a global IP anycast service

References

Web server workload characterization: the search for invariants

A control-theoretic approach to flow control

Internet Web servers: workload characterization and performance implications

Service Location Protocol

Grapevine: an exercise in distributed computing

Related Papers (5)

Host Anycasting Service

Server selection using dynamic path characterization in wide-area networks

Chord: A scalable peer-to-peer lookup service for internet applications

A scalable content-addressable network

Topologically-aware overlay construction and server selection