scispace - formally typeset
Open Access

A Workload Characterization Study of the 7998 World Cup Web Site

Martin Arlitt, +1 more
Reads0
Chats0
TLDR
In this article, a detailed workload characterization study of the 1998 World Cup Web site is presented, showing that improvements in the caching architecture of the World Wide Web are changing the workloads of Web servers, but major improvements to that architecture are still necessary.
Abstract
This article presents a detailed workload characterization study of the 1998 World Cup Web site. Measurements from this site were collected over a three-month period. During this time the site received l .35 billion re uests, making this the largest throu h comparison with existing characterization studies, we are able to determinelow W eb server workloads are evolving. We find that improvements in the caching architecture of the World Wide Web are changing the workloads of Web servers, but major im rovements to that architecture are still necessary. In particular, we uncover evilence that a better consistency mechanism is required for World Wide Web caches. Web workload analyzed to date. By examining a t is extremely busy site and

read more

Content maybe subject to copyright    Report

A
Workload Characterization
Study
of
the
7998
World
Cup
Web
Site zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Martin Arlitt and Tai Jin, Hewlett-Packard laboratories
Abstract zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
This article presents a detailed workload characterization study
of
the zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
1998
World
Cup Web site. Measurements from this site were collected over a three-month peri-
od. During this time the site received
l
.35
billion re uests, making this the largest
throu
h
comparison with existing characterization studies, we are able to deter-
minelow Web server workloads are evolving. We find that improvements in the
caching architecture
of
the World Wide Web are changing the workloads
of
Web
servers, but major im rovements to that architecture are still necessary. In particu-
lar, we uncover evilence that a better consistency mechanism
is
required for
World Wide Web caches.
Web workload analyzed to date. By examining
a
t
is
extremely busy site and
he 16th Federation Internationale de Football Associa-
tion (FIFA) World Cup was held in France from June
10
through July 12, 1998. France ’98, as the 16th FIFA
World Cup was commonly called, was the most widely
covered media event in history.
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
An
estimated cumulative tele-
vision audience of
40
billion watched the
64
matches of
France ’98, more than twice the cumulative television audi-
ence of the 1996 Summer Olympic Games in Atlanta.
http://www.france98.com,
the Web site for France ’98, was
also popular, receiving more than
1
billion client requests dur-
ing the tournament.
This article presents a detailed workload characterization of
the France ’98 Web site (more information is available in [l]).
Workload characterization plays an important role in systems
design. It allows us to understand the current state
of
the sys-
tem. By characterizing the system over time we can learn what
effects changes to the system have had. Workload characteri-
zation is also crucial to the design of new system components.
In this article we focus
on
the characterization
of
a Web serv-
er workload. We compare our results to those from previous
studies (e.g., [2]) to determine how Web server workloads
have changed over time. Furthermore, the extremely heavy
workload of the World Cup site allows us to predict what the
workloads of future Web servers may look like
so
that we may
plan accordingly.
Some
of
the more significant characteristics we observed in
the World Cup workload and the performance implications
of
these characteristics include:
HTTP/1.1 clients are becoming more prevalent, accounting
for
21
percent of all requests. Widespread deployment
of
HTTP/l.l-compliant clients and servers is necessary for the
functionality of HlTPi1.1 to be fully utilized.
Eighty-eight percent
of
all requests were for image files; an
additional
10
percent were for HTML files, indicating that
most user interest was in relatively static (i.e., cachable) files.
Almost 19 percent of all responses were “Not Modified,”
indicating that
cache
consistency traffic had
a
greater
impact
on
the World Cup workload than
on
previous Web
server workloads [2].
The workload was quite bursty, although over longer time
scales (e.g., hours or more) the arrival
of
these bursts was
quite predictable.
During periods of peak user interest in the World Cup site
the volume
of
cache consistency traffic increased dramati-
cally. This indicates that current consistency mechanisms do
not allow Web caches to eliminate “flash-crowds’’ (i.e., sud-
den, unexpected increases in traffic to a site) in the network
and at the servers, which is supposed to be one of the main
benefits of Web caching.
Web server workload characterization is only one of the
necessary steps for understanding the changes occurring in
Web traffic. Research efforts
on
Web client workloads
[3],
Web proxy workloads
[4],
network traffic characterizations
[5]
as well as HTTP analyses [6] are all required in order better
understand the Web.
The remainder of the article is organized as follows. We
provide background information on the 1998 World Cup
tournament, and introduce the France ’98 Web site. Next,
we discuss the data set used in the workload characterization
study and present the results of the study. Then, we analyze
a particularly busy segment
of
the World Cup workload and
compare the results to the overall study previously discussed.
We will describe in more detail the performance implica-
tions of the results from the previous sections. Finally, we
summarize the contributions of our article and list future
work. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
30
0890-8044/00/$10.00 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
0 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
2000
IEEE IEEE
Network
May/June
2000

Background
The zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
I998
World
Cup zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
rfc931 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
In
order
to
better understand the nature
of
the
work-
load from the France ’98 Wcb site knowledge
of
thc
tournament itself is required. This information is par-
ticularly useful for understanding the
usage
of the site.
The FIFA World Cup is zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
a
tournament held once
every four ycars
to
dctcrminc the bcst football (soc-
cer) team in the world. Due to thc large number of
teams interested
in
participating, a qualifying round is
used
to
select the teams that
will
play in the World
The remote login name
of
the user zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
-
-
-
-
__
__
-
remotehost
1
The
IP
address of the client issuing the request
[date]
The date and time zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
of
the request
(1
s
resolution)
I
1
authuser
1
The username
by
which the user has authenticated himself
I
Cup tournament. Of the 172 countries that entered
the qualifying round for France ’98, 30 were selected
to competc for the World Cup, along
with
the host country,
France, and the reigning champions, Brazil.
France ’98 began on June 10, 1998 and ended on July 12,
1998. The tournament consisted of several rounds of play. The
opening round lasted from June 10 until June 26. The 32 par-
ticipating teams were divided into eight groups. Each team
then played onc 90-min match against each of the other teams
in its group. The top two finishers from each group qualified
for the playoffs. During the playoffs only the winning team of
each match advanced
to
the next round.
If
a playoff game was
tied after 90 min, a 30-min sudden death overtime period was
played. If the overtime failcd to determine a winner, penalty
kicks were used to decide which team would advance. The
first round
of
the playoffs, known as the “Round of 16,” last-
ed from June 27 through July
1.
The remaining rounds of the
tournament were the Quarter Finals, held on July 3 and zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
4;
the
Semi Finals, held on July 7 and 8; and the Final, held on July
12. A match to determine the third place finisher was held
on
July
11.
Table
1
. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
A zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
descriptioii
of
Coiiiniori
Log
Format entries.
The
7
998
World
Cup
Web Site
The Web site
of
the 1998 World Cup,
http://www.france98.com,
provided Internet-sawy football fans around the world with a
wide range
of
information. Besides being able to access the
current scores
of
the football matches in real time, fans could
also access previous match results, player statistics, player
biographies, team histories, information on the stadiums, facts
about local attractions and festivities, as well as a wide range
of
photos from the matches and interviews with players and
coaches. Fans could also download free software, such as
World Cup screensavers and wallpapers from the France ’98
Web site.
All
of the information on the site was available in
English and French.
The France ’98 Web site went online May
6,
1997. In antic-
ipation
of
significant interest from the Internet community,
emphasis was placed on developing an available, reliable, and
low-latency platform to power the Web site. During the tour-
nament
30
servers were used, distributed across four loca-
tions: Paris, France; Herndon, Virginia; Plano, Texas; and
Santa Clara, California.
All
Web page creation and modifica-
tions occurred in France, and were distributed to the U.S.
locations.
A
number
of
load balancers were used
to
distribute
the requests across these four locations and among the servers
at each location. In this article we examine only the aggregate
workload. Information on the workload at each of these loca-
tions is available in
[
l].
.
Workload Characterization
This section presents the results
of
our workload characteriza-
tion. The next section provides information
on
the collection
and reduction
of
the data set used in our study. Since 99.88
percent
of
all requests contained the GET method, through-
I
Duration
1
Mav
1-Julv
23,
1998
I
L
I
1,352,804,107
Total requests
I
96
I
I
Average request
1
Total bytes transferred (Gbytes)
4991
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
I
Average bytes transferred/min (Mbytes)
I
40.8
1
I
I
I
Table
2.
A
summary
of
access
log
characteristics (raw data).
out the remainder of the article we analyze only these GET
requests. We then discuss the protocol version, response sta-
tus code and file type distributions. We also analyze the usage
of the World Cup site. We describe the unique file size distri-
bution while later we look at the file referencing patterns.
Collection and Reduction
of
Data
The data set used in this workload characterization study is
composed of the access logs collected from each
of
the servers
used in the World Cup Web site. The access logs from each
server were archived on a daily basis. For this study all
of
the
access logs from May
1
through July 23, 1998 were analyzed.
Each access log is in the Common Log Format. For every
request received by the Web server, the information described
in Table
1
is stored. The
request
line from the client includes
the method (e.g., GET) to be applied to the requested
resource [7], the name of the resource (e.g., /index.html), and
the protocol version in use (e.g., HTTPA.0).
Table 2 summarizes the access logs that we acquired from
the World Cup site. Our first concern was with the size
of
the
raw access logs: 125 Gbytes in total, 14 Gbytes when com-
pressed.
In
order to make
our
workload analyses more effi-
cient, we chose to convert the logs to a more compact binary
format. We reduced the storage requirements in two ways.
One approach removed unnecessary data. For example, we
deleted the
rfc931
and
authuser
fields since they were not
used by the servers and thus provided no information. The
second tactic we used
to
reduce the size
of
the data set was
to
represent the remaining fields in more efficient ways when
possible. For example, we mapped each distinct URL
to
a
unique integer identifier. Finally, we collated the access logs
of
all the servers by request time. The resulting binary log file
was 25 Gbytes in size, 9 Gbytes when compressed. Further-
more, each request is now in a fixed size structure, which also
helps to improve the efficiency
of
our
analyses.
Despite the vast amount of data that was collected by each
of
the servers, a lot
of
interesting information is still not avail-
able.
For
example, although the logs do have a timestamp that
records when the request was received by the server, it has a
one second resolution which is
too
coarse-grained to be
of
use
for numerous analyses (e.g., interrequest times). This is just
~ ~~
IEEE
Network
May/Junc
2000
31

i
200 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
(Successful)
,
206 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
(Partial
Content)
304
(Not
Modified)
I
4xx.
5xx zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
(Errors)
I
0.64
I
0.06
1
80.52 97.86
0.09
2.08
18.75
0.00
Table
3. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Breokclown zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
ofsetser
respotise
codes.
Table zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
4.
Breakclowti
by
type.
one examplc
of
useful information that could
be
addcd
to
a
revised Web servcr
log
filc format.
Sta zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
tis
tica
I
Characteristics
Our first analysis examined the version of Hypertext Transfer
Protocol
(HTTP)
supported by the client issuing the request.
As expected, we found that HTTP/1.0 is still the version used
by most
(78.7
perccnt)
of
thc clients. However, over 20 per-
cent of the traffic came from clients that
support
HTTP/l.l.
This suggests that browsers supporting HTTPI1.1
arc
slowly
replacing browsers that do not. These results do not indicate
what percentage of the requests to the World Cup site,
if
any,
actually used
HTTP/I.l
functionality.
Table
3
shows the breakdown of servcr response codes.
Table
3
reveals that the majority
of
requests resulted in the
successful transfcr
of
an object (response status 200). The
successful transfers account for almost all
of
the content
data (97.86 pcrcent) transfcrred from the Web site back to
clients. The second most common status code was the Not
Modified
(304)
response, which accounted for almost
19
per-
cent of all rcsponses to clicnt rcquests. This represents
a
substantial increase over the fraction
of
Not Modified
responses seen in earlier server workloads [2]. The reason
for this increase can be attributed to the improved caching
architecture in the Web, including persistent caches in
browsers, and morc recently in proxies and networks (e.g.,
transparent caches). This type
of
response indicates that the
client issued a conditional GET request
to
verify that its
cached copy
of
the
file
is consistent with the version being
served at the Web site.
Table
4
shows thc breakdown
of
responses by the type
of
file requested by the client. For the majority
of
the responses
the file extension was used to determine the file type. For
example, files ending with
.
jpg
or
.gif
were placed in the
Images catcgory. We considered any
URL
that included
a
cgi-bin
substring
or
a parameter list (e.g.,
/home.htm?
parameter-list)
to
be
a dynamic file. In some cases dynam-
ic
files can
also
bc
identified by file type
(e.g.,
.asp).
For
all
remaining (unique) requests wc issued HEAD rcquests to the
Web
site and used the
Content-type:
responsc header to
classify the filc.
Tablc
4
rcvcals that
almost
all
clicnt rcqucsts
(98.01
pcr-
ccnt) wcrc for cithcr HTML
(9.85
pcrccnt)
or
image (88.16
pcrccnt) filcs.
A
similar churactcristic was ohsci-vcd
in
carlicr
Wcb scrvcr workloads 121. IHTML filcs had
more
impact than
imagc filcs
on
thc volumc
of
data transfcrrcd from
the
Wch
sitc (38.60 pcrcent
foI
HTML comparcd with 35.02 pcrccnt
for imagcs). Most imagc rcqucsts wcrc for
small
inlinc
graph-
ics, whilc thc HTML rcqucsts wcrc for substantially largcr
filcs. Comprcsscd
filcs
(c.g., scrccns;ivcrs), which accountcd
for only 0.08 perccnt
of
all
requcsts, wcre rcsponsiblc for
over 20 perccnt of the data traffic.
Thc
hugc discrepancy
bctwccn thc percentagcs
of
rcquests and data transferred for
the corresponding transfers
is
an indication
of
the effects
largc files can have on the workload
of
a Web scrvcr and
of
the network.
Usage
Analysis
Figurc
1
shows the daily traffic volume handled by the World
Cup Web site. From the beginning
of
May until the start
of
the World Cup on June
10,
the traffic volume is quitc light,
although clearly building in anticipation
of
the start
of
the
event. Beginning
on
June
10,
the volume
of
traffic grows
enormously. This marks the beginning
of
a prolonged
flash-
crowd.
That is, thc sitc suddenly became very popular,
remained popular for a pcriod
of
time, and then quickly faded
back into obscurity. Although the daily traffic volumc is quitc
bursty during the World Cup, the traffic volume remains high-
er
than it was at any time prior to the start of the event. The
busiest day for the site was June
30,
when over
73
million
requests were handled by the France '98 sitc.
In order to better understand the causes
of
this burstiness
we analyzed the traffic in more detail. Figure 2 shows the
hourly traffic volume
of
the World Cup Web site. The figure
consists
of
six bar graphs, one
for
each week
of
the World
Cup
tournament. The solid black curve in each graph rcpre-
sents the hourly volume
of
requests (y-axis) for the given
time (x-axis, normalized to local time in France). The scale
of
both the
x-
and y-axes
are
kept constant across all bar
graphs to facilitate comparisons in traffic volumc over time
and by day
of
the week. The dashed vertical lines indicate
the starting time
of
a World Cup football match. The teams
involved in each match
are
also listed (the abbreviations are
defined in Table
5).
For cxample, at
5:30
p.m. (in France)
on Wednesday June
10,
the first match
of
the 1998 World
Cup was played between Brazil
(BRA)
and Scotland (SCO).
At this time requests were arriving at a rate
of
6 million/hr.
May
June
July
Aug
H
Figure
1
.
Daily
trafic
vohtnie
to
[Ire zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
World
Cup
Web
site.
32
IEEE
Nctwork
May/Junc
2000

I zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
ARG ]Argentina zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
1
DEN
I
Denmark
1
ITA
I
NOR
I
Norway
I
AUT Austria ENG England JAM Jamaica PAR Paraguay
I
BEL
I
Belgium
I
ESP
I
Spain
I
JPN
I
Japan
I
ROM
1
Romania
I
BGR Bulgaria FRA France KOR South Korea RSA South Africa
I
I zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
BRA
I
Brazil
I
GER
I
Germany
I
KSA
I
Saudi Arabia
I
SCO
I
Scotland
1
CHI
Chile HOL The Netherlands MEX Mexico
TUN Tunisia
lcameroon
I
HRV
I
Croatia
I
MOR
1
Morocco
I
USA
1
United States
1
I
CMR
COL IRN Iran NGA Nigeria YUG
Yugoslavia
Table zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
5. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Team name abbreviations. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
The bar graphs also indicate the days on which each round
of the tournament began (e.g., the Round of 16 began on
Saturday June 27), as well as those matches that required
penalty kicks to decide
a
victor, indicated by a zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
(P)
following
the names of the teams.
Figure
2
reveals that there were many variables that affect-
ed the hourly traffic volume at the World Cup Web site. For
example, the volume of traffic increased when matches were
in progress and decreased once they had finished. These
bursts represent flash-crowds on a smaller scale. The traffic
volume was also affected by the teams involved in the matches
(e.g., traditional football powers like Brazil and Germany are
of interest to football fans everywhere, not just Brazilians and
Germans), the number of matches in progress (e.g., from June
23 through June 26 matches were played in parallel), and the
playoff implications of the match. The time differences
between the user’s geographic location and France also con-
tributed to the usage of the World Cup site.
One interesting observation to be made from Fig. 2 is that
the volume
of
traffic to the World Cup Web site was quite
low on weekends, even though a higher percentage
of
matches
were played on Saturday and Sunday than on weekdays. The
obvious reason for this reduction in traffic volume is that peo-
ple preferred to watch the matches on television. When these
fans were unable to watch the matches on television, such as
when they were at work or school, or certain matches were
not televised in their area, they relied on the Web to provide
them with progress reports on the matches.
zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Unique
File
Size
Distribution
In this section we analyze the distribution of sizes for all
unique files requested from the World Cup Web site. Infor-
mation on the successful transfer size and overall transfer size
distributions is available in [l].
Our first analysis
looks
at the sizes of each
of
the unique
files requested and successfully transferred at least once in
the access log. For the purpose of this study we utilize the
initial nonzero size recorded for each unique file. Twenty
thousand, seven hundred twenty-eight unique files were
requested (and successfully transferred) from the World Cup
site during the measurement period. The total combined size
of
these files was 307 MB. The mean size
of
these files was
15,524 bytes, the median size 4674 bytes, and the maximum
size 61.2 Mbytes. The mean, median, and maximum values
are consistent with those observed in other Web server
workloads [Z].
Figure 3a presents the frequency histogram; Fig. 3b pro-
vides the cumulative frequency histogram of the unique file
sizes. We have applied a logarithmic transformation to the
file
sizes
to
enable us to identify patterns across the wide
range
of
values [SI.
For
a log2 transformation, bini includes
values in the range 2’
5 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
x
<
2’+1
-1.
Similarly, for a loglo
transformation, bini includes values in the range
10’
5 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
x
<
loi+]
-
1.
Figure 3a indicates that most unique files have
sizes in the 256-byte to 64-kbyte range (28-216 bytes). In
other Web workload characterizations the file size distribu-
tion has been found to be
lognormal
[3, 41. That is, after
applying a logarithmic transformation to the data, the data
appears to be normally distributed. We compare the unique
file size distribution (the
empirical data)
to
a
synthetic log-
normal distribution with parameters
p
=
12.14 and
o
=
1.73.
From Fig. 3a we can see that the empirical data deviates
quite substantially from the synthetic model. These differ-
ences are due to the distinct nature of the World Cup site.
For example, in Fig. 3a about
10
percent of all unique files
were around 4 kbytes (212 bytes) in size. Sixty-five percent
of these files are HTML objects that provided profiles on
the individual players who participated in the World Cup
tournament. The other large spikes in Fig. 3a are also the
result of groups
of
related objects having very similar sizes.
On “typical” Web sites we would not expect to see such
large clusters of related objects making up a substantial per-
centage of all files on the Web site. Despite the number
of
spikes seen in Fig. 3a, the cumulative frequency histogram
(shown in Fig. 3b) indicates that the lognormal distribution
still provides a reasonable estimate for the body
of
the
unique file size distribution.
While most of the unique files are less than 64 kbytes in size,
a few are substantially larger. Our next analysis examined the
tail
of
the unique file size distribution to determine if it is heavy-
tailed.
A
distribution is considered heavy-tailed
if zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
P[X
>
x]
-xa,
x
+
=,
0
<
a
<
2. This means that if the asymptotic shape of
the distribution is hyperbolic, it is heavy-tailed, regardless
of
the behavior of the distributions
for
small values [9]. To deter-
mine if the unique file size distribution from the World Cup
Web site is heavy-tailed, we plotted the complementary distri-
bution (CD) function on log-log axes and examined the
results for linear behavior on the upper tail. This method
of
analysis is described in [9]. The results of this analysis for the
World Cup data are shown in Fig. 3c. The tail of the distribu-
tion does exhibit some linear behavior, which suggests that the
distribution is indeed heavy-tailed. However, this linearity
does not exist throughout the entire tail. Specifically, a spike
exists in the 1-4-Mbyte range. This spike is caused by the
exis-
tence
of
44 files whose sizes are in the 1-4-Mbyte range.
These files include 13 uncompressed high-resolution images,
four audio clips,
15
screen savers (i.e., downloadable soft-
ware), and 12 video clips.
To verify that the unique file size distribution
is
indeed
heavy-tailed, we utilized the scaling estimator tool
aest
creat-
ed by Crovella and Taqqu [9]. This tool aggregates the data
points in the distribution and then plots the
CD
of the aggre-
IEEE
Network
May/Junc
2000
33

10 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
8-
I
I
.I
.. ...
'BRA-sco-+
:
ITA-CHi-
:
LAR-BGRi
j
:
IESp-NGAj zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
i
i
-
MOR-NOR-, CMR-AUT-: KSA-DEN
-,
i
KOR-MEX-,
j
-
: :
FRA-RSA- HOL-BEL-
-
...
...
..
..
..
..
.. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
6amlZpm6pm 6amlZpm6pm
6amlZpm6pm
6amlZpm6pm 6am12pm6pm 6aml2pm6pm 6amlZpm6pm
Sun
Mon
Tue
Wed Thu zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Fri
Sat
June
7
June
8
June
9
June
10
June
11
June
12
June
13
6-
4- zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
2-
6amlZpm6pm 6aml2pm6pm 6aml2pm zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
6pm
6amlZpm6pm 6amlZpm6pm 6amlZpm6pm 6am 12pm6pm
Sun
Mon
Tue Wed Thu Fri Sat
June
14
June
15
June
16
June
17
June
18
June
19
June
20
I,
.-
..
.-
-
6am 12pm6pm
6am
12pm 6pm
6am
12pm
6pm
6am
12pm6pm 6aml2pm 6pm
6am
12pm 6pm
6am
12pm6pm
Sun
Mon
Tue Wed Thu Fri
Sat
June
21
June
22
June
23
June
24
June
25
June
26
June
27
12
I
I I
I I I
10
BRA-HOL(P)
-j
FRA-HRV
-:
HOL-HRV-:
-
-
8-
6-
4-
2-
Third
:
-
place
i
-
game
I.
--~
I
I
..
ITA-FRA(P)+
:
HOL-ARC-
-
BRA-DEN
-
GER-HRV
-:
8-
-
BRA-FRA--i
I I
I
I
I
ends
i
2
-I..,
6am
12pm 6pm
6aml
2pm 6pm
6am
12pm6pm
6am
l2pm 6pm 6amlZpm6pm
6am
12pm zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
6pm 6am
12pm
6pm
Sun
Mon
Tue
Wed
Thu
Fri Sat
July
12
July
13
July
14
July
15
July
16
July
17
July
18
Figure
2. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA
Hourly
trafic
volume
to
the
World
Cup
Web
site.
34
IEEE
Network
May/June
2000

Citations
More filters
Journal ArticleDOI

Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility

TL;DR: This paper defines Cloud computing and provides the architecture for creating Clouds with market-oriented resource allocation by leveraging technologies such as Virtual Machines (VMs), and provides insights on market-based resource management strategies that encompass both customer-driven service management and computational risk management to sustain Service Level Agreement (SLA) oriented resource allocation.
Proceedings ArticleDOI

Managing energy and server resources in hosting centers

TL;DR: Experimental results from a prototype confirm that the system adapts to offered load and resource availability, and can reduce server energy usage by 29% or more for a typical Web workload.
Proceedings ArticleDOI

Youtube traffic characterization: a view from the edge

TL;DR: This paper presents a traffic characterization study of the popular video sharing service, YouTube, and finds that as with the traditional Web, caching could improve the end user experience, reduce network bandwidth consumption, and reduce the load on YouTube's core server infrastructure.
Proceedings ArticleDOI

Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites

TL;DR: An enhancement to CDNs is proposed that offers better protection to Web sites against flash events and trace-driven simulations are used to study the effect of the enhancement on CDNs and Web sites.
Proceedings Article

Making scheduling cool: temperature-aware workload placement in data centers

TL;DR: This paper examines a theoretic thermodynamic formulation that uses information about steady state hot spots and cold spots in the data center and develops real-world scheduling algorithms, and develops an alternate approach to address the problem of heat management through temperature-aware workload placement.
References
More filters
Proceedings ArticleDOI

Web caching and Zipf-like distributions: evidence and implications

TL;DR: This paper investigates the page request distribution seen by Web proxy caches using traces from a variety of sources and considers a simple model where the Web accesses are independent and the reference probability of the documents follows a Zipf-like distribution, suggesting that the various observed properties of hit-ratios and temporal locality are indeed inherent to Web accesse observed by proxies.
Journal ArticleDOI

Self-similarity in World Wide Web traffic: evidence and possible causes

TL;DR: It is shown that the self-similarity in WWW traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in file transfer, the effect of user "think time", and the superimposition of many such transfers in a local-area network.
Proceedings ArticleDOI

Generating representative Web workloads for network and server performance evaluation

TL;DR: This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server and addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream.
Journal ArticleDOI

Wide-area Internet traffic patterns and characteristics

K. Thompson, +2 more
- 01 Nov 1997 - 
TL;DR: Observations on the patterns and characteristics of wide-area Internet traffic, as recorded by MCI's OC-3 traffic monitors are presented, revealing the characteristics of the traffic in terms of packet sizes, flow duration, volume, and percentage composition by protocol and application.
Journal ArticleDOI

Internet Web servers: workload characterization and performance implications

TL;DR: The paper concludes with a discussion of caching and performance issues, using the observed workload characteristics to suggest performance enhancements that seem promising for Internet Web servers.
Related Papers (5)