scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Peer-to-peer workload characterization: techniques and open issues

TL;DR: The contribution of this paper is to provide a classification of related studies on file-sharing workload by distinguishing the main considered information and the mechanisms and tools that have been used for data collection.
Abstract: The popularity of peer-to-peer file sharing networks has attracted multiple interests even in the research community. In this paper, we focus on workload characterization of file-sharing systems that should be at the basis of performance evaluation and investigations for possible improvements. The contribution of this paper is twofold: first, we provide a classification of related studies on file-sharing workload by distinguishing the main considered information and the mechanisms and tools that have been used for data collection. We also point out open issues in file-sharing workload characterization and suggest novel approaches to workload studies.

Summary (2 min read)

1. Introduction

  • The P2P phenomenon has received an increasing amount of attention in the last years.
  • There are many goals behind the workload studies of file sharing systems.
  • Each proposed approach has its pros and cons.

2. File sharing networks

  • File sharing networks are essentially peer-to-peer systems designed to allow users to exchange files.
  • The lookup process is generally based on queries that match the resource characteristics.
  • Query and download are the basic operations for the majority of file sharing networks.
  • Some researchers have directed their study also on other file sharing networks such as E-Donkey an DirectConnect [18], however studies taking into account multiple networks are few and their contribution is limited to a partial view of the network characteristics.
  • Even if the number of nodes and the amount of files shared in the Gnutella network is lower than that in the FastTrack network [19], the open nature of the network makes Gnutella an interesting basis for the study of file sharing characteristics.

3. Classification of workload analysis according to collection technique

  • There are two main approaches for data collection, that is: Active probing and Passive probing (traffic interception and analysis).
  • A crawler is a modified servent that issues queries to inspect the contents and the structure of the peer-to-peer network.
  • The latter acts as a content digest (usually computed with the SHA1 algorithm) of the actual file content.
  • Passive probing collects data without issuing explicit queries but analyzing already available file sharing traffic.
  • The traffic analyzer is connected to a link connecting multiple physical networks.

4. Survey of workload analysis

  • The authors classify the literature on workload analysis in the last years into three broad categories: Characterization of the resource working set: it focuses on what resources are shared over the network.
  • One can study the number of shared files and their popularity distribution.the authors.
  • These analyses allow the evaluation of the caching potential of file-sharing traffic and provide an evaluation of the magnitude of the file sharing phenomenon.
  • Time stability of these patterns has also been taken into account.
  • Characterization of the servents and of the overlay network investigates on connection characteristics of the servents belonging to the network.

4.1. Characterization of the resource working set

  • Analysis on working set have focused on two main topics that is, resource popularity and size.
  • Two analyses [11, 1] have addressed this issue and their conclusions are the same.
  • This analysis have been carried out by Leibowitz et al. in [12] by studying variations the popularity rank of the 400 most popular files.
  • A Further analysis related to file sized has been carried out through crawling by Andreolini et al. [1] and provides an analytical model for the resources shared by each node.
  • A final analysis on working set size is the relationship between file MIME type and its size.

4.2. Analysis of the user behavior

  • Studies focusing on user behavior belongs to two categories: studies aiming to define a “file sharer user profile” and studies aiming to characterize the user activity cycles.
  • A first important contribution focusing on user profiles aims to address the issue of freeloaders that is, users downloading resources without sharing any file.
  • A first finding of the study is that “users are patient”: the researchers found that even for small files (less than 10 MB, typically audio files), 30% of the downloads take more than an hour and for 10% of the resources the download takes nearly a day.
  • Moreover, the study described in [17] is carried out on the Gnutella Network, while the analysis described in [7] is based on the FastTrack/Kazaa network.
  • These considerations can explain the different results of the two studies.

4.3. Characterization of the servents and of the overlay network

  • An interesting aspect takes into account the network and the servents.
  • The same study analyzes the impact of the power-law structure on the network resilience and concludes that the network is highly resilient to random node failure.
  • In particular, they have found that supernodes tend to select the least loaded neighbors.
  • Studies on servent connectivity have mainly focused on two main parameters, that is available bandwidth and network latency.
  • The study suggests that Gnutella is not a real peer-to-peer network because less than 15% of the nodes fit in the “server” profile and the large majority of peers are mainly “clients”.

5. Open issues and conclusions

  • The authors described the two main approaches to data collection and they provide a taxonomic classification of the literature on peer-to-peer workload analysis according to three main categories, that is analysis of file-sharing working set, characterization of user behavior and analysis on network structure and characteristics.
  • Experimental results and conclusions should use multiple data collection techniques.
  • A second interesting problem is the lack of geographicrelated analysis on file sharing download.
  • IP packet capture and analysis over high capacity links is required to obtain significant information for workload characterization.
  • It seems necessary to address the trade-off between accuracy in traffic analysis and computational load.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Peer-to-Peer workload characterization: techniques and open issues
Mauro Andreolini
University of Rome “Tor Vergata”
andreolini@ing.uniroma2.it
Michele Colajanni
University of Modena and Reggio Emilia
colajanni.michele@unimore.it
Riccardo Lancellotti
University of Modena and Reggio Emilia
lancellotti.riccardo@unimore.it
Abstract
The popularity of peer-to-peer file sharing networks has
attracted multiple interests even in the research commu-
nity. In this paper, we focus on workload characterization
of file-sharing systems that should be at the basis of perfor-
mance evaluation and investigations for possible improve-
ments. The contribution of this paper is twofold: first, we
provide a classification of related studies on file-sharing
workload by distinguishing the main considered informa-
tion and the mechanisms and tools that have been used for
data collection. We also point out open issues in file-sharing
workload characterization and suggest novel approaches to
workload studies.
1. Introduction
The P2P phenomenonhas received an increasing amount
of attention in the last years. Thanks to their distributed na-
ture [19], these systems represent an innovative and promis-
ing paradigm to build scalable and fault tolerant systems.
Multiple applications of peer-to-peer systems have
been proposed. Examples include filesystems [10, 3], Web
caches [8] and Streaming services [20]. However, the killer
application for peer-to-peer systems remains file shar-
ing over a large scale and large dimensions. The popularity
of file sharing applications is increasing over time, thanks
also to the growth in broadband connections that are avail-
able even to the home users. (A significant portion of the
traffic on network backbones is related to file sharing ac-
tivity [11].) For now on, file sharing represents the main
test-bench for the scalability and fault tolerance proper-
ties of peer-to-peer systems.
There are many goals behind the workload studies of file
sharing systems. Let us mention the improvement of peer-
to-peer protocols [2, 14], the creation of realistic analytical
and simulation models [5], the introduction of caching so-
lutions [11], and the evaluation of the economic impact of
file sharing due to copyright infringements [15].
The literature on P2P file-sharing workloadcharacteriza-
tion is recent, but large already. Each proposedapproachhas
its pros and cons. We propose the first survey that considers
main studies in file-sharing workload analysis and classifies
them according to two main parameters: techniquesused for
data collection that is, crawling or traffic interception and
information that have been analyzed, such as shared con-
tents, user behavior, structure and performance of the inter-
connections;
Hence, the contribution of this paper is twofold: we pro-
pose a taxonomic scheme for workload analysis classifica-
tion; our classification allows us to point out discrepancies
among different studies and to point out open issues and ar-
eas for future research.
The paper is organized as following. Section 2 outlines
the main characteristics of file sharing and the main ele-
ments of file sharing networks. Section 3 describes the two
techniques used for data collection. Section 4 provides an
analysis of the state-of-the-art in workload characterization.
Finally, Section 5 outlines open issues, future research di-
rections, and provides some concluding remarks.
2. File sharing networks
File sharing networks are essentially peer-to-peer sys-
tems designed to allow users to exchange files. The file shar-
ing application inherits from peer-to-peersystems two main
characteristics, that is: creation of a so-called overlay net-
work and the use of a decentralized approach to network
management. Both them are key characteristics for the de-
ployment of a world-wide service such as file sharing.
The basic function of a file-sharing network is to allow a
node to advertise the shared files and to carry out a lookup
process over the overlay network to find a resource shared
by remote node. The lookup process is generally based on
queries that match the resource characteristics. The most
common case is a query that matches filenames based on
regular expressions. The lookup process returns the list of
resources that match the query and the location of these re-
sources. Once a file is found,a download process can be ini-
tiated for the actual file retrieval.

Query and download are the basic operations for the ma-
jority of file sharing networks. Although the basic princi-
ples are common, there are multiple incompatible networks,
each characterized by different protocols. We find useful
to focus the analysis on two popular file sharing networks
FastTrack [9] and Gnutella [6] because most workload char-
acterization results are based on them.
Some researchers have directed their study also on other
file sharing networks such as E-Donkey an DirectCon-
nect [18], however studies taking into account multiple net-
works are few and their contribution is limited to a par-
tial view of the network characteristics. On the other hand,
Gnutella and FastTrack have been studied through differ-
ent techniques with various features aspects.
The FastTrack network is used by the Kazaa [9] file shar-
ing software. The network uses two protocols. The former
is used for network management and for resource lookup;
it is characterized by a heavy use of cryptography that hin-
ders its reverse engineering. The latter protocol is used for
file download and can be easily analyzed because in prac-
tice it correspondsto the standard HTTP protocol integrated
with few headers.
The Gnutella network is based on open protocols. Even
if the number of nodes and the amount of files shared in the
Gnutella network is lower than that in the FastTrack net-
work [19], the open nature of the network makes Gnutella
an interesting basis for the study of file sharing characteris-
tics. Gnutella uses two protocols: HTTP for file download
and a network-specific protocol for network management
and resource lookup. The Gnutella protocol specification
is available in two versions: Gnutella v0.4 (the first avail-
able standard version of the protocol) and Gnutella v0.6 [6]
officially standardized in 2004 and now adopted by most
Gnutella servents.
3. Classification of workload analysis accord-
ing to collection technique
There are two main approaches for data collection, that
is: Active probing (crawling) and Passive probing (traffic in-
terception and analysis). Active probing is a technique for
data collection based on crawlers. A crawler is a modified
servent that issues queries to inspect the contents and the
structure of the peer-to-peer network. Passive probing col-
lects data without issuing explicit queries but intercepting
and analyzing actual file sharing traffic.
Fig. 1 represents the crawling approach to data collec-
tion. The small monitors are the servents of the file-sharing
network, and the clouds represents physical networks that
are connected through links shown as thick solid lines. The
crawler connects to the overlay network and issues queries
(shown as dashed arrows). The crawler creates a snapshot
of the overlay network based on the responses to its queries.
Crawler
Servents
Servents
Servents
Servents
Figure 1. Crawling.
Servents
Servents
Servents
Servents
Traffic
analyzer
Figure 2. Traffic interception and analysis.
For each file stored in every node a crawler can collect the
name (that can be used also to determine the resource type),
the size and the file hash value. The latter acts as a con-
tent digest (usually computed with the SHA1 algorithm) of
the actual file content. Due to the strong non-collision prop-
erties of SHA1, the hash code can be assumed as a unique
identifier of the file content. This allows a double analy-
sis of resources to detect different files with the same name
and identical files stored under different names.
From a technical point of view, crawlers are easy to
implement, and their implementation is further simplified
when open source servents are available. On the other hand,
when the overlay network protocol is not known (for exam-
ple, in the FastTrack/Kazaa network), the use of a crawler
is extremely difficult (and has not yet be done) because it
would require a previous step for the reverse engineering of
the network protocol. For this reason studies on file-sharing
networks based on crawling are carried out basically on the
Gnutella network.
Passive probing collects data without issuing explicit
queries but analyzing already available file sharing traffic.
This analysis is typically carried out by intercepting and
analyzing the traffic on a network link. Fig. 2 shows the
passive approach to data collection. The traffic analyzer is

connected to a link connecting multiple physical networks.
The analyzer can gather information on the overlay network
only based on the traffic observed over the link.
Traffic interception and analysis introduce two main is-
sues to be addressed: first, the file-sharing traffic on the
link under observation must be a significant sample of the
overall file-sharing traffic, second, only traffic related to file
sharing is significant.
The first issue requires a careful selection of the link
to be observed. Analysis carried out on a scarcely popu-
lar link can lead to wrong or inaccurate conclusions be-
cause the intercepted traffic deviates substantially from the
real workload. For this reason studies using a traffic anal-
ysis approach takes into account links such as ISP back-
bones [11, 12] or big organizations (e.g., companies, uni-
versities) outbound links [7].
Extracting file sharing packets from the overall traffic re-
quires a classification of the traffic over the link. From the
point of view of a file-sharing workload characterization we
can recognize three types of traffic: (1) traffic directly re-
lated to resource downloading, (2) traffic related to over-
lay network management and queries, (3) traffic unrelated
to file sharing.
As for crawling, the traffic analysis requires open
and well-documented protocols. Download is carried out
through the HTTP protocol in both the Gnutella and the
FastTrack/Kazaa network. Download analysis is hence
straightforward for both networks.
Traffic analysis introduces also significant technical is-
sues. From Section 2 we know that multiple protocols are
used for overlay network management and signaling, hence
the traffic analyzer must be able to recognize specific traffic
signatures [18]. An alternative approach is to rely only on
specific well-known ports. However, this solution is not re-
liable because file sharing servents can be configured to use
non-standard ports. This behavior has become more popu-
lar since firewalls are configured to hinder the diffusion of
file sharing by blocking protocol-specific ports.
4. Survey of workload analysis
We classify the literature on workload analysis in the last
years into three broad categories:
Characterization of the resource working set: it focuses
on what resources are shared over the network. For exam-
ple, we can study the number of shared files and their pop-
ularity distribution. These analyses allow the evaluation of
the caching potential of file-sharing traffic and provide an
evaluation of the magnitudeof the file sharing phenomenon.
Other interesting studies are finalized to classify the wide
heterogeneity of shared files into a few profiles, generally
based on the MIME type.
Analysis of the user behavior: it is mainly related to the
dynamic aspects of the network. A non exhaustive list of
user behavior studies includes analysis on download starts
and abortions, time related patterns in the population of
users, such as the download session duration, frequency of
servent joins and leaves. Time stability of these patterns has
also been taken into account.
Characterization of the servents and of the overlay net-
work investigates on connection characteristics of the ser-
vents belonging to the network. Moreover, some researches
have focused on the relationship between the overlay net-
work and the physical network topologies.
4.1. Characterization of the resource working set
Analysis on working set have focused on two main top-
ics that is, resource popularity and size.
Studies on file popularity. If we consider the studies on
popularity, we have three main analyses that have been car-
ried out: popularity analysis on global resource set, based
on resource type and as a function of time.
In [11], Leibowitz et al. found through traffic analysis a
very skewed popularity curve in which 80% of downloads
is referred to 20% of the resources. The same authors con-
firmed the observation in a subsequent study [12]. In [1],
through crawling, Andreolini et. al found that the popularity
of shared resources follows a Zipf law. On the other hand,
another study of Gummadi et al. [7] suggests that if we fo-
cus on file downloads popularity distributions can be bet-
ter described through truncated-Zipf curves. This difference
between the results of [1] and [7] is due to the different data
collection strategies. However, the topic seems interesting
and worth of further studies.
Another interesting study is the analysis on the file types
popularity. Two analyses [11, 1] have addressed this issue
and their conclusions are the same. Fig. 3 shows the num-
ber of shared files according to its type. We aggregated the
MIME types into four groups: audio, video, archives (cor-
responding to archival data) and documents (e.g. PDF, text,
postscript files). All studies confirm that audio clips are the
most popular files, accounting for nearly 50% of files, fol-
lowed by archives, video and documents, with the latter be-
ing rather uncommon.
A final study on popularity is how popularity rank
changes over time. This analysis have been carried out by
Leibowitz et al. in [12] by studying variations the pop-
ularity rank of the 400 most popular files. The study
identified two categories of resources: a small group of re-
sources (nearly 20% of the working set) that are charac-
terized by stable popularity rang and the remaining 80%
of the resources that is subject to fast changes in popular-
ity.

Figure 3. File type popularity (# of files).
Studies on working set size. We can distinguish three
main studies carried out on the working set and resource
size of file sharing networks: resource size on the global
working set, resources shared by each node, and resource
size according to its type.
In [11], Leibowitz et al. provided a histogram of the file
size for the working set of the FastTrack/Kazaa network.
The study shows that a high number of shared files have
sizes of nearly 5 MB. This is consistent with the previously
reported results on audio files popularity.
A Further analysis related to file sized has been carried
out through crawling by Andreolini et al. [1] and provides
an analytical model for the resources shared by each node.
The study suggests that the number bytes shared by each
node follow a distribution with a lognormal body and a
Pareto tail.
A final analysis on working set size is the relationship
between file MIME type and its size. Leibowitz et al. found
a that file size is strongly correlated to its type. For exam-
ple, audio clips tend to be rather small (a few MB in size),
while video and archive files are at least an order of magni-
tude bigger. If we consider Fig. 4 we see that archives ac-
counts for more than 75% of the global working set size,
while audio clips accounts for less than 10% of it. These re-
sults are particularly interesting when compared with Fig. 3:
audio files are the most common resource, but its contribu-
tion to the working set size is very small. On the other hand,
archives are the main contributors to the working set size,
even if their number is reduced. These results are confirmed
in [1]. This latter study providesalso an analytical model for
file size according to file type using lognormal and Pareto
distributions.
4.2. Analysis of the user behavior
Studies focusing on user behavior belongs to two cate-
gories: studies aiming to define a “file sharer user profile”
and studies aiming to characterize the user activity cycles.
Figure 4. File types popularity (size).
Definition of user profile. User profile has been de-
scribed focusing on particular behaviors (considered antiso-
cial or dangerous for the network), taking into account the
time required for file download, and evaluating time-related
modifications in user behavior.
A first important contribution focusing on user profiles
aims to address the issue of freeloaders that is, users down-
loading resources without sharing any file. The commonbe-
liefs is that these users have an antisocial behavior, wast-
ing resources (mainly bandwidth) available in the file shar-
ing network. A study [5] based on analytical models, how-
ever, suggests that freeloaders are not necessarily a negative
aspects of the network because, while they are connected
to the network they contribute in routing query messages.
Moreover, partially downloaded files are made available to
the network and provide additional replicas of the file be-
ing downloaded.
A further contribution to describing the file sharing net-
work user is the study of Gummadi et al. [7]. The study
provides an interesting characterization of Kazaa users by
analyzing the file sharing traffic in a university campus. A
first finding of the study is that “users are patient”: the re-
searchers found that even for small files (less than 10 MB,
typically audio files), 30% of the downloads take more than
an hour and for 10% of the resources the download takes
nearly a day. For large requests (more than 100 MB), less
than 10% take less than one hour, 50% take more than one
day and 20% of the users wait for a week to complete their
download.
The same study outlines also an interesting aging effect
on the user. As users gets accustomed to the Kazaa tool (i.e.
after 3-4 weeks), the number of downloads is nearly halved
and the amount of data download is reduced to one fourth
respect to new users.
User activity characterization User activity characteri-
zation can be described based on download session length,
that is the time during which the user is downloadingat least
a resource and activity fraction, that is the fraction of time

median 90-percentile
Activity fraction [7] 66% 100%
Download session length [7] 2.40 min 28.33 min
Session length [17] 60 min 300 min
Table 1. User activity parameters.
spent by users downloading files from the network.
In [7] Gummadi et al. studied both metrics through traf-
fic analysis. The results of this study are shown in Tab. 1:
the activity fraction tends to be high, with a median value
of two-third of time spent in downloads. On the other and,
each download session tends to be rather short (lasting only
a few minutes). This suggests that one download is typically
split into multiple small chunks that are downloaded sepa-
rately.
A similar analysis carried out by Saroiu et al. [17] sug-
gests much longer sessions. However, this latter study is
carried out with a crawler. As a consequence, the session
length is not referred to the download activity, but to the
standard join/leave cycle of a servent in a file sharing net-
work. Moreover, the study described in [17] is carried out
on the Gnutella Network, while the analysis described in [7]
is based on the FastTrack/Kazaa network. These consider-
ations can explain the different results of the two studies.
Our conclusion is that this results discrepancy is worth of
further investigation.
4.3. Characterization of the servents and of the
overlay network
An interesting aspect takes into account the network and
the servents. Two main topics have been analyzed: network
topology and servent connectivity.
Studies on network topology. The main issues ad-
dressed by studies on network topology are the relationship
between overlay network topology and physical IP net-
work and the structure of the overlay network.
In [16], Ripeanu et al. use a crawler to demonstrate that
the Gnutella network topology is completely different from
the physical network topology. The authors argue that this
makes the Gnutella file sharing system inefficient.The same
study suggests that the networkcan be described as a power-
law network. This means that the Gnutella network is com-
posed by a reduced number of nodes with a high out-degree
and multiple nodes with a reduced number of connections.
Further study carried out by Saroiu et al. through crawl-
ing [17] confirms this observation.The same study analyzes
the impact of the power-law structure on the network re-
silience and concludes that the network is highly resilient
to random node failure. On the other hand removing just a
small amount (less than 5%) of the best connected nodes
median 90-percentile
Latency 100 ms 900 ms
Bandwidth 1Mb/s 20 Mb/s
Table 2. Servent connectivity parameters.
can lead to network partitioning. In [13], a detailed study
of the Kazaa network topology is given. The authors also
try to deduce the behavior of supernodes by injecting their
clients into the Fasttrack network. In particular, they have
found that supernodes tend to select the least loaded neigh-
bors.
Characterization of servent connectivity. Studies on
servent connectivity have mainly focused on two main pa-
rameters, that is available bandwidth and network latency.
The main results addressing this problem are by
Saroiu et al. [17]. The authors carried out a crawl-
ing on the Gnutella network and for each servent collected
the servent-advertised bandwidth. An analysis of la-
tency and bottleneck bandwidth has been carried out for
every discovered servent and the data have been com-
pared with the advertised information. The authors point
out that advertised data tends to underestimate the ac-
tual network connectivity. Tab. 2 shows the latency and
bandwidth found in [17]. As we can see most users are char-
acterized by DSL-class networking, as testified by the me-
dian bandwidth and latency.
The authors combine the bandwidth and latency data
with node availability information and define two peer pro-
files: a “Server” profile characterized by high bandwidth,
low latency and high availability, and a “Client” profile with
reduced connectivity and availability. The study suggests
that Gnutella is not a real peer-to-peer network because less
than 15% of the nodes fit inthe “server” profile and the large
majority of peers are mainly “clients”.
5. Open issues and conclusions
In this paper we proposed an analysis of the state of the
art in file sharing workload analysis. We described the two
main approaches to data collection and we provide a taxo-
nomic classification of the literature on peer-to-peer work-
load analysis according to three main categories, that is
analysis of file-sharing working set, characterization of user
behavior and analysis on network structure and characteris-
tics.
There are many open issues in workload characterization
of file sharing.
Experimental results and conclusions should use mul-
tiple data collection techniques. In multiple analysis (e.g.,
session duration, resource popularity) the use of just a
crawler or a traffic analyzer leads to quite different results.

Citations
More filters
Journal ArticleDOI
TL;DR: The comprehensive survey on P2P measurement is given, and the existing measurements and their results are analyzed in depth, furthermore, the shortcomings and problems are outlined.
Abstract: With the progress of peer-to-peer (P2P) technology, the Internet applications model is in a great reformation. In order to get an all-win solution among the Internet users, Internet service providers and content providers, it is necessary to measure and analyze the P2P applications from their perspectives. In this paper, the content of P2P measurement is introduced firstly, and then the existing research on P2P measurement is classified into 3 areas: topology measurement, traffic measurement and availability measurement. After comparing between measurement methods, the comprehensive survey on P2P measurement is given, and then the existing measurements and their results are analyzed in depth, furthermore, the shortcomings and problems are outlined. In the end, the future trend of the P2P measurement is discussed.

21 citations


Cites background from "Peer-to-peer workload characterizat..."

  • ...从推动 P2P测量的发展和应用来看,我们认为以下 4个方 面将成为未来的主要研究方向: (1) 目前的 P2P 测量方案通常都是仅采用单一测量方法(主动测量或被动测量),测量结果相对片面,对 P2P 应用的行为特征缺乏整体的认识([31])....

    [...]

  • ...pdf [31] Andreolini M, Colajanni M, Lancellotti R....

    [...]

Proceedings ArticleDOI
12 Jun 2015
TL;DR: Workload characterization for a variety of heterogeneous processors such as the DSPs and FPGAs for high performance embedded systems is shown.
Abstract: As the embedded computing becomes advanced, more and more functionality is becoming available on the mobile devices. The workloads on earlier generations of mobile devices were mostly limited to chat, e-mail or web browsing apart from the use as phones. Multi-media workloads such as the video are on the rise; in addition many users play games or use apps on the latest mobile devices. The emergence of these new workloads has resulted in the high performance demands on the mobile devices. System level design space exploration for high performance embedded systems is a very important problem that has become very challenging due to the advent of multi cores, GPUs, FPGAs and DSPs along with a large variety of energy efficient memory systems. To perform efficient design space exploration for SoCs adopted workload characterizations approach. This paper shows workload characterization for a variety of heterogeneous processors such as the DSPs and FPGAs.

9 citations


Cites methods from "Peer-to-peer workload characterizat..."

  • ...The techniques [2], [3] most commonly used to analyze system workloads have been briefly discussed in this section....

    [...]

Proceedings ArticleDOI
02 Apr 2015
TL;DR: This paper shows workload characterization for multi core and single core processors to the SPEC CPU 2000 benchmarks.
Abstract: Embedded systems are designed as a system-on-chip (SoC) with different processors and memory systems. Workloads on embedded systems are rapidly changing due to high performance demands. System level design space exploration for high performance embedded systems is a very important problem. A very large space of design choices exists for a high performance; embedded system in terms of both processing elements and the memory systems. Exploration of such a large space is intractable. To perform efficient design space exploration for SoCs adopted an approach of workload characterization. Knowing the workload characteristics is very important in the efficient design space for SoCs. The workload characterization approach uses the profiling tools and performance counters to measure the important metrics that determine the performance bottlenecks of a processor. This has to done as most important and dominant parameters are not known and after measuring important metrics, a statistical technique of characterization called Principle Component Analysis is used. PCA identifies dominant independent variables which help to do workload characterization. Using such data, generating workload performance models are generated and then those are fed to a system level simulator to analyze the design. This paper shows workload characterization for multi core and single core processors to the SPEC CPU 2000 benchmarks.

Cites methods from "Peer-to-peer workload characterizat..."

  • ...The techniques [8], [9] most commonly used to analyze system workloads have been briefly discussed in this section....

    [...]

01 Jan 2007
TL;DR: This dissertation investigates two new search paradigms with reduced overhead traffic in peer-to-peer file sharing overlay networks and shows that Targeted Search is shown to reduce query overhead traffic when compared to broadcast-based search used by Gnutella.
Abstract: Current estimates are that more than nine million PCs in the U.S. are part of peer-to-peer (P2P) file sharing overlay networks on the Internet. These P2P hosts generate about 20% of the traffic on the Internet and consume about 7.8 TWh/yr equal to $630 million per year. File search in a P2P network is based on a wasteful paradigm of broadcasting query messages. Reducing P2P overhead traffic to reduce bandwidth waste and enabling power management to reduce electricity usage are clearly of great interest. In this dissertation, two new search paradigms with reduced overhead traffic are investigated. The new Targeted Search method uses statistics from previous searches to target future searches. Targeted Search is shown to reduce query overhead traffic when compared to broadcast-based search used by Gnutella. The new Broadcast Updates with Local Look-up Search (BULLS) protocol enables new capabilities including power management and reduces overhead traffic by enabling a local look-up of shared files. BULLS hosts periodically broadcast changes in their list of files shared and build a table of shared files by all other hosts. Power management in P2P networks is studied as an application of the minimum set cover problem. A reduction in overall energy consumption is achieved by powering down hosts that have all of their shared files fully shared (or covered) by other hosts. A new set cover heuristic—called the Random Map Out (RMO) algorithm—is introduced and compared to the well-known Greedy heuristic. The algorithms are evaluated for minimum set cover size and computational complexity (number of comparisons). The RMO algorithm requires significantly less comparisons than Greedy and still achieves a set cover size within a few percent of that of Greedy. Additionally, the RMO algorithm can be distributed and independently executed by each host with reduced complexity per host where the Greedy heuristic does not reduce in complexity by being distributed. With RMO there is a non-zero probability of a given file being “lost” (not in set cover). The probability of this event is modeled and numerical results show that the probability of a file being lost is practically insignificant.
References
More filters
Journal ArticleDOI
12 Nov 2000
TL;DR: OceanStore monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data.
Abstract: OceanStore is a utility infrastructure designed to span the globe and provide continuous access to persistent information. Since this infrastructure is comprised of untrusted servers, data is protected through redundancy and cryptographic techniques. To improve performance, data is allowed to be cached anywhere, anytime. Additionally, monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data. A prototype implementation is currently under development.

3,376 citations


"Peer-to-peer workload characterizat..." refers background in this paper

  • ...The contribution of this paper is twofold: first, we provide a classification of related studies on file-sharing workload by distinguishing the main considered information and the mechanisms and tools that have been used for data collection....

    [...]

Proceedings ArticleDOI
10 Dec 2001
TL;DR: This measurement study seeks to precisely characterize the population of end-user hosts that participate in Napster and Gnutella, and shows that there is significant heterogeneity and lack of cooperation across peers participating in these systems.
Abstract: The popularity of peer-to-peer multimedia file sharing applications such as Gnutella and Napster has created a flurry of recent research activity into peer-to-peer architectures. We believe that the proper evaluation of a peer-to-peer system must take into account the characteristics of the peers that choose to participate. Surprisingly, however, few of the peer-to-peer architectures currently being developed are evaluated with respect to such considerations. In this paper, we remedy this situation by performing a detailed measurement study of the two popular peer-to-peer file sharing systems, namely Napster and Gnutella. In particular, our measurement study seeks to precisely characterize the population of end-user hosts that participate in these two systems. This characterization includes the bottleneck bandwidths between these hosts and the Internet at large, IP-level latencies to send packets to these hosts, how often hosts connect and disconnect from the system, how many files hosts share and download, the degree of cooperation between the hosts, and several correlations between these characteristics. Our measurements show that there is significant heterogeneity and lack of cooperation across peers participating in these systems.

2,189 citations

Proceedings ArticleDOI
21 Oct 2001
TL;DR: The Cooperative File System is a new peer-to-peer read-only storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storage and retrieval with a completely decentralized architecture that can scale to large systems.
Abstract: The Cooperative File System (CFS) is a new peer-to-peer read-only storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storage and retrieval. CFS does this with a completely decentralized architecture that can scale to large systems. CFS servers provide a distributed hash table (DHash) for block storage. CFS clients interpret DHash blocks as a file system. DHash distributes and caches blocks at a fine granularity to achieve load balance, uses replication for robustness, and decreases latency with server selection. DHash finds blocks using the Chord location protocol, which operates in time logarithmic in the number of servers.CFS is implemented using the SFS file system toolkit and runs on Linux, OpenBSD, and FreeBSD. Experience on a globally deployed prototype shows that CFS delivers data to clients as fast as FTP. Controlled tests show that CFS is scalable: with 4,096 servers, looking up a block of data involves contacting only seven servers. The tests also demonstrate nearly perfect robustness and unimpaired performance even when as many as half the servers fail.

1,733 citations


"Peer-to-peer workload characterizat..." refers background in this paper

  • ...The contribution of this paper is twofold: first, we provide a classification of related studies on file-sharing workload by distinguishing the main considered information and the mechanisms and tools that have been used for data collection....

    [...]

Proceedings ArticleDOI
25 Aug 2003
TL;DR: This work proposes several modifications to Gnutella's design that dynamically adapt the overlay topology and the search algorithms in order to accommodate the natural heterogeneity present in most peer-to-peer systems.
Abstract: Napster pioneered the idea of peer-to-peer file sharing, and supported it with a centralized file search facility. Subsequent P2P systems like Gnutella adopted decentralized search algorithms. However, Gnutella's notoriously poor scaling led some to propose distributed hash table solutions to the wide-area file search problem. Contrary to that trend, we advocate retaining Gnutella's simplicity while proposing new mechanisms that greatly improve its scalability. Building upon prior research [1, 12, 22], we propose several modifications to Gnutella's design that dynamically adapt the overlay topology and the search algorithms in order to accommodate the natural heterogeneity present in most peer-to-peer systems. We test our design through simulations and the results show three to five orders of magnitude improvement in total system capacity. We also report on a prototype implementation and its deployment on a testbed.

1,184 citations


Additional excerpts

  • ...We also point out open issues in file-sharing workload characterization and suggest novel approaches to workload studies....

    [...]

Journal ArticleDOI
19 Oct 2003
TL;DR: Unlike the Web, whose workload is driven by document change, it is demonstrated that clients' fetch-at-most-once behavior, the creation of new objects, and the addition of new clients to the system are the primary forces that drive multimedia workloads such as Kazaa.
Abstract: Peer-to-peer (P2P) file sharing accounts for an astonishing volume of current Internet traffic. This paper probes deeply into modern P2P file sharing systems and the forces that drive them. By doing so, we seek to increase our understanding of P2P file sharing workloads and their implications for future multimedia workloads. Our research uses a three-tiered approach. First, we analyze a 200-day trace of over 20 terabytes of Kazaa P2P traffic collected at the University of Washington. Second, we develop a model of multimedia workloads that lets us isolate, vary, and explore the impact of key system parameters. Our model, which we parameterize with statistics from our trace, lets us confirm various hypotheses about file-sharing behavior observed in the trace. Third, we explore the potential impact of locality-awareness in Kazaa.Our results reveal dramatic differences between P2P file sharing and Web traffic. For example, we show how the immutability of Kazaa's multimedia objects leads clients to fetch objects at most once; in contrast, a World-Wide Web client may fetch a popular page (e.g., CNN or Google) thousands of times. Moreover, we demonstrate that: (1) this "fetch-at-most-once" behavior causes the Kazaa popularity distribution to deviate substantially from Zipf curves we see for the Web, and (2) this deviation has significant implications for the performance of multimedia file-sharing systems. Unlike the Web, whose workload is driven by document change, we demonstrate that clients' fetch-at-most-once behavior, the creation of new objects, and the addition of new clients to the system are the primary forces that drive multimedia workloads such as Kazaa. We also show that there is substantial untapped locality in the Kazaa workload. Finally, we quantify the potential bandwidth savings that locality-aware P2P file-sharing architectures would achieve.

941 citations

Frequently Asked Questions (2)
Q1. What have the authors contributed in "Peer-to-peer workload characterization: techniques and open issues" ?

In this paper, the authors focus on workload characterization of file-sharing systems that should be at the basis of performance evaluation and investigations for possible improvements. The contribution of this paper is twofold: first, the authors provide a classification of related studies on file-sharing workload by distinguishing the main considered information and the mechanisms and tools that have been used for data collection. The authors also point out open issues in file-sharing workload characterization and suggest novel approaches to workload studies. 

In particular the authors outlined the following three interesting fields that are worth additional study in the future: • analysis of file sharing workload carried out combining both crawling and traffic analysis.