scispace - formally typeset
Open Access

Workload and Failure Characterization on a Large-Scale Federated Testbed

Brent N. Chun, +1 more
Reads0
Chats0
TLDR
A detailed characterization of the actual use of the PlanetLab network testbed is presented, using a variety of measurement tools, on the network, CPU, memory and disk usage of individual PlanetLab nodes and sites over a three-month period.
Abstract: 
Recently, a number of federated distributed computational and communication infrastructures have emerged, including the Grid, PlanetLab, and Content Distribution Networks. In these environments, mutually distrustful autonomous domains pool resources together for their mutual benefit, for instance to gain access to: unique computational resources, multiple vantage points on the network, or more computation than available locally. Key challenges for such federated infrastructures include resource allocation, scheduling, and constructing highly available services in the face of faulty end hosts and unpredictable network behavior. Developing such appropriate mechanisms and policies requires an understanding of the usage characteristics and operating environment of the target environment. In this paper, we present a detailed characterization of the actual use of the PlanetLab network testbed. PlanetLab consists of 240 nodes spread across 100 autonomous domains with over 500 active users. Using a variety of measurement tools, we present a three-month study on the network, CPU, memory and disk usage of individual PlanetLab nodes and sites. On the consumer side, we further characterize the consumption of individual users. Next, we present results on the availability and reliability of system nodes and the network interconnecting them. Finally, we discuss the implications of our measurements for emerging federated environments.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Exploring event correlation for failure prediction in coalitions of clusters

TL;DR: A spherical covariance model with an adjustable timescale parameter to quantify the temporal correlation and a stochastic model to describe spatial correlation is developed to cluster failure events based on their correlations and predict their future occurrences.
Proceedings ArticleDOI

Mirage: a microeconomic resource allocation system for sensornet testbeds

TL;DR: It is argued that a microeconomic resource allocation scheme, specifically the combinatorial auction, is well suited to testbed resource management and to demonstrate this, the Mirage resource allocation system is presented.
Proceedings Article

Subtleties in tolerating correlated failures in wide-area storage systems

TL;DR: This paper systematically revisits previously proposed techniques for addressing correlated failures and identifies a set of design principles that system builders can use to tolerate correlated failures.

Beyond Availability: Towards a Deeper Understanding of Machine Failure Characteristics in Large Distributed Systems

TL;DR: This paper analyzes traces from three large distributed systems to answer several subtle questions regarding machine failure characteristics and derives a set of fundamental principles for designing highly available distributed systems.
Proceedings ArticleDOI

Multi-state grid resource availability characterization

TL;DR: This paper introduces five availability states, and characterizes a Condor pool trace that uncovers when, how, and why its resources reside in, and transition between, these states, which suggests resource categories that schedulers can use to make better mapping decisions.
References
More filters
Proceedings ArticleDOI

Chord: A scalable peer-to-peer lookup service for internet applications

TL;DR: Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.
Journal ArticleDOI

Xen and the art of virtualization

TL;DR: Xen, an x86 virtual machine monitor which allows multiple commodity operating systems to share conventional hardware in a safe and resource managed fashion, but without sacrificing either performance or functionality, considerably outperform competing commercial and freely available solutions.
Journal ArticleDOI

Free riding on Gnutella

TL;DR: It is argued that free riding leads to degradation of the system performance and adds vulnerability to the system, and copyright issues might become moot compared to the possible collapse of such systems.
Journal ArticleDOI

An integrated experimental environment for distributed systems and networks

TL;DR: The overall design and implementation of Netbed is presented and its ability to improve experimental automation and efficiency is demonstrated, leading to new methods of experimentation, including automated parameter-space studies within emulation and straightforward comparisons of simulated, emulated, and wide-area scenarios.
Journal ArticleDOI

Measurement, modeling, and analysis of a peer-to-peer file-sharing workload

TL;DR: Unlike the Web, whose workload is driven by document change, it is demonstrated that clients' fetch-at-most-once behavior, the creation of new objects, and the addition of new clients to the system are the primary forces that drive multimedia workloads such as Kazaa.
Related Papers (5)