Multi-class latency-bounded Web services

doi:10.1109/IWQOS.2000.847959

Open AccessProceedings ArticleDOI

Multi-class latency-bounded Web services

V. Kanodia, +1 more

- pp 231-239

Chats0

TLDR

The goal of this work is to design a general "front-end" algorithm that uses a general service abstraction to adaptively control not only the latency of a particular class, but also to assess the inter-class relationships.

Abstract:

Two recent advances have resulted in significant improvements in Web server quality of service. First, both centralized and distributed Web servers can provide isolation among service classes by fairly distributing system resources. Second, session admission control can protect classes from performance degradation due to overload. The goal of this work is to design a general "front-end" algorithm that uses these two building blocks to support a new Web service model, namely, multi-class services which control response latencies to within pre-specified targets. Our key technique is to devise a general service abstraction to adaptively control not only the latency of a particular class, but also to assess the inter-class relationships. In this way, we capture the extent to which classes are isolated or share system resources (as determined by the server architecture and system internals) and hence their effects on each other's QoS. For example, if the server provides class isolation (i.e., a minimum fraction of system resources independent of other classes), yet also allows a class to utilize unused resources from other classes, the algorithm infers and exploits this behavior without an explicit low level model of the server. Thus, as new functionalities are incorporated into Web servers, the approach naturally exploits their properties to efficiently satisfy the classes' performance targets. We validate the scheme with trace driven simulations.

Content maybe subject to copyright Report

HTML Viewer

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

An analytical model for multi-tier internet services and its applications

Bhuvan Urgaonkar, +4 more

TL;DR: This paper presents a model based on a network of queues, where the queues represent different tiers of the application, sufficiently general to capture the behavior of tiers with significantly different performance characteristics and application idiosyncrasies such as session-based workloads, concurrency limits, and caching at intermediate tiers.

...read moreread less

Journal ArticleDOI

Performance guarantees for Web server end-systems: a control-theoretical approach

Tarek Abdelzaher, +2 more

- 01 Jan 2002 -

IEEE Transactions on Parallel and Distri...

TL;DR: This paper uses feedback control theory to achieve overload protection, performance guarantees, and service differentiation in the presence of load unpredictability, and shows that control-theoretic techniques offer a sound way of achieving desired performance in performance-critical Internet applications.

...read moreread less

Journal ArticleDOI

Agile dynamic provisioning of multi-tier Internet applications

Bhuvan Urgaonkar, +4 more

- 27 Mar 2008 -

ACM Transactions on Autonomous and Adapt...

TL;DR: A novel dynamic provisioning technique for multi-tier Internet applications that employs a flexible queuing model to determine how much of the resources to allocate to each tier of the application, and a combination of predictive and reactive methods that determine when to provision these resources, both at large and small time scales is proposed.

...read moreread less

Journal ArticleDOI

Session-based admission control: a mechanism for peak load management of commercial Web sites

Ludmila Cherkasova, +1 more

- 01 Jun 2002 -

IEEE Transactions on Computers

TL;DR: It is shown that a Web server augmented with the admission control mechanism is able to provide a fair guarantee of completion, for any accepted session, independent of a session length, which is a critical requirement for any e-business.

...read moreread less

Proceedings Article

Adaptive overload control for busy internet servers

Matt Welsh, +1 more

TL;DR: This paper presents a set of techniques for managing overload in complex, dynamic Internet services based on an adaptive admission control mechanism that attempts to bound the 90th-percentile response time of requests flowing through the service.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Self-similarity in World Wide Web traffic: evidence and possible causes

Mark Crovella, +1 more

- 01 Dec 1997 -

IEEE ACM Transactions on Networking

TL;DR: It is shown that the self-similarity in WWW traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in file transfer, the effect of user "think time", and the superimposition of many such transfers in a local-area network.

...read moreread less

Journal ArticleDOI

Self-similarity in World Wide Web traffic: evidence and possible causes

Mark Crovella, +1 more

...read moreread less

Proceedings Article

Cost-aware WWW proxy caching algorithms

Pei Cao, +1 more

TL;DR: GreedyDual-Size as discussed by the authors incorporates locality with cost and size concerns in a simple and nonparameterized fashion for high performance, which can potentially improve the performance of main-memory caching of Web documents.

...read moreread less

Proceedings ArticleDOI

Resource containers: a new facility for resource management in server systems

Gaurav Banga, +2 more

TL;DR: This work proposes and evaluates a new operating system abstraction called a resource container, which separates the notion of a protection domain from that of a resource principal, and enables fine-grained resource management in server systems and allow the development of robust servers, with simple and firm control over priority policies.

...read moreread less

Journal ArticleDOI

Locality-aware request distribution in cluster-based network servers

Vivek S. Pai, +6 more

TL;DR: A simple, practical strategy for locality-aware request distribution (LARD), in which the front-end distributes incoming requests in a manner that achieves high locality in the back-ends' main memory caches as well as load balancing.

...read moreread less