scispace - formally typeset
Proceedings ArticleDOI

Web caching and Zipf-like distributions: evidence and implications

TLDR
This paper investigates the page request distribution seen by Web proxy caches using traces from a variety of sources and considers a simple model where the Web accesses are independent and the reference probability of the documents follows a Zipf-like distribution, suggesting that the various observed properties of hit-ratios and temporal locality are indeed inherent to Web accesse observed by proxies.
Abstract
This paper addresses two unresolved issues about Web caching. The first issue is whether Web requests from a fixed user community are distributed according to Zipf's (1929) law. The second issue relates to a number of studies on the characteristics of Web proxy traces, which have shown that the hit-ratios and temporal locality of the traces exhibit certain asymptotic properties that are uniform across the different sets of the traces. In particular, the question is whether these properties are inherent to Web accesses or whether they are simply an artifact of the traces. An answer to these unresolved issues will facilitate both Web cache resource planning and cache hierarchy design. We show that the answers to the two questions are related. We first investigate the page request distribution seen by Web proxy caches using traces from a variety of sources. We find that the distribution does not follow Zipf's law precisely, but instead follows a Zipf-like distribution with the exponent varying from trace to trace. Furthermore, we find that there is only (i) a weak correlation between the access frequency of a Web page and its size and (ii) a weak correlation between access frequency and its rate of change. We then consider a simple model where the Web accesses are independent and the reference probability of the documents follows a Zipf-like distribution. We find that the model yields asymptotic behaviour that are consistent with the experimental observations, suggesting that the various observed properties of hit-ratios and temporal locality are indeed inherent to Web accesses observed by proxies. Finally, we revisit Web cache replacement algorithms and show that the algorithm that is suggested by this simple model performs best on real trace data. The results indicate that while page requests do indeed reveal short-term correlations and other structures, a simple model for an independent request stream following a Zipf-like distribution is sufficient to capture certain asymptotic properties observed at Web proxies.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Cloud download: using cloud utilities to achieve high-quality content distribution for unpopular videos

TL;DR: This paper proposes and implements the cloud download scheme, which achieves high-quality video content distribution by using cloud utilities to guarantee the data health and enhance the data transfer rate, and provides practical experiences and valuable heuristics for making use of cloud Utilities to achieve efficient Internet services.
Journal ArticleDOI

Joint Trajectory Design and Resource Allocation for Secure Transmission in Cache-Enabled UAV-Relaying Networks With D2D Communications

TL;DR: This article investigates the issue of secure transmission in a cache-enabled UAV-relaying network with D2D communications in the presence of an eavesdropper, and proposes an alternating iterative algorithm based on the block alternating descent and successive convex approximation methods to solve the problem.
Proceedings ArticleDOI

Flower-CDN: a hybrid P2P overlay for efficient query processing in CDN

TL;DR: Flower-CDN is developed, a locality-aware P2P based content-distribution network (CDN) in which the users that are interested in a website support the distribution of its content.
Proceedings ArticleDOI

Efficient bulk insertion into a distributed ordered table

TL;DR: This work proposes a novel approach in which a planning phase is invoked before the actual insertions of bulk inserts into tables in a system that horizontally range-partitions data over a large cluster of shared-nothing machines, and yields significant improvements over more naïve techniques.
Journal ArticleDOI

Content Placement in Cache-Enabled Sub-6 GHz and Millimeter-Wave Multi-Antenna Dense Small Cell Networks

TL;DR: This paper studies the performance of cache-enabled dense small cell networks consisting of multi-antenna sub-6 GHz and millimeter-wave (mm-wave) base stations and develops another simple yet effective heuristic probabilistic content placement scheme, termed two-stair algorithm, which strikes a balance between caching the most popular contents and achieving content diversity.
References
More filters
Proceedings Article

Cost-aware WWW proxy caching algorithms

TL;DR: GreedyDual-Size as discussed by the authors incorporates locality with cost and size concerns in a simple and nonparameterized fashion for high performance, which can potentially improve the performance of main-memory caching of Web documents.
Book

Operating Systems Theory

TL;DR: As one of the part of book categories, operating systems theory always becomes the most wanted book.

Characteristics of WWW Client-based Traces

TL;DR: This paper presents a descriptive statistical summary of the traces of actual executions of NCSA Mosaic, and shows that many characteristics of WWW use can be modelled using power-law distributions, including the distribution of document sizes, the popularity of documents as a function of size, and the Distribution of user requests for documents.
Proceedings ArticleDOI

Characterizing reference locality in the WWW

TL;DR: The authors propose models for both temporal and spatial locality of reference in streams of requests arriving at Web servers and show that temporal locality can be characterized by the marginal distribution of the stack distance trace, and proposed models for typical distributions and compare their cache performance to the traces.
Journal ArticleDOI

Working Sets Past and Present

TL;DR: This paper outlines the argument why it is unlikely that anyone will find a cheaper nonlookahead memory policy that delivers significantly better performance and suggests that a working set dispatcher should be considered.
Related Papers (5)