Class-based cache management for dynamic Web content
22 Apr 2001-Vol. 3, pp 1215-1224
TL;DR: The experimental results show that the proposed techniques are effective in supporting coarse-grain cache management and reducing server response times for tested applications.
Abstract: Caching dynamic pages at a server site is beneficial in reducing server resource demands and it also helps dynamic page caching at proxy sites. Previous work has used fine-grain dependence graphs among individual dynamic pages and underlying data sets to enforce result consistency. This paper proposes a complementary solution for applications that require coarse-grain cache management. The key idea is to partition dynamic pages into classes based on URL patterns so that an application can specify page identification and data dependence, and invoke invalidation for a class of dynamic pages. To make this scheme time-efficient with small space requirement, lazy invalidation is used to minimize slow disk accesses when IDs of dynamic pages are stored in memory with a digest format. Selective precomputing is further proposed to refresh stale pages and smoothen load peaks. A data structure is developed for efficient URL class searching during lazy or eager invalidation. This paper also presents design and implementation of a caching system called Cachuma which integrates the above techniques, runs in tandem with standard Web servers, and allows Web sites to add dynamic page caching capability with minimal changes. The experimental results show that the proposed techniques are effective in supporting coarse-grain cache management and reducing server response times for tested applications.
Citations
More filters
••
IBM1
TL;DR: This article classifies and describes main mechanisms to split the traffic load among the server nodes, discussing both the alternative architectures and the load sharing policies.
Abstract: The overall increase in traffic on the World Wide Web is augmenting user-perceived response times from popular Web sites, especially in conjunction with special events. System platforms that do not replicate information content cannot provide the needed scalability to handle large traffic volumes and to match rapid and dramatic changes in the number of clients. The need to improve the performance of Web-based services has produced a variety of novel content delivery architectures. This article will focus on Web system architectures that consist of multiple server nodes distributed on a local area, with one or more mechanisms to spread client requests among the nodes. After years of continual proposals of new system solutions, routing mechanisms, and policies (the first dated back to 1994 when the NCSA Web site had to face the first million of requests per day), many problems concerning multiple server architectures for Web sites have been solved. Other issues remain to be addressed, especially at the network application layer, but the main techniques and methodologies for building scalable Web content delivery architectures placed in a single location are settled now. This article classifies and describes main mechanisms to split the traffic load among the server nodes, discussing both the alternative architectures and the load sharing policies. To this purpose, it focuses on architectures, internal routing mechanisms, and dispatching request algorithms for designing and implementing scalable Web-server systems under the control of one content provider. It identifies also some of the open research issues associated with the use of distributed systems for highly accessed Web sites.
525 citations
••
21 Jul 2002TL;DR: This paper proposes and evaluates decentralized web caching algorithms for Squirrel, and discovers that it exhibits performance comparable to a centralized web cache in terms of hit ratio, bandwidth usage and latency.
Abstract: This paper presents a decentralized, peer-to-peer web cache called Squirrel. The key idea is to enable web browsers on desktop machines to share their local caches, to form an efficient and scalable web cache, without the need for dedicated hardware and the associated administrative cost. We propose and evaluate decentralized web caching algorithms for Squirrel, and discover that it exhibits performance comparable to a centralized web cache in terms of hit ratio, bandwidth usage and latency. It also achieves the benefits of decentralization, such as being scalable, self-organizing and resilient to node failures, while imposing low overhead on the participating nodes.
429 citations
Cites methods from "Class-based cache management for dy..."
...the study[13] has provided several methods to cache dynamic web contents....
[...]
•
31 May 2007TL;DR: In this article, the authors present methods and systems for configuring and managing class-based condensation, which includes an automated mechanism to receive a user request for a document, retrieve the document from a content server, and compare the document to a base file of a document class.
Abstract: The present application discloses methods and systems for configuring and managing class-based condensation. One aspect thereof includes an automated mechanism to receive a user request for a document, retrieve the document from a content server, and compare the document to a base file of a document class. If a delta-difference between the requested document and the base file is less than a predetermined threshold value, create a condensed document by abbreviating redundancy in the requested document relative to the base file of the associated document class, associate the document with the document class, and transmit the condensed document to the user. The documents within a document class possess similar layouts.
420 citations
••
TL;DR: This survey aims to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks.
Abstract: Growing main memory capacity has fueled the development of in-memory big data management and processing. By eliminating disk I/O bottleneck, it is now possible to support interactive data analytics. However, in-memory systems are much more sensitive to other sources of overhead that do not matter in traditional I/O-bounded disk-based systems. Some issues such as fault-tolerance and consistency are also more challenging to handle in in-memory environment. We are witnessing a revolution in the design of database systems that exploits main memory as its data storage layer. Many of these researches have focused along several dimensions: modern CPU and memory hierarchy utilization, time/space efficiency, parallelism, and concurrency control. In this survey, we aim to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks. We also give a comprehensive presentation of important technology in memory management, and some key factors that need to be considered in order to achieve efficient in-memory data management and processing.
391 citations
Cites methods from "Class-based cache management for dy..."
...Full-page caching [269], [270], [271] was adopted in...
[...]
•
11 Jun 2003
TL;DR: In this article, techniques for anticipating a user's request for documents or other content from a server (typically via a URL), precomputing the anticipated content, and caching the precomputed information at a cache in proximity to the content server are disclosed.
Abstract: Techniques are disclosed for anticipating a user's request for documents or other content from a server (typically via a URL), precomputing the anticipated content, and caching the precomputed information at a cache in proximity to the content server. The cache stores the response to the anticipated request, until the user requests the same content. The anticipated requests can be precomputed based on triggers reflecting users' historical access patterns.
180 citations
References
More filters
01 Jan 2001
TL;DR: Here the authors haven’t even started the project yet, and already they’re forced to answer many questions: what will this thing be named, what directory will it be in, what type of module is it, how should it be compiled, and so on.
Abstract: Writers face the blank page, painters face the empty canvas, and programmers face the empty editor buffer. Perhaps it’s not literally empty—an IDE may want us to specify a few things first. Here we haven’t even started the project yet, and already we’re forced to answer many questions: what will this thing be named, what directory will it be in, what type of module is it, how should it be compiled, and so on.
6,547 citations
"Class-based cache management for dy..." refers background in this paper
...trie [19] is a string tree in which common pre xes are represented by tree branches, and the unique su xes are stored at the leaves....
[...]
••
21 Mar 1999TL;DR: This paper investigates the page request distribution seen by Web proxy caches using traces from a variety of sources and considers a simple model where the Web accesses are independent and the reference probability of the documents follows a Zipf-like distribution, suggesting that the various observed properties of hit-ratios and temporal locality are indeed inherent to Web accesse observed by proxies.
Abstract: This paper addresses two unresolved issues about Web caching. The first issue is whether Web requests from a fixed user community are distributed according to Zipf's (1929) law. The second issue relates to a number of studies on the characteristics of Web proxy traces, which have shown that the hit-ratios and temporal locality of the traces exhibit certain asymptotic properties that are uniform across the different sets of the traces. In particular, the question is whether these properties are inherent to Web accesses or whether they are simply an artifact of the traces. An answer to these unresolved issues will facilitate both Web cache resource planning and cache hierarchy design. We show that the answers to the two questions are related. We first investigate the page request distribution seen by Web proxy caches using traces from a variety of sources. We find that the distribution does not follow Zipf's law precisely, but instead follows a Zipf-like distribution with the exponent varying from trace to trace. Furthermore, we find that there is only (i) a weak correlation between the access frequency of a Web page and its size and (ii) a weak correlation between access frequency and its rate of change. We then consider a simple model where the Web accesses are independent and the reference probability of the documents follows a Zipf-like distribution. We find that the model yields asymptotic behaviour that are consistent with the experimental observations, suggesting that the various observed properties of hit-ratios and temporal locality are indeed inherent to Web accesses observed by proxies. Finally, we revisit Web cache replacement algorithms and show that the algorithm that is suggested by this simple model performs best on real trace data. The results indicate that while page requests do indeed reveal short-term correlations and other structures, a simple model for an independent request stream following a Zipf-like distribution is sufficient to capture certain asymptotic properties observed at Web proxies.
3,582 citations
"Class-based cache management for dy..." refers background in this paper
...0, a distribution which has been observed in many human behaviors and Internet related scenarios [16], [17]....
[...]
•
01 Apr 1992
TL;DR: This document describes the MD5 message-digest algorithm, which takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input.
Abstract: This document describes the MD5 message-digest algorithm. The
algorithm takes as input a message of arbitrary length and produces as
output a 128-bit "fingerprint" or "message digest" of the input. This
memo provides information for the Internet community. It does not
specify an Internet standard.
3,514 citations
"Class-based cache management for dy..." refers methods in this paper
...Squid uses the 16-byte MD5 digests [ 10 ] computed from page URLs as page IDs and the space saving by using MD5 could be tremendous because the length of a URL can be several hundred bytes....
[...]