Showing papers on "Cache algorithms published in 2001"

PDF

Open Access

Proceedings Article•DOI•

Cache decay: exploiting generational behavior to reduce cache leakage power

[...]

Stefanos Kaxiras¹, Zhigang Hu², Margaret Martonosi²•Institutions (2)

01 May 2001

TL;DR: This paper discusses policies and implementations for reducing cache leakage by invalidating and “turning off” cache lines when they hold data not likely to be reused, and proposes adaptive policies that effectively reduce LI cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.

...read moreread less

Abstract: Power dissipation is increasingly important in CPUs ranging from those intended for mobile use, all the way up to high-performance processors for high-end servers. While the bulk of the power dissipated is dynamic switching power, leakage power is also beginning to be a concern. Chipmakers expect that in future chip generations, leakage's proportion of total chip power will increase significantly.This paper examines methods for reducing leakage power within the cache memories of the CPU. Because caches comprise much of a CPU chip's area and transistor counts, they are reasonable targets for attacking leakage. We discuss policies and implementations for reducing cache leakage by invalidating and “turning off” cache lines when they hold data not likely to be reused. In particular, our approach is targeted at the generational nature of cache line usage. That is, cache lines typically have a flurry of frequent use when first brought into the cache, and then have a period of “dead time” before they are evicted. By devising effective, low-power ways of deducing dead time, our results show that in many cases we can reduce LI cache leakage energy by 4x in SPEC2000 applications without impacting performance. Because our decay-based techniques have notions of competitive on-line algorithms at their roots, their energy usage can be theoretically bounded at within a factor of two of the optimal oracle-based policy. We also examine adaptive decay-based policies that make energy-minimizing policy choices on a per-application basis by choosing appropriate decay intervals individually for each cache line. Our proposed adaptive policies effectively reduce LI cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.

...read moreread less

725 citations

Proceedings Article•

Weaving Relations for Cache Performance

[...]

Anastassia Ailamaki¹, David J. DeWitt², Mark D. Hill², Marios Skounakis²•Institutions (2)

Carnegie Mellon University¹, University of Wisconsin-Madison²

11 Sep 2001

TL;DR: This paper proposes a new data organization model called PAX (Partition Attributes Across), that significantly improves cache performance by grouping together all values of each attribute within each page, and demonstrates that in-page data placement is the key to high cache performance.

...read moreread less

Abstract: Relational database systems have traditionally optimzed for I/O performance and organized records sequentially on disk pages using the N-ary Storage Model (NSM) (a.k.a., slotted pages). Recent research, however, indicates that cache utilization and performance is becoming increasingly important on modern platforms. In this paper, we first demonstrate that in-page data placement is the key to high cache performance and that NSM exhibits low cache utilization on modern platforms. Next, we propose a new data organization model called PAX (Partition Attributes Across), that significantly improves cache performance by grouping together all values of each attribute within each page. Because PAX only affects layout inside the pages, it incurs no storage penalty and does not affect I/O behavior. According to our experimental results, when compared to NSM (a) PAX exhibits superior cache and memory bandwidth utilization, saving at least 75% of NSM’s stall time due to data cache accesses, (b) range selection queries and updates on memoryresident relations execute 17-25% faster, and (c) TPC-H queries involving I/O execute 11-48% faster.

...read moreread less

428 citations

Proceedings Article•DOI•

Reducing set-associative cache energy via way-prediction and selective direct-mapping

[...]

Michael D. Powell¹, Amit Agarwal¹, T. N. Vijaykumar¹, Babak Falsafi², Kaushik Roy¹ - Show less +1 more•Institutions (2)

Purdue University¹, Carnegie Mellon University²

01 Dec 2001

TL;DR: Two previously-proposed techniques, way-prediction and selective direct-mapping, are applied to reducing L1 cache dynamic energy while maintaining high performance, and caches achieve the energy-delay of sequential access while maintaining the performance of parallel access.

...read moreread less

Abstract: Set-associative caches achieve low miss rates for typical applications but result in significant energy dissipation. Set-associative caches minimize access time by probing all the data ways in parallel with the tag lookup, although the output of only the matching way is used. The energy spent accessing the other ways is wasted Eliminating the wasted energy by performing the data lookup sequentially following the tag lookup substantially increases cache access time, and is unacceptable for high-performance L1 caches. In this paper, we apply two previously-proposed techniques, way-prediction and selective direct-mapping, to reducing L1 cache dynamic energy while maintaining high performance. The techniques predict the matching way and probe only the predicted way and not all the ways, achieving energy savings. While these techniques were originally proposed to improve set-associative cache access times, this is the first paper to apply them to reducing cache energy. We evaluate the effectiveness of these techniques in reducing L1 d-cache, L1 i-cache, and overall processor energy. Using these techniques, our caches achieve the energy-delay of sequential access while maintaining the performance of parallel access. Relative to parallel access L1 i- and d-caches, the techniques achieve overall processor energy-delay reduction of 8%, while perfect way-prediction with no performance degradation achieves 10% reduction. The performance degradation of the techniques is less than 3%, compared to an aggressive,.1-cycle, 4-way, parallel access cache.

...read moreread less

310 citations

Patent•

Scalable architecture based on single-chip multiprocessing

[...]

Luiz Andre Barroso, Kourosh Gharachorloo, Andreas Nowatzyk

08 Jun 2001

TL;DR: The PIRANHA system as discussed by the authors is a scalable chip-multiprocessing system with scalable architecture, including on a single chip: a plurality of processor cores; a two-level cache hierarchy; an intra-chip switch; one or more memory controllers; a cache coherence protocol; and an interconnect subsystem.

...read moreread less

Abstract: A chip-multiprocessing system with scalable architecture, including on a single chip: a plurality of processor cores; a two-level cache hierarchy; an intra-chip switch; one or more memory controllers; a cache coherence protocol; one or more coherence protocol engines; and an interconnect subsystem The two-level cache hierarchy includes first level and second level caches In particular, the first level caches include a pair of instruction and data caches for, and private to, each processor core The second level cache has a relaxed inclusion property, the second-level cache being logically shared by the plurality of processor cores Each of the plurality of processor cores is capable of executing an instruction set of the ALPHA™ processing core The scalable architecture of the chip-multiprocessing system is targeted at parallel commercial workloads A showcase example of the chip-multiprocessing system, called the PIRANHA™ system, is a highly integrated processing node with eight simpler ALPHA™ processor cores A method for scalable chip-multiprocessing is also provided

...read moreread less

294 citations

Patent•

Apparatus and methods for providing coordinated and personalized application and data management for resource-limited mobile devices

[...]

Luosheng Peng

01 Feb 2001

TL;DR: In this article, the authors propose a method for managing information in a mobile device consisting of downloading a first set of files, determining whether a local cache has enough space to store the first set, storing the first sets of files into the local cache, selecting an out-dated record and removing a second set of records corresponding to the out-of-date record from the local caches if the local cached record does not have enough space.

...read moreread less

Abstract: An exemplary method for managing information in a mobile device comprises the steps of downloading a first set of files, determining whether a local cache has enough space to store the first set of files, storing the first set of files into the local cache if the local cache has enough space, selecting an out-dated record and removing a second set of files corresponding to the out-dated record from the local cache if the local cache does not have enough space, and repeating the determining step until the first set of files is stored into the local cache.

...read moreread less

235 citations

Patent•

System and method for using a mapping between client addresses and addresses of caches to support content delivery

[...]

J.J. Garcia-Luna-Aceves, Bradley R. Smith

26 Apr 2001

TL;DR: In this article, the authors present a selection procedure for information object repository selection procedures for determining which of a number of information object repositories should service a request for the information object, including a direct cache selection process, a redirect cache selection, a remote DNS cache, or a local DNS cache selection.

...read moreread less

Abstract: Various information object repository selection procedures for determining which of a number of information object repositories should service a request for the information object include a direct cache selection process, a redirect cache selection process, a remote DNS cache selection process, or a local DNS cache selection process. Different combinations of these procedures may also be used. For example different combination may be used depending on the type of content being requested. The direct cache selection process may be used for information objects that will be immediately loaded without user action, while any of the redirect cache selection process, the remote DNS cache selection process and/or the local DNS cache selection process may be used for information objects that will be loaded only after some user action.

...read moreread less

231 citations

Patent•

Multi-tier caching system

[...]

Lawrence Jacobs¹, Alan J. Demers¹, Norman C. Woo¹•Institutions (1)

Business International Corporation¹

13 Aug 2001

TL;DR: In this article, a content analysis engine determines which of the caches a data item should be stored in, based on an analysis of data requests or data items served in response to the requests, guidelines set by a system administrator, etc.

...read moreread less

Abstract: A multi-tier caching system and method of operating the same. The system comprises a first cache implemented in operating system or kernel space (e.g., in memory managed by or allocated to an operating system) and a second cache implemented in application or user space (e.g., in memory managed by or allocated to an application program). Data requests requiring little processing to identify responsive data may be served from the first cache, while those requiring further processing are served from the second. The first cache may therefore store frequently requested data items or items that can be served in response to requests having different forms, qualifiers or other indicia. A content analysis engine determines which of the caches a data item should be stored in, based on an analysis of data requests or data items served in response to the requests, guidelines set by a system administrator, etc.

...read moreread less

220 citations

Proceedings Article•DOI•

Adaptive precision setting for cached approximate values

[...]

Christopher Olston¹, Boon Thau Loo¹, Jennifer Widom¹•Institutions (1)

Stanford University¹

01 May 2001

TL;DR: A parameterized algorithm for adjusting the precision of cached approximations adaptively to achieve the best performance as data values, precision requirements, or workload vary, which easily outperforms previous algorithms for exact caching.

...read moreread less

Abstract: Caching approximate values instead of exact values presents an opportunity for performance gains in exchange for decreased precision. To maximize the performance improvement, cached approximations must be of appropriate precision: approximations that are too precise easily become invalid, requiring frequent refreshing, while overly imprecise approximations are likely to be useless to applications, which must then bypass the cache. We present a parameterized algorithm for adjusting the precision of cached approximations adaptively to achieve the best performance as data values, precision requirements, or workload vary. We consider interval approximations to numeric values but our ideas can be extended to other kinds of data and approximations. Our algorithm strictly generalizes previous adaptive caching algorithms for exact copies: we can set parameters to require that all approximations be exact, in which case our algorithm dynamically chooses whether or not to cache each data value.We have implemented our algorithm and tested it on synthetic and real-world data. A number of experimental results are reported, showing the effectiveness of our algorithm at maximizing performance, and also showing that in the special case of exact caching our algorithm performs as well as previous algorithms. In cases where bounded imprecision is acceptable, our algorithm easily outperforms previous algorithms for exact caching.

...read moreread less

219 citations

Proceedings Article•

Reuse Distance as a Metric for Cache Behavior.

[...]

Kristof Beyls¹, Erik H. D'Hollander¹•Institutions (1)

Ghent University¹

01 Jan 2001

TL;DR: The distribution of the conflict and capacity misses was measured in the execution of code generated by a state-of-the-art EPIC compiler and it is observed that some program transformations to enhance the parallelism may counter the optimizations to reduce the capacity misses.

...read moreread less

Abstract: The widening gap between memory and processor speed causes more and more programs to shift from CPUbounded to memory speed-bounded, even in the presence of multi-level caches. Powerful cache optimizations are needed to improve the cache behavior and increase the execution speed of these programs. Many optimizations have been proposed, and one can wonder what new optimizations should focus on. To answer this question, the distribution of the conflict and capacity misses was measured in the execution of code generated by a state-of-the-art EPIC compiler. The results show that cache conflict misses are reduced, but only a small fraction of the large number of capacity misses are eliminated. Furthermore, it is observed that some program transformations to enhance the parallelism may counter the optimizations to reduce the capacity misses. In order to minimize the capacity misses, the effect of program transformations and hardware solutions are explored and examples show that a directed approach can be very effective.

...read moreread less

200 citations

Patent•

Hardware and software co-simulation including simulating the cache of a target processor

[...]

Graham R. Hellestrand, King Yin Cheung, James R. Torossian, Ricky L. K. Chan, Ming Chi Kam, Foo Ngok Yong - Show less +2 more

24 Jan 2001

TL;DR: In this paper, a co-simulation design system that runs on a host computer system is described that includes a hardware simulator and a processor simulator coupled via an interface mechanism, and the analysis adds timing information to the user program so that the processor simulator provides accurate timing information whenever the simulator interacts with the hardware simulator.

...read moreread less

Abstract: A co-simulation design system that runs on a host computer system is described that includes a hardware simulator and a processor simulator coupled via an interface mechanism. The execution of a user program on a target processor that includes a cache is simulated by executing an analyzed version of the user program on the host computer system. The analysis adds timing information to the user program so that the processor simulator provides accurate timing information whenever the processor simulator interacts with the hardware simulator. The analysis also adds hooks to the user program such that executing the analyzed user program on the host computer system invokes a cache simulator that simulates operation of the cache.

...read moreread less

180 citations

Proceedings Article•DOI•

Exact analysis of the cache behavior of nested loops

[...]

Siddhartha Chatterjee¹, Erin Parker¹, Philip J. Hanlon², Alvin R. Lebeck³•Institutions (3)

University of North Carolina at Chapel Hill¹, University of Michigan², Duke University³

01 May 2001

TL;DR: An exact model of the behavior of loop nests executing in a memory hicrarchy is developed by using a nontraditional classification of misses that has the key property of composability, allowing the model to gain efficiency in counting cache misses by exploiting repetitive patterns of cache behavior.

...read moreread less

Abstract: We develop from first principles an exact model of the behavior of loop nests executing in a memory hicrarchy, by using a nontraditional classification of misses that has the key property of composability. We use Presburger formulas to express various kinds of misses as well as the state of the cache at the end of the loop nest. We use existing tools to simplify these formulas and to count cache misses. The model is powerful enough to handle imperfect loop nests and various flavors of non-linear array layouts based on bit interleaving of array indices. We also indicate how to handle modest levels of associativity, and how to perform limited symbolic analysis of cache behavior. The complexity of the formulas relates to the static structure of the loop nest rather than to its dynamic trip count, allowing our model to gain efficiency in counting cache misses by exploiting repetitive patterns of cache behavior. Validation against cache simulation confirms the exactness of our formulation. Our method can serve as the basis for a static performance predictor to guide program and data transformations to improve performance.

...read moreread less

Journal Article•DOI•

Speculative Versioning Cache

[...]

T. N. Vijaykumar¹, S. Gopal², James E. Smith³, Gurindar S. Sohi³•Institutions (3)

Purdue University¹, Alcatel-Lucent², University of Wisconsin-Madison³

01 Dec 2001-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The proposed Speculative Versioning Cache uses distributed caches to eliminate the latency and bandwidth problems of the ARB and conceptually unifies cache coherence and speculative versioning by using an organization similar to snooping bus-based coherent caches.

...read moreread less

Abstract: Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous memory dependences can be overcome by memory dependence speculation which enables a load or store to be speculatively executed before the addresses of all preceding loads and stores are known. Furthermore, multiple speculative stores to a memory location create multiple speculative versions of the location. Program order among the speculative versions must be tracked to maintain sequential semantics. A previously proposed approach, the Address Resolution Buffer (ARB) uses a centralized buffer to support speculative versions. Our proposal, called the Speculative Versioning Cache (SVC), uses distributed caches to eliminate the latency and bandwidth problems of the ARB. The SVC conceptually unifies cache coherence and speculative versioning by using an organization similar to snooping bus-based coherent caches. Our evaluation for the Multiscalar architecture shows that hit latency is an important factor affecting performance and private cache solutions trade-off hit rate for hit latency.

...read moreread less

Patent•

Methods and apparatus for populating a network cache

[...]

Marvin Wexler¹, Sunil Gaitonde¹•Institutions (1)

Cisco Systems, Inc.¹

09 Jul 2001

TL;DR: In this article, a router associated with the cache is enabled to compile flow data relating to object traffic, and the flow data are analyzed to determine a first plurality of frequently requested objects.

...read moreread less

Abstract: Methods and apparatus for populating a network cache are described. A router associated with the cache is enabled to compile flow data relating to object traffic. The flow data are analyzed to determine a first plurality of frequently requested objects. The network cache is populated with the first plurality of frequently requested objects. Subsequent to populating the network cache, the network cache is operated in conjunction with the router to cache a second plurality of requested objects.

...read moreread less

Patent•

Method and apparatus for freeing memory from an extensible markup language document object model tree active in an application cache

[...]

Krishnendu Charkraborty¹, Jayashri Visvanathan¹•Institutions (1)

Sun Microsystems¹

01 Mar 2001

TL;DR: In this article, a garbage collector that uses an LRU algorithm to free memory from an XML DOM tree active in an application cache is described. But it is not shown how to remove the nodes from the DOM tree.

...read moreread less

Abstract: The present invention relates to a garbage collector that uses an LRU algorithm to free memory from an XML DOM tree active in an application cache. According to one or more embodiments of the present invention, a threshold for the amount of memory permitted to reside in an application cache is set. Then, a garbage collector removes entries from the cache until it falls below the threshold. In one or more embodiments, a node table is used. When nodes are added to the XML DOM tree in the application cache the node table is updated. When the threshold for the amount of memory permitted to reside in the application cache is exceeded, the garbage collector applies an LRU algorithm uses the node table to determine which nodes to remove from the application cache. In one embodiment, the LRU algorithm scans the node table to determine the least recently used node in the table by examining time stamp entries in the table. Then, the algorithm removes that node and repeats the process until the XML DOM tree uses less memory in the cache than the threshold.

...read moreread less

Book Chapter•DOI•

Semantic Caching in Location-Dependent Query Processing

[...]

Baihua Zheng¹, Dik Lun Lee¹•Institutions (1)

Hong Kong University of Science and Technology¹

12 Jul 2001

TL;DR: A Voronoi Diagram is constructed on the data objects to serve as an index for them and a semantic caching scheme is developed that records a cached item as well as its valid range.

...read moreread less

Abstract: A method is presented in this paper for answering location-dependent queries in a mobile computing environment. We investigate a common scenario where data objects (e.g., restaurants and gas stations) are stationary while clients that issue queries about the data objects are mobile. Our proposed technique constructs a Voronoi Diagram (VD) on the data objects to serve as an index for them. A VD defines, for each data object d, the region within which d is the nearest point to any mobile client within that region. As such, the VD can be used to answer nearest-neighbor queries directly. Furthermore, the area within which the answer is valid can be computed. Based on the VD, we develop a semantic caching scheme that records a cached item as well as its valid range. A simulation is conducted to study the performance of the proposed semantic cache in comparison with the traditional cache and the baseline case where no cache is used. We show that the semantic cache has a much better performance than the other two methods.

...read moreread less

Proceedings Article•DOI•

Analytical cache models with applications to cache partitioning

[...]

G. Edward Suh¹, Srinivas Devadas¹, Larry Rudolph¹•Institutions (1)

Massachusetts Institute of Technology¹

17 Jun 2001

TL;DR: In this paper, an analytical cache model for time-shared systems is presented, which estimates the overall cache miss-rate of a multiprocessing system with any cache size and time quanta.

...read moreread less

Abstract: An accurate, tractable, analytic cache model for time-shared systems is presented, which estimates the overall cache miss-rate of a multiprocessing system with any cache size and time quanta. The input to the model consists of the isolated miss-rate curves for each process, the time quanta for each of the executing processes, and the total cache size. The output is the overall miss-rate. Trace-driven simulations demonstrate that the estimated miss-rate is very accurate. Since the model provides a fast and accurate way to estimate the effect of context switching, it is useful for both understanding the effect of context switching on caches and optimizing cache performance for time-shared systems. A cache partitioning mechanism is also presented and is shown to improve the cache miss-rate up to 25% over the normal LRU replacement policy.

...read moreread less

Journal Article•DOI•

A strategy to manage cache consistency in a disconnected distributed environment

[...]

Sandeep K. S. Gupta¹, Pradip K. Srimani²•Institutions (2)

Cisco Systems, Inc.¹, Telcordia Technologies²

01 Jul 2001-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A new cache maintenance scheme, called AS, to minimize the overhead for mobile hosts to validate their cache upon reconnection, to allow stateless servers, and to minimizing the bandwidth requirement, is presented.

...read moreread less

Abstract: Modern distributed systems involving large number of nonstationary clients (mobile hosts, MH) connected via unreliable low-bandwidth communication channels are very prone to frequent disconnections. This disconnection may occur because of different reasons: The clients may voluntarily switch off (to save battery power), or a client may be involuntarily disconnected due to its own movement in a mobile network (hand-off, wireless link failures, etc.). A mobile computing environment is characterized by slow wireless links and relatively underprivileged hosts with limited battery powers. Still, when data at the server changes, the client hosts must be made aware of this fact in order for them to invalidate their cache, otherwise the host would continue to answer queries with the cached values returning incorrect data. The nature of the physical medium coupled with the fact that disconnections from the network are very frequent in mobile computing environments demand a cache invalidation strategy with minimum possible overheads. In this paper, we present a new cache maintenance scheme, called AS. The objective of the proposed scheme is to minimize the overhead for the MHs to validate their cache upon reconnection, to allow stateless servers, and to minimize the bandwidth requirement. The general approach is (1) to use asynchronous invalidation messages and (2) to buffer invalidation messages from servers at the MH's Home Location Cache (HLC) while the MH is disconnected from the network and redeliver these invalidation messages to the MH when it gets reconnected to the network. Use of asynchronous invalidation messages minimizes access latency, buffering of invalidation messages minimizes the overhead of validating MH's cache after each disconnection and use of HLC off-loads the overhead of maintaining state of MH's cache from the servers. The MH can be disconnected from the server either voluntarily or involuntarily. We capture the effects of both by using a single parameter: The percentage of time a mobile host is disconnected from the network. We demonstrate the efficacy of our scheme through simulation and performance modeling. In particular, we show that the average data access latency and the number of uplink requests by a MH decrease by using the proposed strategy at the cost of using buffer space at the HLC. We provide analytical comparison between our proposed scheme and the existing scheme for cache management in a mobile environment. Extensive experimental results are provided to compare the schemes in terms of performance metrics like latency, number of uplink requests, etc., under both a high and a low rate of change of data at servers for various values of the parameters. A mathematical model for the scheme is developed which matches closely with the simulation results.

...read moreread less

Patent•

Method and system for fragment linking and fragment caching

[...]

Rajesh S. Agarwalla¹, James R. H. Challenger¹, George Copeland¹, Arun Iyengar¹, Mark H. Linehan¹, Subbarao Meduri¹ - Show less +2 more•Institutions (1)

IBM¹

19 Dec 2001

TL;DR: In this paper, a method, a system, an apparatus, and a computer program product are presented for fragment caching, where a message is received at a computing device that contains a cache management unit, a fragment in the message body of the message is cached.

...read moreread less

Abstract: A method, a system, an apparatus, and a computer program product are presented for fragment caching. After a message is received at a computing device that contains a cache management unit, a fragment in the message body of the message is cached. Subsequent requests for the fragment at the cache management unit result in a cache hit. The cache management unit operates equivalently in support of fragment caching operations without regard to whether the computing device acts as a client, a server, or a hub located throughout the network; in other words, the fragment caching technique is uniform throughout a network. Cache ID rules accompany a fragment from an origin server; the cache ID rules describe a method for forming a unique cache ID for the fragment such that dynamic content can be cached away from an origin server.

...read moreread less

Proceedings Article•DOI•

Analysis and design of hierarchical Web caching systems

[...]

Hao Che¹, Z. Wang¹, Ye Tung¹•Institutions (1)

Pennsylvania State University¹

22 Apr 2001

TL;DR: An analytical modeling technique is developed to characterize an uncooperative two-level hierarchical caching system where the least recently used (LRU) algorithm is locally run at each cache, and a cooperative hierarchical Web caching architecture is proposed based on these principles.

...read moreread less

Abstract: This paper aims at finding fundamental design principles for hierarchical Web caching. An analytical modeling technique is developed to characterize an uncooperative two-level hierarchical caching system where the least recently used (LRU) algorithm is locally run at each cache. With this modeling technique, we are able to identify a characteristic time for each cache, which plays a fundamental role in understanding the caching processes. In particular, a cache can be viewed roughly as a lowpass filter with its cutoff frequency equal to the inverse of the characteristic time. Documents with access frequencies lower than this cutoff frequency will have good chances to pass through the cache without cache hits. This viewpoint enables us to take any branch of the cache tree as a tandem of lowpass filters at different cutoff frequencies, which further results in the finding of two fundamental design principles. Finally, to demonstrate how to use the principles to guide the caching algorithm design, we propose a cooperative hierarchical Web caching architecture based on these principles. The simulation study shows that the proposed cooperative architecture results in 50% saving of the cache resource compared with the traditional uncooperative hierarchical caching architecture.

...read moreread less

Patent•

Method and system for exclusive two-level caching in a chip-multiprocessor

[...]

Luiz Andre Barroso, Kourosh Gharachorloo, Andreas Nowatzyk

08 Jun 2001

TL;DR: In this article, a method and system for exclusive two-level caching in a chip-multiprocessor is presented to maximize the effective use of on-chip cache.

...read moreread less

Abstract: To maximize the effective use of on-chip cache, a method and system for exclusive two-level caching in a chip-multiprocessor are provided. The exclusive two-level caching in accordance with the present invention involves method relaxing the inclusion requirement in a two-level cache system in order to form an exclusive cache hierarchy. Additionally, the exclusive two-level caching involves providing a first-level tag-state structure in a first-level cache of the two-level cache system. The first tag-state structure has state information. The exclusive two-level caching also involves maintaining in a second-level cache of the two-level cache system a duplicate of the first-level tag-state structure and extending the state information in the duplicate of the first tag-state structure, but not in the first-level tag-state structure itself, to include an owner indication. The exclusive two-level caching further involves providing in the second-level cache a second tag-state structure so that a simultaneous lookup at the duplicate of the first tag-state structure and the second tag-state structure is possible. Moreover, the exclusive two-level caching involves associating a single owner with a cache line at any given time of its lifetime in the chip-multiprocessor.

...read moreread less

Patent•

System and method for network caching

[...]

Mark Vange, Marc Plumb, Marco Clementoni

16 Apr 2001

TL;DR: In this article, the authors propose a system and method for caching network resources in an intermediary server topologically located between a client and a server in a network, where the intermediate server includes a cache and methods for loading content into the cache as according to rules specified by a site owner.

...read moreread less

Abstract: A system and method for caching network resources in an intermediary server topologically located between a client and a server in a network. The intermediate server preferably caches at both a back-end location and a front-end location. Intermediary server includes a cache and methods for loading content into the cache as according to rules specified by a site owner. Optionally, content can be proactively loaded into the cache to include content not yet requested. In another option, requests can be held at the cache when a prior request for similar content is pending.

...read moreread less

Patent•

Method and apparatus for broadcast delivery of content to a client-side cache based on user preferences

[...]

James P. Janniello¹, Christopher Ward¹•Institutions (1)

IBM¹

14 Aug 2001

TL;DR: In this paper, a method and apparatus for the selection of digital content for broadcast delivery to multiple users is described, where each user filters the received content for storage in a client-side cache based on user preferences.

...read moreread less

Abstract: A method and apparatus are disclosed for the selection of digital content for broadcast delivery to multiple users. A broadcast edge cache server selects content for broadcast distribution to multiple users. Each user filters the received content for storage in a client-side cache based on user preferences. Each client computer includes a local cache that records material that has been accessed by the user and a broadcast cache that records material that is predicted to be of interest to the user, in accordance with the present invention. Each client computer is connected to the network environment by a relatively high bandwidth uni-directional broadcast channel, and a second bi-directional channel, such as a lower bandwidth channel. A client initially determines if requested content is available local in a client cache or a broadcast cache before requesting the content over the network from an edge server or the content provider (such as a web site) on a lower bandwidth channel.

...read moreread less

Patent•

System boot time reduction method

[...]

Richard L. Coulson¹, John I. Garney, Jeanna Matthews, Robert J. Royer•Institutions (1)

Intel¹

27 Jun 2001

TL;DR: In this article, a system and method to reduce the time for system initializations is described, where data accessed during a system initialization is loaded into a non-volatile cache and is pinned to prevent eviction.

...read moreread less

Abstract: A system and method to reduce the time for system initializations is disclosed. In accordance with the invention, data accessed during a system initialization is loaded into a non-volatile cache and is pinned to prevent eviction. By pinning data into the cache, the data required for system initialization is pre-loaded into the cache on a system reboot, thereby eliminating the need to access a disk.

...read moreread less

Patent•

Distributed multicast caching technique

[...]

Julian Satran¹, Gidon Gershinsky¹•Institutions (1)

IBM¹

26 Jan 2001

TL;DR: In this paper, the first cache forms the root of a multilevel hierarchical tree and transmits the group directory to a plurality of subsidiary caches, and the subsidiary caches may reorganize the group directories and relay it to a lower level of subsidiary cache.

...read moreread less

Abstract: A caching arrangement for the content of multicast transmission across a data network utilizes a first cache which receives content from one or more content providers. Using the REMADE protocol, the first cache constructs a group directory. The first cache forms the root of a multilevel hierarchical tree. In accordance with configuration parameters, the first cache transmits the group directory to a plurality of subsidiary caches. The subsidiary caches may reorganize the group directory, and relay it to a lower level of subsidiary caches. The process is recursive, until a multicast group of end-user clients is reached. Requests for content by the end-user clients are received by the lowest level cache, and forwarded as necessary to higher levels in the hierarchy. The content is then returned to the requesters. Various levels of caches retain the group directory and content according to configuration options, which can be adaptive to changing conditions such as demand, loading, and the like. The behavior of the caches may optionally be modified by the policies of the content providers.

...read moreread less

Patent•

Distributed evalulation of directory queries using a topology cache

[...]

Sihem Amer-Yahia¹, Divesh Srivastava¹, Dan Suciu¹•Institutions (1)

AT&T¹

29 Aug 2001

TL;DR: In this paper, a technique for performing query evaluation on distributed directories utilizes the creation of a "topology cache" defining the hierarchical relationship between the various directory servers (i.e., identifying "subordinate" and "superior" knowledge references associated with each directory server in the system).

...read moreread less

Abstract: A technique for performing query evaluation on distributed directories utilizes the creation of a “topology cache” defining the hierarchical relationship between the various directory servers (i.e., identifying “subordinate” and “superior” knowledge references associated with each directory server in the system). The created topology cache is then stored at each directory server, and forwarded to a client upon submitting a query to the system. Using the topology cache information at the client, a distributed query evaluation plan can be developed for use with complex queries, such as hierarchical and aggregate queries.

...read moreread less

Patent•

Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy

[...]

Jr. Simon C. Steely

25 Jan 2001

TL;DR: In this paper, a system for adaptive bypassing one or more higher cache levels following a miss in a lower level of a cache hierarchy is described, where each cache level preferably includes a tag store containing address and state information for each cache line resident in the respective cache.

...read moreread less

Abstract: A system for adaptively bypassing one or more higher cache levels following a miss in a lower level of a cache hierarchy is described. Each cache level preferably includes a tag store containing address and state information for each cache line resident in the respective cache. When an invalidate request is received at a given cache hierarchy, each cache level is searched for the address specified by the invalidate request. When an address match is detected, the state of the respective cache line is changed to the invalid state, although the address of the cache line is left in the tag store. Thereafter, if the processor or entity associated with this cache hierarchy issues its own request for this same cache line, the cache hierarchy begins searching the tag store of each level starting with the lowest cache level. Since the address of the invalidated cache line was left in the respective tag store, a match will be detected at one of the cache levels, although the corresponding state of this cache line is invalid. This condition is specifically detected and is considered to be an “inval_miss” occurrence. In response, to an inval_miss, the cache hierarchy calls off searching any higher levels, and instead, issues a memory reference request for the desired cache line. In a further embodiment, the entity that sourced an invalidate request is stored, and a subsequent memory reference request for the same cache line is sent directly to the source entity.

...read moreread less

Patent•

Cache system and method for generating uncached objects from cached and stored object components

[...]

Ron Abraham Gut, Alexis Paul Tzannes, Edmund Campion Reiter

08 Aug 2001

TL;DR: In this paper, the cache system determines that an object, such as an image file, is missing from the cache memory, locates sufficient components from cache memory and/or external storage, and constructs the object from the located components.

...read moreread less

Abstract: Methods and apparatus for constructing objects within a cache system thereby allowing the cache system to respond to requested objects that are not initially available within the cache system. One embodiment of the invention caches image files, where the images are divided into components and stored in a format that allows identification and access to the components. The cache system determines that an object, such as an image file, is missing from the cache memory, locates sufficient components from the cache memory and/or external storage, and constructs the object from the located components.

...read moreread less

Journal Article•DOI•

An evaluation of cache invalidation strategies in wireless environments

[...]

Kian-Lee Tian¹, Jun Cai¹, Beng Chin Ooi¹•Institutions (1)

National University of Singapore¹

01 Aug 2001-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This study shows that the two proposed schemes are not only effective in salvaging the cache content but consume significantly less energy than their counterparts.

...read moreread less

Abstract: Caching can reduce the bandwidth requirement in a wireless computing environment as well as minimize the energy consumption of wireless portable computers. To facilitate mobile clients in ascertaining the validity of their cache content, servers periodically broadcast cache invalidation reports that contain information of data that has been updated. However, as mobile clients may operate in a doze or even totally disconnected mode (to conserve energy), it is possible that some reports may be missed and the clients are forced to discard the entire cache content. In this paper, we reexamine the issue of designing cache invalidation strategies. We identify the basic issues in designing cache invalidation strategies. From the solutions to these issues, a large set of cache invalidation schemes can be constructed. We evaluate the performance of four representative algorithms-two of which are known algorithms (i.e., Dual-Report Cache Invalidation and Bit-Sequences) while the other two are their counterparts that exploit selective tuning (namely, Selective Dual-Report Cache Invalidation and Bit-Sequences with Bit Count). Our study shows that the two proposed schemes are not only effective in salvaging the cache content but consume significantly less energy than their counterparts. While the Selective Dual-Report Cache Invalidation scheme performs best in most cases, it is inferior to the Bit-Sequences with the Bit-Count scheme under high update rates.

...read moreread less

Patent•

System and method for partitioning address space in a proxy cache server cluster

[...]

Robert Drew Major, Stephen R. Carter, Howard Davis, Brent Ray Christensen

07 Jun 2001

TL;DR: In this article, a proxy partition cache (PPC) architecture and a technique for address-partitioning a proxy cache consisting of a grouping of discrete, cooperating caches (servers) is provided.

...read moreread less

Abstract: A proxy partition cache (PPC) architecture and a technique for address-partitioning a proxy cache consisting of a grouping of discrete, cooperating caches (servers) is provided. Client requests for objects (files) of a given size are redirected or reassigned to a single cache in the grouping, notwithstanding the cache to which the request is made by the load-balancing mechanism (such as a Layer 4 switch) based upon load-balancing considerations. The file is then returned to the switch via the switch-designated cache for vending to the requesting client. The redirection/reassignment occurs according to a function within the cache to which the request is directed so that the switch remains freed from additional tasks that can compromise speed.

...read moreread less

Patent•

Computer performance improvement by adjusting a time used for preemptive eviction of cache entries

[...]

Blaine D. Gaither¹, Benjamin D. Osecky¹•Institutions (1)

Hewlett-Packard¹

31 Oct 2001

TL;DR: In this article, a cache memory system can determine that an entry is stale if the entry has not been accessed or modified for a predetermined time, and the predetermined time is made dynamically variable.

...read moreread less

Abstract: A cache memory system can determine that an entry is stale if the entry has not been accessed or modified for a predetermined time. If an entry is stale, the entry may be preemptively evicted. The predetermined time is made dynamically variable. A computer system can adjust the time to optimize a measure of performance. In a specific example, evicted lines are temporarily stored in an eviction queue. The time is adjusted to be as short as possible without substantially increasing the number of lines that must be recalled from the eviction queue.

...read moreread less

Collapse