Showing papers on "Cache algorithms published in 2002"

PDF

Open Access

Proceedings Article•DOI•

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

[...]

Changkyu Kim¹, Doug Burger¹, Stephen W. Keckler¹•Institutions (1)

01 Oct 2002

TL;DR: This paper proposes physical designs for these Non-Uniform Cache Architectures (NUCAs) and extends these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache.

...read moreread less

Abstract: Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a line's physical location within the cache. Consequently, cache access times will become a continuum of latencies rather than a single discrete latency. This non-uniformity can be exploited to provide faster access to cache lines in the portions of the cache that reside closer to the processor. In this paper, we evaluate a series of cache designs that provides fast hits to multi-megabyte cache memories. We first propose physical designs for these Non-Uniform Cache Architectures (NUCAs). We extend these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache. We show that, for multi-megabyte level-two caches, an adaptive, dynamic NUCA design achieves 1.5 times the IPC of a Uniform Cache Architecture of any size, outperforms the best static NUCA scheme by 11%, outperforms the best three-level hierarchy--while using less silicon area--by 13%, and comes within 13% of an ideal minimal hit latency solution.

...read moreread less

799 citations

Journal Article•DOI•

Hierarchical Web caching systems: modeling, design and experimental results

[...]

Hao Che, Ye Tung¹, Zhijun Wang²•Institutions (2)

University of South Alabama¹, University of Texas at Arlington²

07 Nov 2002-IEEE Journal on Selected Areas in Communications

TL;DR: Both model-based and real trace simulation studies show that the proposed cooperative architecture results in more than 50% memory saving and substantial central processing unit (CPU) power saving for the management and update of cache entries compared with the traditional uncooperative hierarchical caching architecture.

...read moreread less

Abstract: This paper aims at finding fundamental design principles for hierarchical Web caching. An analytical modeling technique is developed to characterize an uncooperative two-level hierarchical caching system where the least recently used (LRU) algorithm is locally run at each cache. With this modeling technique, we are able to identify a characteristic time for each cache, which plays a fundamental role in understanding the caching processes. In particular, a cache can be viewed roughly as a low-pass filter with its cutoff frequency equal to the inverse of the characteristic time. Documents with access frequencies lower than this cutoff frequency have good chances to pass through the cache without cache hits. This viewpoint enables us to take any branch of the cache tree as a tandem of low-pass filters at different cutoff frequencies, which further results in the finding of two fundamental design principles. Finally, to demonstrate how to use the principles to guide the caching algorithm design, we propose a cooperative hierarchical Web caching architecture based on these principles. Both model-based and real trace simulation studies show that the proposed cooperative architecture results in more than 50% memory saving and substantial central processing unit (CPU) power saving for the management and update of cache entries compared with the traditional uncooperative hierarchical caching architecture.

...read moreread less

512 citations

Proceedings Article•DOI•

Squirrel: a decentralized peer-to-peer web cache

[...]

Sitaram Iyer¹, Antony Rowstron², Peter Druschel¹•Institutions (2)

Rice University¹, Microsoft²

21 Jul 2002

TL;DR: This paper proposes and evaluates decentralized web caching algorithms for Squirrel, and discovers that it exhibits performance comparable to a centralized web cache in terms of hit ratio, bandwidth usage and latency.

...read moreread less

Abstract: This paper presents a decentralized, peer-to-peer web cache called Squirrel. The key idea is to enable web browsers on desktop machines to share their local caches, to form an efficient and scalable web cache, without the need for dedicated hardware and the associated administrative cost. We propose and evaluate decentralized web caching algorithms for Squirrel, and discover that it exhibits performance comparable to a centralized web cache in terms of hit ratio, bandwidth usage and latency. It also achieves the benefits of decentralization, such as being scalable, self-organizing and resilient to node failures, while imposing low overhead on the participating nodes.

...read moreread less

429 citations

Proceedings Article•DOI•

A new memory monitoring scheme for memory-aware scheduling and partitioning

[...]

G.E. Suh¹, Srinivas Devadas¹, Larry Rudolph¹•Institutions (1)

Massachusetts Institute of Technology¹

02 Feb 2002

TL;DR: A scheme that enables an accurate estimate of the isolated miss-rates of each process as a function of cache size under the standard LRU replacement policy is described, which can be used to schedule jobs or to partition the cache to minimize the overall miss-rate.

...read moreread less

Abstract: We propose a low overhead, online memory monitoring scheme utilizing a set of novel hardware counters. The counters indicate the marginal gain in cache hits as the size of the cache is increased, which gives the cache miss-rate as a function of cache size. Using the counters, we describe a scheme that enables an accurate estimate of the isolated miss-rates of each process as a function of cache size under the standard LRU replacement policy. This information can be used to schedule jobs or to partition the cache to minimize the overall miss-rate. The data collected by the monitors can also be used by an analytical model of cache and memory behavior to produce a more accurate overall miss-rate for the collection of processes sharing a cache in both time and space. This overall miss-rate can be used to improve scheduling and partitioning schemes.

...read moreread less

325 citations

Report•DOI•

My Cache or Yours? Making Storage More Exclusive

[...]

Theodore M. Wong¹, John Wilkes²•Institutions (2)

Carnegie Mellon University¹, Hewlett-Packard²

10 Jun 2002

TL;DR: In this article, the authors explore the benefits of a simple scheme to achieve exclusive caching, in which a data block is cached at either a client or the disk array, but not both.

...read moreread less

Abstract: Modern high-end disk arrays often have several gigabytes of cache RAM. Unfortunately, most array caches use management policies which duplicate the same data blocks at both the client and array levels of the cache hierarchy: they are inclusive. Thus, the aggregate cache behaves as if it was only as big as the larger of the client and array caches, instead of as large as the sum of the two. Inclusiveness is wasteful: cache RAM is expensive. We explore the benefits of a simple scheme to achieve exclusive caching, in which a data block is cached at either a client or the disk array, but not both. Exclusiveness helps to create the effect of a single, large unified cache. We introduce a DEMOTE operation to transfer data ejected from the client to the array, and explore its effectiveness with simulation studies. We quantify the benefits and overheads of demotions across both synthetic and real-life workloads. The results show that we can obtain useful—sometimes substantial—speedups. During our investigation, we also developed some new cache-insertion algorithms that show promise for multiclient systems, and report on some of their properties.

...read moreread less

285 citations

Posted Content•

Theoretical Use of Cache Memory as a Cryptanalytic Side-Channel.

[...]

Daniel Page¹•Institutions (1)

University of Bristol¹

01 Jan 2002-IACR Cryptology ePrint Archive

TL;DR: In this article, the idea of cache memory being used as a side-channel which leaks information during the run of a cryptographic algorithm has been investigated, and it has been shown that an attacker may be able to reveal or narrow the possible values of secret information held on the target device.

...read moreread less

Abstract: We expand on the idea, proposed by Kelsey et al [?], of cache memory being used as a side-channel which leaks information during the run of a cryptographic algorithm By using this side-channel, an attacker may be able to reveal or narrow the possible values of secret information held on the target device We describe an attack which encrypts 2 chosen plaintexts on the target processor in order to collect cache profiles and then performs around 2 computational steps to recover the key As well as describing and simulating the theoretical attack, we discuss how hardware and algorithmic alterations can be used to defend against such techniques

...read moreread less

260 citations

Proceedings Article•DOI•

Using the small-world model to improve Freenet performance

[...]

Hui Zhang¹, Ashish Goel¹, Ramesh Govindan²•Institutions (2)

University of Southern California¹, International Computer Science Institute²

07 Nov 2002

TL;DR: This work proposes an enhanced-clustering cache replacement scheme for use in place of LRU, which improved the request hit ratio dramatically while keeping the small average hops per successful request comparable to LRU.

...read moreread less

Abstract: Efficient data retrieval in a peer-to-peer system like Freenet is a challenging problem. We study the impact of cache replacement policy on the performance of Freenet. We find that, with Freenet's LRU (least recently used) cache replacement, there is a steep reduction in the hit ratio with increasing load. Based on intuition from the small-world models and the recent theoretical results by Kleinberg, we propose an enhanced-clustering cache replacement scheme for use in place of LRU. Such a replacement scheme forces the routing tables to resemble neighbor relationships in a small-world acquaintance graph - clustering with light randomness. In our simulation, this new scheme improved the request hit ratio dramatically while keeping the small average hops per successful request comparable to LRU. A simple, highly idealized model of Freenet under clustering with light randomness proves that the expected message delivery time in Freenet is O(log/sup 2/n) if the routing tables satisfy the small-world model and have the size /spl theta/(log/sup 2/n).

...read moreread less

183 citations

Journal Article•DOI•

Cache invalidation and replacement strategies for location-dependent data in mobile environments

[...]

Baihua Zheng¹, Jianliang Xu¹, Dik Lun Lee¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Oct 2002-IEEE Transactions on Computers

TL;DR: A new performance criterion is introduced, called caching efficiency, and a generic method for location-dependent cache invalidation strategies is proposed, and two cache replacement policies, PA and PAID, are proposed.

...read moreread less

Abstract: Mobile location-dependent information services (LDISs) have become increasingly popular in recent years. However, data caching strategies for LDISs have thus far received little attention. In this paper, we study the issues of cache invalidation and cache replacement for location-dependent data under a geometric location model. We introduce a new performance criterion, called caching efficiency, and propose a generic method for location-dependent cache invalidation strategies. In addition, two cache replacement policies, PA and PAID, are proposed. Unlike the conventional replacement policies, PA and PAID take into consideration the valid scope area of a data value. We conduct a series of simulation experiments to study the performance of the proposed caching schemes. The experimental results show that the proposed location-dependent invalidation scheme is very effective and the PA and PAID policies significantly outperform the conventional replacement policies.

...read moreread less

172 citations

Proceedings Article•DOI•

Drowsy instruction caches. Leakage power reduction using dynamic voltage scaling and cache sub-bank prediction

[...]

Nam Sung Kim¹, Krisztian Flautner, David Blaauw¹, Trevor Mudge¹•Institutions (1)

University of Michigan¹

18 Nov 2002

TL;DR: The architectural control mechanism of-the drowsy cache is extended to reduce leakage power consumption of instruction caches without significant impact on execution time and the results show that data and instruction caches require different control strategies for efficient execution.

...read moreread less

Abstract: On-chip caches represent a sizeable fraction of the total power consumption of microprocessors. Although large caches can significantly improve performance, they have the potential to increase power consumption. As feature sizes shrink, the dominant component of this power loss will be leakage. In our previous work we have shown how the drowsy circuit - a simple, state-preserving, low-leakage circuit that relies on voltage scaling for leakage reduction - can be used to reduce the total energy consumption of data caches by more than 50%. In this paper, we extend the architectural control mechanism of the drowsy cache to reduce leakage power consumption of instruction caches without significant impact on execution time. Our results show that data and instruction caches require different control strategies for efficient execution. To enable drowsy instruction caches, we propose a technique called cache sub-bank prediction which is used to selectively wake up only the necessary parts of the instruction cache, while allowing most of the cache to stay in a low leakage drowsy mode. This prediction technique reduces the negative performance impact by 76% compared to the no-prediction policy. Our technique works well even with small predictor sizes and enables an 86% reduction of leakage energy in a 64 K byte instruction cache.

...read moreread less

170 citations

Proceedings Article•DOI•

Low-complexity algorithms for static cache locking in multitasking hard real-time systems

[...]

Isabelle Puaut, D. Decotigny

03 Dec 2002

TL;DR: This paper proposes two low-complexity algorithms for selecting the contents of statically-locked caches and evaluates their performances and compares them with those of a state of the art static cache analysis method.

...read moreread less

Abstract: Cache memories have been extensively used to bridge the gap between high speed processors and relatively slow main memories However, they are a source of predictability problems because of their dynamic and adaptive behavior and thus need special attention to be used in hard-real time systems A lot of progress has been achieved in the last ten years to statically predict the worst-case behavior of applications with respect to caches in order to determine safe and precise bounds on task worst-case execution times (WCETs) and cache-related preemption delays An alternative approach to cope with caches in real-time systems is to statically lock their contents such that memory access times and cache-related preemption times are predictable In this paper, we propose two low-complexity algorithms for selecting the contents of statically-locked caches We evaluate their performances and compare them with those of a state of the art static cache analysis method

...read moreread less

168 citations

Journal Article•DOI•

Timekeeping in the memory system: predicting and optimizing memory behavior

[...]

Zhigang Hu¹, Stefanos Kaxiras², Margaret Martonosi¹•Institutions (2)

Princeton University¹, Agere Systems²

01 May 2002

TL;DR: The extent to which detailed timing characteristics of past memory reference events are strongly predictive of future program reference behavior is shown, and a family of time-keeping techniques that optimize behavior based on observations about particular cache time durations, such as the cache access interval or the cache dead time are proposed.

...read moreread less

Abstract: Techniques for analyzing and improving memory referencing behavior continue to be important for achieving good overall program performance due to the ever-increasing performance gap between processors and main memory. This paper offers a fresh perspective on the problem of predicting and optimizing memory behavior. Namely, we show quantitatively the extent to which detailed timing characteristics of past memory reference events are strongly predictive of future program reference behavior. We propose a family of time-keeping techniques that optimize behavior based on observations about particular cache time durations, such as the cache access interval or the cache dead time. Timekeeping techniques can be used to build small simple, and high-accuracy (often 90% or more) predictors for identifying conflict misses, for predicting dead blocks, and even for estimating the time at which the next reference to a cache frame will occur and the address that will be accessed. Based on these predictors, we demonstrate two new and complementary time-based hardware structures: (1) a time-based victim cache that improves performance by only storing conflict miss lines with likely reuse, and (2) a time-based prefetching technique that hones in on the right address to prefetch, and the right time to schedule the prefetch. Our victim cache technique improves performance over previous proposals by better selections of what to place in the victim cache. Our prefetching technique outperforms similar prior hardware prefetching proposals, despite being orders of magnitude smaller. Overall, these techniques improve performance by more than 11% across the SPEC2000 benchmark suite.

...read moreread less

Proceedings Article•DOI•

Optimal proxy cache allocation for efficient streaming media distribution

[...]

Bing Wang, Subhabrata Sen¹, Micah Adler², Don Towsley²•Institutions (2)

AT&T¹, University of Massachusetts Amherst²

07 Nov 2002

TL;DR: The problem of efficiently streaming a set of heterogeneous videos from a remote server through a proxy to multiple asynchronous clients so that they can experience playback with low startup delays is addressed.

...read moreread less

Abstract: In this paper, we address the problem of efficiently streaming a set of heterogeneous videos from a remote server through a proxy to multiple asynchronous clients so that they can experience playback with low startup delays. We develop a technique to analytically determine the optimal proxy prefix cache allocation to the videos that minimizes the aggregate network bandwidth cost. We integrate proxy caching with traditional server-based reactive transmission schemes such as batching, patching and stream merging to develop a set of proxy-assisted delivery schemes. We quantitatively explore the impact of the choice of transmission scheme, cache allocation policy, proxy cache size, and availability of unicast versus multicast capability, on the resultant transmission cost.. Our evaluations show that even a relatively small prefix cache (10%-20% of the video repository) is sufficient to realize substantial savings in transmission cost. We find that carefully designed proxy-assisted reactive transmission schemes can produce significant cost savings even in predominantly unicast environments such as the Internet.

...read moreread less

Proceedings Article•DOI•

Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay

[...]

Se-Hyun Yang¹, Michael D. Powell¹, Babak Falsafi², T. N. Vijaykumar²•Institutions (2)

Carnegie Mellon University¹, Purdue University²

02 Feb 2002

TL;DR: A hybrid selective-sets-and-ways cache organization is proposed that always offers equal or better resizing granularity than both of previously proposed organizations, and the energy savings from resizing d-cache and i-cache together are investigated.

...read moreread less

Abstract: Cache memories account for a significant fraction of a chip's overall energy dissipation. Recent research advocates using "resizable" caches to exploit cache requirement variability in applications to reduce cache size and eliminate energy dissipation in the cache's unused sections with minimal impact on performance. Current proposals for resizable caches fundamentally vary in two design aspects: (1) cache organization, where one organization, referred to as selective-ways, varies the cache's set-associativity, while the other, referred to as selective-sets, varies the number of cache sets, and (2) resizing strategy, where one proposal statically sets the cache size prior to an application's execution, while the other allows for dynamic resizing both within and across applications. In this paper, we compare and contrast, for the first time, the proposed design choices for resizable caches, and evaluate the effectiveness of cache resizings in reducing the overall energy-delay in deep-submicron processors. In addition, we propose a hybrid selective-sets-and-ways cache organization that always offers equal or better resizing granularity than both of previously proposed organizations. We also investigate the energy savings from resizing d-cache and i-cache together to characterize the interaction between d-cache and i-cache resizings.

...read moreread less

Patent•

Log-structured write cache for data storage devices and systems

[...]

Steven Robert Hetzler, Daniel F. Smith

27 Dec 2002

TL;DR: In this article, a log-structured write cache for a data storage system and method for improving the performance of the storage system is described, where cache lines where write data is temporarily accumulated in a nonvolatile state so that it can be sequentially written to the target storage locations at a later time.

...read moreread less

Abstract: A log-structured write cache for a data storage system and method for improving the performance of the storage system are described. The system might be a RAID storage array, a disk drive, an optical disk, or a tape storage system. The write cache is preferably implemented in the main storage medium of the system, but can also be provided in other storage components of the system. The write cache includes cache lines where write data is temporarily accumulated in a non-volatile state so that it can be sequentially written to the target storage locations at a later time, thereby improving the overall performance of the system. Meta-data for each cache line is also maintained in the write cache. The meta-data includes the target sector address for each sector in the line and a sequence number that indicates the order in which data is posted to the cache lines. A buffer table entry is provided for each cache line. A hash table is used to search the buffer table for a sector address that is needed at each data read and write operation.

...read moreread less

Journal Article•DOI•

Proactive power-aware cache management for mobile computing systems

[...]

Guohong Cao¹•Institutions (1)

Pennsylvania State University¹

01 Jun 2002-IEEE Transactions on Computers

TL;DR: This paper proposes a proactive cache management scheme that not only improves the cache hit ratio, the throughput, and the bandwidth utilization, but also reduces the query delay and the power consumption.

...read moreread less

Abstract: Recent work has shown that invalidation report (IR)-based cache management is an attractive approach for mobile environments. However, the IR-based cache invalidation solution has some limitations, such as long query delay, low bandwidth utilization, and it is not suitable for applications where data change frequently. In this paper, we propose a proactive cache management scheme to address these issues. Instead of passively waiting, the clients intelligently prefetch the data that are most likely used in the future. Based on a novel prefetch-access ratio concept, the proposed scheme can dynamically optimize performance or power based on the available resources and the performance requirements. To deal with frequently updated data, different techniques (indexing and caching) are applied to handle different components of the data based on their update frequency. Detailed simulation experiments are carried out to evaluate the proposed methodology. Compared to previous schemes, our solution not only improves the cache hit ratio, the throughput, and the bandwidth utilization, but also reduces the query delay and the power consumption.

...read moreread less

Patent•

Centralized bounded domain caching control system for network edge servers

[...]

Stephen McHenry, David Veach, Paul Czarnik, Carl Schroeder, David Zink, Dan Koren, Neal Caldecott, Shari Trumbo-McHenry - Show less +4 more

06 Aug 2002

TL;DR: In this article, the cache content storage and replacement policies for a distributed plurality of network edge caches are centrally determined by a content selection server that executes a first process over a bounded content domain against a predefined set of domain content identifiers.

...read moreread less

Abstract: A network edge cache management system centrally determines cache content storage and replacement policies for a distributed plurality of network edge caches. The management system includes a content selection server that executes a first process over a bounded content domain against a predefined set of domain content identifiers to produce a meta-content description of the bounded content domain, a second process against the meta-content description to define a plurality of content groups representing respective content sub-sets of the bounded content domain, a third process to associate respective sets of predetermined cache management attributes with the plurality of content groups, and a fourth process to generate a plurality of cache control rule bases selectively storing identifications of the plurality of content groups and corresponding associated sets of the predetermined cache management attributes. The cache control rule bases are distributed to the plurality of network edge cache servers.

...read moreread less

Journal Article•DOI•

Data page layouts for relational databases on deep memory hierarchies

[...]

Anastassia Ailamaki¹, David J. DeWitt², Mark D. Hill²•Institutions (2)

Carnegie Mellon University¹, University of Wisconsin-Madison²

11 Nov 2002

TL;DR: This paper proposes a new data organization model called PAX (Partition Attributes Across), that significantly improves cache performance by grouping together all values of each attribute within each page, and shows that PAX performs well across different memory system designs.

...read moreread less

Abstract: Relational database systems have traditionally optimized for I/O performance and organized records sequentially on disk pages using the N-ary Storage Model (NSM) (a.k.a., slotted pages). Recent research, however, indicates that cache utilization and performance is becoming increasingly important on modern platforms. In this paper, we first demonstrate that in-page data placement is the key to high cache performance and that NSM exhibits low cache utilization on modern platforms. Next, we propose a new data organization model called PAX (Partition Attributes Across), that significantly improves cache performance by grouping together all values of each attribute within each page. Because PAX only affects layout inside the pages, it incurs no storage penalty and does not affect I/O behavior. According to our experimental results (which were obtained without using any indices on the participating relations), when compared to NSM: (a) PAX exhibits superior cache and memory bandwidth utilization, saving at least 75% of NSM's stall time due to data cache accesses; (b) range selection queries and updates on memory-resident relations execute 1725% faster; and (c) TPC-H queries involving I/O execute 1148% faster. Finally, we show that PAX performs well across different memory system designs.

...read moreread less

Proceedings Article•DOI•

Fractal prefetching B+-Trees: optimizing both cache and disk performance

[...]

Shimin Chen¹, Phillip B. Gibbons², Todd C. Mowry¹, Gary Valentin³•Institutions (3)

Carnegie Mellon University¹, Bell Labs², IBM³

03 Jun 2002

TL;DR: Fractal prefetching B+-Trees (fpB+Trees) as discussed by the authors embeds cache-optimized trees within disk optimized trees, in order to optimize both cache and I/O performance.

...read moreread less

Abstract: B+-Trees have been traditionally optimized for I/O performance with disk pages as tree nodes. Recently, researchers have proposed new types of B+-Trees optimized for CPU cache performance in main memory environments, where the tree node sizes are one or a few cache lines. Unfortunately, due primarily to this large discrepancy in optimal node sizes, existing disk-optimized B+-Trees suffer from poor cache performance while cache-optimized B+-Trees exhibit poor disk performance. In this paper, we propose fractal prefetching B+-Trees (fpB+-Trees), which embed "cache-optimized" trees within "disk-optimized" trees, in order to optimize both cache and I/O performance. We design and evaluate two approaches to breaking disk pages into cache-optimized nodes: disk-first and cache-first. These approaches are somewhat biased in favor of maximizing disk and cache performance, respectively, as demonstrated by our results. Both implementations of fpB+-Trees achieve dramatically better cache performance than disk-optimized B+-Trees: a factor of 1.1-1.8 improvement for search, up to a factor of 4.2 improvement for range scans, and up to a 20-fold improvement for updates, all without significant degradation of I/O performance. In addition, fpB+-Trees accelerate I/O performance for range scans by using jump-pointer arrays to prefetch leaf pages, thereby achieving a speed-up of 2.5-5 on IBM's DB2 Universal Database.

...read moreread less

Proceedings Article•DOI•

Pointer cache assisted prefetching

[...]

Jamison D. Collins¹, Suleyman Sair¹, Brad Calder¹, Dean M. Tullsen¹•Institutions (1)

University of California, San Diego¹

18 Nov 2002

TL;DR: This paper proposes the use of a pointer cache, which tracks pointer transitions, to aid prefetching, and examines using the pointer cache in a wide issue superscalar processor as a value predictor and to aidPrefetching when a chain of pointers is being traversed.

...read moreread less

Abstract: Data prefetching effectively reduces the negative effects of long load latencies on the performance of modern processors. Hardware prefetchers employ hardware structures to predict future memory addresses based on previous patterns. Thread-based prefetchers use portions of the actual program code to determine future load addresses for prefetching. This paper proposes the use of a pointer cache, which tracks pointer transitions, to aid prefetching. The pointer cache provides, for a given pointer's effective address, the base address of the object pointed to by the pointer. We examine using the pointer cache in a wide issue superscalar processor as a value predictor and to aid prefetching when a chain of pointers is being traversed. When a load misses in the L1 cache, but hits in the pointer cache, the first two cache blocks of the pointed to object are prefetched. In addition, the load's dependencies are broken by using the pointer cache hit as a value prediction. We also examine using the pointer cache to allow speculative precomputation to run farther ahead of the main thread of execution than in prior studies. Previously proposed thread-based prefetchers are limited in how far they can run ahead of the main thread when traversing a chain of recurrent dependent loads. When combined with the pointer cache, a speculative thread can make better progress ahead of the main thread, rapidly traversing data structures in the face of cache misses caused by pointer transitions.

...read moreread less

Patent•

Method and system for automatically updating content stored on servers connected by a network

[...]

Thomas E. Kee, Ryan C. Kearny, Donald Joseph DeCaprio, Christian D. Saether

25 Jan 2002

TL;DR: In this article, a system and computer implementable method for updating content on servers coupled to a network is described, which includes updating an origin server with a version of files used to provide content, retrieving data that indicates an action to be performed on one or more cache servers in conjunction with updating the origin server, and performing the action to update entries in the one/more cache servers.

...read moreread less

Abstract: A system and computer implementable method for updating content on servers coupled to a network. The method includes updating an origin server with a version of files used to provide content, retrieving data that indicates an action to be performed on one or more cache servers in conjunction with updating the origin server, and performing the action to update entries in the one or more cache servers. Each entry in each cache server is associated with a subset of the content on the origin server and may include an expiration field and/or a time to live field. An example of a subset of content to which a cache entry may be associated is a Web page. Cache servers are not required to poll origin servers to determine whether new content is available. Cache servers may be pre-populated using push or pull techniques.

...read moreread less

Journal Article•DOI•

Coordinated en-route Web caching

[...]

Xueyan Tang¹, Samuel T. Chanson¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Jun 2002-IEEE Transactions on Computers

TL;DR: A novel caching scheme that integrates both object placement and replacement policies and which makes caching decisions on all candidate sites in a coordinated fashion is proposed.

...read moreread less

Abstract: Web caching is an important technique for reducing Internet access latency, network traffic, and server load. This paper investigates cache management strategies for the en-route web caching environment, where caches are associated with routing nodes in the network. We propose a novel caching scheme that integrates both object placement and replacement policies and which makes caching decisions on all candidate sites in a coordinated fashion. In our scheme, cache status information along the routing path of a request is used in dynamically determining where to cache the requested object and what to replace if there is not enough space. The object placement problem is formulated as an optimization problem and the optimal locations to cache the object are obtained using a low-cost dynamic programming algorithm. Extensive simulation experiments have been performed to evaluate the proposed scheme in terms of a wide range of performance metrics. The results show that the proposed scheme significantly outperforms existing algorithms which consider either object placement or replacement at individual caches only.

...read moreread less

Patent•

Sharing a second tier cache memory in a multi-processor

[...]

Fred Gruner, David Hass, Ramesh Panwar, Nazar A. Zaidi

25 Mar 2002

TL;DR: In this article, the second tier cache memory includes a data ring interface and a snoop ring interface, which are coupled to the first-tier cache memory in a set of caches.

...read moreread less

Abstract: A set of cache memory includes a set of first tier cache memory and a second tier cache memory. In the set of first tier cache memory each first tier cache memory is coupled to a compute engine in a set of compute engines. The second tier cache memory is coupled to each first tier cache memory in the set of first tier cache memory. The second tier cache memory includes a data ring interface and a snoop ring interface.

...read moreread less

Patent•

Method and apparatus for shared cache coherency for a chip multiprocessor or multiprocessor system

[...]

Vladimir Pentkovski, Vivek Garg, Narayanan S. Iyer, Jagannath Keshava

23 Aug 2002

TL;DR: In this paper, the authors present a method and apparatus for shared cache coherency for a chip multiprocessor or a multi-core system. But they do not specify the cache lines themselves.

...read moreread less

Abstract: A method and apparatus for shared cache coherency for a chip multiprocessor or a multiprocessor system. In one embodiment, a multicore processor includes a plurality of processor cores, each having a private cache, and a shared cache. An internal snoop bus is coupled to each private cache and the shared cache to communicate data from each private cache to other private caches and the shared cache. In another embodiment, an apparatus includes a plurality of processor cores and a plurality of caches. One of the plurality of caches maintains cache lines in two different modified states. The first modified state indicates a most recent copy of a modified cache line, and the second modified state indicates a stale copy of the modified cache line.

...read moreread less

Patent•

Cache management system and method

[...]

David D'Orto, Neil Kenig, Peter Petersen, Gregory Pavlik

03 Dec 2002

TL;DR: In this article, a cache management system comprises a cache adapted store data corresponding to a data source and a cache manager adapted to access a set of rules to determine a frequency for automatically updating the data in the cache.

...read moreread less

Abstract: A cache management system comprises a cache adapted store data corresponding to a data source. The cache management system also comprises a cache manager adapted to access a set of rules to determine a frequency for automatically updating the data in the cache. The cache manager is also adapted to automatically communicate with the data source to update the data in the cache corresponding to the determined frequency.

...read moreread less

Proceedings Article•DOI•

The hardness of cache conscious data placement

[...]

Erez Petrank¹, Dror Rawitz¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 Jan 2002

TL;DR: This work investigates the complexity of finding the optimal placement of objects (or code) in the memory, in the sense that this placement reduces the cache misses to the minimum, and shows that this problem is one of the toughest amongst the interesting algorithmic problems in computer science.

...read moreread less

Abstract: The growing gap between the speed of memory access and cache access has made cache misses an influential factor in program efficiency. Much effort has been spent recently on reducing the number of cache misses during program run. This effort includes wise rearranging of program code, cache-conscious data placement, and algorithmic modifications that improve the program cache behavior. In this work we investigate the complexity of finding the optimal placement of objects (or code) in the memory, in the sense that this placement reduces the cache misses to the minimum. We show that this problem is one of the toughest amongst the interesting algorithmic problems in computer science. In particular, suppose one is given a sequence of memory accesses and one has to place the data in the memory so as to minimize the number of cache misses for this sequence. We show that if P ≠ NP, then one cannot efficiently approximate the optimal solution even up to a very liberal approximation ratio. Thus, this problem joins the small family of extremely inapproximable optimization problems. The other two famous members in this family are minimum coloring and maximum clique.

...read moreread less

Patent•

Systems and methods for efficient memory allocation for streaming of multimedia files

[...]

Thomas Pinckney, Garry Kenneth Kessler, Christopher Provenzano, Benjamin Thomas

19 Apr 2002

TL;DR: In this article, a streaming delivery accelerator (SDA) receives content from a content provider, caches at least part of the content, forming a cache file, and streams the cache file to a user.

...read moreread less

Abstract: Systems and methods for streaming of multimedia files over a network are described. A streaming delivery accelerator (SDA) receives content from a content provider, caches at least part of the content, forming a cache file, and streams the cache file to a user. The described systems and methods are directed to separate (shred) the content into contiguous cache files suitable for streaming. The shredded cache files may have different transmission bit rates and/or different content, such as audio, text, etc. Checksums can migrate from the content file to the shredded cache files and between different network protocols without the need for recomputing the checksums.

...read moreread less

Journal Article•DOI•

Efficient replacement of nonuniform objects in Web caches

[...]

Hyokyung Bahn¹, Kern Koh¹, Sam H. Noh, S.M. Lyul¹•Institutions (1)

Seoul National University¹

01 Jun 2002-IEEE Computer

TL;DR: The authors present the least-unified value algorithm, which performs better than existing algorithms for replacing nonuniform data objects in wide-area distributed environments.

...read moreread less

Abstract: Cache performance depends heavily on replacement algorithms, which dynamically select a suitable subset of objects for caching in a finite space. Developing such algorithms for wide-area distributed environments is challenging because, unlike traditional paging systems, retrieval costs and object sizes are not necessarily uniform. In a uniform caching environment, a replacement algorithm generally seeks to reduce cache misses, usually by replacing an object with the least likelihood of re-reference. In contrast, reducing total cost incurred due to cache misses is more important in nonuniform caching environments. The authors present the least-unified value algorithm, which performs better than existing algorithms for replacing nonuniform data objects in wide-area distributed environments.

...read moreread less

Proceedings Article•DOI•

Energy efficient Frequent Value data Cache design

[...]

Jun Yang¹, Rajiv Gupta²•Institutions (2)

University of California, Riverside¹, University of Arizona²

18 Nov 2002

TL;DR: This paper proposes the design of the Frequent Value Cache (FVC), a cache in which storing a frequent value requires few bits as they are stored in encoded form while all other values are storage in unencoded form using 32 bits.

...read moreread less

Abstract: Recent work has shown that a small number of distinct frequently occurring values often account for a large portion of memory accesses. In this paper we demonstrate how this frequent value phenomenon can be exploited in designing a cache that trades off performance with energy efficiency. We propose the design of the Frequent Value Cache (FVC) in which storing a frequent value requires few bits as they are stored in encoded form while all other values are stored in unencoded form using 32 bits. The data array is partitioned into two arrays such that if a frequent value is accessed only the first data array is accessed; otherwise an additional cycle is needed to access the second data array. Experiments with some of the SPEC95 benchmarks show that on an average a 64 Kb/64-value FVC provides 28.8% reduction in Ll cache energy and 3.38% increase in execution time delay over a conventional 64 Kb cache.

...read moreread less

Journal Article•DOI•

Scalable proxy caching of video under storage constraints

[...]

Zhourong Miao¹, Antonio Ortega¹•Institutions (1)

University of Southern California¹

07 Nov 2002-IEEE Journal on Selected Areas in Communications

TL;DR: It is shown that the approaches proposed in this paper (referred to as selective caching), where only a few frames are cached, can also contribute to significant improvements in the overall performance.

...read moreread less

Abstract: Proxy caching has been used to speed up Web browsing and reduce networking costs. In this paper, we study the extension of proxy caching techniques to streaming video applications. A trivial extension consists of storing complete video sequences in the cache. However, this may not be applicable in situations where the video objects are very large and proxy cache space is limited. We show that the approaches proposed in this paper (referred to as selective caching), where only a few frames are cached, can also contribute to significant improvements in the overall performance. In particular, we discuss two network environments for streaming video, namely, quality-of-service (QoS) networks and best-effort networks (Internet). For QoS networks, the video caching goal is to reduce the network bandwidth costs; for best-effort networks, the goal is to increase the robustness of continuous playback against poor network conditions (such as congestion, delay, and loss). Two different selective caching algorithms (SCQ and SCB) are proposed, one for each network scenario, to increase the relevant overall performance metric in each case, while requiring only a fraction of the video stream to be cached. The main contribution of our work is to provide algorithms that are efficient even when the buffer memory available at the client is limited. These algorithms are also scalable so that when changes in the environment occur it is possible, with low complexity, to modify the allocation of cache space to different video sequences.

...read moreread less

Proceedings Article•DOI•

Leakage energy management in cache hierarchies

[...]

Lin Li¹, Ismail Kadayif¹, Yuh-Fang Tsai¹, Narayanan Vijaykrishnan¹, Mahmut Kandemir¹, Mary Jane Irwin¹, Anand Sivasubramaniam¹ - Show less +3 more•Institutions (1)

Pennsylvania State University¹

22 Sep 2002

TL;DR: This work presents several architectural techniques that exploit the data duplication across the different levels of cache hierarchy, and employs both state-preserving and state-destroying leakage control mechanisms to L2 subblocks when their data also exist in L1.

...read moreread less

Abstract: Energy management is important for a spectrum of systems ranging from high-performance architectures to low-end mobile and embedded devices. With the increasing number of transistors, smaller feature sizes, lower supply and threshold voltages, the focus on energy optimization is shifting from dynamic to leakage energy. Leakage energy is of particular concern in dense cache memories that form a major portion of the transistor budget. In this work, we present several architectural techniques that exploit the data duplication across the different levels of cache hierarchy. Specifically, we employ both state-preserving (data-retaining) and state-destroying leakage control mechanisms to L2 subblocks when their data also exist in L1. Using a set of media and array-dominated applications, we demonstrate the effectiveness of the proposed techniques through cycle-accurate simulation. We also compare our schemes with the previously proposed cache decay policy. This comparison indicates that one of our schemes generates competitive results with cache decay.

...read moreread less

Collapse