scispace - formally typeset
Search or ask a question

Showing papers on "Cache pollution published in 2003"


Proceedings Article
Nimrod Megiddo1, Dharmendra S. Modha1
31 Mar 2003
TL;DR: The problem of cache management in a demand paging scenario with uniform page sizes is considered and a new cache management policy, namely, Adaptive Replacement Cache (ARC), is proposed that has several advantages.
Abstract: We consider the problem of cache management in a demand paging scenario with uniform page sizes. We propose a new cache management policy, namely, Adaptive Replacement Cache (ARC), that has several advantages. In response to evolving and changing access patterns, ARC dynamically, adaptively, and continually balances between the recency and frequency components in an online and selftuning fashion. The policy ARC uses a learning rule to adaptively and continually revise its assumptions about the workload. The policy ARC is empirically universal, that is, it empirically performs as well as a certain fixed replacement policy-even when the latter uses the best workload-specific tuning parameter that was selected in an offline fashion. Consequently, ARC works uniformly well across varied workloads and cache sizes without any need for workload specific a priori knowledge or tuning. Various policies such as LRU-2, 2Q, LRFU, and LIRS require user-defined parameters, and, unfortunately, no single choice works uniformly well across different workloads and cache sizes. The policy ARC is simple-to-implement and, like LRU, has constant complexity per request. In comparison, policies LRU-2 and LRFU both require logarithmic time complexity in the cache size. The policy ARC is scan-resistant: it allows one-time sequential requests to pass through without polluting the cache. On 23 real-life traces drawn from numerous domains, ARC leads to substantial performance gains over LRU for a wide range of cache sizes. For example, for a SPC1 like synthetic benchmark, at 4GB cache, LRU delivers a hit ratio of 9.19% while ARC achieves a hit ratio of 20.

938 citations


Proceedings ArticleDOI
01 May 2003
TL;DR: This work introduces a novel cache architecture intended for embedded microprocessor platforms that can be configured by software to be direct-mapped, two-way, or four-way set associative, using a technique the authors call way concatenation, having very little size or performance overhead.
Abstract: Energy consumption is a major concern in many embedded computing systems. Several studies have shown that cache memories account for about 50% of the total energy consumed in these systems. The performance of a given cache architecture is largely determined by the behavior of the application using that cache. Desktop systems have to accommodate a very wide range of applications and therefore the manufacturer usually sets the cache architecture as a compromise given current applications, technology and cost. Unlike desktop systems, embedded systems are designed to run a small range of well-defined applications. In this context, a cache architecture that is tuned for that narrow range of applications can have both increased performance as well as lower energy consumption. We introduce a novel cache architecture intended for embedded microprocessor platforms. The cache can be configured by software to be direct-mapped, two-way, or four-way set associative, using a technique we call way concatenation, having very little size or performance overhead. We show that the proposed cache architecture reduces energy caused by dynamic power compared to a way-shutdown cache. Furthermore, we extend the cache architecture to also support a way shutdown method designed to reduce the energy from static power that is increasing in importance in newer CMOS technologies. Our study of 23 programs drawn from Powerstone, MediaBench and Spec2000 show that tuning the cache's configuration saves energy for every program compared to conventional four-way set-associative as well as direct mapped caches, with average savings of 40% compared to a four-way conventional cache.

323 citations


Proceedings ArticleDOI
08 Feb 2003
TL;DR: Simulations show that for the best of the methods, the performance overhead is less than 25%, a significant decrease from the 10/spl times/ overhead of a naive implementation.
Abstract: We study the hardware cost of implementing hash-tree based verification of untrusted external memory by a high performance processor. This verification could enable applications such as certified program execution. A number of schemes are presented with different levels of integration between the on-processor L2 cache and the hash-tree machinery. Simulations show that for the best of our methods, the performance overhead is less than 25%, a significant decrease from the 10/spl times/ overhead of a naive implementation.

244 citations


Proceedings ArticleDOI
03 Dec 2003
TL;DR: NuRAPID is proposed, which averages sequential tag-data access to decouple data placement from tag placement, resulting in higher performance and substantially lower cache energy.
Abstract: Wire delays continue to grow as the dominant component oflatency for large caches.A recent work proposed an adaptive,non-uniform cache architecture (NUCA) to manage large, on-chipcaches.By exploiting the variation in access time acrosswidely-spaced subarrays, NUCA allows fast access to closesubarrays while retaining slow access to far subarrays.Whilethe idea of NUCA is attractive, NUCA does not employ designchoices commonly used in large caches, such as sequential tag-dataaccess for low power.Moreover, NUCA couples dataplacement with tag placement foregoing the flexibility of dataplacement and replacement that is possible in a non-uniformaccess cache.Consequently, NUCA can place only a few blockswithin a given cache set in the fastest subarrays, and mustemploy a high-bandwidth switched network to swap blockswithin the cache for high performance.In this paper, we proposethe Non-uniform access with Replacement And PlacementusIng Distance associativity" cache, or NuRAPID, whichleverages sequential tag-data access to decouple data placementfrom tag placement.Distance associativity, the placementof data at a certain distance (and latency), is separated from setassociativity, the placement of tags within a set.This decouplingenables NuRAPID to place flexibly the vast majority offrequently-accessed data in the fastest subarrays, with fewerswaps than NUCA.Distance associativity fundamentallychanges the trade-offs made by NUCA's best-performingdesign, resulting in higher performance and substantiallylower cache energy.A one-ported, non-banked NuRAPIDcache improves performance by 3% on average and up to 15%compared to a multi-banked NUCA with an infinite-bandwidthswitched network, while reducing L2 cache energy by 77%.

210 citations


Patent
24 Mar 2003
TL;DR: In this article, a centralized cache server connected to a plurality of web servers provides a cached copy of the requested dynamic content if it is available in its cache, if the cached copy is still fresh.
Abstract: A method and system for optimizing Internet applications. A centralized cache server connected to a plurality of web servers provides a cached copy of the requested dynamic content if it is available in its cache. Preferably, the centralized cache server determines if the cached copy is still fresh. If the requested content is unavailable from its cache, the centralized cache server directs the client request to the application server. The response is delivered to the client and a copy of the response is stored in the cache by the centralized cache server. Preferably, the centralized cache server utilizes a pre-determined caching rules to selectively store the response from the application server.

163 citations


Proceedings ArticleDOI
10 Jun 2003
TL;DR: This paper combines compile-time cache analysis with data cache locking to estimate the worst-case memory performance (WCMP) in a safe, tight and fast way, and shows that this scheme is fully predictable, without compromising the performance of the transformed program.
Abstract: Caches have become increasingly important with the widening gap between main memory and processor speeds. However, they are a source of unpredictability due to their characteristics, resulting in programs behaving in a different way than expected.Cache locking mechanisms adapt caches to the needs of real-time systems. Locking the cache is a solution that trades performance for predictability: at a cost of generally lower performance, the time of accessing the memory becomes predictable.This paper combines compile-time cache analysis with data cache locking to estimate the worst-case memory performance (WCMP) in a safe, tight and fast way. In order to get predictable cache behavior, we first lock the cache for those parts of the code where the static analysis fails. To minimize the performance degradation, our method loads the cache, if necessary, with data likely to be accessed.Experimental results show that this scheme is fully predictable, without compromising the performance of the transformed program. When compared to an algorithm that assumes compulsory misses when the state of the cache is unknown, our approach eliminates all overestimation for the set of benchmarks, giving an exact WCMP of the transformed program without any significant decrease in performance.

155 citations


Book ChapterDOI
09 Sep 2003
TL;DR: This work introduces a new database object called Cache Table that enables persistent caching of the full or partial content of a remote database table that supports transparent caching both at the edge of content-delivery networks and in the middle of an enterprise application infrastructure, improving the response time, throughput and scalability of transactional web applications.
Abstract: We introduce a new database object called Cache Table that enables persistent caching of the full or partial content of a remote database table. The content of a cache table is either defined declaratively and populated in advance at setup time, or determined dynamically and populated on demand at query execution time. Dynamic cache tables exploit the characteristics of typical transactional web applications with a high volume of short transactions, simple equality predicates, and 3-4 way joins. Based on federated query processing capabilities, we developed a set of new technologies for database caching: cache tables, "Janus" (two-headed) query execution plans, cache constraints, and asynchronous cache population methods. Our solution supports transparent caching both at the edge of content-delivery networks and in the middle-tier of an enterprise application infrastructure, improving the response time, throughput and scalability of transactional web applications.

151 citations


Patent
21 Apr 2003
TL;DR: In this article, the disk drive has a cache control system that is configured to efficiently respond to host commands by forming variable length segments of memory clusters for caching disk data in contiguous ranges of logical block addresses without regard to the sequential order of the memory clusters.
Abstract: The present invention is embodied in the disk drive having a cache control system that is configured to efficiently respond to host commands by forming variable length segments of memory clusters for caching disk data in contiguous ranges of logical block addresses without regard to the sequential order of the memory clusters. The cache control system has a tag memory usable only for defining the segments. The tag memory has a plurality of tag records pointing to cluster control blocks associated with the memory clusters for defining the segments. The tag memory may be accessed and updated by several state machines in the cache control system and by a microprocessor in the disk drive.

141 citations


Journal ArticleDOI
TL;DR: Simulations show that an average of 73% of I-cache lines and 54% of D- caches are put in sleep mode with an average IPC impact of only 1.7%, for 64 KB caches, and this work proposes applying sleep mode only to the data store and not the tag store.
Abstract: Lower threshold voltages in deep submicron technologies cause more leakage current, increasing static power dissipation. This trend, combined with the trend of larger/more cache memories dominating die area, has prompted circuit designers to develop SRAM cells with low-leakage operating modes (e.g., sleep mode). Sleep mode reduces static power dissipation, but data stored in a sleeping cell is unreliable or lost. So, at the architecture level, there is interest in exploiting sleep mode to reduce static power dissipation while maintaining high performance.Current approaches dynamically control the operating mode of large groups of cache lines or even individual cache lines. However, the performance monitoring mechanism that controls the percentage of sleep-mode lines, and identifies particular lines for sleep mode, is somewhat arbitrary. There is no way to know what the performance could be with all cache lines active, so arbitrary miss rate targets are set (perhaps on a per-benchmark basis using profile information), and the control mechanism tracks these targets. We propose applying sleep mode only to the data store and not the tag store. By keeping the entire tag store active the hardware knows what the hypothetical miss rate would be if all data lines were active, and the actual miss rate can be made to precisely track it. Simulations show that an average of 73p of I-cache lines and 54p of D-cache lines are put in sleep mode with an average IPC impact of only 1.7p, for 64 KB caches.

140 citations


Proceedings ArticleDOI
22 Jun 2003
TL;DR: This paper proposes a novel solution to this problem by allowing in-cache replication, wherein reliability can be enhanced without excessively slowing down cache accesses or requiring significant area cost increases.
Abstract: Processor caches already play a critical role in the performance of today’s computer systems. At the same time, the data integrity of words coming out of the caches can have serious consequences on the ability of a program to execute correctly, or even to proceed. The integrity checks need to be performed in a time-sensitive manner to not slow down the execution when there are no errors as in the common case, and should not excessively increase the power budget of the caches which is already high. ECC and parity-based protection techniques in use today fall at either extremes in terms of compromising one criteria for another, i.e., reliability for performance or vice-versa. This paper proposes a novel solution to this problem by allowing in-cache replication, wherein reliability can be enhanced without excessively slowing down cache accesses or requiring significant area cost increases. The mechanism is fairly power efficient in comparison to other alternatives as well. In particular, the solution replicates data that is in active use within the cache itself while evicting those that may not be needed in the near future. Our experiments show that a large fraction of the data read from the cache have replicas available with this optimization.

129 citations


Patent
14 Oct 2003
TL;DR: In this article, a power saving cache includes circuitry to dynamically reduce the logical size of the cache in order to save power, using a variety of combinable hardware and software techniques.
Abstract: A power saving cache and a method of operating a power saving cache. The power saving cache includes circuitry to dynamically reduce the logical size of the cache in order to save power. Preferably, a method is used to determine optimal cache size for balancing power and performance, using a variety of combinable hardware and software techniques. Also, in a preferred embodiment, steps are used for maintaining coherency during cache resizing, including the handling of modified (“dirty”) data in the cache, and steps are provided for partitioning a cache in one of several way to provide an appropriate configuration and granularity when resizing.

Patent
29 Oct 2003
TL;DR: In this paper, a method and apparatus is provided that provides a reliable diskless network-bootable computers using a local non-volatile memory (NVM) cache, which allows the user to continue operating during network outages and the computer can be cold booted using the data in the NVM cache if the network is unavailable.
Abstract: A method and apparatus is provided that provides a reliable diskless network-bootable computers using a local non-volatile memory (NVM) cache. The NVM cache is used by the computer when the network is temporarily unavailable or slow. The cache is later synchronized with a remote boot server having remote storage volumes when network conditions improve. It is determined if data is to be stored in the NVM cache or the remote storage volume. Data sent to the remote storage volume is transactionally written and the data is cached in the NVM cache if a network outage is occurring or a transaction complete message has not been received. The data stored in the NVM cache allows the user to continue operating during network outages and the computer can be cold-booted using the data in the NVM cache if the network is unavailable.

Patent
William B. Boyle1
31 Jul 2003
TL;DR: In this article, a disk drive control system comprising a microcontroller, a micro-controller cache system adapted to store microcontroller data for access by the microcontroller and a buffer manager adapted to provide the micro controller cache system with microcontroller requested data stored in a remote memory, and a cache demand circuit adapted to: a) receive a memory address and a memory access signal, and b) cause the micro-cached cache system to fetch data from the remote memory via the buffer manager based on the received memory address.
Abstract: A disk drive control system comprising a micro-controller, a micro-controller cache system adapted to store micro-controller data for access by the micro-controller, a buffer manager adapted to provide the micro-controller cache system with micro-controller requested data stored in a remote memory, and a cache demand circuit adapted to: a) receive a memory address and a memory access signal, and b) cause the micro-controller cache system to fetch data from the remote memory via the buffer manager based on the received memory address and memory access signal prior to a micro-controller request.

Patent
William B. Boyle1
31 Jul 2003
TL;DR: In this article, a method and system for improving fetch operations between a microcontroller and a remote memory via a buffer manager in a disk drive control system comprising a micro-controller, a micro controller cache system having a cache memory and a cache-control subsystem, and buffer manager communicating with microcontroller cache system and remote memory.
Abstract: A method and system for improving fetch operations between a micro-controller and a remote memory via a buffer manager in a disk drive control system comprising a micro-controller, a micro-controller cache system having a cache memory and a cache-control subsystem, and a buffer manager communicating with micro-controller cache system and remote memory. The invention includes receiving a data-request from micro-controller in cache control subsystem wherein the data-request comprises a request for at least one of instruction code and non-instruction data. The invention further includes providing the requested data to micro-controller if the requested data reside in cache memory, determining if the received data-request is for non-instruction data if requested data does not reside in cache memory, fetching the non-instruction data from remote memory by micro-controller cache system via buffer manager, and bypassing cache memory to preserve the contents of cache memory and provide the fetched non-instruction data to micro-controller.

Patent
29 Aug 2003
TL;DR: In this paper, a disk drive for executing a program comprising a plurality of instructions is described, and the disk drive comprises a primary memory for storing the instructions and a cache memory for caching the instructions.
Abstract: A disk drive is disclosed for executing a program comprising a plurality of instructions. The disk drive comprises a primary memory for storing the instructions, and a cache memory for caching the instructions. The cache management is enhanced by not re-filling the cache due to accessing a non-sequential immediate operand.

Patent
05 May 2003
TL;DR: In this article, an edge server and caching system is proposed, where the edge server may have a cache, cache listing, profile data, multimedia server, and internet information server.
Abstract: The invention is directed to an edge server and caching system. The edge server may have a cache, cache listing, profile data, multimedia server, and internet information server. A viewer may request a file with a specific version. The edge server may determine if the file is stored locally. If the file is not stored locally, the edge server may simultaneously cache and stream the media. If the file is available, the media may be streamed from the cache. The cache may be managed with a cache listing. The cache listing may be ordered by time of last use and may have profile data. Storage capacity may be managed by deleting the last file in the list. The profile data may be used to manage and distribute streaming media.

Patent
18 Feb 2003
TL;DR: In this paper, a cache in the memory controller stores entries that indicate a current power state for a subset of the dynamic memory devices, and a cache update logic updates information stored in the cache in accordance with the at least one update control signal.
Abstract: A memory controller controls access to, and the power state of a plurality of dynamic memory devices. A cache in the memory controller stores entries that indicate a current power state for a subset of the dynamic memory devices. Device state lookup logic responds to a memory access request by retrieving first information from an entry, if any, in the cache corresponding to a device address in the memory access request. The device state lookup logic generates a miss signal when the cache has no entry corresponding to the device address. It also retrieves second information indicating whether the cache is currently storing a maximum allowed number of entries for devices in a predefined mid-power state. Additional logic converts the first and second information and miss signal into at least one command selection signal and at least one update control signal. Cache update logic updates information stored in the cache in accordance with the at least one update control signal. Command issue circuitry issues power state commands and access commands to the dynamic memory devices in accordance with the at least one command selection signal and the address in the memory access request.

Patent
17 Jan 2003
TL;DR: In this paper, the authors propose a memory architecture in which method frames of method calls are stored in two different memory circuits, one memory circuit stores the execution environment of each method call, and the second memory circuits stores parameters, variables or operands of the method calls.
Abstract: A memory architecture in accordance with an embodiment of the present invention improves the speed of method invocation. Specifically, method frames of method calls are stored in two different memory circuits. The first memory circuit stores the execution environment of each method call, and the second memory circuit stores parameters, variables or operands of the method calls. In one embodiment the execution environment includes a return program counter, a return frame, a return constant pool, a current method vector, and a current monitor address. In some embodiments, the memory circuits are stacks; therefore, the stack management unit to cache can be used to cache either or both memory circuits. The stack management unit can include a stack cache to accelerate data transfers between a stack-based computing system and the stacks. In one embodiment, the stack management unit includes a stack cache, a dribble manager unit, and a stack control unit. The dribble manager unit includes a fill control unit and a spill control unit. Since the vast majority of memory accesses to the stack occur at or near the top of the stack, the dribble manager unit maintains the top portion of the stack in the stack cache. When the stack-based computing system is popping data off of the stack and a fill condition occurs, the fill control unit transfer data from the stack to the bottom of the stack cache to maintain the top portion of the stack in the stack cache. Typically, a fill condition occurs as the stack cache becomes empty and a spill condition occurs as the stack cache becomes full.

Patent
Richard L. Coulson1
22 Dec 2003
TL;DR: In this paper, the cache coherency administrator can include a display to indicate a cache co-herency status of a non-volatile cache, which can be used to check the cache's integrity.
Abstract: Apparatus and methods relating to a cache coherency administrator. The cache coherency administrator can include a display to indicate a cache coherency status of a non-volatile cache.

Journal ArticleDOI
TL;DR: The authors propose several designs that treat the cache as a network of banks and facilitate nonuniform accesses to different physical regions that offer low-latency access, increased scalability, and greater performance stability than conventional uniform access cache architectures.
Abstract: Nonuniform cache access designs solve the on-chip wire delay problem for future large integrated caches. By embedding a network in the cache, NUCA designs let data migrate within the cache, clustering the working set nearest the processor. The authors propose several designs that treat the cache as a network of banks and facilitate nonuniform accesses to different physical regions. NUCA architectures offer low-latency access, increased scalability, and greater performance stability than conventional uniform access cache architectures.

Patent
Dharmendra S. Modha1
21 Oct 2003
TL;DR: In this article, the authors propose a method, system, and program storage medium for adaptively managing pages in a cache memory included within a system having a variable workload, comprising arranging cache memory including a pointer that rotates around a circular buffer; maintaining a bit for each page in the circular buffer, wherein a bit value 0 indicates that the page was not accessed by the system since a last time that the pointer traversed over the page, and a hit value 1 indicates that a page has been accessed since the last time the pointer was accessed.
Abstract: A method, system, and program storage medium for adaptively managing pages in a cache memory included within a system having a variable workload, comprising arranging a cache memory included within a system into a circular buffer; maintaining a pointer that rotates around the circular buffer; maintaining a bit for each page in the circular buffer, wherein a bit value 0 indicates that the page was not accessed by the system since a last time that the pointer traversed over the page, and a hit value 1 indicates that the page has been accessed since the last time the pointer traversed over the page; and dynamically controlling a distribution of a number of pages in the cache memory that are marked with bit 0 in response to a variable workload in order to increase a hit ratio of the cache memory.

Patent
06 Aug 2003
TL;DR: In this article, a method for preloading data on a cache (210) in a local machine (235), coupled to a data store (130), in a remote host machine (240), is described.
Abstract: A method (400) of preloading data on a cache (210) in a local machine (235). The cache (210) is operably coupled to a data store (130), in a remote host machine (240). The method includes the steps of determining a user behaviour profile for the local machine (235); retrieving data relating to the user behaviour profile from the data store (130); and preloading the retrieved data in the cache (210), such that the data is made available to the cache user when desired. A local machine, a host machine, a cache, a communication system and preloading functions are also described. In this manner, data within the cache is maintained and replaced in a substantially optimal manner, and configured to be available to a cache user when it is predicted that the user wishes to access the data.

01 Jan 2003
TL;DR: A recent body of work has developed cache-oblivious algorithms and data structures that perform as well or nearly as well as standard external-memory structures which require knowledge of the cache/memory size and block transfer size.
Abstract: A recent direction in the design of cache-efficient and diskefficient algorithms and data structures is the notion of cache obliviousness, introduced by Frigo, Leiserson, Prokop, and Ramachandran in 1999. Cache-oblivious algorithms perform well on a multilevel memory hierarchy without knowing any parameters of the hierarchy, only knowing the existence of a hierarchy. Equivalently, a single cache-oblivious algorithm is efficient on all memory hierarchies simultaneously. While such results might seem impossible, a recent body of work has developed cache-oblivious algorithms and data structures that perform as well or nearly as well as standard external-memory structures which require knowledge of the cache/memory size and block transfer size. Here we describe several of these results with the intent of elucidating the techniques behind their design. Perhaps the most exciting of these results are the data structures, which form general building blocks immediately leading to several algorithmic results.

Proceedings ArticleDOI
03 Dec 2003
TL;DR: In this paper, a runtime data cache prefetching in the dynamic optimization system ADORE (ADaptive Object code Reoptimization) is proposed. But the performance of this approach is limited due to the lack of runtime cache miss and miss address information.
Abstract: Traditional software controlled data cache prefetching is often ineffective due to the lack of runtime cache miss and miss address information. To overcome this limitation, we implement runtime data cache prefetching in the dynamic optimization system ADORE (ADaptive Object code Reoptimization). Its performance has been compared with static software prefetching on the SPEC2000 benchmark suite. Runtime cache prefetching shows better performance. On an Itanium 2 based Linux workstation, it can increase performance by more than 20% over static prefetching on some benchmarks. For benchmarks that do not benefit from prefetching, the runtime optimization system adds only 1%-2% overhead. We have also collected cache miss profiles to guide static data cache prefetching in the ORC compiler. With that information the compiler can effectively avoid generating prefetches for loops that hit well in the data cache.

Patent
25 Aug 2003
TL;DR: In this article, the authors proposed a cache management method that enables optimal cache space settings to be provided on a storage device in a computer system where database management systems (DBMSs) run.
Abstract: A cache management method disclosed herein enables optimal cache space settings to be provided on a storage device in a computer system where database management systems (DBMSs) run. Through the disclosed method, cache space partitions to be used per data set are set, based on information about processes to be executed by the DBMSs, which is given as design information. For example, based on estimated rerun time of processes required after DBMS abnormal termination, cache space is adjusted to serve the needs of logs to be output from the DBMS. In another example, initial cache space allocations for table and index data is optimized, based on process types and approximate access characteristics of data. In yet another example, from a combination of results of pre-analysis of processes and cache operating statistics information, a change in process execution time by cache space tuning is estimated and a cache effect is enhanced.

Proceedings Article
01 Jan 2003
TL;DR: This paper presents an eviction-based placement policy for a storage cache that usually sits in the lower level of a multi-level buffer cache hierarchy and thereby has different access patterns from upper levels, and presents a method of using a client content tracking table to obtain eviction information from client buffer caches.
Abstract: Most previous work on buffer cache management uses an access-based placement policy that places a data block into a buffer cache at the block’s access time. This paper presents an eviction-based placement policy for a storage cache that usually sits in the lower level of a multi-level buffer cache hierarchy and thereby has different access patterns from upper levels. The main idea of the eviction-based placement policy is to delay a block’s placement in the cache until it is evicted from the upper level. This paper also presents a method of using a client content tracking table to obtain eviction information from client buffer caches, which can avoid modifying client application source code. We have evaluated the performance of this eviction-based placement by using both simulations with real-world workloads, and implementations on a storage system connected to a Microsoft SQL server database. Our simulation results show that the eviction-based cache placement has an up to 500% improvement on cache hit ratios over the commonly used access-based placement policy. Our evaluation results using OLTP workloads have demonstrated that the eviction-based cache placement has a speedup of 1.2 on OLTP transaction rates.

Patent
16 May 2003
TL;DR: In this article, a cache memory management system for snapshot applications is presented, which includes a cache directory including a hash table, hash table elements, cache line descriptors, and cache line functional pointers.
Abstract: The present invention relates to a cache memory management system suitable for use with snapshot applications. The system includes a cache directory including a hash table, hash table elements, cache line descriptors, and cache line functional pointers, and a cache manager running a hashing function that converts a request for data from an application to an index to a first hash table pointer in the hash table. The first hash table pointer in turn points to a first hash table element in a linked list of hash table elements where one of the hash table elements of the linked list of hash table elements points to a first cache line descriptor in the cache directory and a cache memory including a plurality of cache lines, wherein the first cache line descriptor has a one-to-one association with a first cache line. The present invention also provides for a method converting a request for data to an input to a hashing function, addressing a hash table based on a first index output from the hashing function, searching the hash table elements pointed to by the first index for the requested data, determining the requested data is not in cache memory, and allocating a first hash table element and a first cache line descriptor that associates with a first cache line in the cache memory.

Proceedings ArticleDOI
08 Feb 2003
TL;DR: This paper first explores the simple case of two static miss costs using trace-driven simulations to understand when cost-sensitive replacements are effective, and proposes several extensions of LRU which account for nonuniform miss costs.
Abstract: Cache replacement algorithms originally developed in the context of simple uniprocessor systems aim to reduce the miss count. However, in modern systems, cache misses have different costs. The cost may be latency, penalty, power consumption, bandwidth consumption, or any other ad-hoc numerical property attached to a miss. In many practical situations, it is desirable to inject the cost of a miss into the replacement policy. In this paper, we propose several extensions of LRU which account for nonuniform miss costs. These LRU extensions have simple implementations, yet they are very effective in various situations. We first explore the simple case of two static miss costs using trace-driven simulations to understand when cost-sensitive replacements are effective. We show that very large improvements of the cost function are possible in many practical cases. As an example of their effectiveness, we apply the algorithms to the second-level cache of a multiprocessor with superscalar processors, using the miss latency as the cost function. By applying our simple replacement policies sensitive to the latency of misses we can improve the execution time of some parallel applications by up to 18%.

Patent
15 Dec 2003
TL;DR: In this article, the authors present a system that facilitates delaying interfering memory accesses from other threads during transactional execution by storing copy-back information for the cache line to enable the cache lines to be copied back to the requesting thread.
Abstract: One embodiment of the present invention provides a system that facilitates delaying interfering memory accesses from other threads during transactional execution. During transactional execution of a block of instructions, the system receives a request from another thread (or processor) to perform a memory access involving a cache line. If performing the memory access on the cache line will interfere with the transactional execution and if it is possible to delay the memory access, the system delays the memory access and stores copy-back information for the cache line to enable the cache line to be copied back to the requesting thread. At a later time, when the memory access will no longer interfere with the transactional execution, the system performs the memory access and copies the cache line back to the requesting thread.

Proceedings ArticleDOI
27 Sep 2003
TL;DR: This work uses the recently published locality analysis to generate a parameterized model of program cache behavior that predicts the miss rate for arbitrary data input set sizes and identifies critical data input sizes where cache behavior exhibits marked changes.
Abstract: Improving cache performance requires understanding cache behavior. However, measuring cache performance for one or two data input sets provides little insight into how cache behavior varies across all data input sets. This paper uses our recently published locality analysis to generate a parameterized model of program cache behavior. Given a cache size and associativity, this model predicts the miss rate for arbitrary data input set sizes. This model also identifies critical data input sizes where cache behavior exhibits marked changes. Experiments show this technique is within 2% of the hit rate for set associative caches on a set of integer and floating-point programs.