Showing papers on "Cache algorithms published in 2003"

PDF

Open Access

Proceedings Article•

ARC: a self-tuning, low overhead replacement cache

[...]

Nimrod Megiddo¹, Dharmendra S. Modha¹•Institutions (1)

31 Mar 2003

TL;DR: The problem of cache management in a demand paging scenario with uniform page sizes is considered and a new cache management policy, namely, Adaptive Replacement Cache (ARC), is proposed that has several advantages.

...read moreread less

Abstract: We consider the problem of cache management in a demand paging scenario with uniform page sizes. We propose a new cache management policy, namely, Adaptive Replacement Cache (ARC), that has several advantages. In response to evolving and changing access patterns, ARC dynamically, adaptively, and continually balances between the recency and frequency components in an online and selftuning fashion. The policy ARC uses a learning rule to adaptively and continually revise its assumptions about the workload. The policy ARC is empirically universal, that is, it empirically performs as well as a certain fixed replacement policy-even when the latter uses the best workload-specific tuning parameter that was selected in an offline fashion. Consequently, ARC works uniformly well across varied workloads and cache sizes without any need for workload specific a priori knowledge or tuning. Various policies such as LRU-2, 2Q, LRFU, and LIRS require user-defined parameters, and, unfortunately, no single choice works uniformly well across different workloads and cache sizes. The policy ARC is simple-to-implement and, like LRU, has constant complexity per request. In comparison, policies LRU-2 and LRFU both require logarithmic time complexity in the cache size. The policy ARC is scan-resistant: it allows one-time sequential requests to pass through without polluting the cache. On 23 real-life traces drawn from numerous domains, ARC leads to substantial performance gains over LRU for a wide range of cache sizes. For example, for a SPC1 like synthetic benchmark, at 4GB cache, LRU delivers a hit ratio of 9.19% while ARC achieves a hit ratio of 20.

...read moreread less

938 citations

Journal Article•DOI•

A survey of Web cache replacement strategies

[...]

Stefan Podlipnig¹, László Böszörményi¹•Institutions (1)

Alpen-Adria-Universität Klagenfurt¹

01 Dec 2003-ACM Computing Surveys

TL;DR: This article proposes a classification for proposals for cache replacement that subsumes prior classifications and discusses the importance of cache replacement strategies in modern proxy caches and outlines potential future research topics.

...read moreread less

Abstract: Web caching is an important technique to scale the Internet. One important performance factor of Web caches is the replacement strategy. Due to specific characteristics of the World Wide Web, there exist a huge number of proposals for cache replacement. This article proposes a classification for these proposals that subsumes prior classifications. Using this classification, different proposals and their advantages and disadvantages are described. Furthermore, the article discusses the importance of cache replacement strategies in modern proxy caches and outlines potential future research topics.

...read moreread less

767 citations

Book Chapter•DOI•

Cache-oblivious algorithms

[...]

Charles E. Leiserson¹•Institutions (1)

Massachusetts Institute of Technology¹

28 May 2003

TL;DR: It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal across a multilevel cache hierarchy, and it is shown that the assumption of optimal replacement made by the ideal-cache model can be simulated efficiently by LRU replacement.

...read moreread less

Abstract: Computers with multiple levels of caching have traditionally required techniques such as data blocking in order for algorithms to exploit the cache hierarchy effectively. These "cache-aware" algorithms must be properly tuned to achieve good performance using so-called "voodoo" parameters which depend on hardware properties, such as cache size and cache-line length. Surprisingly, however, for a variety of problems - including matrix multiplication, FFT, and sorting - asymptotically optimal "cache-oblivious" algorithms do exist that contain no voodoo parameters. They perform an optimal amount of work and move data optimally among multiple levels of cache. Since they need not be tuned, cache-oblivious algorithms are more portable than traditional cache-aware algorithms. We employ an "ideal-cache" model to analyze these algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal across a multilevel cache hierarchy. We also show that the assumption of optimal replacement made by the ideal-cache model can be simulated efficiently by LRU replacement. We also provide some empirical results on the effectiveness of cache-oblivious algorithms in practice.

...read moreread less

604 citations

Proceedings Article•DOI•

A highly configurable cache architecture for embedded systems

[...]

Chuanjun Zhang¹, Frank Vahid¹, Walid Najjar¹•Institutions (1)

University of California, Riverside¹

01 May 2003

TL;DR: This work introduces a novel cache architecture intended for embedded microprocessor platforms that can be configured by software to be direct-mapped, two-way, or four-way set associative, using a technique the authors call way concatenation, having very little size or performance overhead.

...read moreread less

Abstract: Energy consumption is a major concern in many embedded computing systems. Several studies have shown that cache memories account for about 50% of the total energy consumed in these systems. The performance of a given cache architecture is largely determined by the behavior of the application using that cache. Desktop systems have to accommodate a very wide range of applications and therefore the manufacturer usually sets the cache architecture as a compromise given current applications, technology and cost. Unlike desktop systems, embedded systems are designed to run a small range of well-defined applications. In this context, a cache architecture that is tuned for that narrow range of applications can have both increased performance as well as lower energy consumption. We introduce a novel cache architecture intended for embedded microprocessor platforms. The cache can be configured by software to be direct-mapped, two-way, or four-way set associative, using a technique we call way concatenation, having very little size or performance overhead. We show that the proposed cache architecture reduces energy caused by dynamic power compared to a way-shutdown cache. Furthermore, we extend the cache architecture to also support a way shutdown method designed to reduce the energy from static power that is increasing in importance in newer CMOS technologies. Our study of 23 programs drawn from Powerstone, MediaBench and Spec2000 show that tuning the cache's configuration saves energy for every program compared to conventional four-way set-associative as well as direct mapped caches, with average savings of 40% compared to a four-way conventional cache.

...read moreread less

323 citations

Journal Article•DOI•

A scalable low-latency cache invalidation strategy for mobile environments

[...]

Guohong Cao¹•Institutions (1)

Pennsylvania State University¹

01 Sep 2003-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Compared to previous IR-based schemes, this scheme can significantly improve the throughput and reduce the query latency, the number of uplink request, and the broadcast bandwidth requirements.

...read moreread less

Abstract: Caching frequently accessed data items on the client side is an effective technique for improving performance in a mobile environment. Classical cache invalidation strategies are not suitable for mobile environments due to frequent disconnections and mobility of the clients. One attractive cache invalidation technique is based on invalidation reports (IRs). However, the IR-based cache invalidation solution has two major drawbacks, which have not been addressed in previous research. First, there is a long query latency associated with this solution since a client cannot answer the query until the next IR interval. Second, when the server updates a hot data item, all clients have to query the server and get the data from the server separately, which wastes a large amount of bandwidth. In this paper, we propose an IR-based cache invalidation algorithm, which can significantly reduce the query latency and efficiently utilize the broadcast bandwidth. Detailed analytical analysis and simulation experiments are carried out to evaluate the proposed methodology. Compared to previous IR-based schemes, our scheme can significantly improve the throughput and reduce the query latency, the number of uplink request, and the broadcast bandwidth requirements.

...read moreread less

210 citations

Proceedings Article•DOI•

Distance associativity for high-performance energy-efficient non-uniform cache architectures

[...]

Zeshan A. Chishti¹, Michael D. Powell¹, T. N. Vijaykumar¹•Institutions (1)

Purdue University¹

03 Dec 2003

TL;DR: NuRAPID is proposed, which averages sequential tag-data access to decouple data placement from tag placement, resulting in higher performance and substantially lower cache energy.

...read moreread less

Abstract: Wire delays continue to grow as the dominant component oflatency for large caches.A recent work proposed an adaptive,non-uniform cache architecture (NUCA) to manage large, on-chipcaches.By exploiting the variation in access time acrosswidely-spaced subarrays, NUCA allows fast access to closesubarrays while retaining slow access to far subarrays.Whilethe idea of NUCA is attractive, NUCA does not employ designchoices commonly used in large caches, such as sequential tag-dataaccess for low power.Moreover, NUCA couples dataplacement with tag placement foregoing the flexibility of dataplacement and replacement that is possible in a non-uniformaccess cache.Consequently, NUCA can place only a few blockswithin a given cache set in the fastest subarrays, and mustemploy a high-bandwidth switched network to swap blockswithin the cache for high performance.In this paper, we proposethe Non-uniform access with Replacement And PlacementusIng Distance associativity" cache, or NuRAPID, whichleverages sequential tag-data access to decouple data placementfrom tag placement.Distance associativity, the placementof data at a certain distance (and latency), is separated from setassociativity, the placement of tags within a set.This decouplingenables NuRAPID to place flexibly the vast majority offrequently-accessed data in the fastest subarrays, with fewerswaps than NUCA.Distance associativity fundamentallychanges the trade-offs made by NUCA's best-performingdesign, resulting in higher performance and substantiallylower cache energy.A one-ported, non-banked NuRAPIDcache improves performance by 3% on average and up to 15%compared to a multi-banked NUCA with an infinite-bandwidthswitched network, while reducing L2 cache energy by 77%.

...read moreread less

210 citations

Patent•

System and method to refresh proxy cache server objects

[...]

Gerard Marmigere¹, Joaquin Picon¹, Pierre Secondo¹•Institutions (1)

IBM¹

12 Aug 2003

TL;DR: In this paper, a method and computing system for refreshing objects stored by a proxy cache server from Web content servers was proposed, which offloads the computing resources involved for data transfer through the network connecting the servers, and the refreshed objects are not sent by the Web content server if the last modified date has changed but rather if, and only if, the object content, identified by a signature has been changed.

...read moreread less

Abstract: A method and computing systems for refreshing objects stored by a Proxy cache server from Web content servers. The refresh is requested by the Proxy cache server only if the expiration date attached to a stored object has expired. The refresh of one object is requested by the Proxy cache server to the Web content server upon a request from the browser of a client device. Additionally, the Proxy cache server can send a Refresh_request command to the Web content servers applying to a list of objects for which the expiration date has expired. The refreshed objects are not sent by the Web content server if the last modified date has changed but rather if, and only if, the object content, identified by a signature, has been changed. This method and system have the advantage of offloading the computing resources involved for data transfer through the network connecting the servers.

...read moreread less

182 citations

Patent•

System and method for optimizing internet applications

[...]

Kailai Chen, John Gnip, Richard Dubilier, Michael Corcoran

24 Mar 2003

TL;DR: In this article, a centralized cache server connected to a plurality of web servers provides a cached copy of the requested dynamic content if it is available in its cache, if the cached copy is still fresh.

...read moreread less

Abstract: A method and system for optimizing Internet applications. A centralized cache server connected to a plurality of web servers provides a cached copy of the requested dynamic content if it is available in its cache. Preferably, the centralized cache server determines if the cached copy is still fresh. If the requested content is unavailable from its cache, the centralized cache server directs the client request to the application server. The response is delivered to the client and a copy of the response is stored in the cache by the centralized cache server. Preferably, the centralized cache server utilizes a pre-determined caching rules to selectively store the response from the application server.

...read moreread less

163 citations

Book Chapter•DOI•

An Overview of Cache Optimization Techniques and Cache-Aware Numerical Algorithms

[...]

Markus Kowarschik¹, Christian Weiß²•Institutions (2)

University of Erlangen-Nuremberg¹, Technische Universität München²

01 Jan 2003-Lecture Notes in Computer Science

TL;DR: In this article, the authors focus on optimization techniques for enhancing cache performance by hiding both the low main memory bandwidth and the latency of main memory accesses which is slow in contrast to the floating-point performance of the CPUs.

...read moreread less

Abstract: In order to mitigate the impact of the growing gap between CPU speed and main memory performance, today’s computer architectures implement hierarchical memory structures. The idea behind this approach is to hide both the low main memory bandwidth and the latency of main memory accesses which is slow in contrast to the floating-point performance of the CPUs. Usually, there is a small and expensive high speed memory sitting on top of the hierarchy which is usually integrated within the processor chip to provide data with low latency and high bandwidth; i.e., the CPU registers. Moving further away from the CPU, the layers of memory successively become larger and slower. The memory components which are located between the processor core and main memory are called cache memories or caches. They are intended to contain copies of main memory blocks to speed up accesses to frequently needed data [378], [392]. The next lower level of the memory hierarchy is the main memory which is large but also comparatively slow. While external memory such as hard disk drives or remote memory components in a distributed computing environment represent the lower end of any common hierarchical memory design, this paper focuses on optimization techniques for enhancing cache performance.

...read moreread less

157 citations

Proceedings Article•DOI•

Data cache locking for higher program predictability

[...]

Xavier Vera¹, Björn Lisper¹, Jingling Xue²•Institutions (2)

Mälardalen University College¹, University of New South Wales²

10 Jun 2003

TL;DR: This paper combines compile-time cache analysis with data cache locking to estimate the worst-case memory performance (WCMP) in a safe, tight and fast way, and shows that this scheme is fully predictable, without compromising the performance of the transformed program.

...read moreread less

Abstract: Caches have become increasingly important with the widening gap between main memory and processor speeds. However, they are a source of unpredictability due to their characteristics, resulting in programs behaving in a different way than expected.Cache locking mechanisms adapt caches to the needs of real-time systems. Locking the cache is a solution that trades performance for predictability: at a cost of generally lower performance, the time of accessing the memory becomes predictable.This paper combines compile-time cache analysis with data cache locking to estimate the worst-case memory performance (WCMP) in a safe, tight and fast way. In order to get predictable cache behavior, we first lock the cache for those parts of the code where the static analysis fails. To minimize the performance degradation, our method loads the cache, if necessary, with data likely to be accessed.Experimental results show that this scheme is fully predictable, without compromising the performance of the transformed program. When compared to an algorithm that assumes compulsory misses when the state of the cache is unknown, our approach eliminates all overestimation for the set of benchmarks, giving an exact WCMP of the transformed program without any significant decrease in performance.

...read moreread less

155 citations

Book Chapter•DOI•

Cache tables: paving the way for an adaptive database cache

[...]

Mehmet Altinel¹, Christof Bornhövd¹, Sailesh Krishnamurthy², Chandrasekaran Mohan¹, Hamid Pirahesh¹, Berthold Reinwald¹ - Show less +2 more•Institutions (2)

IBM¹, University of California, Berkeley²

09 Sep 2003

TL;DR: This work introduces a new database object called Cache Table that enables persistent caching of the full or partial content of a remote database table that supports transparent caching both at the edge of content-delivery networks and in the middle of an enterprise application infrastructure, improving the response time, throughput and scalability of transactional web applications.

...read moreread less

Abstract: We introduce a new database object called Cache Table that enables persistent caching of the full or partial content of a remote database table. The content of a cache table is either defined declaratively and populated in advance at setup time, or determined dynamically and populated on demand at query execution time. Dynamic cache tables exploit the characteristics of typical transactional web applications with a high volume of short transactions, simple equality predicates, and 3-4 way joins. Based on federated query processing capabilities, we developed a set of new technologies for database caching: cache tables, "Janus" (two-headed) query execution plans, cache constraints, and asynchronous cache population methods. Our solution supports transparent caching both at the edge of content-delivery networks and in the middle-tier of an enterprise application infrastructure, improving the response time, throughput and scalability of transactional web applications.

...read moreread less

Journal Article•DOI•

Adaptive mode control: A static-power-efficient cache design

[...]

Huiyang Zhou¹, Mark C. Toburen¹, Eric Rotenberg¹, Thomas M. Conte¹•Institutions (1)

North Carolina State University¹

01 Aug 2003-ACM Transactions in Embedded Computing Systems

TL;DR: Simulations show that an average of 73% of I-cache lines and 54% of D- caches are put in sleep mode with an average IPC impact of only 1.7%, for 64 KB caches, and this work proposes applying sleep mode only to the data store and not the tag store.

...read moreread less

Abstract: Lower threshold voltages in deep submicron technologies cause more leakage current, increasing static power dissipation. This trend, combined with the trend of larger/more cache memories dominating die area, has prompted circuit designers to develop SRAM cells with low-leakage operating modes (e.g., sleep mode). Sleep mode reduces static power dissipation, but data stored in a sleeping cell is unreliable or lost. So, at the architecture level, there is interest in exploiting sleep mode to reduce static power dissipation while maintaining high performance.Current approaches dynamically control the operating mode of large groups of cache lines or even individual cache lines. However, the performance monitoring mechanism that controls the percentage of sleep-mode lines, and identifies particular lines for sleep mode, is somewhat arbitrary. There is no way to know what the performance could be with all cache lines active, so arbitrary miss rate targets are set (perhaps on a per-benchmark basis using profile information), and the control mechanism tracks these targets. We propose applying sleep mode only to the data store and not the tag store. By keeping the entire tag store active the hardware knows what the hypothetical miss rate would be if all data lines were active, and the actual miss rate can be made to precisely track it. Simulations show that an average of 73p of I-cache lines and 54p of D-cache lines are put in sleep mode with an average IPC impact of only 1.7p, for 64 KB caches.

...read moreread less

Proceedings Article•DOI•

ICR: in-cache replication for enhancing data cache reliability

[...]

Wei Zhang¹, Sudhanva Gurumurthi¹, Mahmut Kandemir, Anand Sivasubramaniam•Institutions (1)

Pennsylvania State University¹

22 Jun 2003

TL;DR: This paper proposes a novel solution to this problem by allowing in-cache replication, wherein reliability can be enhanced without excessively slowing down cache accesses or requiring significant area cost increases.

...read moreread less

Abstract: Processor caches already play a critical role in the performance of today’s computer systems. At the same time, the data integrity of words coming out of the caches can have serious consequences on the ability of a program to execute correctly, or even to proceed. The integrity checks need to be performed in a time-sensitive manner to not slow down the execution when there are no errors as in the common case, and should not excessively increase the power budget of the caches which is already high. ECC and parity-based protection techniques in use today fall at either extremes in terms of compromising one criteria for another, i.e., reliability for performance or vice-versa. This paper proposes a novel solution to this problem by allowing in-cache replication, wherein reliability can be enhanced without excessively slowing down cache accesses or requiring significant area cost increases. The mechanism is fairly power efficient in comparison to other alternatives as well. In particular, the solution replicates data that is in active use within the cache itself while evicting those that may not be needed in the near future. Our experiments show that a large fraction of the data read from the cache have replicas available with this optimization.

...read moreread less

Patent•

Method of dynamically controlling cache size

[...]

Erwin B. Cohen¹, Thomas E. Cook², Ian R. Govett², Paul D. Kartschoke², Stephen V. Kosonocky², Peter A. Sandon², Keith R. Williams² - Show less +3 more•Institutions (2)

IBM¹, GlobalFoundries²

14 Oct 2003

TL;DR: In this article, a power saving cache includes circuitry to dynamically reduce the logical size of the cache in order to save power, using a variety of combinable hardware and software techniques.

...read moreread less

Abstract: A power saving cache and a method of operating a power saving cache. The power saving cache includes circuitry to dynamically reduce the logical size of the cache in order to save power. Preferably, a method is used to determine optimal cache size for balancing power and performance, using a variety of combinable hardware and software techniques. Also, in a preferred embodiment, steps are used for maintaining coherency during cache resizing, including the handling of modified (“dirty”) data in the cache, and steps are provided for partitioning a cache in one of several way to provide an appropriate configuration and granularity when resizing.

...read moreread less

Patent•

Reliability of diskless network-bootable computers using non-volatile memory cache

[...]

Clark D. Nicholson¹, William J. Westerinen¹, Cenk Ergan¹, Michael R. Fortin¹, Mehmet Iyigun¹ - Show less +1 more•Institutions (1)

Microsoft¹

29 Oct 2003

TL;DR: In this paper, a method and apparatus is provided that provides a reliable diskless network-bootable computers using a local non-volatile memory (NVM) cache, which allows the user to continue operating during network outages and the computer can be cold booted using the data in the NVM cache if the network is unavailable.

...read moreread less

Abstract: A method and apparatus is provided that provides a reliable diskless network-bootable computers using a local non-volatile memory (NVM) cache. The NVM cache is used by the computer when the network is temporarily unavailable or slow. The cache is later synchronized with a remote boot server having remote storage volumes when network conditions improve. It is determined if data is to be stored in the NVM cache or the remote storage volume. Data sent to the remote storage volume is transactionally written and the data is cached in the NVM cache if a network outage is occurring or a transaction complete message has not been received. The data stored in the NVM cache allows the user to continue operating during network outages and the computer can be cold-booted using the data in the NVM cache if the network is unavailable.

...read moreread less

Patent•

Integrated web cache

[...]

Milind M. Buddhikot, Girish P. Chandranmenon, Seung-Jae Han, Yui-Wah Lee, Scott C. Miller, Luca Salgarelli - Show less +2 more

20 Oct 2003

TL;DR: A gateway for mobile communications comprises a cache for storing network data recently downloaded from a network, a foreign agent, and a packet filter that directs requests for the network data from a mobile node to the cache.

...read moreread less

Abstract: A gateway for mobile communications comprises a cache for storing network data recently downloaded from a network, a foreign agent, and a packet filter that directs requests for the network data from a mobile node to the cache. The packet filter directs the requested network data from the cache to the mobile node by way of the foreign agent, without forwarding the requested network data to a home agent of the mobile node.

...read moreread less

Patent•

Fetch operations in a disk drive control system

[...]

William B. Boyle¹•Institutions (1)

Western Digital¹

31 Jul 2003

TL;DR: In this article, a method and system for improving fetch operations between a microcontroller and a remote memory via a buffer manager in a disk drive control system comprising a micro-controller, a micro controller cache system having a cache memory and a cache-control subsystem, and buffer manager communicating with microcontroller cache system and remote memory.

...read moreread less

Abstract: A method and system for improving fetch operations between a micro-controller and a remote memory via a buffer manager in a disk drive control system comprising a micro-controller, a micro-controller cache system having a cache memory and a cache-control subsystem, and a buffer manager communicating with micro-controller cache system and remote memory. The invention includes receiving a data-request from micro-controller in cache control subsystem wherein the data-request comprises a request for at least one of instruction code and non-instruction data. The invention further includes providing the requested data to micro-controller if the requested data reside in cache memory, determining if the received data-request is for non-instruction data if requested data does not reside in cache memory, fetching the non-instruction data from remote memory by micro-controller cache system via buffer manager, and bypassing cache memory to preserve the contents of cache memory and provide the fetched non-instruction data to micro-controller.

...read moreread less

Patent•

Computer system implementing synchronized broadcast using timestamps

[...]

Robert E. Cypher¹, Darien Wood², Mark D. Hill¹, Thomas M. Wicki¹•Institutions (2)

Sun Microsystems¹, Oracle Corporation²

30 Jun 2003

TL;DR: In this paper, the cache is configured to inhibit receipt of the corresponding data packet based on a value of a timestamp associated with the corresponding packet, and the ownership responsibility for the given block transitions in response to a corresponding address packet being received by the cache.

...read moreread less

Abstract: A computer system may include a system memory, an active device configured to access data stored in the system memory, where the active device includes a cache configured to store data accessed by the active device, an address network for conveying address packets between the active device and the system memory, and a data network for conveying data packets between the active device and the system memory. An access right corresponding to a given block allocated in the cache transitions in response to a corresponding data packet being received by the cache. An ownership responsibility for the given block transitions in response to a corresponding address packet being received by the cache. The access right transitions at a different time than the ownership responsibility transitions. The cache is configured to inhibit receipt of the corresponding data packet based on a value of a timestamp associated with the corresponding data packet.

...read moreread less

Patent•

Node caching system for streaming media applications

[...]

Guy Huggins, Sergey Yasevich, David Kopaniky

05 May 2003

TL;DR: In this article, an edge server and caching system is proposed, where the edge server may have a cache, cache listing, profile data, multimedia server, and internet information server.

...read moreread less

Abstract: The invention is directed to an edge server and caching system. The edge server may have a cache, cache listing, profile data, multimedia server, and internet information server. A viewer may request a file with a specific version. The edge server may determine if the file is stored locally. If the file is not stored locally, the edge server may simultaneously cache and stream the media. If the file is available, the media may be streamed from the cache. The cache may be managed with a cache listing. The cache listing may be ordered by time of last use and may have profile data. Storage capacity may be managed by deleting the last file in the list. The profile data may be used to manage and distribute streaming media.

...read moreread less

Patent•

Non-volatile mass storage cache coherency apparatus

[...]

Richard L. Coulson¹•Institutions (1)

Intel¹

22 Dec 2003

TL;DR: In this paper, the cache coherency administrator can include a display to indicate a cache co-herency status of a non-volatile cache, which can be used to check the cache's integrity.

...read moreread less

Abstract: Apparatus and methods relating to a cache coherency administrator. The cache coherency administrator can include a display to indicate a cache coherency status of a non-volatile cache.

...read moreread less

Journal Article•DOI•

Nonuniform cache architectures for wire-delay dominated on-chip caches

[...]

Changkyu Kim, Doug Burger, Stephen W. Keckler

01 Nov 2003-IEEE Micro

TL;DR: The authors propose several designs that treat the cache as a network of banks and facilitate nonuniform accesses to different physical regions that offer low-latency access, increased scalability, and greater performance stability than conventional uniform access cache architectures.

...read moreread less

Abstract: Nonuniform cache access designs solve the on-chip wire delay problem for future large integrated caches. By embedding a network in the cache, NUCA designs let data migrate within the cache, clustering the working set nearest the processor. The authors propose several designs that treat the cache as a network of banks and facilitate nonuniform accesses to different physical regions. NUCA architectures offer low-latency access, increased scalability, and greater performance stability than conventional uniform access cache architectures.

...read moreread less

Proceedings Article•DOI•

Accurate estimation of cache-related preemption delay

[...]

Hemendra Singh Negi¹, Tulika Mitra¹, Abhik Roychoudhury¹•Institutions (1)

National University of Singapore¹

01 Oct 2003

TL;DR: This paper provides a program path analysis technique to estimate cache-related preemption delay (CRPD), and improves the accuracy of the analysis by estimating the possible states of the entire cache at each possible preemption point rather than estimating the states of each cache block independently.

...read moreread less

Abstract: Multitasked real-time systems often employ caches to boost performance. However the unpredictable dynamic behavior of caches makes schedulability analysis of such systems difficult. In particular, the effect of caches needs to be considered for estimating the inter-task interference. As the memory blocks of different tasks can map to the same cache blocks, preemption of a task may introduce additional cache misses. The time penalty introduced by these misses is called the cache-related preemption delay (CRPD). In this paper, we provide a program path analysis technique to estimate CRPD. Our technique performs path analysis of both the preempted and the preempting tasks. Furthermore, we improve the accuracy of the analysis by estimating the possible states of the entire cache at each possible preemption point rather than estimating the states of each cache block independently. To avoid incurring high space requirements, the cache states can be maintained symbolically as a binary decision diagram. Experimental results indicate that we obtain tight CRPD estimates for realistic benchmarks.

...read moreread less

Journal Article•DOI•

Defending against cache-based side-channel attacks

[...]

Daniel Page

01 Mar 2003-Information Security Technical Report

TL;DR: A number of hardware- and software-based approaches to defending against methods of attack using cache-based side-channel analysis are surveyed and evaluated using simulated results.

...read moreread less

Patent•

Method and system for a cache replacement technique with adaptive skipping

[...]

Dharmendra S. Modha¹•Institutions (1)

IBM¹

21 Oct 2003

TL;DR: In this article, the authors propose a method, system, and program storage medium for adaptively managing pages in a cache memory included within a system having a variable workload, comprising arranging cache memory including a pointer that rotates around a circular buffer; maintaining a bit for each page in the circular buffer, wherein a bit value 0 indicates that the page was not accessed by the system since a last time that the pointer traversed over the page, and a hit value 1 indicates that a page has been accessed since the last time the pointer was accessed.

...read moreread less

Abstract: A method, system, and program storage medium for adaptively managing pages in a cache memory included within a system having a variable workload, comprising arranging a cache memory included within a system into a circular buffer; maintaining a pointer that rotates around the circular buffer; maintaining a bit for each page in the circular buffer, wherein a bit value 0 indicates that the page was not accessed by the system since a last time that the pointer traversed over the page, and a hit value 1 indicates that the page has been accessed since the last time the pointer traversed over the page; and dynamically controlling a distribution of a number of pages in the cache memory that are marked with bit 0 in response to a variable workload in order to increase a hit ratio of the cache memory.

...read moreread less

Patent•

System and method for maintaining cache coherency without external controller intervention

[...]

Robert Horn, Biswajit Khandai

05 May 2003

TL;DR: In this article, a disk array includes a system and method for cache management and conflict detection, where incoming host commands are processed by a storage controller, which identifies a set of at least one cache segment descriptor associated with the requested address range.

...read moreread less

Abstract: A disk array includes a system and method for cache management and conflict detection. Incoming host commands are processed by a storage controller, which identifies a set of at least one cache segment descriptor (CSD) associated with the requested address range. Command conflict detection can be quickly performed by examining the state information of each CSD associated with the command. The use of CSDs therefore permits the present invention to rapidly and efficiently perform read and write commands and detect conflicts.

...read moreread less

Patent•

Method and apparatus for preloading caches

[...]

Simon Hugh Cassia, Keith Charles Day, Simon David Wood

06 Aug 2003

TL;DR: In this article, a method for preloading data on a cache (210) in a local machine (235), coupled to a data store (130), in a remote host machine (240), is described.

...read moreread less

Abstract: A method (400) of preloading data on a cache (210) in a local machine (235). The cache (210) is operably coupled to a data store (130), in a remote host machine (240). The method includes the steps of determining a user behaviour profile for the local machine (235); retrieving data relating to the user behaviour profile from the data store (130); and preloading the retrieved data in the cache (210), such that the data is made available to the cache user when desired. A local machine, a host machine, a cache, a communication system and preloading functions are also described. In this manner, data within the cache is maintained and replaced in a substantially optimal manner, and configured to be available to a cache user when it is predicted that the user wishes to access the data.

...read moreread less

Proceedings Article•DOI•

The performance of runtime data cache prefetching in a dynamic optimization system

[...]

Jiwei Lu¹, Howard Chen¹, Rao Fu¹, Wei-Chung Hsu¹, Bobbie Othmer¹, Pen-Chung Yew¹, Dong-Yuan Chen² - Show less +3 more•Institutions (2)

University of Minnesota¹, Intel²

03 Dec 2003

TL;DR: In this paper, a runtime data cache prefetching in the dynamic optimization system ADORE (ADaptive Object code Reoptimization) is proposed. But the performance of this approach is limited due to the lack of runtime cache miss and miss address information.

...read moreread less

Abstract: Traditional software controlled data cache prefetching is often ineffective due to the lack of runtime cache miss and miss address information. To overcome this limitation, we implement runtime data cache prefetching in the dynamic optimization system ADORE (ADaptive Object code Reoptimization). Its performance has been compared with static software prefetching on the SPEC2000 benchmark suite. Runtime cache prefetching shows better performance. On an Itanium 2 based Linux workstation, it can increase performance by more than 20% over static prefetching on some benchmarks. For benchmarks that do not benefit from prefetching, the runtime optimization system adds only 1%-2% overhead. We have also collected cache miss profiles to guide static data cache prefetching in the ORC compiler. With that information the compiler can effectively avoid generating prefetches for loops that hit well in the data cache.

...read moreread less

Patent•

Cache management method for storage device

[...]

Kazuhiko Mogi, Norifumi Nishikawa, Yoshiaki Eguchi

25 Aug 2003

TL;DR: In this article, the authors proposed a cache management method that enables optimal cache space settings to be provided on a storage device in a computer system where database management systems (DBMSs) run.

...read moreread less

Abstract: A cache management method disclosed herein enables optimal cache space settings to be provided on a storage device in a computer system where database management systems (DBMSs) run. Through the disclosed method, cache space partitions to be used per data set are set, based on information about processes to be executed by the DBMSs, which is given as design information. For example, based on estimated rerun time of processes required after DBMS abnormal termination, cache space is adjusted to serve the needs of logs to be output from the DBMS. In another example, initial cache space allocations for table and index data is optimized, based on process types and approximate access characteristics of data. In yet another example, from a combination of results of pre-analysis of processes and cache operating statistics information, a change in process execution time by cache space tuning is estimated and a cache effect is enhanced.

...read moreread less

Patent•

Systems and methods for adjusting caching policies for web service requests

[...]

Gregory L. Truty¹•Institutions (1)

IBM¹

26 Jun 2003

TL;DR: In this article, a mechanism for caching Web services requests and responses, including testing an incoming request against the cached requests and associated responses, is provided, where the requests are selectively tested against cached data in accordance with a set of policies.

...read moreread less

Abstract: A mechanism for caching Web services requests and responses, including testing an incoming request against the cached requests and associated responses is provided. The requests are selectively tested against the cached data in accordance with a set of policies. If a request selected hits in the cache, the response is served up from the cache. Otherwise, the request is passed to the corresponding Web-services server/application. Additionally, a set of predetermined cache specifications for generating request identifiers may be provided. The identifier specification may be autonomically adjusted by determining cache hit/cache miss ratios over the set of identifier specifications and over a set of sample requests. The set of specifications may then be sorted to reflect the performance of the respective cache specification algorithms for the current mix of requests.

...read moreread less

Proceedings Article•

Eviction-based Cache Placement for Storage Caches.

[...]

Zhifeng Chen, Yuanyuan Zhou, Kai Li

01 Jan 2003

TL;DR: This paper presents an eviction-based placement policy for a storage cache that usually sits in the lower level of a multi-level buffer cache hierarchy and thereby has different access patterns from upper levels, and presents a method of using a client content tracking table to obtain eviction information from client buffer caches.

...read moreread less

Abstract: Most previous work on buffer cache management uses an access-based placement policy that places a data block into a buffer cache at the block’s access time. This paper presents an eviction-based placement policy for a storage cache that usually sits in the lower level of a multi-level buffer cache hierarchy and thereby has different access patterns from upper levels. The main idea of the eviction-based placement policy is to delay a block’s placement in the cache until it is evicted from the upper level. This paper also presents a method of using a client content tracking table to obtain eviction information from client buffer caches, which can avoid modifying client application source code. We have evaluated the performance of this eviction-based placement by using both simulations with real-world workloads, and implementations on a storage system connected to a Microsoft SQL server database. Our simulation results show that the eviction-based cache placement has an up to 500% improvement on cache hit ratios over the commonly used access-based placement policy. Our evaluation results using OLTP workloads have demonstrated that the eviction-based cache placement has a speedup of 1.2 on OLTP transaction rates.

...read moreread less

Collapse