scispace - formally typeset
Search or ask a question
Topic

Cache pollution

About: Cache pollution is a research topic. Over the lifetime, 11353 publications have been published within this topic receiving 262139 citations.


Papers
More filters
Patent
30 Sep 1999
TL;DR: In this paper, a multiple level cache structure and multiple level caching method that distributes I/O processing loads including caching operations between processors to provide higher performance processing, especially in a server environment is presented.
Abstract: This inventive provides a multiple level cache structure and multiple level caching method that distributes I/O processing loads including caching operations between processors to provide higher performance I/O processing, especially in a server environment. A method of achieving optimal data throughput by taking full advantage of multiple processing resources is disclosed. A method for managing the allocation of the data caches to optimize the host access time and parity generation is disclosed. A cache allocation for RAID stripes guaranteed to provide fast access times for the XOR engine by ensuring that all cache lines are allocated from the same cache level is disclosed. Allocation of cache lines for RAID levels which do not require parity generation and are allocated in such manner as to maximize utilization of the memory bandwidth is disclosed. Parity generation which is optimized for use of the processor least utilized at the time the cache lines are allocated, thereby providing for dynamic load balancing amongst the multiple processing resources, is disclosed. An inventive cache line descriptor for maintaining information about which cache data pool the cache line resides within, and an inventive cache line descriptor which includes enhancements to allow for movement of cache data from one cache level to another is disclosed. A cache line descriptor with enhancements for tracking the cache within which RAID stripe cache lines siblings reside is disclosed. System, apparatus, computer program product, and methods to support these aspects alone and in combination are also provided.

84 citations

Patent
03 Apr 2003
TL;DR: In this paper, a cache memory system to support re-synchronisation of non-volatile cache memories following interruption in communication is presented, where the data to be transferred represents only the transactions which were in progress at the time of the reset or failure, rather than the entire non-vatile cache contents.
Abstract: An arrangement and methods for operation in a cache memory system to facitate re-synchronising non-volatile cache memories (150B, 160B) following interruption in communication. A primary adapter (150) creates a non-volatile record (150C) of each cache update before it is applied to either cache. Each such record is cleared when the primary adapter knows that the cache update has been applied to both adapters' caches. In the event of a reset or other failure, the primary adapter can read the non-volatile list of transfers which were ongoing. For each entry in this list, the primary adapter negotiates with the secondary adapter (160) and transfers only the data which may be different. The amount of data to be transferred between the adapters following reset/failure is generally much lower than under previous solutions, since the data to be transferred represents only the transactions which were in progress at the time of the reset or failure, rather than the entire non-volatile cache contents; also, new transactions need not be suspended while even this reduced resynchronisation takes place: all that is necessary is for the (relatively short) list of in-doubt quanta of data to be searched (if the transaction does not overlap any entries in this list then it need not be suspended; if it does overlap then the transaction may be queued until the resynchronisation completes).

84 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a way-halting cache, which is a four-way set-associative cache that stores the four lowest-order bits of all ways' tags into a fully associative memory.
Abstract: Caches contribute to much of a microprocessor system's power and energy consumption. Numerous new cache architectures, such as phased, pseudo-set-associative, way predicting, reactive-associative, way-shutdown, way-concatenating, and highly-associative, are intended to reduce power and/or energy, but they all impose some performance overhead. We have developed a new cache architecture, called a way-halting cache, that reduces energy further than previously mentioned architectures, while imposing no performance overhead. Our way-halting cache is a four-way set-associative cache that stores the four lowest-order bits of all ways' tags into a fully associative memory, which we call the halt tag array. The lookup in the halt tag array is done in parallel with, and is no slower than, the set-index decoding. The halt tag array predetermines which tags cannot match due to their low-order 4 bits mismatching. Further accesses to ways with known mismatching tags are then halted, thus saving power. Our halt tag array has an additional feature of using static logic only, rather than dynamic logic used in highly associative caches, making our cache simpler to design with existing tools. We provide data from experiments on 29 benchmarks drawn from Powerstone, Mediabench, and Spec 2000, based on our layouts in 0.18 micron CMOS technology. On average, we obtained 55p savings of memory-access related energy over a conventional four-way set-associative cache. We show that savings are greater than previous methods, and nearly twice that of highly associative caches, while imposing no performance overhead and only 2p cache area overhead.

83 citations

Proceedings ArticleDOI
01 May 1997
TL;DR: This paper discusses the implementation of the data prefetch generation algorithm in the Hewlett-Packard Precision Architecture (HP-PA) compiler and presents performance results for SPECfp95 on a PA-8000 system that show speedups, due to dataPrefetching, of up to 100%.
Abstract: Memory latency is a major issue for many modern microprocessor based systems, including the Hewlett-Packard PA-8000. Due to its fast clock rate and wide issue capability, cache misses in the PA-8000 are very expensive. The PA-8000 combines out-of-order execution with multiple outstanding memory requests to tolerate memory latency; however, this approach has its limitations. In order to substantially reduce much of the memory latency penalty, the PA-8000 uses software-based data cache prefetching. In this paper, we discuss the implementation of the data prefetch generation algorithm in the Hewlett-Packard Precision Architecture (HP-PA) compiler. We present performance results for SPECfp95 on a PA-8000 system that show speedups, due to data prefetching, of up to 100%.

83 citations

Proceedings ArticleDOI
06 Mar 2009
TL;DR: A fully hardwired coarse-grain data migration mechanism that dynamically monitors the access patterns of the cores at the granularity of a page to reduce the book-keeping overhead and decides when and where to migrate an entire page of data to amortize the performance overhead is proposed.
Abstract: As the last-level on-chip caches in chip-multiprocessors increase in size, the physical locality of on-chip data becomes important for delivering high performance. The non-uniform access latency seen by a core to different independent banks of a large cache spread over the chip necessitates active mechanisms for improving data locality. The central proposal of this paper is a fully hardwired coarse-grain data migration mechanism that dynamically monitors the access patterns of the cores at the granularity of a page to reduce the book-keeping overhead and decides when and where to migrate an entire page of data to amortize the performance overhead. The page-grain migration mechanism is compared against two variants of previously proposed cache block-grain dynamic migration mechanisms and two OS-assisted static locality management mechanisms. Our detailed execution-driven simulation of an eight-core chip-multiprocessor with a shared 16 MB L2 cache employing a bidirectional ring to connect the cores and the L2 cache banks shows that hardwired dynamic page migration, while using only 4.8% of extra storage out of the total L2 cache and book-keeping budget, delivers the best performance and energy-efficiency across a set of shared memory parallel applications selected from the SPLASH-2, SPEC OMP, DARPA DIS, and FFTW suites and multiprogrammed workloads prepared out of the SPEC 2000 and BioBench suites. It reduces execution time by 18.7% and 12.6% on average (geometric mean) respectively for the shared memory applications and the multiprogrammed workloads compared to a baseline architecture that distributes the pages round-robin across the L2 cache banks.

83 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
93% related
Compiler
26.3K papers, 578.5K citations
89% related
Scalability
50.9K papers, 931.6K citations
87% related
Server
79.5K papers, 1.4M citations
86% related
Static routing
25.7K papers, 576.7K citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202342
2022110
202112
202020
201915
201830