scispace - formally typeset
Search or ask a question
Topic

Cache invalidation

About: Cache invalidation is a research topic. Over the lifetime, 10539 publications have been published within this topic receiving 245409 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: An efficient compiler framework for cache bypassing on GPUs is proposed and efficient algorithms that judiciously select global load instructions for cache access or bypass are presented.
Abstract: Graphics processing units (GPUs) have become ubiquitous for general purpose applications due to their tremendous computing power. Initially, GPUs only employ scratchpad memory as on-chip memory. Though scratchpad memory benefits many applications, it is not ideal for those general purpose applications with irregular memory accesses. Hence, GPU vendors have introduced caches in conjunction with scratchpad memory in the recent generations of GPUs. The caches on GPUs are highly configurable. The programmer or compiler can explicitly control cache access or bypass for global load instructions. This highly configurable feature of GPU caches opens up the opportunities for optimizing the cache performance. In this paper, we propose an efficient compiler framework for cache bypassing on GPUs. Our objective is to efficiently utilize the configurable cache and improve the overall performance for general purpose GPU applications. In order to achieve this goal, we first characterize GPU cache utilization and develop performance metrics to estimate the cache reuses and memory traffic. Next, we present efficient algorithms that judiciously select global load instructions for cache access or bypass. Finally, we present techniques to explore the unified cache and shared memory design space. We integrate our techniques into an automatic compiler framework that leverages parallel thread execution instruction set architecture to enable cache bypassing for GPUs. Experiments evaluation on NVIDIA GTX680 using a variety of applications demonstrates that compared to cache-all and bypass-all solutions, our techniques improve the performance from 4.6% to 13.1% for 16 KB L1 cache.

60 citations

Proceedings ArticleDOI
01 Oct 2006
TL;DR: This work deconstructs and compares the two dominant existing approaches for L1 data cache (L1D) error protection and presents a new error protection scheme, called the punctured ECC recovery cache (PERC), that achieves the best features of both existing schemes.
Abstract: We deconstruct and compare the two dominant existing approaches for L1 data cache (L1D) error protection, with respect to performance, L2 cache bandwidth, power, and area. The two approaches are: (1) parity on the L1D with write-through to an ECC-protected L2, and (2) ECC protection on the L1D. Qualitatively, the first approach requires a write-through L1D, which places a large bandwidth and power demand on the L2. The second approach adds more bits in the L1D for error protection, which adds to the L1D's area and power while degrading its performance. Our quantitative results show that the relative costs of the second approach are small and that its benefits outweigh these costs. We also present a new error protection scheme, called the Punctured ECC Recovery Cache (PERC), that achieves the best features of both existing schemes.

60 citations

Patent
Sandra K. Johnson1
26 Sep 2003
TL;DR: In this article, the authors present a method, system, and program for maintaining data in distributed caches, where a copy of an object is maintained in at least one cache, wherein multiple caches may have different versions of the object, and wherein the objects are capable of having modifiable data units.
Abstract: Provided are a method, system, and program for maintaining data in distributed caches. A copy of an object is maintained in at least one cache, wherein multiple caches may have different versions of the object, and wherein the objects are capable of having modifiable data units. Update information is maintained for each object maintained in each cache, wherein the update information for each object in each cache indicates the object, the cache including the object, and indicates whether each data unit in the object was modified. After receiving a modification to a target data unit in one target object in one target cache, the update information for the target object and target cache is updated to indicate that the target data unit is modified, wherein the update information for the target object in any other cache indicates that the target data unit is not modified.

60 citations

Patent
08 May 2013
TL;DR: In this article, a way table is provided for identifying which of a plurality of cache ways stores require data, and each way table entry corresponds to one of the translation look aside buffer (TLB) entries of the TLB and identifies, for each memory location, which cache way stores the data associated with that memory location.
Abstract: A data processing apparatus has a cache and a translation look aside buffer (TLB). A way table is provided for identifying which of a plurality of cache ways stores require data. Each way table entry corresponds to one of the TLB entries of the TLB and identifies, for each memory location of the page associated with the corresponding TLB entry, which cache way stores the data associated with that memory location. Also, the cache may be capable of servicing M access requests in the same processing cycle. An arbiter may select pending access requests for servicing by the cache in a way that ensures that the selected pending access requests specify a maximum of N different virtual page addresses, where N < M.

60 citations

Patent
04 Feb 2013
TL;DR: In this paper, a cache controller is coupled to at least two flash bricks, each comprising a flash memory, and metadata is updated to indicate the flash brick having the flash memory on which data units are cached, wherein the metadata is used to determine the flash bricks on which the cache controller caches received data units.
Abstract: Provided is a method for managing cache memory to cache data units in at least one storage device. A cache controller is coupled to at least two flash bricks, each comprising a flash memory. Metadata indicates a mapping of the data units to the flash bricks caching the data units, wherein the metadata is used to determine the flash bricks on which the cache controller caches received data units. The metadata is updated to indicate the flash brick having the flash memory on which data units are cached.

60 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
93% related
Scalability
50.9K papers, 931.6K citations
88% related
Server
79.5K papers, 1.4M citations
88% related
Network packet
159.7K papers, 2.2M citations
83% related
Dynamic Source Routing
32.2K papers, 695.7K citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202344
2022117
20214
20208
20197
201820