Topic
Smart Cache
About: Smart Cache is a research topic. Over the lifetime, 7680 publications have been published within this topic receiving 180618 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: This paper shows how the Wolman model is applied to large-scale caching systems in which the interior nodes belong to third-party content distribution services and correlates the model's predictions of interior cache behavior with empirical observations from the root caches of the NLANR cache hierarchy.
105 citations
••
TL;DR: DAT is presented, a technique that augments loop tiling with data alignment, achieving improved efficiency (by ensuring that the cache is never under-utilized) as well as improved flexibility (by eliminating self-interference cache conflicts independent of the tile size) in a more stable and better cache performance.
Abstract: Loop blocking (tiling) is a well-known compiler optimization that helps improve cache performance by dividing the loop iteration space into smaller blocks (tiles); reuse of array elements within each tile is maximized by ensuring that the working set for the tile fits into the data cache. Padding is a data alignment technique that involves the insertion of dummy elements into a data structure for improving cache performance. In this work, we present DAT, a technique that augments loop tiling with data alignment, achieving improved efficiency (by ensuring that the cache is never under-utilized) as well as improved flexibility (by eliminating self-interference cache conflicts independent of the tile size). This results in a more stable and better cache performance than existing approaches, in addition to maximizing cache utilization, eliminating self-interference, and minimizing cross-interference conflicts. Further, while all previous efforts are targeted at programs characterized by the reuse of a single array, we also address the issue of minimizing conflict misses when several tiled arrays are involved. To validate our technique, we ran extensive experiments using both simulations as well as actual measurements on SUN Sparc5 and Sparc10 workstations. The results on benchmarks exhibiting varying memory access patterns demonstrate the effectiveness of our technique through consistently high hit ratios and improved performance across varying problem sizes.
105 citations
•
22 Dec 2005TL;DR: In this article, the authors present a method for run-time cache optimization based on profiling a program code during a runtime execution, logging the performance for producing a cache log, and rearranging a portion of program code in view of the cache log for producing rearranged portion.
Abstract: A method ( 400 ) and system ( 106 ) is provided for run-time cache optimization. The method includes profiling ( 402 ) a performance of a program code during a run-time execution, logging ( 408 ) the performance for producing a cache log, and rearranging ( 410 ) a portion of program code in view of the cache log for producing a rearranged portion. The rearranged portion is supplied to a memory management unit ( 240 ) for managing at least one cache memory ( 110 - 140 ). The cache log can be collected during a real-time operation of a communication device and is fed back to a linking process ( 244 ) to maximize a cache locality compile-time. The method further includes loading a saved profile corresponding with a run-time operating mode, and reprogramming a new code image associated with the saved profile.
104 citations
••
TL;DR: The authors present the least-unified value algorithm, which performs better than existing algorithms for replacing nonuniform data objects in wide-area distributed environments.
Abstract: Cache performance depends heavily on replacement algorithms, which dynamically select a suitable subset of objects for caching in a finite space. Developing such algorithms for wide-area distributed environments is challenging because, unlike traditional paging systems, retrieval costs and object sizes are not necessarily uniform. In a uniform caching environment, a replacement algorithm generally seeks to reduce cache misses, usually by replacing an object with the least likelihood of re-reference. In contrast, reducing total cost incurred due to cache misses is more important in nonuniform caching environments. The authors present the least-unified value algorithm, which performs better than existing algorithms for replacing nonuniform data objects in wide-area distributed environments.
104 citations
•
30 Sep 2005TL;DR: In this paper, instruction-assisted cache management for efficient use of cache and memory is discussed, where hints (e.g., modifiers) are added to read and write memory access instructions to identify the memory access for temporal data.
Abstract: Instruction-assisted cache management for efficient use of cache and memory. Hints (e.g., modifiers) are added to read and write memory access instructions to identify the memory access is for temporal data. In view of such hints, alternative cache policy and allocation policies are implemented that minimize cache and memory access. Under one policy, a write cache miss may result in a write of data to a partial cache line without a memory read/write cycle to fill the remainder of the line. Under another policy, a read cache miss may result in a read from memory without allocating or writing the read data to a cache line. A cache line soft-lock mechanism is also disclosed, wherein cache lines may be selectably soft locked to indicate preference for keeping those cache lines over non-locked lines.
104 citations