Topic
Smart Cache
About: Smart Cache is a research topic. Over the lifetime, 7680 publications have been published within this topic receiving 180618 citations.
Papers published on a yearly basis
Papers
More filters
••
13 Nov 2010TL;DR: A classification of applications into four cache usage categories is introduced and how applications from different categories affect each other's performance indirectly through cache sharing is discussed and a scheme to optimize such sharing is devised.
Abstract: Contention for shared cache resources has been recognized as a major bottleneck for multicores--especially for mixed workloads of independent applications. While most modern processors implement instructions to manage caches, these instructions are largely unused due to a lack of understanding of how to best leverage them. This paper introduces a classification of applications into four cache usage categories. We discuss how applications from different categories affect each other's performance indirectly through cache sharing and devise a scheme to optimize such sharing. We also propose a low-overhead method to automatically find the best per-instruction cache management policy. We demonstrate how the indirect cache-sharing effects of mixed workloads can be tamed by automatically altering some instructions to better manage cache resources. Practical experiments demonstrate that our software-only method can improve application performance up to 35% on x86 multicore hardware.
46 citations
•
TL;DR: This work explores adaptable caching strategies which balance the resource demands of each application and in turn lead to improvements in throughput for the collective workload, which provides chip designers the opportunity to maintain high performance as cache size and power budgets become a concern in the CMP design space.
Abstract: Chip multi-processors (CMP) are rapidly emerging as an important design paradigm for both high performance and embedded processors These machines provide an important performance alternative to increasing the clock frequency In spite of the increase in potential performance, several issues related to resource sharing on the chip can negatively impact the performance of embedded applications In particular, the shared on-chip caches make each job's memory access times dependent on the behavior of the other jobs sharing the cache If not adequately managed, this can lead to problems in meeting hard real-time scheduling constraints This work explores adaptable caching strategies which balance the resource demands of each application and in turn lead to improvements in throughput for the collective workload Experimental results demonstrate speedups of up to 147X for workloads of two co-scheduled applications compared against a fully-shared two-level cache hierarchy Additionally, the adaptable caching scheme is shown to achieve an average speedup of 110X over the leading cache partitioning model By dynamically managing cache storage for multiple application threads at runtime, sizable performance levels are achieved, which provides chip designers the opportunity to maintain high performance as cache size and power budgets become a concern in the CMP design space
46 citations
•
10 Jul 2008TL;DR: In this article, a device driver monitors which software applications currently running on a microprocessor are in a predetermined list and responsively dynamically writes the values to the microprocessor to configure its operating modes, such as data prefetching, branch prediction, instruction cache eviction, instruction execution suspension, sizes of cache memories, reorder buffer, store/load/fill queues, hashing algorithms related to data forwarding and branch target address cache indexing.
Abstract: A computing system includes a microprocessor that receives values for configuring operating modes thereof. A device driver monitors which software applications currently running on the microprocessor are in a predetermined list and responsively dynamically writes the values to the microprocessor to configure its operating modes. Examples of the operating modes the device driver may configure relate to the following: data prefetching; branch prediction; instruction cache eviction; instruction execution suspension; sizes of cache memories, reorder buffer, store/load/fill queues; hashing algorithms related to data forwarding and branch target address cache indexing; number of instruction translation, formatting, and issuing per clock cycle; load delay mechanism; speculative page tablewalks; instruction merging; out-of-order execution extent; caching of non-temporal hinted data; and serial or parallel access of an L2 cache and processor bus in response to an instruction cache miss.
46 citations
•
28 Dec 2004
TL;DR: In this paper, the authors present a system and method of common cache management, where a shared external memory is provided and populated by the VMs in the system with cache state information responsive to caching activity.
Abstract: A system and method of common cache management. Plural VMs each have a cache infrastructure component used by one or more additional components within each VM. An external cache is provided and shared by the components of each of the VMs. In one embodiment, a shared external memory is provided and populated by the VMs in the system with cache state information responsive to caching activity. This permits external monitoring of caching activity in the system.
46 citations
•
06 Mar 2002TL;DR: In this paper, the cache controllers and cache memory blocks are associated with second level cache, each processor accesses the second-level cache controllers upon missing in a first level cache of fixed size.
Abstract: A processor integrated circuit capable of executing more than one instruction stream has two or more processors. Each processor accesses instructions and data through a cache controller. There are multiple blocks of cache memory. Some blocks of cache memory may optionally be directly attached to particular cache controllers. The cache controllers access at least some of the multiple blocks of cache memory through high speed interconnect, these blocks being dynamically allocable to more than one cache controller. A resource allocation controller determines which cache memory controller has access to the dynamically allocable cache memory block. In an embodiment the cache controllers and cache memory blocks are associated with second level cache, each processor accesses the second level cache controllers upon missing in a first level cache of fixed size.
46 citations