Topic
Smart Cache
About: Smart Cache is a research topic. Over the lifetime, 7680 publications have been published within this topic receiving 180618 citations.
Papers published on a yearly basis
Papers
More filters
•
06 Aug 2003
TL;DR: In this article, a method for preloading data on a cache (210) in a local machine (235), coupled to a data store (130), in a remote host machine (240), is described.
Abstract: A method (400) of preloading data on a cache (210) in a local machine (235). The cache (210) is operably coupled to a data store (130), in a remote host machine (240). The method includes the steps of determining a user behaviour profile for the local machine (235); retrieving data relating to the user behaviour profile from the data store (130); and preloading the retrieved data in the cache (210), such that the data is made available to the cache user when desired. A local machine, a host machine, a cache, a communication system and preloading functions are also described. In this manner, data within the cache is maintained and replaced in a substantially optimal manner, and configured to be available to a cache user when it is predicted that the user wishes to access the data.
97 citations
••
03 Dec 2003TL;DR: In this paper, a runtime data cache prefetching in the dynamic optimization system ADORE (ADaptive Object code Reoptimization) is proposed. But the performance of this approach is limited due to the lack of runtime cache miss and miss address information.
Abstract: Traditional software controlled data cache prefetching is often ineffective due to the lack of runtime cache miss and miss address information. To overcome this limitation, we implement runtime data cache prefetching in the dynamic optimization system ADORE (ADaptive Object code Reoptimization). Its performance has been compared with static software prefetching on the SPEC2000 benchmark suite. Runtime cache prefetching shows better performance. On an Itanium 2 based Linux workstation, it can increase performance by more than 20% over static prefetching on some benchmarks. For benchmarks that do not benefit from prefetching, the runtime optimization system adds only 1%-2% overhead. We have also collected cache miss profiles to guide static data cache prefetching in the ORC compiler. With that information the compiler can effectively avoid generating prefetches for loops that hit well in the data cache.
97 citations
••
27 Jun 2004TL;DR: The perhaps surprising result that for sufficiently parallel computations the shared cache need only be an additive size larger than the single-processor cache is given, and some theoretical justification for designing machines with shared caches is given.
Abstract: We compare the number of cache misses M1 for running a computation on a single processor with cache size C1 to the total number of misses Mp for the same computation when using p processors or threads and a shared cache of size Cp. We show that for any computation, and with an appropriate (greedy) parallel schedule, if Cp ≥ C1 + pd then Mp ≤ M1. The depth d of the computation is the length of the critical path of dependences. This gives the perhaps surprising result that for sufficiently parallel computations the shared cache need only be an additive size larger than the single-processor cache, and gives some theoretical justification for designing machines with shared caches.We model a computation as a DAG and the sequential execution as a depth first schedule of the DAG. The parallel schedule we study is a parallel depth-first schedule (PDF schedule) based on the sequential one. The schedule is greedy and therefore work-efficient. Our main results assume the Ideal Cache model, but we also present results for other more realistic cache models.
97 citations
•
24 Jan 2004
TL;DR: In this article, a method of caching commands in microprocessors having a plurality of arithmetic units and in modules having a two- or multidimensional cell arrangement is provided, which includes combining the plurality of cells and arithmetic units to form groups, assigning a cache unit to a group, and connecting the cache unit with a higher level unit via a tree structure.
Abstract: A method of caching commands in microprocessors having a plurality of arithmetic units and in modules having a two- or multidimensional cell arrangement is provided. The method includes combining a plurality of cells and arithmetic units to form a plurality of groups, assigning a cache unit to a group, and connecting the cache unit to a higher level unit via a tree structure. The cache unit may send requests for required commands to the higher level cache unit, which may return a command sequence including the required command, if the higher level cache unit holds the first command sequence including the required command in the higher level cache unit's local memory.
97 citations
•
21 Dec 2012TL;DR: In this article, a caching priority designator is assigned to an address that addresses information stored in a memory system, and the information is stored in the cacheline of a first level of cache memory in the memory system.
Abstract: A method of managing cache memory includes assigning a caching priority designator to an address that addresses information stored in a memory system. The information is stored in a cacheline of a first level of cache memory in the memory system. The cacheline is evicted from the first level of cache memory. A second level in the memory system to which to write back the information is determined based at least in part on the caching priority designator. The information is written back to the second level.
97 citations