scispace - formally typeset
Search or ask a question
Topic

Smart Cache

About: Smart Cache is a research topic. Over the lifetime, 7680 publications have been published within this topic receiving 180618 citations.


Papers
More filters
Proceedings ArticleDOI
08 Feb 2003
TL;DR: 5 different techniques that will reduce the data access times and power consumption in processors with multi-level caches are proposed that are able to abort 53.1% of the misses on average in SPEC2000 applications.
Abstract: As the performance gap between the processor cores and the memory subsystem increases, designers are forced to develop new latency hiding techniques. Arguably, the most common technique is to utilize multi-level caches. Each new generation of processors is equipped with higher levels of memory hierarchy with increasing sizes at each level. In this paper, we propose 5 different techniques that will reduce the data access times and power consumption in processors with multi-level caches. Using the information about the blocks placed into and replaced from the caches, the techniques quickly determine whether an access at any cache level will be a miss. The accesses that are identified to miss are aborted. The structures used to recognize misses are much smaller than the cache structures. Consequently the data access times and power consumption are reduced. Using the SimpleScalar simulator, we study the performance of these techniques for a processor with 5 cache levels. The best technique is able to abort 53.1% of the misses on average in SPEC2000 applications. Using these techniques, the execution time of the applications is reduced by up to 12.4% (5.4% on average), and the power consumption of the caches is reduced by as much as 11.6% (3.8% on average).

63 citations

Proceedings ArticleDOI
09 Dec 2006
TL;DR: Through simulation studies, this paper establishes the superiority of molecular cache (caches built as aggregations of molecules) that offers a 29% power advantage over that of an equivalently performing traditional cache.
Abstract: CMPs enable simultaneous execution of multiple applications on the same platforms that share cache resources. Diversity in the cache access patterns of these simultaneously executing applications can potentially trigger inter-application interference, leading to cache pollution. Whereas a large cache can ameliorate this problem, the issues of larger power consumption with increasing cache size, amplified at sub-100nm technologies, makes this solution prohibitive. In this paper, in order to address the issues relating to power-aware performance of caches, we propose a caching structure that addresses the following: 1. Definition of application-specific cache partitions as an aggregation of caching units (molecules). The parameters of each molecule namely size, associativity and line size are chosen so that the power consumed by it and access time are optimal for the given technology. 2. Application-Specific resizing of cache partitions with variable and adaptive associativity per cache line, way size and variable line size. 3. A replacement policy that is transparent to the partition in terms of size, heterogeneity in associativity and line size. Through simulation studies we establish the superiority of molecular cache (caches built as aggregations of molecules) that offers a 29% power advantage over that of an equivalently performing traditional cache.

63 citations

Proceedings ArticleDOI
04 Jul 2011
TL;DR: The first experiences gained while developing low-level software for message-passing and shared-memory programming on the Single-chip Cloud Computer (SCC) are detail and the potential of both programming models are evaluated and how these models can be improved especially with respect to the SCC's many-core architecture are evaluated.
Abstract: Since the beginning of the multicore era, parallel processing has become prevalent across the board. On a traditional multicore system, a single operating system manages all cores and schedules threads and processes among them, inherently supported by hardware-implemented cache coherence protocols. However, a further growth of the number of cores per system implies an increasing chip complexity, especially with respect to the cache coherence protocols. Therefore, a very attractive alternative for future many-core systems is to waive the hardware-based cache coherency and to introduce a software-oriented message-passing based architecture instead: a so-called Cluster-on-Chip architecture. Intel's Single-chip Cloud Computer (SCC), a many-core research processor with 48 non-coherent memory-coupled cores, is a very recent example for such a Cluster-on-Chip architecture. The SCC can be configured to run one operating system instance per core by partitioning the shared main memory in a strict manner. However, it is also possible to access the shared main memory in an unsplit and concurrent manner, provided that the cache coherency is then ensured by software. In this paper, we detail our first experiences gained while developing low-level software for message-passing and shared-memory programming on the SCC. In doing so, we evaluate the potential of both programming models and we show how these models can be improved especially with respect to the SCC's many-core architecture.

63 citations

Patent
Asit Dan1, Dinkar Sitaram1
31 Jul 1995
TL;DR: In this article, a system and method for caching sequential data streams in a cache storage device is presented, where each information stream is made as to whether its data blocks should be discarded from cache as they are read by a consuming process.
Abstract: A system and method for caching sequential data streams in a cache storage device. For each information stream, a determination is made as to whether its data blocks should discarded from cache as they are read by a consuming process. Responsive to a determination that the data blocks of a stream should be discarded from the cache are read by the consuming process, the data blocks associated with that stream are cached in accordance with an interval caching algorithm. Alternatively, responsive to a determination that the data blocks of a stream should not be discarded from the cache storage device as they are read by the consuming process, the data blocks of that stream are cached in accordance with a segment caching algorithm.

63 citations

Proceedings ArticleDOI
09 Jun 2003
TL;DR: In this article, the cache architecture includes an explorer component that efficiently searches the large space of possible configurations for the set of points representing meaningful tradeoffs between performance and energy -the Pareto-optimal set.
Abstract: We describe cache architecture, intended for prototype-oriented IC platforms, that automatically finds the best cache configuration for a particular application. The cache itself can be configured with respect to the total size, associativity, line size, and way prediction. The cache architecture includes an explorer component that efficiently searches the large space of possible configurations for the set of points representing meaningful tradeoffs between performance and energy - the Pareto-optimal set. We provide results of experiments showing that the architecture effectively finds a good set of Pareto points for numerous Powerstone and MediaBench embedded system benchmarks. Our architecture eliminates the need for time-consuming simulations to determine the best cache configuration, and imposes little power overhead and reasonable size overhead.

63 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
92% related
Server
79.5K papers, 1.4M citations
88% related
Scalability
50.9K papers, 931.6K citations
88% related
Network packet
159.7K papers, 2.2M citations
85% related
Quality of service
77.1K papers, 996.6K citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202350
2022114
20215
20201
20198
201818