(PDF) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers (1990) | Norman P. Jouppi

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Performance comparison of various cache systems for texture mapping

[...]

C.J. Choi¹, Gi-Ho Park¹, Ji Hyun Lee¹, Woo-Chan Park¹, Tack-Don Han¹ - Show less +1 more•Institutions (1)

Yonsei University¹

14 May 2000

TL;DR: Performance evaluation is carried out through trace-driven simulation using the DineroIII, and the results reveal that victim cache also has cost-performance effectiveness in texture mapping.

...read moreread less

Abstract: Texture mapping is commonly used to make images realistic in most current graphics systems. Texture mapping, however, requires high memory bandwidth and low memory latency to obtain good performance. Recently, a few studies have been carried out to use a cache memory system in texture mapping in older to overcome these problems and those studies show that the cache is useful for texture mapping. Miss distribution of texture cache is analyzed and we find out that quite a few conflict misses occurred by period. Considering this fact, cache systems such as Victim, Half and Half and Cooperative cache which are known to be effective to reduce conflict misses, are evaluated and compared to each other. Performance evaluation is carried out through trace-driven simulation using the DineroIII, and the results reveal that victim cache also has cost-performance effectiveness in texture mapping.

...read moreread less

11 citations

Journal Article•DOI•

Making a case for split data caches for embedded applications

[...]

Afrin Naz¹, Krishna M. Kavi¹, Mehran Rezaei¹, Wentong Li¹•Institutions (1)

University of North Texas¹

17 Sep 2005

TL;DR: It is shown that cache memories for embedded applications can be designed to increase performance while reduce area and energy consumed, and that such a split data cache can also benefit embedded applications.

...read moreread less

Abstract: In this paper we show that cache memories for embedded applications can be designed to increase performance while reduce area and energy consumed. Previously we have shown that separating data cache into an array cache and a scalar cache can lead to significant performance improvements for scientific benchmarks. In this paper we show that such a split data cache can also benefit embedded applications. To further improve the split cache organization, we augment the scalar cache with a small victim cache and the array cache with a small stream buffer. This "integrated" cache organization can lead to 43% reduction in the overall cache size, 37% reduction in access time and a 63% reduction in power consumption when compared to a unified 2-way set associative data cache for media benchmarks from MiBench suite.

...read moreread less

11 citations

Proceedings Article•DOI•

Power-aware deterministic block allocation for low-power way-selective cache structure

[...]

Jung-Wook Park¹, Gi-Ho Park², Sung-Bae Park², Shin-Dug Kim¹•Institutions (2)

Yonsei University¹, Samsung²

11 Oct 2004

TL;DR: The simulation result shows that the proposed architecture can reduce a per access power consumption by 59% over conventional set-associative caches with average 0.06% of negligible performance loss.

...read moreread less

Abstract: This paper proposes a power-aware cache block allocation algorithm for the way-selective set-associative cache on embedded systems to reduce energy consumption without additional delay or performance degradation. For this goal, way selection logic and specialized replacement policy are designed to enable only one way of set-associative cache as in the direct-mapped cache. Overall cache access time becomes almost the same as that of a conventional set associative cache with accessing additional way selection logic. Because data array can be accessed without waiting for tag comparison, multiplexer delay can be removed totally. The simulation result shows that the proposed architecture can reduce a per access power consumption by 59% over conventional set-associative caches with average 0.06% of negligible performance loss.

...read moreread less

11 citations

Journal Article•DOI•

Algorithm-level Feedback-controlled Adaptive data prefetcher: Accelerating data access for high-performance processors

[...]

Yong Chen¹, Huaiyu Zhu², Hui Jin³, Xian-He Sun³•Institutions (3)

Texas Tech University¹, University of Illinois at Urbana–Champaign², Illinois Institute of Technology³

01 Oct 2012

TL;DR: An Algorithm-level Feedback-controlled Adaptive (AFA) data prefetcher is proposed that provides an algorithm-level adaptation and is capable of dynamically adapting to appropriate prefetching algorithms at runtime and achieves considerable IPC improvement for 21 representative SPEC-CPU benchmarks.

...read moreread less

Abstract: The rapid advance of processor architectures such as the emerged multicore architectures and the substantially increased computing capability on chip have put more pressure on the sluggish memory systems than ever. In the meantime, many applications become more and more data intensive. Data-access delay, not the processor speed, becomes the leading performance bottleneck of high-performance computing. Data prefetching is an effective solution to accelerating applications' data access and bridging the growing gap between computing speed and data-access speed. Existing works of prefetching, however, are very conservative in general, due to the computing power consumption concern of the past. They suffer low effectiveness especially when applications' access pattern changes. In this study, we propose an Algorithm-level Feedback-controlled Adaptive (AFA) data prefetcher to address these issues. The AFA prefetcher is based on the Data-Access History Cache, a hardware structure that is specifically designed for data access acceleration. It provides an algorithm-level adaptation and is capable of dynamically adapting to appropriate prefetching algorithms at runtime. We have conducted extensive simulation testing with the SimpleScalar simulator to validate the design and to analyze the performance gain. The simulation results show that the AFA prefetcher is effective and achieves considerable IPC (Instructions Per Cycle) improvement for 21 representative SPEC-CPU benchmarks.

...read moreread less

11 citations

Journal Article•DOI•

On-line restricted caching

[...]

Mark Brehob¹, Richard Enbody¹, Eric Torng¹, Stephen Wagner¹•Institutions (1)

Michigan State University¹

01 Mar 2003-Journal of Scheduling

TL;DR: In this article, the on-line caching problem in a restricted cache where each memory item can be placed in only a restricted subset of cache locations has been studied, and the results show that restricted caches are significantly more complex than identical caches.

...read moreread less

Abstract: We study the on-line caching problem in a restricted cache where each memory item can be placed in only a restricted subset of cache locations. Examples of restricted caches in practice include victim caches, assist caches, and skew caches. To the best of our knowledge, all previous on-line caching studies have considered on-line caching in identical or fully-associative caches where every memory item can be placed in any cache location.In this paper, we focus on companion caches, a simple restricted cache that includes victim caches and assist caches as special cases. Our results show that restricted caches are significantly more complex than identical caches. For example, we show that the commonly studied Least Recently Used algorithm is not competitive unless cache reorganization is allowed while the performance of the First In First Out algorithm is competitive but not optimal. We also present two near optimal algorithms for this problem as well as lower bound arguments.

...read moreread less

11 citations

Collapse

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

Citations

References

Related Papers (5)