(PDF) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers (1990) | Norman P. Jouppi

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Reducing capacity and conflict misses using Set Saturation Levels

[...]

Dyer Rolan, Basilio B. Fraguela, Ramón Doallo

01 Dec 2010

TL;DR: A coordinated strategy to reduce both capacity and conflict misses by changing the placement and insertion policies of the cache, called Bimodal Set Balancing Cache, which reduced the average miss rate of a baseline 2MB 8-way second level cache by 16%, which translated into an average IPC improvement of 4.8%.

...read moreread less

Abstract: The well-known memory wall problem has motivated wide research in the design of caches Last-level caches, whose misses can stall the processors for hundreds of cycles, have received particular attention Strategies to modify adaptably the cache insertion, promotion, eviction and even placement policies have been proposed, some techniques being better at reducing different kinds of misses For example changes in the placement policy of a cache, which are a natural option to reduce conflict misses, can do little to fight capacity misses, which depend on the relation between the working set of the application and the cache size Nevertheless, other techniques such as the recently proposed dynamic insertion policy (DIP), whose aim is to retain a fraction of the working set in the cache when it is larger than the cache size, attack primarily capacity misses In this paper we present a coordinated strategy to reduce both capacity and conflict misses by changing the placement and insertion policies of the cache Our strategy takes its decisions based on the concept of the Set Saturation Level (SSL), which tries to measure to which degree a set can hold its working set Despite requiring only less than 1% storage overhead, our proposal, called Bimodal Set Balancing Cache, reduced the average miss rate of a baseline 2MB 8-way second level cache by 16%, which translated into an average IPC improvement of 48% in our experiments

...read moreread less

8 citations

Proceedings Article•DOI•

The application of two-level cache in RAID system

[...]

Chen Yun¹, Yang Gen-ke¹, Wu Zhiming¹•Institutions (1)

Shanghai Jiao Tong University¹

07 Nov 2002

TL;DR: A two-level cache system that exploits both temporal and spatial localities effectively effectively is proposed as the cache structure for a RAID system and according to the results of simulation, the hit ratio and hit times can be improved.

...read moreread less

Abstract: In a RAID system, the cache is one of the important factors that can affect general system performance. As a two-level cache usually brings better performance than a one-level cache in the processors of personal computers and embedded systems, a two-level cache system that exploits both temporal and spatial localities effectively is proposed as the cache structure for a RAID system. The proposed cache system consists of two layers of caches, i.e., a set associative cache with small block size and a fully associative spatial cache with large block size. According to the results of simulation, the hit ratio and hit times can be improved with the two-level cache structure.

...read moreread less

8 citations

Proceedings Article•DOI•

Impact analysis of performance faults in modern microprocessors

[...]

Naghmeh Karimi¹, Michail Maniatakos², Chandra Tirumurti³, Abhijit Jas³, Yiorgos Makris² - Show less +1 more•Institutions (3)

University of Tehran¹, Yale University², Intel³

04 Oct 2009

TL;DR: This paper investigates quantitatively the performance impact of faults using a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which the authors execute SPEC2000 integer benchmarks and provides extensive fault simulation-based experimental results.

...read moreread less

Abstract: Towards improving performance, modern microprocessors incorporate a variety of architectural features, such as branch prediction and speculative execution, which are not critical to the correctness of their operation. While faults in the corresponding hardware may not necessarily affect functional correctness, they may, nevertheless, adversely impact performance. In this paper, we investigate quantitatively the performance impact of such faults using a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks. We provide extensive fault simulation-based experimental results and we discuss how this information may guide the inclusion of additional hardware for performance loss recovery and yield enhancement.

...read moreread less

8 citations

Proceedings Article•DOI•

Adaptive prefetching for shared cache based chip multiprocessors

[...]

Mahmut Kandemir¹, Yuanrui Zhang¹, Ozcan Ozturk²•Institutions (2)

Pennsylvania State University¹, Bilkent University²

20 Apr 2009

TL;DR: Two complementary techniques to address the problem of harmful prefetches in the context of shared L2 based CMPs are proposed and are evaluated using two embedded application codes to extract significant benefits from software prefetching even with large core counts.

...read moreread less

Abstract: Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle tradeoffs between memory bandwidth and performance. In a shared L2 based CMP, multiple cores compete for the shared on-chip cache space and limited off-chip pin bandwidth. Purely software based prefetching techniques tend to increase this contention, leading to degradation in performance. In some cases, prefetches can become harmful by kicking out useful data from the shared cache whose next usage is earlier than the prefetched data, and the fraction of such harmful prefetches usually increases when we increase the number of cores used for executing a multi-threaded application code. In this paper, we propose two complementary techniques to address the problem of harmful prefetches in the context of shared L2 based CMPs. These techniques, namely, suppressing select data prefetches (if they are found to be harmful) and pinning select data in the L2 cache (if they are found to be frequent victim of harmful prefetches), are evaluated in this paper using two embedded application codes. Our experiments demonstrate that these two techniques are very effective in mitigating the impact of harmful prefetches, and as a result, we extract significant benefits from software prefetching even with large core counts.

...read moreread less

8 citations

Analysis of non-uniform cache architecture policies for chip-multiprocessors using the Parsec Benchmark Suite

[...]

Javier Lira Rueda, Carlos Molina Clemente, Antonio María González Colás

01 Jan 2009

TL;DR: This paper analyzes the performance of several alternatives that can be considered for a NUCA model according to the four policies that determine its behavior: bank placement, bank access, bank migration and bank replacement.

...read moreread less

Abstract: Non-Uniform Cache Architectures (NUCA) have been proposed as a solution to overcome wire delays that will dominate on-chip latencies in Chip Multiprocessor designs in the near future. This novel means of organization divides the total memory area into a set of banks that provides non-uniform access latencies and thus faster access to those banks that are close to the processor. A NUCA model can be characterized according to the four policies that determine its behavior: bank placement, bank access, bank migration and bank replacement. Placement determines the first location of data, access defines the searching algorithm across the banks, migration decides data movements inside the memory and replacement deals with the evicted data. This paper analyzes the performance of several alternatives that can be considered for each of these four policies. Moreover, the Parsec benchmark suite has been used to handle this evaluation because it is a representative group of upcoming sharedmemory programs for Chip Multiprocessors. The results may help researchers to identify key features of NUCA organizations and to open up new areas of investigation.

...read moreread less

8 citations

Collapse

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

Citations

References

Related Papers (5)