Prefetching using Markov predictors

doi:10.1145/264107.264207

Proceedings ArticleDOI

Prefetching using Markov predictors

- Vol. 25, Iss: 2, pp 252-263

TLDR

The Markov prefetcher acts as an interface between the on-chip and off-chip cache, and can be added to existing computer designs and reduces the overall execution stalls due to instruction and data memory operations by an average of 54% for various commercial benchmarks while only using two thirds the memory of a demand-fetch cache organization.

Abstract:

Prefetching is one approach to reducing the latency of memory operations in modern computer systems. In this paper, we describe the Markov prefetcher. This prefetcher acts as an interface between the on-chip and off-chip cache, and can be added to existing computer designs. The Markov prefetcher is distinguished by prefetching multiple reference predictions from the memory subsystem, and then prioritizing the delivery of those references to the processor.This design results in a prefetching system that provides good coverage, is accurate and produces timely results that can be effectively used by the processor. In our cycle-level simulations, the Markov Prefetcher reduces the overall execution stalls due to instruction and data memory operations by an average of 54% for various commercial benchmarks while only using two thirds the memory of a demand-fetch cache organization.

Citations

PDF

Open Access

More filters

Book

Memory Systems: Cache, DRAM, Disk

Bruce Jacob, +2 more

TL;DR: Is your memory hierarchy stopping your microprocessor from performing at the high level it should be?

...read moreread less

Proceedings ArticleDOI

Phase tracking and prediction

Timothy Sherwood, +2 more

TL;DR: This paper presents a unified profiling architecture that can efficiently capture, classify, and predict phase-based program behavior on the largest of time scales, and can capture phases that account for over 80% of execution using less that 500 bytes of on-chip memory.

...read moreread less

Proceedings ArticleDOI

Runahead execution: an alternative to very large instruction windows for out-of-order processors

Onur Mutlu, +3 more

TL;DR: This paper proposes runahead execution as an effective way to increase memory latency tolerance in an out-of-order processor without requiring an unreasonably large instruction window.

...read moreread less

Proceedings ArticleDOI

The predictability of data values

Yiannakis Sazeides, +1 more

TL;DR: Comparison of context based prediction and stride prediction shows that the higher accuracy of contextbased prediction is due to relatively few static instructions giving large improvements; this suggests the usefulness of hybrid predictors.

...read moreread less

Proceedings ArticleDOI

Managing Wire Delay in Large Chip-Multiprocessor Caches

Bradford M. Beckmann, +1 more

TL;DR: This paper develops L2 cache designs for CMPs that incorporate block migration, stride-based prefetching between L1 and L2 caches, and presents a hybrid design-combining all three techniques-that improves performance by an additional 2% to 19% overPrefetching alone.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

Norman P. Jouppi

TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.

...read moreread less

Proceedings ArticleDOI

Design and evaluation of a compiler algorithm for prefetching

Todd C. Mowry, +2 more

TL;DR: This paper proposes a compiler algorithm to insert prefetch instructions into code that operates on dense matrices, and shows that this algorithm significantly improves the execution speed of the benchmark programs-some of the programs improve by as much as a factor of two.

...read moreread less

Proceedings ArticleDOI

Evaluating stream buffers as a secondary cache replacement

Subbarao Palacharla, +1 more

TL;DR: The results show that, for the majority of the benchmarks, stream buffers can attain hit rates that are comparable to typical hit rates of secondary caches, and as the data-set size of the scientific workload increases the performance of streams typically improves relative to secondary cache performance, showing that streams are more scalable to large data- set sizes.

...read moreread less

Proceedings ArticleDOI

An architecture for software-controlled data prefetching

Alexander C. Klaiber, +1 more

TL;DR: Simulations based on a MIPS processor model show that this technique can dramatically reduce on-chip cache miss ratios and average observed memory latency for scientific loops at only slight cost in total memory traffic.

...read moreread less

Proceedings ArticleDOI

A modified approach to data cache management

Gary Tyson, +3 more

TL;DR: The bare minimum amount of local memories that programs require to run without delay is measured by using the Value Reuse Profile, which contains the dynamic value reuse information of a program's execution, and by assuming the existence of efficient memory systems.

...read moreread less

Prefetching using Markov predictors

Citations

Memory Systems: Cache, DRAM, Disk

Phase tracking and prediction

Runahead execution: an alternative to very large instruction windows for out-of-order processors

The predictability of data values

Managing Wire Delay in Large Chip-Multiprocessor Caches

References

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

Design and evaluation of a compiler algorithm for prefetching

Evaluating stream buffers as a secondary cache replacement

An architecture for software-controlled data prefetching

A modified approach to data cache management

Related Papers (5)

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

Effective hardware-based data prefetching for high-performance processors

Compiler-based prefetching for recursive data structures

Design and evaluation of a compiler algorithm for prefetching

Evaluating stream buffers as a secondary cache replacement