scispace - formally typeset
Proceedings ArticleDOI

Domino Temporal Data Prefetcher

TLDR
This work identifies the lookup mechanism of existing temporal prefetchers responsible for the large gap between what they offer and the opportunity, and proposes a practical design for Domino prefetcher that employs an Enhanced Index Table that is indexed by just a single miss address.
Abstract
Big-data server applications frequently encounter data misses, and hence, lose significant performance potential. One way to reduce the number of data misses or their effect is data prefetching. As data accesses have high temporal correlations, temporal prefetching techniques are promising for them. While state-of-the-art temporal prefetching techniques are effective at reducing the number of data misses, we observe that there is a significant gap between what they offer and the opportunity. This work aims to improve the effectiveness of temporal prefetching techniques. We identify the lookup mechanism of existing temporal prefetchers responsible for the large gap between what they offer and the opportunity. Existing lookup mechanisms either not choose the right stream in the history, or unnecessarily delay the stream selection, and hence, miss the opportunity at the beginning of every stream. In this work, we introduce Domino prefetching to address the limitations of existing temporal prefetchers. Domino prefetcher is a temporal data prefetching technique that logically looks up the history with both one and two last miss addresses to find a match for prefetching. We propose a practical design for Domino prefetcher that employs an Enhanced Index Table that is indexed by just a single miss address. We show that Domino prefetcher captures more than 90% of the temporal opportunity. Through detailed evaluation targeting a quad-core processor and a set of server workloads, we show that Domino prefetcher improves system performance by 16% over the baseline with no data prefetcher and 6% over the state-of- the-art temporal data prefetcher.

read more

Citations
More filters
Proceedings ArticleDOI

Bingo Spatial Data Prefetcher

TL;DR: Bingo spatial data prefetcher is proposed in which short and long events are used to select the best access pattern for prefetching, and a storage-efficient design for Bingo in such a way that just one history table is needed to maintain the association between the access patterns and the long and short events.
Journal ArticleDOI

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

TL;DR: In this paper, the authors perform a large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory.
Proceedings ArticleDOI

Bouquet of instruction pointers: instruction pointer classifier-based spatial hardware prefetching

TL;DR: IPCP is a simple, lightweight, and modular framework for L1 and multi-level spatial prefetching that outperforms the already high-performing state-of-the-art prefetchers like SPP with PPF and Bingo by demanding 30X to 50X less storage.
Proceedings ArticleDOI

Enhancing Server Efficiency in the Face of Killer Microseconds

TL;DR: Duplexity is proposed, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices.
Proceedings ArticleDOI

A hierarchical neural model of data prefetching

TL;DR: Voyager as discussed by the authors proposes a hierarchical structure that separates addresses into pages and offsets and introduces a mechanism for learning important relations among pages and offset, which can also learn address correlations, which are important for prefetching irregular sequences of memory accesses.
References
More filters
Book ChapterDOI

Introduction to Algorithms

Xin-She Yang
TL;DR: This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation.
Proceedings ArticleDOI

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.
Proceedings ArticleDOI

Clearing the clouds: a study of emerging scale-out workloads on modern hardware

TL;DR: This work identifies the key micro-architectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers.
Proceedings ArticleDOI

Design and evaluation of a compiler algorithm for prefetching

TL;DR: This paper proposes a compiler algorithm to insert prefetch instructions into code that operates on dense matrices, and shows that this algorithm significantly improves the execution speed of the benchmark programs-some of the programs improve by as much as a factor of two.
Related Papers (5)