Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers

Open AccessPosted Content

Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers

Richard Cole, +1 more

- 23 May 2017 -

arXiv: Distributed, Parallel, and Cluste...

Chats0

TLDR

Borders are obtained that match or improve upon the well-known O(Q+S · (M/B)) caching cost for the randomized work stealing (RWS) scheduler.

Abstract:

We analyze the caching overhead incurred by a class of multithreaded algorithms when scheduled by an arbitrary scheduler. We obtain bounds that match or improve upon the well-known $O(Q+S \cdot (M/B))$ caching cost for the randomized work stealing (RWS) scheduler, where $S$ is the number of steals, $Q$ is the sequential caching cost, and $M$ and $B$ are the cache size and block (or cache line) size respectively.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

How to Manage High-Bandwidth Memory Automatically

Rathish Das, +5 more

TL;DR: This paper provides theoretical support for automatic HBM management by developing simple algorithms that can automatically control HBM and deliver good performance on multicore systems by providing a priority based approach that is simple, efficiently implementable and makespan-competitive for makespan when all multicore threads are independent.

...read moreread less

References

PDF

Open Access

More filters

Book

Introduction to Algorithms, third edition

Thomas H. Cormen, +3 more

TL;DR: Pseudo-code explanation of the algorithms coupled with proof of their accuracy makes this book a great resource on the basic tools used to analyze the performance of algorithms.

...read moreread less

Journal ArticleDOI

The Data Locality of Work Stealing

Umut A. Acar, +2 more

- 01 May 2002 -

Theory of Computing Systems \/ Mathemati...

TL;DR: A locality-guided work-stealing algorithm that improves the data locality of multithreaded computations by allowing a thread to have an affinity for a processor and improves the performance of work stealing up to 80%.

...read moreread less

Journal ArticleDOI

A bridging model for multi-core computing

Leslie G. Valiant

- 01 Jan 2011 -

Journal of Computer and System Sciences

TL;DR: It is suggested that the considerable intellectual effort needed for designing efficient algorithms for multi-core architectures may be most fruitfully expended in designing portable algorithms, once and for all, for such a bridging model.

...read moreread less

Journal ArticleDOI

Cache-Oblivious Algorithms

Matteo Frigo, +3 more

- 01 Jan 2012 -

ACM Transactions on Algorithms

TL;DR: It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement.

...read moreread less

Proceedings Article

Provably good multicore cache performance for divide-and-conquer algorithms

Guy E. Blelloch, +5 more

TL;DR: It is shown that a separator-based algorithm for sparse-matrix-dense-vector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.

...read moreread less

Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers

Citations

How to Manage High-Bandwidth Memory Automatically

References

Introduction to Algorithms, third edition

The Data Locality of Work Stealing

A bridging model for multi-core computing

Cache-Oblivious Algorithms

Provably good multicore cache performance for divide-and-conquer algorithms

Related Papers (5)

Scheduling irregular parallel computations on hierarchical caches

The data locality of work stealing

Effectively sharing a cache among threads

Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms

Program optimization for instruction caches