Open AccessPosted Content
Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers
Richard Cole,Vijaya Ramachandran +1 more
Reads0
Chats0
TLDR
Borders are obtained that match or improve upon the well-known O(Q+S · (M/B)) caching cost for the randomized work stealing (RWS) scheduler.Abstract:
We analyze the caching overhead incurred by a class of multithreaded algorithms when scheduled by an arbitrary scheduler. We obtain bounds that match or improve upon the well-known $O(Q+S \cdot (M/B))$ caching cost for the randomized work stealing (RWS) scheduler, where $S$ is the number of steals, $Q$ is the sequential caching cost, and $M$ and $B$ are the cache size and block (or cache line) size respectively.read more
Citations
More filters
Proceedings ArticleDOI
How to Manage High-Bandwidth Memory Automatically
Rathish Das,Kunal Agrawal,Michael A. Bender,Jonathan W. Berry,Benjamin Moseley,Cynthia A. Phillips +5 more
TL;DR: This paper provides theoretical support for automatic HBM management by developing simple algorithms that can automatically control HBM and deliver good performance on multicore systems by providing a priority based approach that is simple, efficiently implementable and makespan-competitive for makespan when all multicore threads are independent.
References
More filters
Book
Introduction to Algorithms, third edition
TL;DR: Pseudo-code explanation of the algorithms coupled with proof of their accuracy makes this book a great resource on the basic tools used to analyze the performance of algorithms.
Journal ArticleDOI
The Data Locality of Work Stealing
TL;DR: A locality-guided work-stealing algorithm that improves the data locality of multithreaded computations by allowing a thread to have an affinity for a processor and improves the performance of work stealing up to 80%.
Journal ArticleDOI
A bridging model for multi-core computing
TL;DR: It is suggested that the considerable intellectual effort needed for designing efficient algorithms for multi-core architectures may be most fruitfully expended in designing portable algorithms, once and for all, for such a bridging model.
Journal ArticleDOI
Cache-Oblivious Algorithms
TL;DR: It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement.
Proceedings Article
Provably good multicore cache performance for divide-and-conquer algorithms
Guy E. Blelloch,Rezaul Chowdhury,Phillip B. Gibbons,Vijaya Ramachandran,Shimin Chen,Michael Kozuch +5 more
TL;DR: It is shown that a separator-based algorithm for sparse-matrix-dense-vector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.