scispace - formally typeset
Open AccessPosted Content

Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers

Reads0
Chats0
TLDR
Borders are obtained that match or improve upon the well-known O(Q+S · (M/B)) caching cost for the randomized work stealing (RWS) scheduler.
Abstract
We analyze the caching overhead incurred by a class of multithreaded algorithms when scheduled by an arbitrary scheduler. We obtain bounds that match or improve upon the well-known $O(Q+S \cdot (M/B))$ caching cost for the randomized work stealing (RWS) scheduler, where $S$ is the number of steals, $Q$ is the sequential caching cost, and $M$ and $B$ are the cache size and block (or cache line) size respectively.

read more

Citations
More filters
Proceedings ArticleDOI

How to Manage High-Bandwidth Memory Automatically

TL;DR: This paper provides theoretical support for automatic HBM management by developing simple algorithms that can automatically control HBM and deliver good performance on multicore systems by providing a priority based approach that is simple, efficiently implementable and makespan-competitive for makespan when all multicore threads are independent.
References
More filters
Book

Introduction to Algorithms, third edition

TL;DR: Pseudo-code explanation of the algorithms coupled with proof of their accuracy makes this book a great resource on the basic tools used to analyze the performance of algorithms.
Journal ArticleDOI

The Data Locality of Work Stealing

TL;DR: A locality-guided work-stealing algorithm that improves the data locality of multithreaded computations by allowing a thread to have an affinity for a processor and improves the performance of work stealing up to 80%.
Journal ArticleDOI

A bridging model for multi-core computing

TL;DR: It is suggested that the considerable intellectual effort needed for designing efficient algorithms for multi-core architectures may be most fruitfully expended in designing portable algorithms, once and for all, for such a bridging model.
Journal ArticleDOI

Cache-Oblivious Algorithms

TL;DR: It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement.
Proceedings Article

Provably good multicore cache performance for divide-and-conquer algorithms

TL;DR: It is shown that a separator-based algorithm for sparse-matrix-dense-vector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.
Related Papers (5)