Proceedings ArticleDOI
Low depth cache-oblivious algorithms
Guy E. Blelloch,Phillip B. Gibbons,Harsha Vardhan Simhadri +2 more
- pp 189-199
Reads0
Chats0
TLDR
This paper describes several cache-oblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparse-matrix vector multiply on matrices with good vertex separators.Abstract:
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on parallel machines with private or shared caches. The approach is to design nested-parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation order has low cache complexity in the cache-oblivious model. We describe several cache-oblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparse-matrix vector multiply on matrices with good vertex separators.Using known mappings, our results lead to low cache complexities on shared-memory multiprocessors with a single level of private caches or a single shared cache. We generalize these mappings to multi-level cache hierarchies of private or shared caches, implying that our algorithms also have low cache complexities on such hierarchies. The key factor in obtaining these low parallel cache complexities is the low depth of the algorithms we propose.read more
Citations
More filters
Proceedings ArticleDOI
Ligra: a lightweight graph processing framework for shared memory
Julian Shun,Guy E. Blelloch +1 more
TL;DR: This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.
Proceedings ArticleDOI
Multicore triangle computations without tuning
Julian Shun,Kanat Tangwongsan +1 more
TL;DR: This paper describes the design and implementation of simple and fast multicore parallel algorithms for exact, as well as approximate, triangle counting and other triangle computations that scale to billions of nodes and edges, and is much faster than existing parallel approximate triangle counting implementations.
Proceedings ArticleDOI
Internally deterministic parallel algorithms can be fast
TL;DR: The main contribution is to demonstrate that for this wide body of problems, there exist efficient internally deterministic algorithms, and moreover that these algorithms are natural to reason about and not complicated to code.
Journal ArticleDOI
Can traditional programming bridge the Ninja performance gap for parallel computing applications
Nadathur Satish,Changkyu Kim,Jatin Chhugani,Hideki Saito,Rakesh Krishnaiyer,Mikhail Smelyanskiy,Milind B. Girkar,Pradeep Dubey +7 more
TL;DR: It is demonstrated that the otherwise uncontrolled growth of the Ninja gap can be contained and offer a more stable and predictable performance growth over future architectures, offering strong evidence that radical language changes are not required.
Proceedings ArticleDOI
Scheduling irregular parallel computations on hierarchical caches
TL;DR: The parallel cache-oblivious (PCO) model is presented, a relatively simple modification to the CO model that can be used to account for costs on a broad range of cache hierarchies, and a new scheduler is described, which attains provably good cache performance and runtime on parallel machine models with hierarchical caches.
References
More filters
Proceedings ArticleDOI
Thread scheduling for multiprogrammed multiprocessors
TL;DR: A user-level thread scheduler for shared-memory multiprocessors, which achieves linear speedup whenever P is small relative to the parallelism T1/T∈fty .
External-Memory Graph Algorithms
Yi-Feng Chian,Michael T. Goodrich,Edward Grove,Roberto Tamassia,Darren Erik Vengroff,Jeffrey Scott Vitter +5 more
Proceedings ArticleDOI
External-memory graph algorithms
Yi-Jen Chiang,Michael T. Goodrich,Edward F. Grove,Roberto Tamassia,Darren Erik Vengroff,Jeffrey Scott Vitter +5 more
TL;DR: A collection of new techniques for designing and analyzing external-memory algorithms for graph problems and illustrating how these techniques can be applied to a wide variety of speci c problems are presented.
Proceedings ArticleDOI
A model for hierarchical memory
TL;DR: An algorithm that uses LRU policy at the successive “levels” of the memory hierarchy is shown to be optimal for arbitrary memory access time.
Proceedings ArticleDOI
The data locality of work stealing
TL;DR: The initial experiments on iterative data-parallel applications show that the work-stealing scheduling algorithm matches the performance of static-partitioning under traditional work loads but improves the performance up to 50% over static partitioning under multiprogrammed work loads and a locality-guided work stealing algorithm that improves the data locality of multi-threaded computations by allowing a thread to have an affinity for a processor.