Low depth cache-oblivious algorithms

doi:10.1145/1810479.1810519

Proceedings ArticleDOI

Low depth cache-oblivious algorithms

Guy E. Blelloch, +2 more

- pp 189-199

Chats0

TLDR

This paper describes several cache-oblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparse-matrix vector multiply on matrices with good vertex separators.

Abstract:

In this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on parallel machines with private or shared caches. The approach is to design nested-parallel algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation order has low cache complexity in the cache-oblivious model. We describe several cache-oblivious algorithms with optimal work, polylogarithmic depth, and sequential cache complexities that match the best sequential algorithms, including the first such algorithms for sorting and for sparse-matrix vector multiply on matrices with good vertex separators.Using known mappings, our results lead to low cache complexities on shared-memory multiprocessors with a single level of private caches or a single shared cache. We generalize these mappings to multi-level cache hierarchies of private or shared caches, implying that our algorithms also have low cache complexities on such hierarchies. The key factor in obtaining these low parallel cache complexities is the low depth of the algorithms we propose.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Brief announcement: efficient cache oblivious algorithms for randomized divide-and-conquer on the multicore model

Neeraj Sharma, +1 more

TL;DR: In this paper, a cache-oblivious framework for randomized divide and conquer algorithms on the multicore model with private cache is presented, where the number of processors, the size of an individual cache memory and the block size are assumed to be fixed.

...read moreread less

Posted Content

Efficient cache oblivious algorithms for randomized divide-and-conquer on the multicore model

Neeraj Sharma, +1 more

- 29 Apr 2012 -

arXiv: Data Structures and Algorithms

TL;DR: A cache-oblivious framework for randomized divide and conquer algorithms on the multicore model with private cache and a simple randomized processor allocation technique without the explicit knowledge of the number of processors that is likely to find additional applications in resource oblivious environments are presented.

...read moreread less

Posted Content

Improving the Space-Time Efficiency of Processor-Oblivious Matrix Multiplication Algorithms

Yuan Tang

- 13 Nov 2019 -

arXiv: Distributed, Parallel, and Cluste...

TL;DR: This study gives out sublinear time, optimal work, space and cache algorithms for both general matrix multiplication on a semiring and Strassen-like fast algorithm and shows such algorithms have empirical advantages over classic counterparts.

...read moreread less

Journal ArticleDOI

An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix Transposition

Shubham, +2 more

- 01 Jan 2022 -

Information Processing Letters

TL;DR: In this paper, the authors present a cache-efficient parallel algorithm for the sequence alignment with gap penalty problem for shared-memory machines using multiway divide-and-conquer and not-in-place matrix transposition.

...read moreread less

Proceedings ArticleDOI

High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems

Xiaojun Dong, +5 more

TL;DR: In this paper , the authors revisit the semisort problem, with the goal of achieving a high-performance parallel semiisort implementation with a flexible interface, which can easily be extended to two related problems, histogram and collect-reduce.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

A bridging model for parallel computation

Leslie G. Valiant

- 01 Aug 1990 -

Communications of The ACM

TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

...read moreread less

Journal ArticleDOI

Amortized efficiency of list update and paging rules

Daniel D. Sleator, +1 more

- 01 Feb 1985 -

Communications of The ACM

TL;DR: This article shows that move-to-front is within a constant factor of optimum among a wide class of list maintenance rules, and analyzes the amortized complexity of LRU, showing that its efficiency differs from that of the off-line paging rule by a factor that depends on the size of fast memory.

...read moreread less

Journal ArticleDOI

Cilk: An Efficient Multithreaded Runtime System

Robert D. Blumofe, +5 more

- 25 Aug 1996 -

Journal of Parallel and Distributed Comp...

TL;DR: It is shown that on real and synthetic applications, the “work” and “critical-path length” of a Cilk computation can be used to model performance accurately, and it is proved that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal.

...read moreread less

Book

An introduction to parallel algorithms

Joseph JaJa

TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.

...read moreread less

Proceedings ArticleDOI

LogP: towards a realistic model of parallel computation

David E. Culler, +7 more

TL;DR: A new parallel machine model, called LogP, is offered that reflects the critical technology trends underlying parallel computers and is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers.

...read moreread less