Toward Efficient Architecture-Independent Algorithms for Dynamic Programs

doi:10.1007/978-3-030-20656-7_8

Book ChapterDOI

Toward Efficient Architecture-Independent Algorithms for Dynamic Programs

- pp 143-164

TLDR

It is shown that a class of grid-based parallel recursive divide-and-conquer algorithms can be run with provably optimal or near-optimal performance bounds on fat cores (cache complexity), thin cores (data movements), and purely distributed-memory machines (communication complexity) without changing the algorithm’s basic structure.

Abstract:

We argue that the recursive divide-and-conquer paradigm is highly suited for designing algorithms to run efficiently under both shared-memory (multi- and manycores) and distributed-memory settings. The depth-first recursive decomposition of tasks and data is known to allow computations with potentially high temporal locality, and automatic adaptivity when resource availability (e.g., available space in shared caches) changes during runtime. Higher data locality leads to better intra-node I/O and cache performance and lower inter-node communication complexity, which in turn can reduce running times and energy consumption. Indeed, we show that a class of grid-based parallel recursive divide-and-conquer algorithms (for dynamic programs) can be run with provably optimal or near-optimal performance bounds on fat cores (cache complexity), thin cores (data movements), and purely distributed-memory machines (communication complexity) without changing the algorithm’s basic structure.

Toward Efficient Architecture-Independent Algorithms for Dynamic Programs

Citations

Parallel Divide-and-Conquer Algorithms for Bubble Sort, Selection Sort and Insertion Sort

Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment

FOURST: A code generator for FFT-based fast stencil computations

An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix Transposition

References

XSEDE: Accelerating Scientific Discovery

Algorithms on Strings, Trees, and Sequences: Suffix Trees and Their Uses

Introduction to computational biology

Accelerating large graph algorithms on the GPU using CUDA

Parallel Matrix and Graph Algorithms

Related Papers (5)

Toward Efficient Architecture-Independent Algorithms for Dynamic Programs.

Program-Centric Cost Models for Locality and Parallelism

Synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms

An Efficient Algorithm for Communication-Based Task Mapping

An approach towards an analytical characterization of locality and its portability