scispace - formally typeset
Book ChapterDOI

Toward Efficient Architecture-Independent Algorithms for Dynamic Programs

TLDR
It is shown that a class of grid-based parallel recursive divide-and-conquer algorithms can be run with provably optimal or near-optimal performance bounds on fat cores (cache complexity), thin cores (data movements), and purely distributed-memory machines (communication complexity) without changing the algorithm’s basic structure.
Abstract
We argue that the recursive divide-and-conquer paradigm is highly suited for designing algorithms to run efficiently under both shared-memory (multi- and manycores) and distributed-memory settings. The depth-first recursive decomposition of tasks and data is known to allow computations with potentially high temporal locality, and automatic adaptivity when resource availability (e.g., available space in shared caches) changes during runtime. Higher data locality leads to better intra-node I/O and cache performance and lower inter-node communication complexity, which in turn can reduce running times and energy consumption. Indeed, we show that a class of grid-based parallel recursive divide-and-conquer algorithms (for dynamic programs) can be run with provably optimal or near-optimal performance bounds on fat cores (cache complexity), thin cores (data movements), and purely distributed-memory machines (communication complexity) without changing the algorithm’s basic structure.

read more

Citations
More filters
Journal ArticleDOI

Parallel Divide-and-Conquer Algorithms for Bubble Sort, Selection Sort and Insertion Sort

TL;DR: This work presents efficient parallel recursive divide-and-conquer algorithms for bubble sort, selection sort, and insertion sort that have excellent data locality and are highly parallel.
Proceedings ArticleDOI

Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment

TL;DR: It is demonstrated how TTG can address issues without sacrificing scalability or programmability by providing higher-level abstractions than conventionally provided by task-centric programming systems, without impeding the ability of these runtimes to manage task creation and execution as well as data and resource management efficiently.
Proceedings ArticleDOI

FOURST: A code generator for FFT-based fast stencil computations

TL;DR: This paper outlines the design and implementation of the code generation approach in FOURST, to automatically generate FFT-based stencil solvers and discusses the performance profiles, and limitations, of both approaches on high-end modern hardware.
Journal ArticleDOI

An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix Transposition

TL;DR: In this paper, the authors present a cache-efficient parallel algorithm for the sequence alignment with gap penalty problem for shared-memory machines using multiway divide-and-conquer and not-in-place matrix transposition.
References
More filters
Journal ArticleDOI

XSEDE: Accelerating Scientific Discovery

TL;DR: XSEDE's integrated, comprehensive suite of advanced digital services federates with other high-end facilities and with campus-based resources, serving as the foundation for a national e-science infrastructure ecosystem.
BookDOI

Introduction to computational biology

TL;DR: Introduction to computational biology, Introduction to computational Biology, مرکز فناوری اطلاعات و اصاع رسانی, کδاوρزی.
Book ChapterDOI

Accelerating large graph algorithms on the GPU using CUDA

TL;DR: This work presents a few fundamental algorithms - including breadth first search, single source shortest path, and all-pairs shortest path - using CUDA on large graphs using the G80 line of Nvidia GPUs.
Journal ArticleDOI

Parallel Matrix and Graph Algorithms

TL;DR: It is shown that many graph problems can be solved efficiently using the matrix multiplication algorithms.