Book ChapterDOI
Toward Efficient Architecture-Independent Algorithms for Dynamic Programs
Mohammad Mahdi Javanmard,Pramod Ganapathi,Rathish Das,Zafar Ahmad,Stephen Tschudi,Rezaul Chowdhury +5 more
- pp 143-164
TLDR
It is shown that a class of grid-based parallel recursive divide-and-conquer algorithms can be run with provably optimal or near-optimal performance bounds on fat cores (cache complexity), thin cores (data movements), and purely distributed-memory machines (communication complexity) without changing the algorithm’s basic structure.Abstract:
We argue that the recursive divide-and-conquer paradigm is highly suited for designing algorithms to run efficiently under both shared-memory (multi- and manycores) and distributed-memory settings. The depth-first recursive decomposition of tasks and data is known to allow computations with potentially high temporal locality, and automatic adaptivity when resource availability (e.g., available space in shared caches) changes during runtime. Higher data locality leads to better intra-node I/O and cache performance and lower inter-node communication complexity, which in turn can reduce running times and energy consumption. Indeed, we show that a class of grid-based parallel recursive divide-and-conquer algorithms (for dynamic programs) can be run with provably optimal or near-optimal performance bounds on fat cores (cache complexity), thin cores (data movements), and purely distributed-memory machines (communication complexity) without changing the algorithm’s basic structure.read more
Citations
More filters
Journal ArticleDOI
Parallel Divide-and-Conquer Algorithms for Bubble Sort, Selection Sort and Insertion Sort
TL;DR: This work presents efficient parallel recursive divide-and-conquer algorithms for bubble sort, selection sort, and insertion sort that have excellent data locality and are highly parallel.
Proceedings ArticleDOI
Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment
J.A. Schuchart,Poornima Nookala,Mohammad Mahdi Javanmard,Thomas Herault,Edward F. Valeev,George Bosilca,Robert W. Harrison +6 more
TL;DR: It is demonstrated how TTG can address issues without sacrificing scalability or programmability by providing higher-level abstractions than conventionally provided by task-centric programming systems, without impeding the ability of these runtimes to manage task creation and execution as well as data and resource management efficiently.
Proceedings ArticleDOI
FOURST: A code generator for FFT-based fast stencil computations
Zafar Nazir Ahmad,Mohammad Mahdi Javanmard,Gregory Croisdale,Aaron Gregory,Pramod Ganapathi,Louis-Noël Pouchet,Rezaul Chowdhury +6 more
TL;DR: This paper outlines the design and implementation of the code generation approach in FOURST, to automatically generate FFT-based stencil solvers and discusses the performance profiles, and limitations, of both approaches on high-end modern hardware.
Journal ArticleDOI
An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix Transposition
TL;DR: In this paper, the authors present a cache-efficient parallel algorithm for the sequence alignment with gap penalty problem for shared-memory machines using multiway divide-and-conquer and not-in-place matrix transposition.
References
More filters
Journal ArticleDOI
XSEDE: Accelerating Scientific Discovery
John Towns,Timothy M. Cockerill,Maytal Dahan,Ian Foster,Kelly Gaither,Andrew S. Grimshaw,Victor Hazlewood,Scott Lathrop,David Lifka,Gregory D. Peterson,Ralph Roskies,J. Ray Scott,Nancy Wilkins-Diehr +12 more
TL;DR: XSEDE's integrated, comprehensive suite of advanced digital services federates with other high-end facilities and with campus-based resources, serving as the foundation for a national e-science infrastructure ecosystem.
BookDOI
Introduction to computational biology
TL;DR: Introduction to computational biology, Introduction to computational Biology, مرکز فناوری اطلاعات و اصاع رسانی, کδاوρزی.
Book ChapterDOI
Accelerating large graph algorithms on the GPU using CUDA
Pawan Harish,P. J. Narayanan +1 more
TL;DR: This work presents a few fundamental algorithms - including breadth first search, single source shortest path, and all-pairs shortest path - using CUDA on large graphs using the G80 line of Nvidia GPUs.
Journal ArticleDOI
Parallel Matrix and Graph Algorithms
TL;DR: It is shown that many graph problems can be solved efficiently using the matrix multiplication algorithms.