Search or ask a question

Showing papers by "Rezaul Chowdhury published in 2008"

PDF

Open Access

Proceedings Article•

Provably good multicore cache performance for divide-and-conquer algorithms

[...]

Guy E. Blelloch¹, Rezaul Chowdhury², Phillip B. Gibbons³, Vijaya Ramachandran², Shimin Chen³, Michael Kozuch³ - Show less +2 more•Institutions (3)

Carnegie Mellon University¹, University of Texas at Austin², Intel³

20 Jan 2008

TL;DR: It is shown that a separator-based algorithm for sparse-matrix-dense-vector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.

...read moreread less

Abstract: This paper presents a multicore-cache model that reflects the reality that multicore processors have both per-processor private (L1) caches and a large shared (L2) cache on chip. We consider a broad class of parallel divide-and-conquer algorithms and present a new on-line scheduler, CONTROLLED-PDF, that is competitive with the standard sequential scheduler in the following sense. Given any dynamically unfolding computation DAG from this class of algorithms, the cache complexity on the multicore-cache model under our new scheduler is within a constant factor of the sequential cache complexity for both L1 and L2, while the time complexity is within a constant factor of the sequential time complexity divided by the number of processors p. These are the first such asymptotically-optimal results for any multicore model. Finally, we show that a separator-based algorithm for sparse-matrix-dense-vector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.

...read moreread less

127 citations

Journal Article•DOI•

Oracles for Distances Avoiding a Failed Node or Link

[...]

Camil Demetrescu, Mikkel Thorup¹, Rezaul Chowdhury², Vijaya Ramachandran²•Institutions (2)

AT&T¹, University of Texas at Austin²

01 Jan 2008-SIAM Journal on Computing

TL;DR: A deterministic oracle with constant query time for this problem that uses $O (n^2\log n)$ space, where $n$ is the number of vertices in $G$ and the construction time for the oracle is $O(mn^{2} + n^{3}\ log n)$.

...read moreread less

Abstract: We consider the problem of preprocessing an edge-weighted directed graph $G$ to answer queries that ask for the length and first hop of a shortest path from any given vertex $x$ to any given vertex $y$ avoiding any given vertex or edge. As a natural application, this problem models routing in networks subject to node or link failures. We describe a deterministic oracle with constant query time for this problem that uses $O(n^2\log n)$ space, where $n$ is the number of vertices in $G$. The construction time for our oracle is $O(mn^{2} + n^{3}\log n)$. However, if one is willing to settle for $\Theta (n^{2.5})$ space, we can improve the preprocessing time to $O(mn^{1.5}+n^{2.5}\log n)$ while maintaining the constant query time. Our algorithms can find the shortest path avoiding a failed node or link in time proportional to the length of the path.

...read moreread less

120 citations

Proceedings Article•DOI•

Cache-efficient dynamic programming algorithms for multicores

[...]

Rezaul Chowdhury¹, Vijaya Ramachandran¹•Institutions (1)

University of Texas at Austin¹

14 Jun 2008

TL;DR: This work develops a generic CMP algorithm with an associated tiling sequence and provides a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm.

...read moreread less

Abstract: We present cache-efficient chip multiprocessor (CMP) algorithms with good speed-up for some widely used dynamic programming algorithms. We consider three types of caching systems for CMPs: D-CMP with a private cache for each core, S-CMP with a single cache shared by all cores, and Multicore, which has private L1 caches and a shared L2 cache. We derive results for three classes of problems: local dependency dynamic programming (LDDP), Gaussian Elimination Paradigm (GEP), and parenthesis problem.For each class of problems, we develop a generic CMP algorithm with an associated tiling sequence. We then tailor this tiling sequence to each caching model and provide a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm.We present experimental results on an 8-core Opteron for two sequence alignment problems that are important examples of LDDP. Our experimental results show good speed-ups for simple versions of our algorithms.

...read moreread less

86 citations

Equivalence Between Priority Queues and Sorting.

[...]

Rezaul Chowdhury

01 Jan 2008

17 citations