scispace - formally typeset
Search or ask a question
Author

Rezaul Chowdhury

Bio: Rezaul Chowdhury is an academic researcher from Stony Brook University. The author has contributed to research in topics: Cache & Cache-oblivious algorithm. The author has an hindex of 17, co-authored 83 publications receiving 1419 citations. Previous affiliations of Rezaul Chowdhury include University of Texas at Austin & Boston University.


Papers
More filters
Journal ArticleDOI
TL;DR: The 'Dynamic Packing Grid' (DPG), a neighborhood data structure for maintaining and manipulating flexible molecules and assemblies, is presented for efficient computation of binding affinities in drug design or in molecular dynamics calculations.
Abstract: Motivation: We present the ‘Dynamic Packing Grid’ (DPG), a neighborhood data structure for maintaining and manipulating flexible molecules and assemblies, for efficient computation of binding affinities in drug design or in molecular dynamics calculations. Results:DPG can efficiently maintain the molecular surface using only linear space and supports quasi-constant time insertion, deletion and movement (i.e. updates) of atoms or groups of atoms. DPG also supports constant time neighborhood queries from arbitrary points. Our results for maintenance of molecular surface and polarization energy computations using DPG exhibit marked improvement in time and space requirements. Availability: http://www.cs.utexas.edu/~bajaj/cvc/software/DPG.shtml Contact: bajaj@cs.utexas.edu Supplementary information:Supplementary data are available at Bioinformatics online.

12 citations

Journal ArticleDOI
TL;DR: A new exact string-matching algorithm with sub-linear average case complexity has been presented that never performs more than n text character comparisons while working on a text of length n.
Abstract: In this paper a new exact string-matching algorithm with sub-linear average case complexity has been presented. Unlike other sub-linear string-matching algorithms it never performs more than n text character comparisons while working on a text of length n. It requires only O(mþs) extra pre-processing time and space, where m is the length of the pattern and s is the size of the alphabet.

10 citations

Proceedings ArticleDOI
20 Oct 2014
TL;DR: This paper presents a variant of the parallel trapezoidal decomposition algorithm called "cache-oblivious wavefront" (COW) that starts execution of recursive subtasks earlier than the start time prescribed by the original algorithm without violating any real dependencies implied by the underlying recurrences, and thus reducing serialization due to artificial dependencies.
Abstract: The state-of-the-art "trapezoidal decomposition algorithm" for stencil computations on modern multicore machines use recursive divide-and-conquer (DAC) to achieve asymptotically optimal cache complexity cache-obliviously. But the same DAC approach restricts parallelism by introducing artificial dependencies among subtasks in addition to those arising from the defining stencil equations. As a result, the trapezoidal decomposition algorithm has suboptimal parallelism.In this paper we present a variant of the parallel trapezoidal decomposition algorithm called "cache-oblivious wavefront" (COW) that starts execution of recursive subtasks earlier than the start time prescribed by the original algorithm without violating any real dependencies implied by the underlying recurrences, and thus reducing serialization due to artificial dependencies. The reduction in serialization leads to an improvement in parallelism. Moreover, since we do not change the DAC-based decomposition of tasks used in the original algorithm, cache performance does not suffer.We provide experimental measurements of absolute running times, burdened span by Cilkview, and L1/L2 cache misses by PAPI to validate our claims.

10 citations

Proceedings ArticleDOI
05 Oct 2009
TL;DR: The "Dynamic Packing Grid" (DPG) data structure is presented along with details of the implementation and performance results, for maintaining and manipulating flexible molecular models and assemblies, and can additionally be utilized in efficiently maintaining multiple "rigid" domains of dynamic flexible molecules.
Abstract: We present the "Dynamic Packing Grid" (DPG) data structure along with details of our implementation and performance results, for maintaining and manipulating flexible molecular models and assemblies. DPG can efficiently maintain the molecular surface (e.g., van der Waals surface and the solvent contact surface) under insertion/deletion/movement (i.e., updates) of atoms or groups of atoms. DPG also permits the fast estimation of important molecular properties (e.g., surface area, volume, polarization energy, etc.) that are needed for computing binding affinities in drug design or in molecular dynamics calculations. DPG can additionally be utilized in efficiently maintaining multiple "rigid" domains of dynamic flexible molecules. In DPG, each up-date takes only O (log w) time w.h.p. on a RAM with w-bit words i.e., O (1) time in practice, and hence is extremely fast. DPG's queries include the reporting of all atoms within O (rmax) distance from any given atom center or point in 3-space in O (log log w) (= O (1)) time w.h.p., where rmax is the radius of the largest atom in the molecule. It can also answer whether a given atom is exposed or buried under the surface within the same time bound, and can return the entire molecular surface in O (m) worst-case time, where m is the number of atoms on the surface. The data structure uses space linear in the number of atoms in the molecule.

9 citations

Journal ArticleDOI
TL;DR: The notion of a slim data structure that captures the situation when only a limited portion of the cache is available to the data structure to retain data between data structural operations is introduced and a buffer heap automatically adapts to such an environment and supports all operations in O(1/λ + 1/B log2 N/λ) amortized block transfers each when the size of the slim cache is λ.
Abstract: We present the buffer heap, a cache-oblivious priority queue that supports Delete-Min, Delete, and a hybrid Insert/Decrease-Key operation in O(1/B log2 N/M) amortized block transfers from main memory, where M and B are the (unknown) cache size and block size, respectively, and N is the number of elements in the queue. We introduce the notion of a slim data structure that captures the situation when only a limited portion of the cache, which we call a slim cache, is available to the data structure to retain data between data structural operations. We show that a buffer heap automatically adapts to such an environment and supports all operations in O(1/λ + 1/B log2 N/λ) amortized block transfers each when the size of the slim cache is λ. Our results provide substantial improvements over known trivial cache performance bounds for cache-oblivious priority queues with Decrease-Keys. Using the buffer heap, we present cache-oblivious implementations of Dijkstra’s algorithm for undirected and directed single-source shortest path (SSSP) problems for graphs with non-negative real edge-weights. On a graph with n vertices and m edges, our algorithm for the undirected case performs O(n + m/B log2 n/M) block transfers and for the directed case performs O((n + m/B) c log2 n/B) block transfers. These results give the first non-trivial cache-oblivious bounds for the SSSP problem on general graphs. For the all-pairs shortest path (APSP) problem on weighted undirected graphs, we incorporate slim buffer heaps into multi-buffer-buffer-heaps and use these to improve the cache-aware cache complexity. We also present a simple cache-oblivious APSP algorithm for unweighted undirected graphs that performs O(mn/B logM/B n/B) block transfers. This matches the cache-aware bound and is a substantial improvement over the previous cache-oblivious bound for the problem.

8 citations


Cited by
More filters
01 May 1993
TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.
Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

29,323 citations

Book
02 Jan 1991

1,377 citations

Proceedings ArticleDOI
16 Jun 2013
TL;DR: A systematic model of the tradeoff space fundamental to stencil pipelines is presented, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule are presented.
Abstract: Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of magnitude. Efficient implementations require optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values.We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule. Combining this compiler with stochastic search over the space of schedules enables terse, composable programs to achieve state-of-the-art performance on a wide range of real image processing pipelines, and across different hardware architectures, including multicores with SIMD, and heterogeneous CPU+GPU execution. From simple Halide programs written in a few hours, we demonstrate performance up to 5x faster than hand-tuned C, intrinsics, and CUDA implementations optimized by experts over weeks or months, for image processing applications beyond the reach of past automatic compilers.

1,074 citations

Journal ArticleDOI
TL;DR: In this review, methods to adjust the polar solvation energy and to improve the performance of MM/PBSA and MM/GBSA calculations are reviewed and discussed and guidance is provided for practically applying these methods in drug design and related research fields.
Abstract: Molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) and molecular mechanics generalized Born surface area (MM/GBSA) are arguably very popular methods for binding free energy prediction since they are more accurate than most scoring functions of molecular docking and less computationally demanding than alchemical free energy methods. MM/PBSA and MM/GBSA have been widely used in biomolecular studies such as protein folding, protein-ligand binding, protein-protein interaction, etc. In this review, methods to adjust the polar solvation energy and to improve the performance of MM/PBSA and MM/GBSA calculations are reviewed and discussed. The latest applications of MM/GBSA and MM/PBSA in drug design are also presented. This review intends to provide readers with guidance for practically applying MM/PBSA and MM/GBSA in drug design and related research fields.

822 citations

Journal ArticleDOI
TL;DR: Docking against homology-modeled targets also becomes possible for proteins whose structures are not known, and the druggability of the compounds and their specificity against a particular target can be calculated for further lead optimization processes.
Abstract: Molecular docking methodology explores the behavior of small molecules in the binding site of a target protein. As more protein structures are determined experimentally using X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy, molecular docking is increasingly used as a tool in drug discovery. Docking against homology-modeled targets also becomes possible for proteins whose structures are not known. With the docking strategies, the druggability of the compounds and their specificity against a particular target can be calculated for further lead optimization processes. Molecular docking programs perform a search algorithm in which the conformation of the ligand is evaluated recursively until the convergence to the minimum energy is reached. Finally, an affinity scoring function, ΔG [U total in kcal/mol], is employed to rank the candidate poses as the sum of the electrostatic and van der Waals energies. The driving forces for these specific interactions in biological systems aim toward complementarities between the shape and electrostatics of the binding site surfaces and the ligand or substrate.

817 citations