scispace - formally typeset
Search or ask a question
Author

Rezaul Chowdhury

Bio: Rezaul Chowdhury is an academic researcher from Stony Brook University. The author has contributed to research in topics: Cache & Cache-oblivious algorithm. The author has an hindex of 17, co-authored 83 publications receiving 1419 citations. Previous affiliations of Rezaul Chowdhury include University of Texas at Austin & Boston University.


Papers
More filters
Journal ArticleDOI
TL;DR: A new mergesort algorithm which can sort n(= 2h+1 − 1) elements using no more than n log2(n+1) − (1312)n − 1 element comparisons in the worst case is presented.
Abstract: In this paper, we present a new mergesort algorithm which can sort n(= 2h+1 − 1) elements using no more than n log2(n+1) − (1312)n − 1 element comparisons in the worst case. This algorithm includes the heap (fine heap) creation phase as a pre-processing step, and for each internal node v, its left and right subheaps are merged into a sorted list of the elements under that node. Experimental results show that this algorithm requires only n log2(n+1) − 1.2n element comparisons in the average case. But it requires extra space for n LINK fields.
Proceedings ArticleDOI
10 Nov 2012
TL;DR: An octree-based hierarchical algorithm, built on GreengardRokhlin type near and far decomposition of data points which calculates the polarization energy of a molecule using the r6 approximation of Generalized Born (GB) Radii of atoms
Abstract: When a molecule experiences an electric field, its charge distribution is relaxed in response to that field. The energy associated with this relaxation is known as the polarization energy . Computing the polarization energy between a ligand (i.e., a small molecule such as a drug molecule) and a receptor (e.g., a virus molecule) is of utmost importance in drug design, protein-protein docking, virus/bacterium cell analysis, molecular dynamics simulations for determining the molecular conformation with minimal total free energy. We have implemented distributed-memory and distributed shared-memory parallel algorithms for approximating polarization energy of a molecule by extending a prior work for shared-memory (multicore) architectures. This is an octree-based hierarchical algorithm, built on GreengardRokhlin type near and far decomposition of data points (i.e., atoms and points sampled from the molecular surface) which calculates the polarization energy of a molecule using the r6 approximation of Generalized Born (GB) Radii of atoms. Both Poisson-Boltzmann (PB) GeneralizedBorn (GB) models can be used for approximating polarization energy. However, due to high computational costs PB method is rarely used for large molecules such as proteins.
Journal ArticleDOI
TL;DR: A work-efficient parallel level-synchronous Breadth First Search (BFS) algorithm for shared-memory architectures which achieves the theoretical lower bound on parallel running time and the optimality holds regardless of the shape of the graph.
Abstract: . We present a work-efficient parallel level-synchronous Breadth First Search (BFS) algorithm for shared-memory architectures which achieves the theoretical lower bound on parallel running time. The optimality holds regardless of the shape of the graph. We also demonstrate the implication of this optimality for the energy consumption of the program empirically. The key idea is to never use more processing cores than necessary to complete the work in any computation step efficiently. We keep rest of the cores idle to save energy and to reduce other resource contentions (e.g., band-width, shared caches, etc). Our BFS does not use locks and atomic-instructions and is easily extendible to shared-memory coprocessors.
Journal ArticleDOI
TL;DR: In this article , the binomial option pricing problem is transformed into nonlinear stencil computation problems, and the problem is solved using FFT-based stencil algorithms and shown to span asymptotically.
Abstract: We study the binomial option pricing model and the Black-Scholes-Merton pricing model. In the binomial option pricing model, we concentrate on two widely-used call options: (1) European and (2) American. Under the Black-Scholes-Merton model, we investigate pricing American put options. Our contributions are two-fold: First, we transform the option pricing problems into nonlinear stencil computation problems and present efficient algorithms to solve them. Second, using our new FFT-based nonlinear stencil algorithms, we improve the work and span asymptotically for the option pricing problems we consider. In particular, we perform $O(T\log^2 T)$ work for both American call and put option pricing, where $T$ is the number of time steps.
Journal ArticleDOI
TL;DR: In this article , the authors present two cache-oblivious sorting-based convex hull algorithms in the Binary Forking Model, one achieves O(n)$ work, O(log n)$ span, and O (n/B)$ serial cache complexity, where B is the cache line size.
Abstract: We present two cache-oblivious sorting-based convex hull algorithms in the Binary Forking Model. The first is an algorithm for a presorted set of points which achieves $O(n)$ work, $O(\log n)$ span, and $O(n/B)$ serial cache complexity, where $B$ is the cache line size. These are all optimal worst-case bounds for cache-oblivious algorithms in the Binary Forking Model. The second adapts Cole and Ramachandran's cache-oblivious sorting algorithm, matching its properties including achieving $O(n \log n)$ work, $O(\log n \log \log n)$ span, and $O(n/B \log_M n)$ serial cache complexity. Here $M$ is the size of the private cache.

Cited by
More filters
01 May 1993
TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.
Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

29,323 citations

Book
02 Jan 1991

1,377 citations

Proceedings ArticleDOI
16 Jun 2013
TL;DR: A systematic model of the tradeoff space fundamental to stencil pipelines is presented, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule are presented.
Abstract: Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of magnitude. Efficient implementations require optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values.We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule. Combining this compiler with stochastic search over the space of schedules enables terse, composable programs to achieve state-of-the-art performance on a wide range of real image processing pipelines, and across different hardware architectures, including multicores with SIMD, and heterogeneous CPU+GPU execution. From simple Halide programs written in a few hours, we demonstrate performance up to 5x faster than hand-tuned C, intrinsics, and CUDA implementations optimized by experts over weeks or months, for image processing applications beyond the reach of past automatic compilers.

1,074 citations

Journal ArticleDOI
TL;DR: In this review, methods to adjust the polar solvation energy and to improve the performance of MM/PBSA and MM/GBSA calculations are reviewed and discussed and guidance is provided for practically applying these methods in drug design and related research fields.
Abstract: Molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) and molecular mechanics generalized Born surface area (MM/GBSA) are arguably very popular methods for binding free energy prediction since they are more accurate than most scoring functions of molecular docking and less computationally demanding than alchemical free energy methods. MM/PBSA and MM/GBSA have been widely used in biomolecular studies such as protein folding, protein-ligand binding, protein-protein interaction, etc. In this review, methods to adjust the polar solvation energy and to improve the performance of MM/PBSA and MM/GBSA calculations are reviewed and discussed. The latest applications of MM/GBSA and MM/PBSA in drug design are also presented. This review intends to provide readers with guidance for practically applying MM/PBSA and MM/GBSA in drug design and related research fields.

822 citations

Journal ArticleDOI
TL;DR: Docking against homology-modeled targets also becomes possible for proteins whose structures are not known, and the druggability of the compounds and their specificity against a particular target can be calculated for further lead optimization processes.
Abstract: Molecular docking methodology explores the behavior of small molecules in the binding site of a target protein. As more protein structures are determined experimentally using X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy, molecular docking is increasingly used as a tool in drug discovery. Docking against homology-modeled targets also becomes possible for proteins whose structures are not known. With the docking strategies, the druggability of the compounds and their specificity against a particular target can be calculated for further lead optimization processes. Molecular docking programs perform a search algorithm in which the conformation of the ligand is evaluated recursively until the convergence to the minimum energy is reached. Finally, an affinity scoring function, ΔG [U total in kcal/mol], is employed to rank the candidate poses as the sum of the electrostatic and van der Waals energies. The driving forces for these specific interactions in biological systems aim toward complementarities between the shape and electrostatics of the binding site surfaces and the ligand or substrate.

817 citations