Home
/
Authors
/
Duncan Poole

Author

Duncan Poole

Bio: Duncan Poole is an academic researcher from Nvidia. The author has contributed to research in topics: Message Passing Interface & CUDA. The author has an hindex of 8, co-authored 9 publications receiving 3324 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald.

[...]

Romelia Salomon-Ferrer¹, Andreas W. Götz¹, Duncan Poole², Scott M. Le Grand², Ross C. Walker¹ - Show less +1 more•Institutions (2)

University of California, San Diego¹, Nvidia²

20 Aug 2013-Journal of Chemical Theory and Computation

TL;DR: An implementation of explicit solvent all atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA-enabled GPUs, providing results that are statistically indistinguishable from the traditional CPU version of the software and with performance that exceeds that achievable by the CPUs running on all conventional CPU-based clusters and supercomputers.

...read moreread less

Abstract: We present an implementation of explicit solvent all atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA-enabled GPUs. First released publicly in April 2010 as part of version 11 of the AMBER MD package and further improved and optimized over the last two years, this implementation supports the three most widely used statistical mechanical ensembles (NVE, NVT, and NPT), uses particle mesh Ewald (PME) for the long-range electrostatics, and runs entirely on CUDA-enabled NVIDIA graphics processing units (GPUs), providing results that are statistically indistinguishable from the traditional CPU version of the software and with performance that exceeds that achievable by the CPU version of AMBER software running on all conventional CPU-based clusters and supercomputers. We briefly discuss three different precision models developed specifically for this work (SPDP, SPFP, and DPDP) and highlight the technical details of the approach as it extends beyond previously reported work [Gotz et al., J. Chem. Theory Comput. 2012, DOI: 10.1021/ct200909j; Le Grand et al., Comp. Phys. Comm. 2013, DOI: 10.1016/j.cpc.2012.09.022].We highlight the substantial improvements in performance that are seen over traditional CPU-only machines and provide validation of our implementation and precision models. We also provide evidence supporting our decision to deprecate the previously described fully single precision (SPSP) model from the latest release of the AMBER software package.

...read moreread less

2,418 citations

Journal Article•DOI•

Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born

[...]

Andreas W. Götz¹, Mark J. Williamson¹, Dong Xu¹, Duncan Poole², Scott M. Le Grand², Ross C. Walker¹ - Show less +2 more•Institutions (2)

University of California, San Diego¹, Nvidia²

15 Apr 2012-Journal of Chemical Theory and Computation

TL;DR: An implementation of generalized Born implicit solvent all-atom classical molecular dynamics within the AMBER program package that runs entirely on CUDA enabled NVIDIA graphics processing units (GPUs) and shows performance that is on par with, and in some cases exceeds, that of traditional supercomputers.

...read moreread less

Abstract: We present an implementation of generalized Born implicit solvent all-atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA enabled NVIDIA graphics processing units (GPUs). We discuss the algorithms that are used to exploit the processing power of the GPUs and show the performance that can be achieved in comparison to simulations on conventional CPU clusters. The implementation supports three different precision models in which the contributions to the forces are calculated in single precision floating point arithmetic but accumulated in double precision (SPDP), or everything is computed in single precision (SPSP) or double precision (DPDP). In addition to performance, we have focused on understanding the implications of the different precision models on the outcome of implicit solvent MD simulations. We show results for a range of tests including the accuracy of single point force evaluations and energy conservation as well as structural properties pertainining to protein dynamics. The numerical noise due to rounding errors within the SPSP precision model is sufficiently large to lead to an accumulation of errors which can result in unphysical trajectories for long time scale simulations. We recommend the use of the mixed-precision SPDP model since the numerical results obtained are comparable with those of the full double precision DPDP model and the reference double precision CPU implementation but at significantly reduced computational cost. Our implementation provides performance for GB simulations on a single desktop that is on par with, and in some cases exceeds, that of traditional supercomputers.

...read moreread less

1,645 citations

Proceedings Article•DOI•

UCX: An Open Source Framework for HPC Network APIs and Beyond

[...]

Pavel Shamis¹, Manjunath Gorentla Venkata¹, M. Graham Lopez¹, Matthew B. Baker¹, Oscar Hernandez¹, Yossi Itigin, Mike Dubman, Gilad Shainer, Richard L. Graham, Liran Liss, Yiftah Shahar, Sreeram Potluri², Davide Rossetti², Donald Becker², Duncan Poole², Christopher Lamb², Sameer Kumar³, Craig B. Stunkel³, George Bosilca⁴, Aurelien Bouteiller⁴ - Show less +16 more•Institutions (4)

Oak Ridge National Laboratory¹, Nvidia², IBM³, University of Tennessee⁴

26 Aug 2015

TL;DR: This paper presents Unified Communication X, a set of network APIs and their implementations for high throughput computing, and implements the APIs and protocols, and measures the performance of overhead-critical network primitives fundamental for implementing many parallel programming models and system libraries.

...read moreread less

Abstract: This paper presents Unified Communication X (UCX), a set of network APIs and their implementations for high throughput computing. UCX comes from the combined effort of national laboratories, industry, and academia to design and implement a high-performing and highly-scalable network stack for next generation applications and systems. UCX design provides the ability to tailor its APIs and network functionality to suit a wide variety of application domains and hardware. We envision these APIs to satisfy the networking needs of many programming models such as Message Passing Interface (MPI), OpenSHMEM, Partitioned Global Address Space (PGAS) languages, task-based paradigms and I/O bound applications. To evaluate the design we implement the APIs and protocols, and measure the performance of overhead-critical network primitives fundamental for implementing many parallel programming models and system libraries. Our results show that the latency, bandwidth, and message rate achieved by the portable UCX prototype is very close to that of the underlying driver. With UCX, we achieved a message exchange latency of 0.89 us, a bandwidth of 6138.5 MB/s, and a message rate of 14 million messages per second. As far as we know, this is the highest bandwidth and message rate achieved by any network stack (publicly known) on this hardware.

...read moreread less

129 citations

Journal Article•DOI•

Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19.

[...]

Atanu Acharya¹, Rupesh Agarwal², Rupesh Agarwal³, Matthew B. Baker³, Jerome Baudry⁴, Debsindhu Bhowmik³, Swen Boehm³, Kendall G. Byler⁴, Samuel Yen-Chi Chen⁵, Leighton Coates³, Connor J. Cooper³, Connor J. Cooper², Omar Demerdash³, Isabella Daidone⁶, John D. Eblen², John D. Eblen³, Sally R. Ellingson⁷, Stefano Forli⁸, Jens Glaser⁹, James C. Gumbart¹, John A. Gunnels¹⁰, Oscar Hernandez³, Stephan Irle³, Stephan Irle², Daniel W. Kneller³, Andrey Kovalevsky³, Jeffrey M. Larkin¹¹, Travis J Lawrence³, Scott LeGrand¹¹, Shih-Hsien Liu², Shih-Hsien Liu³, Julie C. Mitchell³, Gilchan Park⁵, Jerry M. Parks², Jerry M. Parks³, Anna Pavlova¹, Loukas Petridis³, Loukas Petridis², Duncan Poole¹¹, Line Pouchard⁵, Arvind Ramanathan¹², David M. Rogers⁹, Diogo Santos-Martins⁸, Aaron Scheinberg, Ada Sedova³, Y. Shen², Y. Shen³, Jeremy C. Smith², Jeremy C. Smith³, Micholas Dean Smith², Micholas Dean Smith³, Carlos Soto⁵, A. Tsaris⁹, Mathialakan Thavappiragasam³, Andreas F. Tillack⁸, Josh V. Vermaas⁹, V. Q. Vuong², V. Q. Vuong³, Junqi Yin⁹, Shinjae Yoo⁵, Mai Zahran¹³, Laura Zanetti-Polzi - Show less +58 more•Institutions (13)

Georgia Institute of Technology¹, University of Tennessee², Oak Ridge National Laboratory³, University of Alabama in Huntsville⁴, Brookhaven National Laboratory⁵, University of L'Aquila⁶, University of Kentucky⁷, Scripps Health⁸, National Center for Computational Sciences⁹, Amazon.com¹⁰, Nvidia¹¹, Argonne National Laboratory¹², New York City College of Technology¹³

16 Dec 2020-Journal of Chemical Information and Modeling

TL;DR: A supercomputer-driven pipeline for in silico drug discovery using enhanced sampling molecular dynamics (MD) and ensemble docking is presented, including the use of quantum mechanical, machine learning, and artificial intelligence methods to cluster MD trajectories and rescore docking poses.

...read moreread less

Abstract: We present a supercomputer-driven pipeline for in silico drug discovery using enhanced sampling molecular dynamics (MD) and ensemble docking. Ensemble docking makes use of MD results by docking compound databases into representative protein binding-site conformations, thus taking into account the dynamic properties of the binding sites. We also describe preliminary results obtained for 24 systems involving eight proteins of the proteome of SARS-CoV-2. The MD involves temperature replica exchange enhanced sampling, making use of massively parallel supercomputing to quickly sample the configurational space of protein drug targets. Using the Summit supercomputer at the Oak Ridge National Laboratory, more than 1 ms of enhanced sampling MD can be generated per day. We have ensemble docked repurposing databases to 10 configurations of each of the 24 SARS-CoV-2 systems using AutoDock Vina. Comparison to experiment demonstrates remarkably high hit rates for the top scoring tranches of compounds identified by our ensemble approach. We also demonstrate that, using Autodock-GPU on Summit, it is possible to perform exhaustive docking of one billion compounds in under 24 h. Finally, we discuss preliminary results and planned improvements to the pipeline, including the use of quantum mechanical (QM), machine learning, and artificial intelligence (AI) methods to cluster MD trajectories and rescore docking poses.

...read moreread less

120 citations

Proceedings Article•DOI•

Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs

[...]

Allen D. Malony¹, Scott Biersdorff¹, Sameer Shende¹, Heike Jagode², Stanimire Tomov², Guido Juckeland, Robert Dietrich, Duncan Poole³, Christopher Lamb³ - Show less +5 more•Institutions (3)

University of Oregon¹, University of Tennessee², Nvidia³

13 Sep 2011

TL;DR: A heterogeneous computation model and alternative host-GPU measurement approaches are discussed to set the stage for reporting new capabilities for heterogeneous parallel performance measurement in three leading HPC tools: PAPI, Vampir, and the TAU Performance System.

...read moreread less

Abstract: The power of GPUs is giving rise to heterogeneous parallel computing, with new demands on programming environments, runtime systems, and tools to deliver high-performing applications. This paper studies the problems associated with performance measurement of heterogeneous machines with GPUs. A heterogeneous computation model and alternative host-GPU measurement approaches are discussed to set the stage for reporting new capabilities for heterogeneous parallel performance measurement in three leading HPC tools: PAPI, Vampir, and the TAU Performance System. Our work leverages the new CUPTI tool support in NVIDIA's CUDA device library. Heterogeneous benchmarks from the SHOC suite are used to demonstrate the measurement methods and tool support.

...read moreread less

83 citations

Cited by

PDF

Open Access

More filters

Fast parallel algorithms for short-range molecular dynamics

[...]

Steven J. Plimpton¹•Institutions (1)

Sandia National Laboratories¹

01 May 1993

TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.

...read moreread less

Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

...read moreread less

29,323 citations

Journal Article•DOI•

Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald.

[...]

Romelia Salomon-Ferrer¹, Andreas W. Götz¹, Duncan Poole², Scott M. Le Grand², Ross C. Walker¹ - Show less +1 more•Institutions (2)

University of California, San Diego¹, Nvidia²

20 Aug 2013-Journal of Chemical Theory and Computation

...read moreread less

2,418 citations

Journal Article•DOI•

An overview of the Amber biomolecular simulation package

[...]

Romelia Salomon-Ferrer¹, David A. Case², Ross C. Walker¹•Institutions (2)

University of California, San Diego¹, Rutgers University²

01 Mar 2013-Wiley Interdisciplinary Reviews: Computational Molecular Science

TL;DR: The most recent developments, since version 9 was released in April 2006, of the Amber and AmberTools MD software packages are outlined, referred to here as simply the Amber package.

...read moreread less

Abstract: Molecular dynamics (MD) allows the study of biological and chemical systems at the atomistic level on timescales from femtoseconds to milliseconds. It complements experiment while also offering a way to follow processes difficult to discern with experimental techniques. Numerous software packages exist for conducting MD simulations of which one of the widest used is termed Amber. Here, we outline the most recent developments, since version 9 was released in April 2006, of the Amber and AmberTools MD software packages, referred to here as simply the Amber package. The latest release represents six years of continued development, since version 9, by multiple research groups and the culmination of over 33 years of work beginning with the first version in 1979. The latest release of the Amber package, version 12 released in April 2012, includes a substantial number of important developments in both the scientific and computer science arenas. We present here a condensed vision of what Amber currently supports and where things are likely to head over the coming years. Figure 1 shows the performance in ns/day of the Amber package version 12 on a single-core AMD FX-8120 8-Core 3.6GHz CPU, the Cray XT5 system, and a single GPU GTX680. © 2012 John Wiley & Sons, Ltd.

...read moreread less

1,734 citations

Journal Article•DOI•

OpenMM 7: Rapid development of high performance algorithms for molecular dynamics.

[...]

Peter Eastman¹, Jason M. Swails², John D. Chodera³, Robert T. McGibbon¹, Yutong Zhao¹, Kyle A. Beauchamp³, Lee-Ping Wang⁴, Andrew C. Simmonett⁵, Matthew P. Harrigan¹, Chaya D. Stern³, Rafal P. Wiewiora³, Bernard R. Brooks⁵, Vijay S. Pande¹ - Show less +9 more•Institutions (5)

Stanford University¹, Rutgers University², Memorial Sloan Kettering Cancer Center³, University of California, Davis⁴, National Institutes of Health⁵

26 Jul 2017-PLOS Computational Biology

TL;DR: OpenMM is a molecular dynamics simulation toolkit with a unique focus on extensibility, which makes it an ideal tool for researchers developing new simulation methods, and also allows those new methods to be immediately available to the larger community.

...read moreread less

Abstract: OpenMM is a molecular dynamics simulation toolkit with a unique focus on extensibility. It allows users to easily add new features, including forces with novel functional forms, new integration algorithms, and new simulation protocols. Those features automatically work on all supported hardware types (including both CPUs and GPUs) and perform well on all of them. In many cases they require minimal coding, just a mathematical description of the desired function. They also require no modification to OpenMM itself and can be distributed independently of OpenMM. This makes it an ideal tool for researchers developing new simulation methods, and also allows those new methods to be immediately available to the larger community.

...read moreread less

1,364 citations

Journal Article•DOI•

Lipid14: The Amber Lipid Force Field

[...]

Callum J. Dickson¹, Benjamin D. Madej², Åge A. Skjevik², Robin M. Betz², Knut Teigen³, Ian R. Gould¹, Ross C. Walker² - Show less +3 more•Institutions (3)

Imperial College London¹, University of California, San Diego², University of Bergen³

30 Jan 2014-Journal of Chemical Theory and Computation

TL;DR: The AMBER lipid force field has been updated to create Lipid14, allowing tensionless simulation of a number of lipid types with the AMBER MD package, and is compatible with theAMBER protein, nucleic acid, carbohydrate, and small molecule force fields.

...read moreread less

Abstract: The AMBER lipid force field has been updated to create Lipid14, allowing tensionless simulation of a number of lipid types with the AMBER MD package. The modular nature of this force field allows numerous combinations of head and tail groups to create different lipid types, enabling the easy insertion of new lipid species. The Lennard-Jones and torsion parameters of both the head and tail groups have been revised and updated partial charges calculated. The force field has been validated by simulating bilayers of six different lipid types for a total of 0.5 μs each without applying a surface tension; with favorable comparison to experiment for properties such as area per lipid, volume per lipid, bilayer thickness, NMR order parameters, scattering data, and lipid lateral diffusion. As the derivation of this force field is consistent with the AMBER development philosophy, Lipid14 is compatible with the AMBER protein, nucleic acid, carbohydrate, and small molecule force fields.

...read moreread less

973 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse