Open AccessJournal ArticleDOI

Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform.

- 12 May 2015 -

Journal of Chemical Theory and Computati...

- Vol. 11, Iss: 6, pp 2600-2608

TLDR

This work describes how the distributed execution framework Copernicus allows the expression of algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning in generic workflows: dataflow programs and facilitates the optimization of these algorithms with adaptive sampling.

Abstract:

Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.

Content maybe subject to copyright Report

http://www.diva-portal.org

Postprint

This is the accepted version of a paper published in Journal of Chemical Theory and Computation. This

paper has been peer-reviewed but does not include the final publisher proof-corrections or journal

pagination.

Citation for the original published paper (version of record):

Pronk, S., Pouya, I., Lundborg, M., Rotskoff, G., Wesén, B. et al. (2015)

Molecular Simulation Workflows as Parallel Algorithms: The Execution Engine of Copernicus, a

Distributed High-Performance Computing Platform.

Journal of Chemical Theory and Computation, 11(6): 2600-2608

http://dx.doi.org/10.1021/acs.jctc.5b00234

Access to the published version may require subscription.

N.B. When citing this work, cite the original published paper.

Permanent link to this version:

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170691

Molecular Simulation Workﬂows as Parallel

Algorithms: The Execution Engine of

Copernicus, a Distributed High-Performance

Computing Platform

Sander Pronk,

†,§

Iman Pouya,

†,§

Magnus Lundborg,

‡

Grant Rotskoﬀ,

†

Bj¨orn

Wes´en,

†

Peter Kasson,

and Erik Lindahl

∗,†,‡

Swedish eScience Research Center, Department of Theoretical Physics, KTH Royal

Institute of Technology, Stockholm, Sweden, Department of Biochemistry and Biophysics,

Science for Life Laboratory, Stockholm University, and Dept. of Molecular Physiology and

Biological Physics, University of Virginia, Charlottesville, VA, USA

E-mail: erik.lindahl@scilifelab.se

Abstract

Computational chemistry and other simulation ﬁelds depend critically on com-

puting resources, but few problems scale eﬃciently to the hundreds of thousands of

processors available in current supercomputers - in particular for molecular dynamics.

This has turned into a bottleneck as new hardware generations primarily provide more

∗

To whom correspondence should be addressed

†

Swedish eScience Research Center, Department of Theoretical Physics, KTH Royal Institute of Tech-

nology, Stockholm, Sweden

‡

Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University

Dept. of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA

These authors contributed equally to this work.

processing units rather than making individual units much faster, which simulation

applications are addressing by increasingly focusing on sampling with algorithms such

as free energy perturbation, Markov state modeling, metadynamics or milestoning. All

these rely on combining results from multiple simulations into a single observation.

They are potentially powerful approaches that aim to directly predict experimental

observables, but this comes at the expense of added complexity in selecting sampling

strategies and keeping track of dozens to thousands of simulations and their depen-

dencies. Here, we describe how the distributed execution framework Copernicus allows

the expression of such algorithms in generic workﬂows: dataﬂow programs. Because

dataﬂow algorithms explicitly state dependencies of each constituent part, algorithms

only need to be described on conceptual level, after which the execution is maximally

parallel. The fully automated execution facilitates the optimization of these algorithms

with adaptive sampling, where undersampled regions are automatically detected and

targeted without user intervention. We show how several such algorithms can be for-

mulated for computational chemistry problems, and how they are executed eﬃciently

with many loosely coupled simulations using either distributed or parallel resources

with Copernicus.

1 Introduction

The performance of statistical mechanics-based simulations in chemistry and many other

ﬁelds has increased by several orders of magnitude with faster hardware and highly tuned

simulation codes.

1–3

Conceptually, algorithms such as molecular dynamics are inherently

parallelizable since particle interactions can be evaluated independently, but in practice it is

a very challenging problem when the evaluation has to be iterated for billions of dependent

time steps that only take a fraction of a millisecond each. Large eﬀorts have been invested

in improving performance through simpliﬁed models, new algorithms, and better scaling of

simulations,

4–7

not to mention special-purpose hardware.

8,9

Most force ﬁelds employed in molecular dynamics are based on representations devel-

oped in the 1960s that only require a few dozen ﬂoating-point operations per interaction.

This provides high simulation performance, but it limits scaling for small problems that

are common in biomolecular research. With a few thousand particles there are not enough

ﬂoating-point operations to spread over 100,000 cores in less than a millisecond, no mat-

ter what algorithm or code is used. This limit to strong scaling is typically expressed in

a minimum number of atoms/core and is becoming an increasingly challenging barrier as

computing resources increase in core numbers. Computational power is expected to con-

tinue to increase exponentially, but it will predominantly come from increased numbers of

processing units rather than faster individual units, including the use of GPUs and similar

accelerators.

One potential solution to this problem derives from the higher-level analyses commonly

used for simulations. In computational chemistry and related disciplines, a study almost

never relies on a single simulation trajectory — multiple runs are used even in simple studies

for uncertainty quantiﬁcation and for comparison between conditions. Furthermore, sam-

pling and ensemble techniques

12–17

are increasingly used to combine many simulation trajec-

tories into a higher-level model that is then compared to experimental data. This presents

an opportunity for increased parallelism across simulation trajectories as well as within each

trajectory. Simulation trajectories need not be completely independent, as some algorithms

rely upon data exchange between simulations, but they should be loosely coupled compared

to the tight coupling within simulations. This looser coupling permits eﬃcient paralleliza-

tion over much larger core counts and potentially higher latency interconnects than would

be practical for a single simulation trajectory with a comparable number of atoms.

In this paper, we describe the execution engine of Copernicus:

a parallel computation

platform for large-scale sampling applications. The execution is based on formulating high-

level workﬂows in a dataﬂow algorithm. These workﬂows are then analyzed for dependencies,

and all independent elements will automatically be executed in parallel. Copernicus has a

fully modular structure that is independent of the simulation engine used to run individual

trajectories. We have initially focused on writing plugins for the Gromacs molecular simu-

lation toolkit, but this can easily be exchanged for any other implementation. Similarly, the

core Copernicus framework is designed to permit easy implementation of a wide variety of

sampling algorithms, which are implemented as plugins for the dataﬂow execution engine.

As described below, the Copernicus formalism allows straightforward speciﬁcation of any

sampling or statistical-mechanics approach; once this has been done, the dataﬂow engine

takes care of scheduling, executing, and processing the simulations required for the problem.

The advantage of Copernicus compared to a completely general-purpose dataﬂow engine is

that the structure of statistical-mechanics simulations is infused into the design of the engine,

so it works much better ”out of the box” for such applications.

2 Formulating a Workﬂow as a Dataﬂow

The key to parallelism in Copernicus is formulating problems as dataﬂow networks. This is

illustrated in Fig. 1 for a simple example: free energy perturbation. In this calculation, the

enthalpy and entropy changes associated with an event such as the binding of a molecule to

a protein are calculated using a thermodynamic cycle composed of many individual simula-

tions. In general, a free energy diﬀerence cannot be computed directly since the start and end

conformations sample diﬀerent parts of phase space. This problem is handled by artiﬁcially

separating the change into many stages:

19,20

each of these requires an individual molecular

dynamics simulation so the diﬀerence between adjacent points is small enough for them to

sample overlapping states. When simulations are ﬁnished, post-processing of the combined

output yields the free energy. Clearly, the individual simulations can be run in parallel. This

is apparent from the diagram of Fig. 1, because the links between the nodes denote the ﬂow

of data and explicitly show dependencies. The workﬂow therefore is a dataﬂow diagram and

thus can be executed by an algorithm that runs each individual component when its data

HTML Viewer

Figures

Figure 4: Example function instances. By using a function array the actual work will be performed separately on each element in the variable size input, and produce an output array of the same size (top). A function instance will appear as a black box externally, but internally it can contain subnets, which in turn can be instantiated into arbitrary new copies on the fly (bottom).

Figure 3: An annotated simplified Copernicus project. The project consists of runnable function instances corresponding to executable programs, and connections that describe the flow of data and dependencies. This assembly is called a network, and a complete such unit can itself be used as a network function instance with well-defined input and output as part of a larger project.

Figure 5: Function instance execution. In general, function execution will correspond to large external computations such as simulations or analyses queued for running on worker hardware. When executions return output to the function, it will either propagate data from finished runs in the network or automatically reissue the command if a worker failed.

Figure 6: Free energy of solvation workflow after initialization, equilibration (iter 0) and the first iteration of production simulations. New iteration subnets to improve the precision are instantiated automatically. In most cases only the final output is of interest, but it is trivial to manually query specific data from functions inside any subnet. Even this simple project uses function arrays, subnets, and dynamic instantiation for flow control.

Figure 10: Convergence of swarms transition paths in dialanine. Compared to the default charmm27 force field (left), the inclusion of the cmap correction term leads to a slightly different transition path that converges in a local saddlepoint (right). Reference energy landscapes calculated with metadynamics.17

Figure 9: Swarms project after two generations. The module requires two states and a reaction coordinate, typically all or some of the torsions in the system. The first generation performs minimization, thermalization and equilibration, followed by the large number of short swarm simulations. The final step updates the path and reparameterizes the spacing of points, after which the subnet function instantiates a similar function for the next generation.

Open Access

More filters

Journal ArticleDOI

Comparison of simple potential functions for simulating liquid water

William L. Jorgensen, +4 more

- 15 Jul 1983 -

Journal of Chemical Physics

TL;DR: In this article, the authors compared the Bernal Fowler (BF), SPC, ST2, TIPS2, TIP3P, and TIP4P potential functions for liquid water in the NPT ensemble at 25°C and 1 atm.

...read moreread less

Journal ArticleDOI

CHARMM: A program for macromolecular energy, minimization, and dynamics calculations

Bernard R. Brooks, +5 more

- 01 Jun 1983 -

Journal of Computational Chemistry

TL;DR: The CHARMM (Chemistry at Harvard Macromolecular Mechanics) as discussed by the authors is a computer program that uses empirical energy functions to model macromolescular systems, and it can read or model build structures, energy minimize them by first- or second-derivative techniques, perform a normal mode or molecular dynamics simulation, and analyze the structural, equilibrium, and dynamic properties determined in these calculations.

...read moreread less

Journal ArticleDOI

Scalable molecular dynamics with NAMD

James C. Phillips, +9 more

- 01 Dec 2005 -

Journal of Computational Chemistry

TL;DR: NAMD as discussed by the authors is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems that scales to hundreds of processors on high-end parallel platforms, as well as tens of processors in low-cost commodity clusters, and also runs on individual desktop and laptop computers.

...read moreread less

Book

Scalable Molecular Dynamics with NAMD

James C. Phillips, +5 more

Journal ArticleDOI

The Amber biomolecular simulation programs

David A. Case, +9 more

- 01 Dec 2005 -

Journal of Computational Chemistry

TL;DR: The development, current features, and some directions for future development of the Amber package of computer programs, which contains a group of programs embodying a number of powerful tools of modern computational chemistry, focused on molecular dynamics and free energy calculations of proteins, nucleic acids, and carbohydrates.

...read moreread less

Collapse

Scalable molecular dynamics with NAMD

James C. Phillips, +9 more

- 01 Dec 2005 -

Journal of Computational Chemistry

GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

Mark Abraham, +9 more

- 01 Sep 2015 -

SoftwareX

Markov models of molecular kinetics: Generation and validation

Jan-Hendrik Prinz, +8 more

- 04 May 2011 -

Journal of Chemical Physics

Frequently Asked Questions (14)

Q1. What contributions have the authors mentioned in the paper "Molecular simulation workflows as parallel algorithms: the execution engine of copernicus, a distributed high-performance computing platform" ?

This has turned into a bottleneck as new hardware generations primarily provide more ∗To whom correspondence should be addressed †Swedish eScience Research Center, Department of Theoretical Physics, KTH Royal Institute of Technology, Stockholm, Sweden ‡Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University ¶Dept. of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA §These authors contributed equally to this work.

Q2. What type of dynamism is supported in the dataflow network?

In order to enable dynamic execution (such as iterations and conditionals), two types of dynamism are supported in the dataflow network.

Q3. How many structures were generated from each trajectory?

Relaxation simulations of 25 ps at 300 K with dihedral restraints (4000 kJ mol−1 rad−2) were used to generate 20 structures from each trajectory, all of which were run for 30 fs without restraints.

Q4. What is the main advantage of the dataflow network formalism?

The dataflow network formalism also enables more sophisticated approaches such as altering the simulation setup to achieve more efficient overlap with a different distribution of stages based on short initial runs (known as adaptive lambda spacing).

Q5. What is the type of input socket on a function instance?

The data in the dataflow program flows from output sockets to input sockets, both of which are strongly typed: the type of an input socket on a function instance must match the type of the output socket to which it is connected.

Q6. What is the advantage of using explicit dataflow descriptions?

An advantage of using explicit dataflow descriptions is that program execution becomes transparent to the user; any value can be examined or set at any time.

Q7. How many cores can be spread in a millisecond?

With a few thousand particles there are not enough floating-point operations to spread over 100,000 cores in less than a millisecond, no matter what algorithm or code is used.

Q8. How many short simulations can the swarms module perform?

For large solvated protein complexes, the Copernicus swarms module can simultaneously execute over 10,000 short simulations if given a sufficient pool of workers.

Q9. How many efforts have been made to improve performance of molecular dynamics?

Large efforts have been invested in improving performance through simplified models, new algorithms, and better scaling of simulations,4–7 not to mention special-purpose hardware.

Q10. What is the second type of dynamism associated with arrays?

The second type of dynamism is associated with arrays: instance arrays will instantiate as many copies of a function as there are inputs in its array of function inputs; the output is an array of function outputs (Fig. 4).

Q11. How many cores can a worker allocate to execute?

Copernicus is also capable of using e.g. a 10,000-core worker allocation to execute 100 separate function instances each needing 100 cores.

Q12. What is the common way to use a single simulation trajectory?

In computational chemistry and related disciplines, a study almost never relies on a single simulation trajectory — multiple runs are used even in simple studies for uncertainty quantification and for comparison between conditions.

Q13. What makes MSM a very attractive sampling method for distributed computing?

combined with the high level of parallelism inherent in many hundreds of trajectories, makes MSM a very attractive sampling method for distributed computing.

Q14. What is the easiest way to illustrate this?

The easiest way to illustrate this is to use an example:> cpcc get fe.iter_lj_1.out.dgHere, the authors use the top-level function fe, in which the authors access the instance called iter_lj_1, which is the first iteration of the Lennard-Jones decoupling.

Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform.

Figures

Citations

GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models.

High Performance Computing for Cyber Physical Social Systems by Using Evolutionary Multi-Objective Optimization Algorithm

Molecular dynamics simulations of membrane proteins and their interactions: from nanoscale to mesoscale.

Combining experimental and simulation data of molecular processes via augmented Markov models

References

Comparison of simple potential functions for simulating liquid water

CHARMM: A program for macromolecular energy, minimization, and dynamics calculations

Scalable molecular dynamics with NAMD

Scalable Molecular Dynamics with NAMD

The Amber biomolecular simulation programs

Related Papers (5)

Scalable molecular dynamics with NAMD

GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

Gromacs 4.5

Markov models of molecular kinetics: Generation and validation

The Amber biomolecular simulation programs

Frequently Asked Questions (14)

Q1. What contributions have the authors mentioned in the paper "Molecular simulation workflows as parallel algorithms: the execution engine of copernicus, a distributed high-performance computing platform" ?

Q2. What type of dynamism is supported in the dataflow network?

Q3. How many structures were generated from each trajectory?

Q4. What is the main advantage of the dataflow network formalism?

Q5. What is the type of input socket on a function instance?

Q6. What is the advantage of using explicit dataflow descriptions?

Q7. How many cores can be spread in a millisecond?

Q8. How many short simulations can the swarms module perform?

Q9. How many efforts have been made to improve performance of molecular dynamics?

Q10. What is the second type of dynamism associated with arrays?

Q11. How many cores can a worker allocate to execute?

Q12. What is the common way to use a single simulation trajectory?

Q13. What makes MSM a very attractive sampling method for distributed computing?

Q14. What is the easiest way to illustrate this?