scispace - formally typeset
Open AccessJournal ArticleDOI

Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform.

TLDR
This work describes how the distributed execution framework Copernicus allows the expression of algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning in generic workflows: dataflow programs and facilitates the optimization of these algorithms with adaptive sampling.
Abstract
Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.

read more

Content maybe subject to copyright    Report

http://www.diva-portal.org
Postprint
This is the accepted version of a paper published in Journal of Chemical Theory and Computation. This
paper has been peer-reviewed but does not include the final publisher proof-corrections or journal
pagination.
Citation for the original published paper (version of record):
Pronk, S., Pouya, I., Lundborg, M., Rotskoff, G., Wesén, B. et al. (2015)
Molecular Simulation Workflows as Parallel Algorithms: The Execution Engine of Copernicus, a
Distributed High-Performance Computing Platform.
Journal of Chemical Theory and Computation, 11(6): 2600-2608
http://dx.doi.org/10.1021/acs.jctc.5b00234
Access to the published version may require subscription.
N.B. When citing this work, cite the original published paper.
Permanent link to this version:
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170691

Molecular Simulation Workflows as Parallel
Algorithms: The Execution Engine of
Copernicus, a Distributed High-Performance
Computing Platform
Sander Pronk,
,§
Iman Pouya,
,§
Magnus Lundborg,
Grant Rotskoff,
Bj¨orn
Wes´en,
Peter Kasson,
and Erik Lindahl
,,
Swedish eScience Research Center, Department of Theoretical Physics, KTH Royal
Institute of Technology, Stockholm, Sweden, Department of Biochemistry and Biophysics,
Science for Life Laboratory, Stockholm University, and Dept. of Molecular Physiology and
Biological Physics, University of Virginia, Charlottesville, VA, USA
E-mail: erik.lindahl@scilifelab.se
Abstract
Computational chemistry and other simulation fields depend critically on com-
puting resources, but few problems scale efficiently to the hundreds of thousands of
processors available in current supercomputers - in particular for molecular dynamics.
This has turned into a bottleneck as new hardware generations primarily provide more
To whom correspondence should be addressed
Swedish eScience Research Center, Department of Theoretical Physics, KTH Royal Institute of Tech-
nology, Stockholm, Sweden
Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University
Dept. of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
§
These authors contributed equally to this work.
1

processing units rather than making individual units much faster, which simulation
applications are addressing by increasingly focusing on sampling with algorithms such
as free energy perturbation, Markov state modeling, metadynamics or milestoning. All
these rely on combining results from multiple simulations into a single observation.
They are potentially powerful approaches that aim to directly predict experimental
observables, but this comes at the expense of added complexity in selecting sampling
strategies and keeping track of dozens to thousands of simulations and their depen-
dencies. Here, we describe how the distributed execution framework Copernicus allows
the expression of such algorithms in generic workflows: dataflow programs. Because
dataflow algorithms explicitly state dependencies of each constituent part, algorithms
only need to be described on conceptual level, after which the execution is maximally
parallel. The fully automated execution facilitates the optimization of these algorithms
with adaptive sampling, where undersampled regions are automatically detected and
targeted without user intervention. We show how several such algorithms can be for-
mulated for computational chemistry problems, and how they are executed efficiently
with many loosely coupled simulations using either distributed or parallel resources
with Copernicus.
1 Introduction
The performance of statistical mechanics-based simulations in chemistry and many other
fields has increased by several orders of magnitude with faster hardware and highly tuned
simulation codes.
1–3
Conceptually, algorithms such as molecular dynamics are inherently
parallelizable since particle interactions can be evaluated independently, but in practice it is
a very challenging problem when the evaluation has to be iterated for billions of dependent
time steps that only take a fraction of a millisecond each. Large efforts have been invested
in improving performance through simplified models, new algorithms, and better scaling of
simulations,
4–7
not to mention special-purpose hardware.
8,9
2

Most force fields employed in molecular dynamics are based on representations devel-
oped in the 1960s that only require a few dozen floating-point operations per interaction.
10
This provides high simulation performance, but it limits scaling for small problems that
are common in biomolecular research. With a few thousand particles there are not enough
floating-point operations to spread over 100,000 cores in less than a millisecond, no mat-
ter what algorithm or code is used. This limit to strong scaling is typically expressed in
a minimum number of atoms/core and is becoming an increasingly challenging barrier as
computing resources increase in core numbers. Computational power is expected to con-
tinue to increase exponentially, but it will predominantly come from increased numbers of
processing units rather than faster individual units, including the use of GPUs and similar
accelerators.
11
One potential solution to this problem derives from the higher-level analyses commonly
used for simulations. In computational chemistry and related disciplines, a study almost
never relies on a single simulation trajectory multiple runs are used even in simple studies
for uncertainty quantification and for comparison between conditions. Furthermore, sam-
pling and ensemble techniques
12–17
are increasingly used to combine many simulation trajec-
tories into a higher-level model that is then compared to experimental data. This presents
an opportunity for increased parallelism across simulation trajectories as well as within each
trajectory. Simulation trajectories need not be completely independent, as some algorithms
rely upon data exchange between simulations, but they should be loosely coupled compared
to the tight coupling within simulations. This looser coupling permits efficient paralleliza-
tion over much larger core counts and potentially higher latency interconnects than would
be practical for a single simulation trajectory with a comparable number of atoms.
In this paper, we describe the execution engine of Copernicus:
18
a parallel computation
platform for large-scale sampling applications. The execution is based on formulating high-
level workflows in a dataflow algorithm. These workflows are then analyzed for dependencies,
and all independent elements will automatically be executed in parallel. Copernicus has a
3

fully modular structure that is independent of the simulation engine used to run individual
trajectories. We have initially focused on writing plugins for the Gromacs molecular simu-
lation toolkit, but this can easily be exchanged for any other implementation. Similarly, the
core Copernicus framework is designed to permit easy implementation of a wide variety of
sampling algorithms, which are implemented as plugins for the dataflow execution engine.
As described below, the Copernicus formalism allows straightforward specification of any
sampling or statistical-mechanics approach; once this has been done, the dataflow engine
takes care of scheduling, executing, and processing the simulations required for the problem.
The advantage of Copernicus compared to a completely general-purpose dataflow engine is
that the structure of statistical-mechanics simulations is infused into the design of the engine,
so it works much better ”out of the box” for such applications.
2 Formulating a Workflow as a Dataflow
The key to parallelism in Copernicus is formulating problems as dataflow networks. This is
illustrated in Fig. 1 for a simple example: free energy perturbation. In this calculation, the
enthalpy and entropy changes associated with an event such as the binding of a molecule to
a protein are calculated using a thermodynamic cycle composed of many individual simula-
tions. In general, a free energy difference cannot be computed directly since the start and end
conformations sample different parts of phase space. This problem is handled by artificially
separating the change into many stages:
19,20
each of these requires an individual molecular
dynamics simulation so the difference between adjacent points is small enough for them to
sample overlapping states. When simulations are finished, post-processing of the combined
output yields the free energy. Clearly, the individual simulations can be run in parallel. This
is apparent from the diagram of Fig. 1, because the links between the nodes denote the flow
of data and explicitly show dependencies. The workflow therefore is a dataflow diagram and
thus can be executed by an algorithm that runs each individual component when its data
4

Figures
Citations
More filters
Journal ArticleDOI

GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

TL;DR: GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules, and provides a rich set of calculation types.
Journal ArticleDOI

PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models.

TL;DR: The open-source Python package PyEMMA is presented, derived a systematic and accurate way to coarse-grain MSMs to few states and to illustrate the structures of the metastable states of the system.
Journal ArticleDOI

High Performance Computing for Cyber Physical Social Systems by Using Evolutionary Multi-Objective Optimization Algorithm

TL;DR: A kind of high performance computing approaches, evolutionary multi-objective optimization (EMO) algorithms, is used to deal with multi-Objective optimization problems in CPSS, an emerging complicated topic which is a combination of cyberspace, physical space, and social space.
Journal ArticleDOI

Molecular dynamics simulations of membrane proteins and their interactions: from nanoscale to mesoscale.

TL;DR: In this article, molecular dynamics simulations provide a computational tool to probe membrane proteins and systems at length scales ranging from nanometers to close to a micrometer, and on microsecond timescales.
Journal ArticleDOI

Combining experimental and simulation data of molecular processes via augmented Markov models

TL;DR: It is taken that simulations using a sufficiently good force-field sample conformations that are valid but have inaccurate weights, yet these weights may be made accurate by incorporating experimental data a posteriori, constitutes a unique avenue to combine experiment and computation into integrative models of biomolecular structure and dynamics.
References
More filters
Journal ArticleDOI

Comparison of simple potential functions for simulating liquid water

TL;DR: In this article, the authors compared the Bernal Fowler (BF), SPC, ST2, TIPS2, TIP3P, and TIP4P potential functions for liquid water in the NPT ensemble at 25°C and 1 atm.
Journal ArticleDOI

CHARMM: A program for macromolecular energy, minimization, and dynamics calculations

TL;DR: The CHARMM (Chemistry at Harvard Macromolecular Mechanics) as discussed by the authors is a computer program that uses empirical energy functions to model macromolescular systems, and it can read or model build structures, energy minimize them by first- or second-derivative techniques, perform a normal mode or molecular dynamics simulation, and analyze the structural, equilibrium, and dynamic properties determined in these calculations.
Journal ArticleDOI

Scalable molecular dynamics with NAMD

TL;DR: NAMD as discussed by the authors is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems that scales to hundreds of processors on high-end parallel platforms, as well as tens of processors in low-cost commodity clusters, and also runs on individual desktop and laptop computers.
Journal ArticleDOI

The Amber biomolecular simulation programs

TL;DR: The development, current features, and some directions for future development of the Amber package of computer programs, which contains a group of programs embodying a number of powerful tools of modern computational chemistry, focused on molecular dynamics and free energy calculations of proteins, nucleic acids, and carbohydrates.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What contributions have the authors mentioned in the paper "Molecular simulation workflows as parallel algorithms: the execution engine of copernicus, a distributed high-performance computing platform" ?

This has turned into a bottleneck as new hardware generations primarily provide more ∗To whom correspondence should be addressed †Swedish eScience Research Center, Department of Theoretical Physics, KTH Royal Institute of Technology, Stockholm, Sweden ‡Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University ¶Dept. of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA §These authors contributed equally to this work. 

In order to enable dynamic execution (such as iterations and conditionals), two types of dynamism are supported in the dataflow network. 

Relaxation simulations of 25 ps at 300 K with dihedral restraints (4000 kJ mol−1 rad−2) were used to generate 20 structures from each trajectory, all of which were run for 30 fs without restraints. 

The dataflow network formalism also enables more sophisticated approaches such as altering the simulation setup to achieve more efficient overlap with a different distribution of stages based on short initial runs (known as adaptive lambda spacing). 

The data in the dataflow program flows from output sockets to input sockets, both of which are strongly typed: the type of an input socket on a function instance must match the type of the output socket to which it is connected. 

An advantage of using explicit dataflow descriptions is that program execution becomes transparent to the user; any value can be examined or set at any time. 

With a few thousand particles there are not enough floating-point operations to spread over 100,000 cores in less than a millisecond, no matter what algorithm or code is used. 

For large solvated protein complexes, the Copernicus swarms module can simultaneously execute over 10,000 short simulations if given a sufficient pool of workers. 

Large efforts have been invested in improving performance through simplified models, new algorithms, and better scaling of simulations,4–7 not to mention special-purpose hardware. 

The second type of dynamism is associated with arrays: instance arrays will instantiate as many copies of a function as there are inputs in its array of function inputs; the output is an array of function outputs (Fig. 4). 

Copernicus is also capable of using e.g. a 10,000-core worker allocation to execute 100 separate function instances each needing 100 cores. 

In computational chemistry and related disciplines, a study almost never relies on a single simulation trajectory — multiple runs are used even in simple studies for uncertainty quantification and for comparison between conditions. 

combined with the high level of parallelism inherent in many hundreds of trajectories, makes MSM a very attractive sampling method for distributed computing. 

The easiest way to illustrate this is to use an example:> cpcc get fe.iter_lj_1.out.dgHere, the authors use the top-level function fe, in which the authors access the instance called iter_lj_1, which is the first iteration of the Lennard-Jones decoupling.