scispace - formally typeset
Open AccessJournal ArticleDOI

ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost

Reads0
Chats0
TLDR
In this article, the authors demonstrate how a deep neural network trained on quantum mechanical (QM) DFT calculations can learn an accurate and transferable potential for organic molecules, which is called ANI-ME (Accurate NeurAl networK engINE for Molecular Energies).
Abstract
Deep learning is revolutionizing many areas of science and technology, especially image, text, and speech recognition In this paper, we demonstrate how a deep neural network (NN) trained on quantum mechanical (QM) DFT calculations can learn an accurate and transferable potential for organic molecules We introduce ANAKIN-ME (Accurate NeurAl networK engINe for Molecular Energies) or ANI for short ANI is a new method designed with the intent of developing transferable neural network potentials that utilize a highly-modified version of the Behler and Parrinello symmetry functions to build single-atom atomic environment vectors (AEV) as a molecular representation AEVs provide the ability to train neural networks to data that spans both configurational and conformational space, a feat not previously accomplished on this scale We utilized ANI to build a potential called ANI-1, which was trained on a subset of the GDB databases with up to 8 heavy atoms in order to predict total energies for organic molecules containing four atom types: H, C, N, and O To obtain an accelerated but physically relevant sampling of molecular potential surfaces, we also proposed a Normal Mode Sampling (NMS) method for generating molecular conformations Through a series of case studies, we show that ANI-1 is chemically accurate compared to reference DFT calculations on much larger molecular systems (up to 54 atoms) than those included in the training data set

read more

Content maybe subject to copyright    Report

ANI-1: an extensible neural network potential with
DFT accuracy at force eld computational cost
J. S. Smith,
a
O. Isayev
*
b
and A. E. Roitberg
*
a
Deep learning is revolutionizing many areas of science and technology, especially image, text, and speech
recognition. In this paper, we demonstrate how a deep neural network (NN) trained on quantum
mechanical (QM) DFT calculations can learn an accurate and transferable potential for organic
molecules. We introduce ANAKIN-ME (Accurate NeurAl networK engINe for Molecular Energies) or ANI
for short. ANI is a new method designed with the intent of developing transferable neural network
potentials that utilize a highly-modied version of the Behler and Parrinello symmetry functions to build
single-atom atomic environment vectors (AEV) as a molecular representation. AEVs provide the ability to
train neural networks to data that spans both congurational and conformational space, a feat not
previously accomplished on this scale. We utilized ANI to build a potential called ANI-1, which was
trained on a subset of the GDB databases with up to 8 heavy atoms in order to predict total energies for
organic molecules containing four atom types: H, C, N, and O. To obtain an accelerated but physically
relevant sampling of molecular potential surfaces, we also proposed a Normal Mode Sampling (NMS)
method for generating molecular conformations. Through a series of case studies, we show that ANI-1 is
chemically accurate compared to reference DFT calculations on much larger molecular systems (up to
54 atoms) than those included in the training data set.
1 Introduction
Understanding the energetics of large molecules plays a central
role in the study of chemical and biological systems. However,
because of extreme computational cost, theoretical studies
of these complex systems are oen limited to the use of app-
roximate methods, compromising accuracy in exchange for
a speedup in the calculations. One of the grand challenges in
modern theoretical chemistry is designing and implementing
approximations that expedite ab initio methods without loss of
accuracy. Popular strategies include partition of the system of
interest into fragments,
1,2
linear scaling,
3
semi-empirical
46
(SE)
methods or the construction of empirical potentials that have
been parameterized to reproduce experimental or accurate ab
initio data.
In SE methods, some of the computationally expensive
integrals are replaced with empirically determined parameters.
This results in a very large speed up. However, the accuracy is
also substantially degraded compared to high level ab initio
methods due to the imposed approximations.
7
Also, the
computational cost of SE methods is still very high compared to
classical force elds (FFs), potentially limiting the system size
that can be studied.
Classical force elds or empirical interatomic potentials
(EPs) simplify the description of interatomic interactions even
further by summing components of the bonded, angular,
dihedral, and non-bonded contributions tted to a simple
analytical form. EPs can be used in large-scale atomistic simu-
lations with signicantly reduced computational cost. More
accurate EPs have been long sought aer to improve statistical
sampling and accuracy of molecular dynamics (MD) and Monte-
Carlo (MC) simulations. However, EPs are generally reliable
only near equilibrium. These, typically nonreactive empirical
potentials, are widely used for drug design, condensed matter
and polymer research.
811
Thus, such potentials are usually not
applicable for investigations of chemical reactions and transi-
tion states. One exception to this is the ReaxFF force eld,
12
which is capable of studying chemical reactions and transition
states. However, ReaxFF, like most reactive force elds, must
generally be reparameterized from system to system and
therefore lacks an out-of-the-box level of transferability.
Furthermore, each application of FF and EP needs to be care-
fully pondered, as their accuracy varies among dierent
systems. In fact, performing benchmarks to determine the
optimal FF combination for the problem at hand is usually
unavoidable. Unfortunately, there are no systematic ways for
improving or estimating the transferability of EPs.
a
University of Florida, Department of Chemistry, PO Box 117200, Gainesville, FL, USA
32611-7200. E-mail: roitberg@u.edu
b
University of North Carolina at Chapel Hill, Division of Chemical Biology and
Medicinal Chemistry, UNC Eshelman School of Pharmacy, Chapel Hill, NC, USA
27599. E-mail: olexandr@olexandrisayev.com
Electronic supplementary information (ESI) available . See DOI:
10.1039/c6sc05720a
Cite this: Chem. Sci.,2017,8,3192
Received 31st December 2016
Accepted 7th February 2017
DOI: 10.1039/c6sc05720a
rsc.li/chemical-science
3192 | Chem. Sci.,2017,8,31923203
This journal is © The Royal Society of Chemistry 2017
Chemical
Science
EDGE ARTICLE
Open Access Article. Published on 08 February 2017. Downloaded on 8/26/2022 6:32:28 AM.
This article is licensed under a
Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
View Article Online
View Journal
| View Issue

Machine learning (ML) is emerging as a powerful approach
to construct various forms of transferable
1315
and non-trans-
ferable
16,17
atomistic potentials utilizing regression algorithms.
ML methods have been successfully applied in a variety of
applications in chemistry, including the prediction of reaction
pathways,
18
QM excited state energies,
19
formation energies,
20
atomic forces, nuclear magnetic resonance chemical shis,
21
and assisting in the search of novel materials.
22
ML potentials
have shown promise in predicting molecular energies with QM
accuracy with a speed up of as much as 5 orders of magnitude.
The key to transferable methods is nding a correct molecular
representation that allows and improves learning in the chosen
ML method. As discussed by Behler,
23
there are three criteria
that such representations must adhere to in order to ensure
energy conservation and be useful for ML models: they must be
rotationally and translationally invariant, the exchange of two
identical atoms must yield the same result, and given a set of
atomic positions and types the representation must describe
a molecule's conformation in a unique way. Several such
representations have been developed,
2427
but true trans-
ferability and extensibility to complex chemical environments,
i.e. all degrees of freedom for arbitrary organic molecules, with
chemical accuracy has yet to be accomplished.
In 2007, Behler and Parrinello (BP) developed an approxi-
mate molecular representation, called symmetry functions
(SFs), that take advantage of chemical locality in order to make
neural network potentials
25
(NNPs) transferable. These SFs have
been successfully applied to chemical reaction studies for
a single chemical system or the study of bulk systems such as
water. Bart
´
ok et al. also suggested an alternative representation
called smooth overlap of atomic positions (SOAP), where the
similarity between two neighborhood environments is directly
dened.
28
Very recent work, that introduced a new method
known as deep tensor neural networks (DTNNs),
15
provides
further evidence that NNPs can model a general QM molecular
potential when trained to a diverse set of molecular energies. So
far, the DTNN model was only trained to small test data sets to
show the model could predict molecular energies in specic
cases, i.e. equilibrium geometries of organic molecules or the
energy along the path of short QM molecular dynamics trajec-
tories. In our experience, training to trajectories can bias the
tness of a model to the specic trajectory used for training,
especially along short trajectories. Also, DTNN potentials were
not shown to predict energies for larger systems than those
included in the training set.
Since the introduction of BP SFs, they have been employed in
numerous studies where neural network potentials (NNPs) are
trained to molecular total energies sampled from MD data to
produce a function that can predict total energies of molecular
conformations outside of the training set. In general, the NNPs
developed in these studies are non-transferable, aside from bulk
materials
25,29
and water cases.
30
None of the studies that utilize
the SFs of Behler and Parrinello have presented a NNP that is
truly transferable between complex chemical environments, such
as those found in organic molecules, aside from one limited case
of all trans-alkanes
31
where non-equilibrium structures and
potential surface smoothness are not considered. We suggest two
reasons for the lack of transferability of the SFs. Firstly, as orig-
inally dened, SFs lack the functional form to create recognizable
features (spatial arrangements of atoms found in common
organic molecules, e.g. a benzene ring, alkenes, functional
groups) in the molecular representation, a problem that can
prevent a neural network from learning interactions in one
molecule and then transferring its knowledge to another mole-
cule upon prediction. Secondly, the SFs have limited atomic
number dierentiation, which empirically hinders training in
complex chemical environments. In general, the combination of
these reasons limits the original SFs to studies of either chemi-
cally symmetric systems with one or two atom types or very small
single molecule data sets.
In this work, we present a transferable deep learning
32,33
potential that is applicable to complex and diverse molecu-
lar systems well beyond the training data set. We introduce
ANAKIN-ME (Accurate NeurAl networK engINe for Molecular
Energies) or ANI for short. ANI is a new method for developing
NNPs that utilizes a modied version of the original SFs to build
single-atom atomic environment vectors (AEVs) as a molecular
representation. AEVs solve the transferability problems that
hindered the original Behler and Parrinello SFs in complex
chemical environments. With AEVs, the next goal of ANI
becomes to sample a statistically diverse set of molecular
interactions, within a domain of interest, during the training of
an ANI class potential to produce a transferable NNP. This
requires a very large data set that spans molecular conforma-
tional and congurational space, simultaneously. An ANI
potential trained in this way is well suited to predict energies for
molecules within the desired training set domain (organic
molecules in this paper), which is shown to be extensible to
larger molecules than those included in the training set.
ANI uses an inherently parallel computational algorithm. It
is implemented in an in-house soware package, called Neu-
roChem, which takes advantage of the computational power of
graphics processing units (GPU) to accelerate the training,
testing, and prediction of molecular total energies via an ANI
potential. Finally, we show the accuracy of ANI-1 compared to
its reference DFT level of theory and, for context, three popular
semi-empirical QM methods, AM1, PM6, and DFTB, through
four case studies. All case studies only consider larger organic
molecules than ANI-1 was trained to predict energies for,
providing strong evidence of the transferability of ANI-1.
2 Theory and neural network
potential design
2.1 Neural network potentials
Deep learning
33
is a machine learning model that uses
a network of computational neurons, which are organized in
layers. Specically, ANI uses a fully-connected neural network
(NN) model in this work. NNs are highly exible, non-linear
functions with optimizable parameters, called weights, which
are updated through the computation of analytic derivatives of
a cost function with respect to each weight. The data set used to
optimize the weights of a NN is called a training set and consists
This journal is © The Royal Society of Chemistry 2017 Chem. Sci. ,2017,8,31923203 | 3193
Edge Article Chemical Science
Open Access Article. Published on 08 February 2017. Downloaded on 8/26/2022 6:32:28 AM.
This article is licensed under a
Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
View Article Online

of inputs and a label, or reference value, for each input. Multi-
layered NNs are known as universal function approximators
34
because of their ability to t to arbitrary functions. A neural
network potential
35,36
(NNP) utilizes the regression capabilities
of NNs to predict molecular potential surfaces, given only
information about the structure and composition of a molecule.
Standard NNPs suer from many problems that need to be
solved before any generalized model can be built. Firstly,
training neural networks to molecules with many degrees of
freedom (DOF) is dicult because the data requirements grow
with each DOF to obtain a good statistical sampling of the
potential energy surface. Also, the typical inputs, such as
internal coordinates or coulomb matrices, lack transferability to
dierent molecules since the input size to a neural network
must remain constant. Finally, the exchange of two identical
atoms in a molecule must lead to the same result.
2.2 The ANAKIN-ME model
Heavily modied Behler and Parrinello symmetry functions
25
(BPSFs) and their high-dimensional neural network potential
model, depicted in Fig. 1, form a base for our ANAKIN-ME (ANI)
model. The original BPSFs are used to compute an atomic
environment vector (AEV),
~
G
i
X
¼ {G
1
, G
2
, G
3
,.,G
M
}, composed
of elements, G
M
, which probe specic regions of an individual
atom's radial and angular chemical environment. Each
~
G
i
X
for
the i
th
atom of a molecule with atomic number X is then used as
input into a single NNP. The total energy of a molecule, E
T
,is
computed from the outputs, E
i
, of the atomic number specic
NNPs using:
E
T
¼
X
all atoms
i
E
i
(1)
In this way, E
T
has the form of a sum over all i atomic
contributions to the total energy. Aside from transferability, an
added advantage of this simple summation is that it allows for
a near linear scaling in computational complexity with added
cores or GPUs, up to the number of atoms in the system of
interest.
The
~
G
i
X
vectors are key to allowing this functional form of the
total energy to be utilized. For an atom i,
~
G
i
X
is designed to give
a numerical representation, accounting for both radial and
angular features, of i's local chemical environment. The local
atomic environment approximation is achieved with a piece-
wise cuto function:
f
C
R
ij
¼
8
>
<
>
:
0:5 cos
pR
ij
R
C
þ 0:5 for R
ij
# R
C
0:0 for R
ij
. R
C
(2)
here, R
ij
is the distance between atoms i and j, while R
c
is
a cuto radius. As written, f
C
(R
ij
) is a continuous function with
continuous rst derivatives.
To probe the local radial environment for an atom i, the
following radial symmetry function, introduced by Behler and
Parrinello, produces radial elements, G
R
m
of
~
G
i
X
,
G
R
m
¼
X
all atoms
jsi
e
h
ð
R
ij
R
s
Þ
2
f
C
R
ij
(3)
The index m is over a set of h and R
s
parameters. The
parameter h is used to change the width of the Gaussian
distribution while the purpose of R
s
is to shi the center of the
peak. In an ANI potential, only a single h is used to produce thin
Gaussian peaks and multiple R
s
are used to probe outward from
the atomic center. The reasoning behind this specic use of
parameters is two-fold: rstly, when probing with many small
h parameters, vector elements can grow to very large values,
which are detrimental to the training of NNPs. Secondly, using
R
s
in this manner allows the probing of very specic regions of
the radial environment, which helps with transferability. G
R
m
, for
a set of M ¼ {m
1
, m
2
, m
3
,.} ¼ {(h
1
, R
s
1
), (h
2
, R
s
2
), (h
3
, R
s
3
),.}
parameters, is plotted in Fig. 2A. M consist of a constant h for all
m and multiple R
s
parameters to show a visualization of how
each vector element probes its own distinct region of an atom's
radial environment.
We made two modications to the original version of Behler
and Parrinello's angular symmetry function to produce one
better suited to probing the local angular environment of
complex chemical systems. The rst addition is q
s
, which allows
an arbitrary number of shis in the angular environment, and
the second is a modied exponential factor that allows an R
s
parameter to be added. The R
s
addition allows the angular
environment to be considered within radial shells based on the
average of the distance from the neighboring atoms. The eect
of these two changes is that AEV elements are generally smaller
because they overlap atoms in dierent angular regions less and
they provide a distinctive image of various molecular features,
a property that assists neural networks in learning the ener-
getics of specic bonding patterns, ring patterns, functional
groups, or other molecular features.
Given atoms i, j, and k, an angle q
ijk
, centered on atom i,is
computed along with two distances R
ij
and R
ik
. A single element,
G
A
mod
m
of
~
G
i
X
, to probe the angular environment of atom i, takes
the form of a sum over all j and k neighboring atom pairs, of the
product of a radial and an angular factor,
G
A
mod
m
¼ 2
1z
X
all atoms
j; ksi
1 þ cos
q
ijk
q
s

z
exp
"
h
R
ij
þ R
ik
2
R
s
2
#
f
C
R
ij
f
C
ðR
ik
Þ (4)
The Gaussian factor combined with the cuto functions, like
the radial symmetry functions, allows chemical locality to be
exploited in the angular symmetry functions. In this case, the
index m is over four separate parameters: z, q
s
, h, and R
s
. h and
R
s
serve a similar purpose as in eqn (3). Applying a q
s
parameter
allows probing of specic regions of the angular environment in
a similar way as is accomplished with R
s
in the radial part. Also,
z changes the width of the peaks in the angular environment.
G
A
mod
m
for several m are plotted in Fig. 2B while the original
3194 | Chem. Sci.,2017,8,31923203
This journal is © The Royal Society of Chemistry 2017
Chemical Science Edge Article
Open Access Article. Published on 08 February 2017. Downloaded on 8/26/2022 6:32:28 AM.
This article is licensed under a
Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
View Article Online

Fig. 1 Behler and Parrinello's HDNN or HD-atomic NNP model. (A) A scheme showing the algorithmic structure of an atomic number specic
neural network potential (NNP). The input molecular coordinates,
~
q, are used to generate the atomic environment vector,
~
G
i
X
, for atom i with
atomic number X.
~
G
i
X
is then fed into a neural network potential (NNP) trained specically to predict atomic contributions, E
i
X
, to the total energy,
E
T
. Each l
k
represents a hidden layer of the neural network and is composed of nodes denoted a
j
k
where j indexes the node. (B) The high-
dimensional atomic NNP (HD-atomic NNP) model for a water molecule.
~
G
i
X
is computed for each atom in the molecule then input into their
respective NNP (X) to produce each atom's E
i
X
, which are summed to give E
T
.
Fig. 2 Examples of the symmetry functions with dierent parameter sets. (A) Radial symmetry functions, (B) modied angular symmetry
functions and (C) the original Behler and Parrinello angular symmetry functions. These gures all depict the use of multiple shifting parameters for
each function, while keeping the other parameters constant.
This journal is © The Royal Society of Chemistry 2017 Chem. Sci. ,2017,8,31923203 | 3195
Edge Article Chemical Science
Open Access Article. Published on 08 February 2017. Downloaded on 8/26/2022 6:32:28 AM.
This article is licensed under a
Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
View Article Online

angular function is plotted in Fig. 2C. With the original Behler
and Parrinello angular function, only two shiing values were
possible in the angular environment, 0 and p. The modied
angular function allows an arbitrary number to be chosen,
allowing for better resolution of the angular environment. As
with its radial analog, this helps to keep the elements of
~
G
i
X
small for better NNP performance and allows probing of
specic regions of the angular chemical environment.
Fig. 3 Loglog plots of the training, validation, testing, and a random
GDB-10 (molecules with 10 heavy atoms from the GDB-11 database)
extensibility testing set of total energy errors vs. increasing number of
data points in the training set. The sets of points converge to the nal
ANI-1 potential presented in this paper, trained on the full ANI-1 data
set.
Fig. 4 Relative energy comparisons from random conformations of a random sampling of 134 molecules from GDB-11 all with 10 heavy atoms.
There is an average of 62 conformations, and therefore energies, per molecule. Each set of energies for each molecule is shifted such that the
lowest energy is at 0. None of the molecules from this set are included in any of the ANI training sets. (AD) Correlation plots between DFT
energies, E
ref
, and computed energies, E
cmp
, for ANI-1 and popular semi-empirical QM methods. Each individual molecule's set of energies is
shifted such that the lowest energy is at zero. (E) RMS error (kcal mol
1
) of various ANI potentials, compared to DFT, trained to an increasing data
set size. The x-axis represents the maximum size of GDB molecules included in the training set. For example, 4 represents an ANI potential
trained to a data set built from the subset of GDB-11 containing all molecules up to 4 heavy atoms.
Fig. 5 The total energies, shifted such that the lowest is zero, calcu-
lated for various C
10
H
20
isomers, are compared between DFT with the
uB97X functional and 6-31G(d) basis set, the ANI-1 potential, AM1
semi-empirical, and PM6 semi-empirical methods.
3196
| Chem. Sci.,2017,8,31923203
This journal is © The Royal Society of Chemistry 2017
Chemical Science Edge Article
Open Access Article. Published on 08 February 2017. Downloaded on 8/26/2022 6:32:28 AM.
This article is licensed under a
Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
View Article Online

Citations
More filters
Journal ArticleDOI

Machine learning for molecular and materials science.

TL;DR: A future in which the design, synthesis, characterization and application of molecules and materials is accelerated by artificial intelligence is envisaged.
Journal ArticleDOI

Machine learning and the physical sciences

TL;DR: This article reviews in a selective way the recent research on the interface between machine learning and the physical sciences, including conceptual developments in ML motivated by physical insights, applications of machine learning techniques to several domains in physics, and cross fertilization between the two fields.
Journal ArticleDOI

SchNet - A deep learning architecture for molecules and materials.

TL;DR: SchNet as mentioned in this paper is a deep learning architecture specifically designed to model atomistic systems by making use of continuous-filter convolutional layers, where the model learns chemically plausible embeddings of atom types across the periodic table.
Journal ArticleDOI

Deep Potential Molecular Dynamics: A Scalable Model with the Accuracy of Quantum Mechanics

TL;DR: Deep potential molecular dynamics (DPMD) as discussed by the authors is based on a many-body potential and interatomic forces generated by a carefully crafted deep neural network trained with ab initio data.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Book

Deep Learning

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Journal Article

Dropout: a simple way to prevent neural networks from overfitting

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Journal ArticleDOI

Multilayer feedforward networks are universal approximators

TL;DR: It is rigorously established that standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available.
Related Papers (5)