ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost

doi:10.1039/C6SC05720A

ANI-1: an extensible neural network potential with

DFT accuracy at force ﬁeld computational cost†

J. S. Smith,

a

O. Isayev

*

b

and A. E. Roitberg

*

a

Deep learning is revolutionizing many areas of science and technology, especially image, text, and speech

recognition. In this paper, we demonstrate how a deep neural network (NN) trained on quantum

mechanical (QM) DFT calculations can learn an accurate and transferable potential for organic

molecules. We introduce ANAKIN-ME (Accurate NeurAl networK engINe for Molecular Energies) or ANI

for short. ANI is a new method designed with the intent of developing transferable neural network

potentials that utilize a highly-modiﬁed version of the Behler and Parrinello symmetry functions to build

single-atom atomic environment vectors (AEV) as a molecular representation. AEVs provide the ability to

train neural networks to data that spans both conﬁgurational and conformational space, a feat not

previously accomplished on this scale. We utilized ANI to build a potential called ANI-1, which was

trained on a subset of the GDB databases with up to 8 heavy atoms in order to predict total energies for

organic molecules containing four atom types: H, C, N, and O. To obtain an accelerated but physically

relevant sampling of molecular potential surfaces, we also proposed a Normal Mode Sampling (NMS)

method for generating molecular conformations. Through a series of case studies, we show that ANI-1 is

chemically accurate compared to reference DFT calculations on much larger molecular systems (up to

54 atoms) than those included in the training data set.

1 Introduction

Understanding the energetics of large molecules plays a central

role in the study of chemical and biological systems. However,

because of extreme computational cost, theoretical studies

of these complex systems are oen limited to the use of app-

roximate methods, compromising accuracy in exchange for

a speedup in the calculations. One of the grand challenges in

modern theoretical chemistry is designing and implementing

approximations that expedite ab initio methods without loss of

accuracy. Popular strategies include partition of the system of

interest into fragments,

1,2

linear scaling,

3

semi-empirical

4–6

(SE)

methods or the construction of empirical potentials that have

been parameterized to reproduce experimental or accurate ab

initio data.

In SE methods, some of the computationally expensive

integrals are replaced with empirically determined parameters.

This results in a very large speed up. However, the accuracy is

also substantially degraded compared to high level ab initio

methods due to the imposed approximations.

7

Also, the

computational cost of SE methods is still very high compared to

classical force elds (FFs), potentially limiting the system size

that can be studied.

Classical force elds or empirical interatomic potentials

(EPs) simplify the description of interatomic interactions even

further by summing components of the bonded, angular,

dihedral, and non-bonded contributions tted to a simple

analytical form. EPs can be used in large-scale atomistic simu-

lations with signicantly reduced computational cost. More

accurate EPs have been long sought aer to improve statistical

sampling and accuracy of molecular dynamics (MD) and Monte-

Carlo (MC) simulations. However, EPs are generally reliable

only near equilibrium. These, typically nonreactive empirical

potentials, are widely used for drug design, condensed matter

and polymer research.

8–11

Thus, such potentials are usually not

applicable for investigations of chemical reactions and transi-

tion states. One exception to this is the ReaxFF force eld,

12

which is capable of studying chemical reactions and transition

states. However, ReaxFF, like most reactive force elds, must

generally be reparameterized from system to system and

therefore lacks an “out-of-the-box” level of transferability.

Furthermore, each application of FF and EP needs to be care-

fully pondered, as their accuracy varies among diﬀerent

systems. In fact, performing benchmarks to determine the

optimal FF combination for the problem at hand is usually

unavoidable. Unfortunately, there are no systematic ways for

improving or estimating the transferability of EPs.

a

University of Florida, Department of Chemistry, PO Box 117200, Gainesville, FL, USA

32611-7200. E-mail: roitberg@u.edu

b

University of North Carolina at Chapel Hill, Division of Chemical Biology and

Medicinal Chemistry, UNC Eshelman School of Pharmacy, Chapel Hill, NC, USA

27599. E-mail: olexandr@olexandrisayev.com

† Electronic supplementary information (ESI) available . See DOI:

10.1039/c6sc05720a

Cite this: Chem. Sci.,2017,8,3192

Received 31st December 2016

Accepted 7th February 2017

DOI: 10.1039/c6sc05720a

rsc.li/chemical-science

3192 | Chem. Sci.,2017,8,3192–3203

This journal is © The Royal Society of Chemistry 2017

Chemical

Science

EDGE ARTICLE

Open Access Article. Published on 08 February 2017. Downloaded on 8/26/2022 6:32:28 AM.

This article is licensed under a

Creative Commons Attribution-NonCommercial 3.0 Unported Licence.

View Article Online

View Journal

| View Issue

Machine learning (ML) is emerging as a powerful approach

to construct various forms of transferable

13–15

and non-trans-

ferable

16,17

atomistic potentials utilizing regression algorithms.

ML methods have been successfully applied in a variety of

applications in chemistry, including the prediction of reaction

pathways,

18

QM excited state energies,

19

formation energies,

20

atomic forces, nuclear magnetic resonance chemical shis,

21

and assisting in the search of novel materials.

22

ML potentials

have shown promise in predicting molecular energies with QM

accuracy with a speed up of as much as 5 orders of magnitude.

The key to transferable methods is nding a correct molecular

representation that allows and improves learning in the chosen

ML method. As discussed by Behler,

23

there are three criteria

that such representations must adhere to in order to ensure

energy conservation and be useful for ML models: they must be

rotationally and translationally invariant, the exchange of two

identical atoms must yield the same result, and given a set of

atomic positions and types the representation must describe

a molecule's conformation in a unique way. Several such

representations have been developed,

24–27

but true trans-

ferability and extensibility to complex chemical environments,

i.e. all degrees of freedom for arbitrary organic molecules, with

chemical accuracy has yet to be accomplished.

In 2007, Behler and Parrinello (BP) developed an approxi-

mate molecular representation, called symmetry functions

(SFs), that take advantage of chemical locality in order to make

neural network potentials

25

(NNPs) transferable. These SFs have

been successfully applied to chemical reaction studies for

a single chemical system or the study of bulk systems such as

water. Bart

´

ok et al. also suggested an alternative representation

called smooth overlap of atomic positions (SOAP), where the

similarity between two neighborhood environments is directly

dened.

28

Very recent work, that introduced a new method

known as deep tensor neural networks (DTNNs),

15

provides

further evidence that NNPs can model a general QM molecular

potential when trained to a diverse set of molecular energies. So

far, the DTNN model was only trained to small test data sets to

show the model could predict molecular energies in specic

cases, i.e. equilibrium geometries of organic molecules or the

energy along the path of short QM molecular dynamics trajec-

tories. In our experience, training to trajectories can bias the

tness of a model to the specic trajectory used for training,

especially along short trajectories. Also, DTNN potentials were

not shown to predict energies for larger systems than those

included in the training set.

Since the introduction of BP SFs, they have been employed in

numerous studies where neural network potentials (NNPs) are

trained to molecular total energies sampled from MD data to

produce a function that can predict total energies of molecular

conformations outside of the training set. In general, the NNPs

developed in these studies are non-transferable, aside from bulk

materials

25,29

and water cases.

30

None of the studies that utilize

the SFs of Behler and Parrinello have presented a NNP that is

truly transferable between complex chemical environments, such

as those found in organic molecules, aside from one limited case

of all trans-alkanes

31

where non-equilibrium structures and

potential surface smoothness are not considered. We suggest two

reasons for the lack of transferability of the SFs. Firstly, as orig-

inally dened, SFs lack the functional form to create recognizable

features (spatial arrangements of atoms found in common

organic molecules, e.g. a benzene ring, alkenes, functional

groups) in the molecular representation, a problem that can

prevent a neural network from learning interactions in one

molecule and then transferring its knowledge to another mole-

cule upon prediction. Secondly, the SFs have limited atomic

number diﬀerentiation, which empirically hinders training in

complex chemical environments. In general, the combination of

these reasons limits the original SFs to studies of either chemi-

cally symmetric systems with one or two atom types or very small

single molecule data sets.

In this work, we present a transferable deep learning

32,33

potential that is applicable to complex and diverse molecu-

lar systems well beyond the training data set. We introduce

ANAKIN-ME (Accurate NeurAl networK engINe for Molecular

Energies) or ANI for short. ANI is a new method for developing

NNPs that utilizes a modied version of the original SFs to build

single-atom atomic environment vectors (AEVs) as a molecular

representation. AEVs solve the transferability problems that

hindered the original Behler and Parrinello SFs in complex

chemical environments. With AEVs, the next goal of ANI

becomes to sample a statistically diverse set of molecular

interactions, within a domain of interest, during the training of

an ANI class “potential” to produce a transferable NNP. This

requires a very large data set that spans molecular conforma-

tional and congurational space, simultaneously. An ANI

potential trained in this way is well suited to predict energies for

molecules within the desired training set domain (organic

molecules in this paper), which is shown to be extensible to

larger molecules than those included in the training set.

ANI uses an inherently parallel computational algorithm. It

is implemented in an in-house soware package, called Neu-

roChem, which takes advantage of the computational power of

graphics processing units (GPU) to accelerate the training,

testing, and prediction of molecular total energies via an ANI

potential. Finally, we show the accuracy of ANI-1 compared to

its reference DFT level of theory and, for context, three popular

semi-empirical QM methods, AM1, PM6, and DFTB, through

four case studies. All case studies only consider larger organic

molecules than ANI-1 was trained to predict energies for,

providing strong evidence of the transferability of ANI-1.

2 Theory and neural network

potential design

2.1 Neural network potentials

Deep learning

33

is a machine learning model that uses

a network of computational neurons, which are organized in

layers. Specically, ANI uses a fully-connected neural network

(NN) model in this work. NNs are highly exible, non-linear

functions with optimizable parameters, called weights, which

are updated through the computation of analytic derivatives of

a cost function with respect to each weight. The data set used to

optimize the weights of a NN is called a training set and consists

This journal is © The Royal Society of Chemistry 2017 Chem. Sci. ,2017,8,3192–3203 | 3193

Edge Article Chemical Science

Open Access Article. Published on 08 February 2017. Downloaded on 8/26/2022 6:32:28 AM.

This article is licensed under a

Creative Commons Attribution-NonCommercial 3.0 Unported Licence.

View Article Online

of inputs and a label, or reference value, for each input. Multi-

layered NNs are known as universal function approximators

34

because of their ability to t to arbitrary functions. A neural

network potential

35,36

(NNP) utilizes the regression capabilities

of NNs to predict molecular potential surfaces, given only

information about the structure and composition of a molecule.

Standard NNPs suﬀer from many problems that need to be

solved before any generalized model can be built. Firstly,

training neural networks to molecules with many degrees of

freedom (DOF) is diﬃcult because the data requirements grow

with each DOF to obtain a good statistical sampling of the

potential energy surface. Also, the typical inputs, such as

internal coordinates or coulomb matrices, lack transferability to

diﬀerent molecules since the input size to a neural network

must remain constant. Finally, the exchange of two identical

atoms in a molecule must lead to the same result.

2.2 The ANAKIN-ME model

Heavily modied Behler and Parrinello symmetry functions

25

(BPSFs) and their high-dimensional neural network potential

model, depicted in Fig. 1, form a base for our ANAKIN-ME (ANI)

model. The original BPSFs are used to compute an atomic

environment vector (AEV),

~

G

i

X

¼ {G

1

, G

2

, G

3

,.,G

M

}, composed

of elements, G

M

, which probe specic regions of an individual

atom's radial and angular chemical environment. Each

~

G

i

X

for

the i

th

atom of a molecule with atomic number X is then used as

input into a single NNP. The total energy of a molecule, E

T

,is

computed from the outputs, E

i

, of the atomic number specic

NNPs using:

E

T

¼

X

all atoms

i

E

i

(1)

In this way, E

T

has the form of a sum over all i “atomic

contributions” to the total energy. Aside from transferability, an

added advantage of this simple summation is that it allows for

a near linear scaling in computational complexity with added

cores or GPUs, up to the number of atoms in the system of

interest.

The

~

G

i

X

vectors are key to allowing this functional form of the

total energy to be utilized. For an atom i,

~

G

i

X

is designed to give

a numerical representation, accounting for both radial and

angular features, of i's local chemical environment. The local

atomic environment approximation is achieved with a piece-

wise cutoﬀ function:

f

C



R

ij



¼

8

>

<

>

:

0:5  cos



pR

ij

R

C



þ 0:5 for R

ij

# R

C

0:0 for R

ij

. R

C

(2)

here, R

ij

is the distance between atoms i and j, while R

c

is

a cutoﬀ radius. As written, f

C

(R

ij

) is a continuous function with

continuous rst derivatives.

To probe the local radial environment for an atom i, the

following radial symmetry function, introduced by Behler and

Parrinello, produces radial elements, G

R

m

of

~

G

i

X

,

G

R

m

¼

X

all atoms

jsi

e

h

ð

R

ij

R

s

Þ

2

f

C



R

ij



(3)

The index m is over a set of h and R

s

parameters. The

parameter h is used to change the width of the Gaussian

distribution while the purpose of R

s

is to shi the center of the

peak. In an ANI potential, only a single h is used to produce thin

Gaussian peaks and multiple R

s

are used to probe outward from

the atomic center. The reasoning behind this specic use of

parameters is two-fold: rstly, when probing with many small

h parameters, vector elements can grow to very large values,

which are detrimental to the training of NNPs. Secondly, using

R

s

in this manner allows the probing of very specic regions of

the radial environment, which helps with transferability. G

R

m

, for

a set of M ¼ {m

1

, m

2

, m

3

,.} ¼ {(h

1

, R

s

1

), (h

2

, R

s

2

), (h

3

, R

s

3

),.}

parameters, is plotted in Fig. 2A. M consist of a constant h for all

m and multiple R

s

parameters to show a visualization of how

each vector element probes its own distinct region of an atom's

radial environment.

We made two modications to the original version of Behler

and Parrinello's angular symmetry function to produce one

better suited to probing the local angular environment of

complex chemical systems. The rst addition is q

s

, which allows

an arbitrary number of shis in the angular environment, and

the second is a modied exponential factor that allows an R

s

parameter to be added. The R

s

addition allows the angular

environment to be considered within radial shells based on the

average of the distance from the neighboring atoms. The eﬀect

of these two changes is that AEV elements are generally smaller

because they overlap atoms in diﬀerent angular regions less and

they provide a distinctive image of various molecular features,

a property that assists neural networks in learning the ener-

getics of specic bonding patterns, ring patterns, functional

groups, or other molecular features.

Given atoms i, j, and k, an angle q

ijk

, centered on atom i,is

computed along with two distances R

ij

and R

ik

. A single element,

G

A

mod

m

of

~

G

i

X

, to probe the angular environment of atom i, takes

the form of a sum over all j and k neighboring atom pairs, of the

product of a radial and an angular factor,

G

A

mod

m

¼ 2

1z

X

all atoms

j; ksi



1 þ cos



q

ijk

 q

s



z

 exp

"

h



R

ij

þ R

ik

2

 R

s



2

#

f

C



R

ij



f

C

ðR

ik

Þ (4)

The Gaussian factor combined with the cutoﬀ functions, like

the radial symmetry functions, allows chemical locality to be

exploited in the angular symmetry functions. In this case, the

index m is over four separate parameters: z, q

s

, h, and R

s

. h and

R

s

serve a similar purpose as in eqn (3). Applying a q

s

parameter

allows probing of specic regions of the angular environment in

a similar way as is accomplished with R

s

in the radial part. Also,

z changes the width of the peaks in the angular environment.

G

A

mod

m

for several m are plotted in Fig. 2B while the original

3194 | Chem. Sci.,2017,8,3192–3203

Chemical Science Edge Article

Open Access Article. Published on 08 February 2017. Downloaded on 8/26/2022 6:32:28 AM.

This article is licensed under a

Creative Commons Attribution-NonCommercial 3.0 Unported Licence.

View Article Online

Fig. 1 Behler and Parrinello's HDNN or HD-atomic NNP model. (A) A scheme showing the algorithmic structure of an atomic number speciﬁc

neural network potential (NNP). The input molecular coordinates,

~

q, are used to generate the atomic environment vector,

~

G

i

X

, for atom i with

atomic number X.

~

G

i

X

is then fed into a neural network potential (NNP) trained speciﬁcally to predict atomic contributions, E

i

X

, to the total energy,

E

T

. Each l

k

represents a hidden layer of the neural network and is composed of nodes denoted a

j

k

where j indexes the node. (B) The high-

dimensional atomic NNP (HD-atomic NNP) model for a water molecule.

~

G

i

X

is computed for each atom in the molecule then input into their

respective NNP (X) to produce each atom's E

i

X

, which are summed to give E

T

.

Fig. 2 Examples of the symmetry functions with diﬀerent parameter sets. (A) Radial symmetry functions, (B) modiﬁed angular symmetry

functions and (C) the original Behler and Parrinello angular symmetry functions. These ﬁgures all depict the use of multiple shifting parameters for

each function, while keeping the other parameters constant.

Edge Article Chemical Science

Open Access Article. Published on 08 February 2017. Downloaded on 8/26/2022 6:32:28 AM.

This article is licensed under a

Creative Commons Attribution-NonCommercial 3.0 Unported Licence.

View Article Online

angular function is plotted in Fig. 2C. With the original Behler

and Parrinello angular function, only two shiing values were

possible in the angular environment, 0 and p. The modied

angular function allows an arbitrary number to be chosen,

allowing for better resolution of the angular environment. As

with its radial analog, this helps to keep the elements of

~

G

i

X

small for better NNP performance and allows probing of

specic regions of the angular chemical environment.

Fig. 3 Log–log plots of the training, validation, testing, and a random

GDB-10 (molecules with 10 heavy atoms from the GDB-11 database)

extensibility testing set of total energy errors vs. increasing number of

data points in the training set. The sets of points converge to the ﬁnal

ANI-1 potential presented in this paper, trained on the full ANI-1 data

set.

Fig. 4 Relative energy comparisons from random conformations of a random sampling of 134 molecules from GDB-11 all with 10 heavy atoms.

There is an average of 62 conformations, and therefore energies, per molecule. Each set of energies for each molecule is shifted such that the

lowest energy is at 0. None of the molecules from this set are included in any of the ANI training sets. (A–D) Correlation plots between DFT

energies, E

ref

, and computed energies, E

cmp

, for ANI-1 and popular semi-empirical QM methods. Each individual molecule's set of energies is

shifted such that the lowest energy is at zero. (E) RMS error (kcal mol

1

) of various ANI potentials, compared to DFT, trained to an increasing data

set size. The x-axis represents the maximum size of GDB molecules included in the training set. For example, 4 represents an ANI potential

trained to a data set built from the subset of GDB-11 containing all molecules up to 4 heavy atoms.

Fig. 5 The total energies, shifted such that the lowest is zero, calcu-

lated for various C

10

H

20

isomers, are compared between DFT with the

uB97X functional and 6-31G(d) basis set, the ANI-1 potential, AM1

semi-empirical, and PM6 semi-empirical methods.

3196

| Chem. Sci.,2017,8,3192–3203

Chemical Science Edge Article

Open Access Article. Published on 08 February 2017. Downloaded on 8/26/2022 6:32:28 AM.

This article is licensed under a

Creative Commons Attribution-NonCommercial 3.0 Unported Licence.

View Article Online

ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost

Citations

Machine learning for molecular and materials science.

Machine learning and the physical sciences

Scalable molecular dynamics on CPU and GPU architectures with NAMD.

SchNet - A deep learning architecture for molecules and materials.

Deep Potential Molecular Dynamics: A Scalable Model with the Accuracy of Quantum Mechanics

References

Adam: A Method for Stochastic Optimization

Deep learning

Deep Learning

Dropout: a simple way to prevent neural networks from overfitting

Multilayer feedforward networks are universal approximators

Related Papers (5)

Generalized neural-network representation of high-dimensional potential-energy surfaces.

Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons.

Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning

Quantum-chemical insights from deep tensor neural networks.

SchNet - A deep learning architecture for molecules and materials.