scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Machine learning of accurate energy-conserving molecular force fields

TL;DR: The GDML approach enables quantitative molecular dynamics simulations for molecules at a fraction of cost of explicit AIMD calculations, thereby allowing the construction of efficient force fields with the accuracy and transferability of high-level ab initio methods.
Abstract: Using conservation of energy-a fundamental property of closed classical and quantum mechanical systems-we develop an efficient gradient-domain machine learning (GDML) approach to construct accurate molecular force fields using a restricted number of samples from ab initio molecular dynamics (AIMD) trajectories. The GDML implementation is able to reproduce global potential energy surfaces of intermediate-sized molecules with an accuracy of 0.3 kcal mol-1 for energies and 1 kcal mol-1 A-1 for atomic forces using only 1000 conformational geometries for training. We demonstrate this accuracy for AIMD trajectories of molecules, including benzene, toluene, naphthalene, ethanol, uracil, and aspirin. The challenge of constructing conservative force fields is accomplished in our work by learning in a Hilbert space of vector-valued functions that obey the law of energy conservation. The GDML approach enables quantitative molecular dynamics simulations for molecules at a fraction of cost of explicit AIMD calculations, thereby allowing the construction of efficient force fields with the accuracy and transferability of high-level ab initio methods.

Content maybe subject to copyright    Report

APPLIED MATHEMATICS
2017 © The Authors,
some rights reserved;
exclusive licensee
American Association
for the Advancement
of Science. Distributed
under a Creative
Commons Attribution
NonCommercial
License 4.0 (CC BY-NC).
Machine learning of accurate energy-conserving
molecular force fields
Stefan Chmiela,
1
Alexandre Tkatchenko,
2,3
* Huziel E. Sauceda,
3
Igor Poltavsky,
2
Kristof T. Schütt,
1
Klaus-Robert Müller
1,4,5
*
Using conservation of energya fundamental property of closed classical and quantum mechanical systems
we develop an efficient gradient-domain machine learning (GDML) approach to construct accurate molecular
force fields using a restricted number of samples from ab initio molecular dynamics (AIMD) trajectories. The
GDML implementation is able to reproduce global potential energy surfaces of intermediate-sized molecules
with an accuracy of 0.3 kcal mol
1
for energies and 1 kcal mol
1
Å
1
for atomic forces using only 1000 confor-
mational geometries for training. We demonstrate this accuracy for AIMD trajectories of molecules, including
benzene, toluene, naphthalene, ethanol, uracil, and aspirin. The challenge of constructing conservative force
fields is accomplished in our work by learning in a Hilbert space of vector-valued functions that obey the
law of energy conservation. The GDML approach enables quantitative molecular dynamics simulations for mol-
ecules at a fraction of cost of explicit AIMD calculations, thereby allowing the construction of efficient force
fields with the accuracy and transferability of high-level ab initio methods.
INTRODUCTION
Within theBorn-Oppenheimer (BO) approximation, predictive simulations
of properties and functions of molecularsystemsrequireanaccuratedescrip-
tion of the global potential energy hypersurface V
BO
(r
1
, r
2
, , r
N
),
where r
i
indicates the n uclear Cartesian coordinates. Although
V
BO
could, in principle, be obtained on the fly using explicit ab initio
calculations, more efficient approaches that can access the long time
scales are required to understand relevant phenomena in large mo-
lecular systems. A plethora of classical mechanistic approximations
to V
BO
have been constructed, in which the parameters are typically
fitted to a small set of ab initio calculations or experimental data.
Unfortunately, these classical approximations may suffer from the
lack of transferability and can yield accurate results only close to
the conditions (geometries) they have been fitted to. Alternatively,
sophisticated machine learning (ML) approaches that can accurately
reproduce the global potential energy surface (PES) for elemental
materials (19) and small molecules (1016)have been recently developed
(see Fig. 1, A and B) (17). Although potentially very promising, one par-
ticular challenge for direct ML fitting of molecular PE S is the large
amountof datanecessaryto obtainan accurate model.Often,many thou-
sands or even millions of atomic configurations are used as training
data for ML models. This results in nontransparent models, which are
difficult to analyze and may break consistency (18) between energies
and forces.
A fundamental property that any force field F
i
(r
1
, r
2
, , r
N
) must
satisfy is the conservation of total energy, which implies that
F
i
ð
r
1
;
r
2
; ;
r
N
Þ¼
r
i
Vð
r
1
;
r
2
; ;
r
N
Þ. Any classical mechanistic
expressions for the potential energy (also denoted as classical force
field) or analytically derivable ML approaches trained on energies sat-
isfy energy conservation by construction. However, even if conserva-
tion of energy is satisfied implicitly within an approximation, this does
not imply that the model will be able to accurately follow the trajectory
of the true ab initio potential, which was used to fit the force field. In
particular, small energy/force inconsistencies between the force field
model and ab initio calculations can lead to unforeseen artifacts in
the PES topology, such as spurious critical points that can give rise
to incorrect molecular dynamics (MD) trajectories. Another funda-
mental problem is that classical and ML force fields focusing on energy
asthemainobservablehavetoassumeatomicenergyadditivityan
approximation that is hard to justify from quantum mechanics.
Here, we present a robust solution to these challenges by construct-
ing an explicitly conservative ML force field, which uses exclusively
atomic gradient information in lieu of atomic (or total) energies. In this
manner, with any number of data samples, the proposed model fulfills
energy conservation by construction. Obviously, the developed ML
force field can be coupled to a heat bath, making the full system (mol-
ecule and bath) nonenergy-conserving.
We remark that atomic forces are true quantum-mechanical observa-
bles within the BO approximation by virtue of the Hellmann-Feynman
theorem. The energy of a molecular system is recovered by analytic
integration of the force-field kernel (see Fig. 1C). We demonstrate that
our gradient-domain machine learning (GDML) approach is able to
accurately reproduce global PESs of intermediate-sized molecule s
within 0.3 kcal mol
1
for energies and 1 kcal mol
1
Å
1
for atomic
forces relative to the reference data. This accuracy is achieved when
using less than 1000 training geometries to construct the GDML model
and using energy conservation to avoid overfitting and artifacts. Hence,
the GDML approach paves the way for efficient and precise MD simula-
tions with PESs that are obtained with arbitrary high-level quantum-
chemical approaches. We demonstrate the accuracy of GDML by
computing AIMD-quality thermodynamic observables using path-
integral MD (PIMD) for eight organic molecules with up to 21 atoms
and four chemicalelements. Althoughwe usedensity functionaltheory
(DFT) calculations as reference in this development work, it is pos-
sible to use any higher-level quantum-chemical reference data. With
state-of-the-art quantum chemistry codes running on current high-
performance computers, it is possible to generate accurate reference
dataformoleculeswithafewdozenatoms. Here, we focus on intramolecular
1
Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany.
2
Physics and Materials Science Research Unit, University of Luxembourg, L-1511
Luxembourg, Luxembourg.
3
Fritz-Haber-Ins titut der Max-Planck-Ges ellschaft,
14195 Berlin, Germany.
4
Department of Brain and Cognitive Engineering, Korea
University, Anam-dong, Seongbuk-gu, Seoul 136-713, Korea.
5
Max Planck Institute
for Informatics, Stuhlsatzenhausweg, 66123 Saarbrücken, Germany.
*Corresponding author. Email: alexandre.tkatchenko@uni.lu (A.T.); klaus -robert.
mueller@tu-berlin.de (K.-R.M.)
SCIENCE ADVANCES
|
RESEARCH ARTICLE
Chmiela et al., Sci. Adv. 2017; 3 : e1603015 5 May 2017 1of6
on April 16, 2018http://advances.sciencemag.org/Downloaded from

f o rce s in small- and medium-sized molecules. However, in the future,
the GDML model should be combined with an accurate model for in-
termolecular forces to enable predictive simulations of condensed mo-
lecular systems. Widely used classical mechanistic force fields are based on
simple harmonic terms for intramolecular degrees of freedom. Our
GDML model correctly treats anharmonicities by using no assumptions
whatsoever on the analytic form on the interatomic potential energy
functions within molecules.
Kernel
G
eometry Space
Kernel function
Energy [kcal/mol] Energy [kcal/mol]
Energy [kcal/mol]
Descriptor
Dierentiation
Kernel
C Force domain
Descriptor encodes molecular structure.
Kernel function measures the similarity
between pairs of inputs.
Solution:
Training in the force domain accurately re-
produces PES topology.
Integration
Energy [kcal/mol]
B Energy domain
Energy samples
Force samples
...
Problem:
Energy-based model lacks detail in under-
sampled regions.
...
Prediction
Prediction
Test error
Energy model
Force model
ML
ML
A
Fig. 1. The construction of ML models: First, reference data from an MD trajectory are sampled. (A) The geometry of each molecule is encoded in a descriptor.
This representation introduces elementary transformational invariances of energy and constitutes the first part of the prior. A kernel function then relates all descriptors
to form the kernel matrixthe second part of the prior. The kernel function encodes similarity between data points. Our particular choice makes only weak assump-
tions: It limits the frequency spectrum of the resulting model and adds the energy conservation constraint. Hess, Hessian. (C) These general priors are sufficient to
reproduce good estimates from a restricted number of force samples. (B) A comparable energy model is not able to reproduce the PES to the same level of detail.
Ground truth
Samples
Vector field Conservative field Solenoidal field
f
f
Helmholtz decomposition
Fig. 2. Modeling the true vector field (leftmost subfigure) based on a small number of vector samples With GDML, a conservative vector field estimate
^
f
F
is
obtained directly. A naïve estimator
^
f
F
with independent predictions for each element of the output vector is not capable of imposing energy conservation constraints.
We perform a Helmholtz decomposition of this nonconservative vector field to show the error component that violates the law of energy conservation. This is the
portion of the overall prediction error that was avoided with GDML because of the addition of the energy conservation constraint.
SCIENCE ADVANCES
|
RESEARCH ARTICLE
Chmiela et al., Sci. Adv. 2017; 3 : e1603015 5 May 2017 2of6
on April 16, 2018http://advances.sciencemag.org/Downloaded from

METHODS
The GDML approach explicitly c onstructs an energy-conserving
force field, avoiding the application of the noise-amplifying deriv-
ative operator to a parameterized potential energy model (see the
Supplementary Materials for details). This can be achieved by directly
learning the functional relationship
^
f
F
:ð
r
1
;
r
2
; ;
r
N
Þ
i
ML
F
i
ð1Þ
between atomic coordinates and interatomic forces, instead of com-
puting the gradient of the PES (see Fig. 1, C and B). This requires
constraining the solution space o f all arbitrary vector fields t o the
subset of energy-conserving gradient fields. The PES can be obtained
through direct integration of
^
f
F
up to an additive constant.
To construct
^
f
F
, we used a generalization of the commonly used
kernel ridge regression technique for structured vector fields (see
the S upplementary Materials for details) (1921). GDML solves
the normal equation of the ridge estimator in the gradient domain
using the Hessian matrix of a kernel as the covariance structure. It
maps to all partial forces of a molecule simultaneously (see Fig. 1A)
K
Hess κðÞ
þ λI

a ¼ V
BO
¼ F ð2Þ
We resorted to the extensive body of research on suitable
kernels and descriptors for the energy prediction task (10, 13, 17).
For our application, we considered a subclass from the parametric
Matérn family (2224) of (isotropic) kernel functions
κ : C
v¼nþ
1
2
dðÞ¼exp
ffiffiffiffi
2v
p
d
σ

P
n
dðÞ;
P
n
dðÞ¼
n
k¼0
ðn þ kÞ!
ð2nÞ!
n
k

2
ffiffiffiffi
2v
p
d
σ

nk
ð3Þ
where d ¼ x
x
is the Euclidean distance between two mol-
ecule descriptors. It can be regarded as a generalization of the
universal Gaussian kernel with an addition al smoothness par-
ameter n. Our parameterization n =2resemblestheLaplacian
kernel, as suggested by Hansen et al. (13), while being sufficient-
ly differentiable.
To disambiguate Cartesian geometries that are physically
equivalent, we use an input descriptor derived from the Coulomb
matrix (see the Supplementary Materials for details) (10).
The trained force field estimator collects the contributions of the
partial derivatives 3N of all training points M to compile the prediction.
It takes the form
^
f
F
x

¼
M
i¼1
3N
j¼1
a
i

j
x
j
κ
x ;
x
i

ð4Þ
and a corresponding energy predictor is obtained by integrating
^
f
F
ð x
Þ
with respect to the Cartesian geometry. Because the trained model is a
(fixed) linear combination of kernel functions, integration only affects
the kernel function itself. The expression
^
f
E
x

¼
M
i¼1
3N
j¼1
a
i

j
x
j
κ
x ;
x
i

ð5Þ
for the energy predictor is therefore neither problem-specific nor does it
require retraining.
We remark that our PES model is global in the sense that each
molecular descriptor is considered as a whole entity, bypassing the
need for arbitrary partitioning of energy into atomic contributions.
This allows the GDML framework to capture chemical and long-
range interactions. Obviously, long-range electrostatic and van der
Waals interactions that fall within the error of the GDML model
will have to be incorporated with explici t physical models. Other
approaches that use ML to fit PESs such as Gaussian approximation
potentials (3, 8) have been proposed. However, these approaches con-
sider an explicit localization of the contribution of individual atoms to
the total energy. The total energy is expressed as a linear combination
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Force prediction accuracy
OO
O
CH
3
OH
OH
O
OH
OH
0
0.1
0.2
0.3
Energy prediction accuracy [kcal mol
–1
]
Malonaldehy
d
e
Benzene
Uracil
Naphthalene
Aspirin
Salicylic acid
Ethanol
Toluene
N
H
O
NH
O
CH
3
O O
0
5
10
Number of samples
1000
1 kcal mol
1
Å
1
1 kcal mol
–1
Å
–1
[kcal mol
–1
Å
–1
]
A
B
C
Fig. 3. Efficiency of GDML predictor versus a model that has been trained on
energies. (A) Required number of samples for a force prediction performance of
MAE (1 kcal mol
1
Å
1
) with the energy-based model (gray) and GDML (blue). The
energy-based model was not able to achieve the targeted performance with the
maximum number of 63,000 samples for aspirin. (B) Force prediction errors for
the converged models (same number of partial derivative samples and energy
samples). (C) Energy prediction errors for the converged models. All reported pre-
diction errors have been estimated via cross-validation.
SCIENCE ADVANCES
|
RESEARCH ARTICLE
Chmiela et al., Sci. Adv. 2017; 3 : e1603015 5 May 2017 3of6
on April 16, 2018http://advances.sciencemag.org/Downloaded from

of local environments characterized by a descriptor that acts as a
nonunique partitioning function to the total energy. Training on force
s a mpl es similarly requires the evaluatio n of kernel derivatives, but w.
r.t. those local environments. Although any partitioning of the total
energy is arbitrary, our molecular total energy is physically meaningful
in that it is related to the atomic force, thus being a measure for the
deflection of every atom from its ground state.
We first demonstrate the impact of the energy conservation
constraint on a toy model that can be easily visualized. A noncon-
servative force model
^
f
F
was trained alongside our GDML model
^
f
F
on a synthetic po tential defined by a two-dimensional harmonic
oscillator using the same samples, descriptor, and kernel.
We were interested in a qualitative assessment of the prediction error that
is introduced as a direct result of violating the law of energy conservation.
For this, we uniquely decomposed our naïve estimate
^
f
F
¼E þ A ð6Þ
into a sum of a curl-free (conservative) and a divergence-free (so-
lenoidal) vector field, according to the Helmholtz theorem (see Fig.
2) (25). This was achieved by subsampling
^
f
F
on a regular grid and
numerically projecting it onto the closest conservative vector field
by solving Poissons equation (26)
2
E
!
¼
^
f
F
ð7Þ
with Neumann boundary conditions. The remaining solenoidal
field represents the systematic error made by the naïve estimator.
Other than in this example, our GDML approach directly estimates
the conservative vector field and does not require a costly numer-
ical projection on a dense grid of regularly spaced samples.
RESULTS
We now proceed to evaluate the performance of the GDML ap-
proach by learning and then predicting AIMD trajectories for mol-
ecules, including benzene, uracil, naphthalene, aspirin, salicylic
acid, malonaldehyde, ethanol, and toluene (see table S1 for details
of the se molecular data sets). These data sets range in size fro m
150 k to nearly 1 M conformational geometries with a resolution
of 0.5 fs, although only a drastically reduced subset is necessary to
train our energy and GDML predictors. The molecules have differ-
ent sizes, and the molecular PESs exhibit different levels of com-
plexity. The energy range across all data points within a set spans
from 20 to 48 kcal mol
1
. Force components range from 266 to
570 kcal mol
1
Å
1
. The total energy and force labels for each data
set were computed using the PBE + vdW-TS electronic structure
method (27, 28).
The GDML prediction results are contrasted with the output of a
model that has been trained on energies. Both models use the same
kernel and descriptor, but the hyperparameter search was performed
individually to ensure optimal model selection. The GDML model
for each data set was trained on ~1000 geometries, sampled uniformly
according to the MD@DFT trajectory energy distribu tion. For the
energy model, we multiplied this amount by the number of atoms in
one molecule times its three s patial degrees of freedom. This
configuration yields equal kernel sizes for both models and therefore
equal levels of complexity in terms of the optimization problem. We
compare the models on the basis of the required number of samples
(Fig. 3A) to achieve a force prediction accuracy of 1 kcal mol
1
Å
1
.
Furthermore, the prediction accuracy of the force and energy estimates
for fully converged models (w.r.t. number of samples) (Fig. 3, B and C)
are judged on the basis of the mean absolute error (MAE) and root
mean square error performance measures.
It can be seen in Fig. 3A that the GDML model achieves a force
accuracy of 1 kcal mol
1
Å
1
using only ~1000 samples from differ-
ent data sets. Conversely, a pure energy-based model would require
up to two orders of magnitu de more samples to achieve a similar
accuracy. The superior performance of the GDML model cannot
be simply attributed to the greater information content of force
samples. We compare our results to those of a naïve force model
along the lines of the toy example shown in Fig. 2 (see tables S1
and S3 for details on the prediction accuracy of both models). The
naïve force model is nonconservative but identical to the GDML
model in all other aspects. Note that its performance deteriorates sig-
nificantly on all data sets compared to the full GDML model (see the
Supplementary Materials for details). We note here that we used DFT
calculations, but any other high-level quantum chemistry approach
could have been used to calculate forces for 1000 conformational
geometries. This allows AIMD simulations to be carried out at the
speed of ML models with the accuracy of correlated quantum chem-
istry calculations.
It is noticeable that the GDML model at convergence (w.r.t. number
of samples) yields higheraccuracy for forces than an equivalent energy-
based model (see Fig. 3B). Here, we should remark that the energy-
based model trained on a very large data set can reduce the energy error
to below 0.1 kcal mol
1
, whereas the GDML energy error remains at
0.2 kcal mol
1
for ~1000 training samples (see Fig. 3C). However, these
errors are already significantly below thermal fluctuations (k
B
T)at
room temperature (~0.6 kcal mol
1
), indicating that the GDML model
provides an excellent description of both energies and forces, fully pre-
serves their consistency, and reduces the complexity of the ML model.
These are all desirable features of models that combine rigorous
physical laws with the power of data-driven machines.
The ultimate test of any force field model is to establish its aptitude
to predict statistical averages and fluctuations using MD simulations.
The quantitative performance of the GDML model is demonstrated in
Fig. 4. Results of classical and PIMD simulations. The recently developed es-
timators based on perturbation theory were used to evaluate st ructural and
electronic observables (30). (A) Comparison of the interatomic distance distribu-
tions, hrðÞ¼
2
NðN 1Þ
X
N
i < j
dðr jj
r
i
r
j
jjÞ
P;t
, obtained from GDML (blue line) and
DFT (dashed red line) with classical MD (main frame), and PIMD (inset). a.u., arbi-
trary units. (B) Probability distribution of the dihedral angles (corresponding to
carboxylic acid and ester functional groups) using a 20 ps time interval from a
total PIMD trajectory of 200 ps.
SCIENCE ADVANCES
|
RESEARCH ARTICLE
Chmiela et al., Sci. Adv. 2017; 3 : e1603015 5 May 2017 4of6
on April 16, 2018http://advances.sciencemag.org/Downloaded from

Fig. 4 for classical and quantum MD simulations of aspirin at T = 300 K.
Figure 4A shows a comparison of interatomic distance distributions,
h(r), from MD@DFT and MD@GDML. Overall, we observe a quanti-
tative agreement in h(r) between DFT and GDML simulations. The
small differences in the distance range between 4.3 and 4.7 Å result
from slightly higher energy barriers of the GDML model in the
pathway from A to B corresponding to the collective motions of the
carboxylic acid and ester groups in aspirin. These differences vanish
once the quantum nature of the nuclei is introduced in the PIMD sim-
ulations (29). In addition, longtime scale simulations are required to
completely understand the dynamics of molecular systems. Figure 4B
shows the probability distribution of the fluctuations of dihedral angles
of carboxylic acid and ester groups in aspirin. This plot shows the ex-
istence of two main metastable configurations A and B and a short-
lived configuration C, illustrating the nontrivial dynamics captured
bytheGDMLmodel.Finally,weremarkthatasimilarlygood
performance as for aspirin is also observed for the other seven mole-
cules shown in Fig. 3. The efficiency of the GDML model (which is
three orders of magnitude faster than DFT) should enable longtime
scale PIMD simulations to obtain converged thermodynamic proper-
ties of intermediate-sized molecules with the accuracyand transferabil-
ity of high-level ab initio methods.
In summary, the developed GDML model allows the construction
of complex multidimensionalPES by combining rigorous physical laws
with data-driven ML techniques. In addition to the presented success-
ful applications to the model systems and intermediate-sized mole-
cules, our work can be further developed in several directions, including
scaling with system size and complexity, incorporating additional physical
priors, describing reaction pathways, and enabling seamless coupling be-
tween GDML and ab initio calculations.
SUPPLEMENTARY MATERIALS
Supplementary material for this article is available at http://advances.sciencemag.org/cgi/
content/full/3/5/e1603015/DC1
section S1. Noise amplification by differentiation
section S2. Vector-valued kernel learning
section S3. Descriptors
section S4. Model analysis
section S5. Details of the PIMD simulation
fig. S1. The accuracy of the GDML model (in terms of the MAE) as a function of training set size:
Chemical accuracy of less than 1 kcal/mol is already achieved for small training sets.
fig. S2. Predicting energies andforcesforconsecutivetimestepsof an MDsimulationofuracilat500K.
table S1. Properties of MD data sets that were used for numerical testing.
table S2. GDML prediction accuracy for interatomic forces and total energies for all data sets.
table S3. Accuracy of the naïve force predictor.
table S4. Accuracy of the converged energy-based predictor.
References (3136)
REFERENCES AND NOTES
1. J. Behler, M. Parrinello, Generalized neural-network representation of high-dimensional
potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
2. J. Behler, S. Lorenz, K. Reuter, Representing molecule-surface interactions with symmetry-
adapted neural networks. J. Chem. Phys. 127, 014705 (2007).
3. A. P. Bartók, M. C. Payne, R. Kondor, G. Csányi, Gaussian approximation potentials: The
accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403
(2010).
4. J. Behler, Atom-centered symmetry functions for constructing high-dimensional neural
network potentials. J. Chem. Phys. 134, 074106 (2011).
5. J. Behler, Neural network potential-energy surfaces in chemistry: A tool for large-scale
simulations. Phys. Chem. Chem. Phys. 13, 1793017955 (2011).
6. K. V. J. Jose, N. Artrith, J. Behler, Construction of high-dimensional neural network
potentials using environment-dependent atom pairs. J. Chem. Phys. 136, 194111 (2011).
7. A. P. Bartók, R. Kondor, G. Csányi, On representing chemical environments. Phys. Rev. B
87, 184115 (2013).
8. A. P. Bartók, G. Csányi, Gaussian approximation potentials: A brief tutorial introduction.
Int. J. Quantum Chem. 115, 10511057 (2015).
9. S. De, A. P. Bartók, G. Csányi, M. Ceriotti, Comparing molecules and solids across structural
and alchemical space. Phys. Chem. Chem. Phys. 18, 1375413769 (2016).
10. M. Rupp, A. Tkatchenko, K.-R. Müller, O. A. von Lilienfeld, Fast and accurate modeling of
molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301
(2012).
11. G. Montavon, M. Rupp, V. Gobre, A. Vazquez-Mayagoitia, K. Hansen, A. Tkatchenko,
K.-R. Müller, O. A. von Lilienfeld, Machine learning of molecular electronic properties in
chemical compound space. New J. Phys. 15, 095003 (2013).
12. K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Scheffler, O. A. von Lilienfeld,
A. Tkatchenko, K.-R. Müller, Assessment and validation of machine learning
methods for predicting molecular atomization energies. J. Chem. Theory Comput. 9,
34043419 (2013).
13. K.Hansen,F.Biegler,R.Ramakrishnan,W.Pronobis,O.A.vonLilienfeld,K.-R.Müller,
A. Tkat chenko , Machine learning predic tions of molecular properties: Accur ate
many-body potentials and nonlocality in chemical space. J. Phys. Chem. Lett. 6, 23262331
(2015).
14. M. Rupp, R. Ramakrishnan, O. A. von Lilienfeld, Machine learning for quantum mechanical
properties of atoms in molecules. J. Phys. Chem. Lett. 6, 3309 3313 (2015).
15. V. Botu, R. Ramprasad, Learning scheme to predict atomic forces and accelerate materials
simulations. Phys. Rev. B 92, 094306 (2015).
16. M. Hirn, N. Poilvert, S. Mallat, Quantum energy regression using scattering transforms.
CoRR arXiv:1502.02077 (2015).
17. J. Behler, Perspective: Machine learning potentials for atomistic simulations. J. Chem.
Phys. 145, 170901 (2016).
18. Z. Li, J. R. Kermode, A. De Vita, Molecular dynamics with on-the-fly machine learning of
quantum-mechanical forces. Phys. Rev. Lett. 114
, 096405 (2015).
19. C. A. Micchelli, M. A. Pontil, On learning vector-valued functions. Neural Comput. 17,
177204 (2005).
20. A. Caponnetto, C. A. Micchelli, M. Pontil, Y. Ying, Universal multi-task kernels. J. Mach.
Learn. Res. 9, 16151646 (2008).
21. V. Sindhwani, H. Q. Minh, A. C. Lozano, Scalable matrix-valued kernel learning for high-
dimensional nonlinear multivariate regression and granger causality, in Proceedings of the 29th
Conference on Uncertainty in Artificial Intelligence (UAI13),12to14July2013.
22. B. Matérn, Spatial Variation, Lecture Notes in Statistics (Springer-Verlag, 1986).
23. I. S. Gradshteyn, I. M. Ryzhik, Table of Integrals, Series, and Products, A. Jeffrey, D. Zwillinger,
Eds. (Academic Press, ed. 7, 2007).
24. T. Gneiting, W. Kleiber, M. Schlather, Matérn cross-covariance functions for multivariate
random fields. J. Am. Stat. Assoc. 105 , 11671177 (2010).
25. H. Helmholtz, Über Integrale der hydrodynamischen Gleichungen, welche den
Wirbelbewegungen entsprechen. Angew. Math. 1858,2555 (2009).
26. W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, Numerical Recipes: The Art of
Scientific Computing (Cambridge Univ. Press, ed. 3, 2007).
27. J. P. Perdew, K. Burke, M. Ernzerhof, Generalized gradient approximation made simple.
Phys. Rev. Lett. 77, 38653868 (1996).
28. A. Tkatchenko, M. Scheffler, Accurate molecular Van Der Waals interactions from ground-state
electron density and free-atom reference data. Phys. Rev. Lett. 102, 073005 (2009).
29. M. Ceriotti, J. More, D. E. Manolopoulos, i-PI: A Python interface for ab initio path integral
molecular dynamics simulations. Comput. Phys. Commun. 185, 10191026 (2014).
30. I. Poltavsky, A. Tkatchenko, Modeling quantum nuclei with perturbed path integral
molecular dynamics. Chem. Sci. 7, 13681372 (2016).
31. A. J. Smola, B. Schölkopf, Learning with Kernels: Support Vector Machines, Regularization,
Optimization, and Beyond (MIT Press, 2001).
32. J. C. Snyder, M. Rupp, K. Hansen, K.-R. Müller, K. Burke, Finding density functionals with
machine learning. Phys. Rev. Lett. 108, 253002 (2012).
33. J. C. Snyder, M. Rupp, K.-R. Müller, K. Burke, Nonlinear gradient denoising: Finding
accurate extrema from inaccurate functional derivatives. Int. J. Quantum Chem. 115,
11021114 (2015).
34. B. Schölkopf, A. Smola, K.-R. Müller, Nonlinear component analysis as a kernel eigenvalue
problem. Neural Comput. 10, 1299 (1998).
35. B. Schölkopf, S. Mika, C. J. C. Burges, P. Knirsch, K.-R. Müller, G. Ratsch, A. J. Smola, Input space
versus feature space in kernel-based methods. IEEE Trans. Neural Netw. Learn. Syst. 10,
10001017 (1999).
36. K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, B. Schölkopf, An introduction to kernel-based
learning algorithms. IEEE Trans. Neural Netw. Learn. Syst. 12, 181201 (2001).
Acknowledgments
Funding: S.C., A.T., and K.-R.M. thank the Deutsche Forschungsgemeinschaft (project MU 987/
20-1) for funding this work. A.T. is funded by the European Research Council with ERC-CoG
SCIENCE ADVANCES
|
RESEARCH ARTICLE
Chmiela et al., Sci. Adv. 2017; 3 : e1603015 5 May 2017 5of6
on April 16, 2018http://advances.sciencemag.org/Downloaded from

Citations
More filters
Journal ArticleDOI
TL;DR: The second part of the tutorial focuses on the recently proposed layer-wise relevance propagation (LRP) technique, for which the author provides theory, recommendations, and tricks, to make most efficient use of it on real data.

1,939 citations


Cites methods from "Machine learning of accurate energy..."

  • ...In the domain of atomistic simulations, powerful machine learning models have been produced to link molecular structure to electronic properties [48,23,58,18]....

    [...]

Journal ArticleDOI
TL;DR: SchNet as mentioned in this paper is a deep learning architecture specifically designed to model atomistic systems by making use of continuous-filter convolutional layers, where the model learns chemically plausible embeddings of atom types across the periodic table.
Abstract: Deep learning has led to a paradigm shift in artificial intelligence, including web, text, and image search, speech recognition, as well as bioinformatics, with growing impact in chemical physics. Machine learning, in general, and deep learning, in particular, are ideally suitable for representing quantum-mechanical interactions, enabling us to model nonlinear potential-energy surfaces or enhancing the exploration of chemical compound space. Here we present the deep learning architecture SchNet that is specifically designed to model atomistic systems by making use of continuous-filter convolutional layers. We demonstrate the capabilities of SchNet by accurately predicting a range of properties across chemical space for molecules and materials, where our model learns chemically plausible embeddings of atom types across the periodic table. Finally, we employ SchNet to predict potential-energy surfaces and energy-conserving force fields for molecular dynamics simulations of small molecules and perform an exemplary study on the quantum-mechanical properties of C20-fullerene that would have been infeasible with regular ab initio molecular dynamics.

1,104 citations

Journal ArticleDOI
13 Dec 2017
TL;DR: This article attempts to provide an overview of some of the recent successful data-driven “materials informatics” strategies undertaken in the last decade, with particular emphasis on the fingerprint or descriptor choices.
Abstract: Propelled partly by the Materials Genome Initiative, and partly by the algorithmic developments and the resounding successes of data-driven efforts in other domains, informatics strategies are beginning to take shape within materials science. These approaches lead to surrogate machine learning models that enable rapid predictions based purely on past data rather than by direct experimentation or by computations/simulations in which fundamental equations are explicitly solved. Data-centric informatics methods are becoming useful to determine material properties that are hard to measure or compute using traditional methods—due to the cost, time or effort involved—but for which reliable data either already exists or can be generated for at least a subset of the critical cases. Predictions are typically interpolative, involving fingerprinting a material numerically first, and then following a mapping (established via a learning algorithm) between the fingerprint and the property of interest. Fingerprints, also referred to as “descriptors”, may be of many types and scales, as dictated by the application domain and needs. Predictions may also be extrapolative—extending into new materials spaces—provided prediction uncertainties are properly taken into account. This article attempts to provide an overview of some of the recent successful data-driven “materials informatics” strategies undertaken in the last decade, with particular emphasis on the fingerprint or descriptor choices. The review also identifies some challenges the community is facing and those that should be overcome in the near future.

1,021 citations


Cites background from "Machine learning of accurate energy..."

  • ...Several candidates, including those based on symmetry functions [76–78], bispectra of neighborhood atomic densities [79], Coulomb matrices (and its variants) [80, 81], smooth overlap of atomic positions (SOAP) [82–85], and others [86,87], have been proposed....

    [...]

  • ...Although questions have arisen with respect to smoothness considerations and whether the representation is under/over-determined (depending on whether the eigenspectrum or the entire matrix is used as the fingerprint) [82], this approach has been shown to be able to predict various molecular properties accurately [81]....

    [...]

Journal ArticleDOI
TL;DR: Deep potential molecular dynamics (DPMD) as discussed by the authors is based on a many-body potential and interatomic forces generated by a carefully crafted deep neural network trained with ab initio data.
Abstract: We introduce a scheme for molecular simulations, the deep potential molecular dynamics (DPMD) method, based on a many-body potential and interatomic forces generated by a carefully crafted deep neural network trained with ab initio data. The neural network model preserves all the natural symmetries in the problem. It is first-principles based in the sense that there are no ad hoc components aside from the network model. We show that the proposed scheme provides an efficient and accurate protocol in a variety of systems, including bulk materials and molecules. In all these cases, DPMD gives results that are essentially indistinguishable from the original data, at a cost that scales linearly with system size.

903 citations

Journal ArticleDOI
TL;DR: A package written in Python/C++ that has been designed to minimize the effort required to build deep learning based representation of potential energy and force field and to perform molecular dynamics, it is demonstrated that the resulted molecular dynamics model reproduces accurately the structural information contained in the original model.

628 citations


Cites background from "Machine learning of accurate energy..."

  • ...[9] Stefan Chmiela, Alexandre Tkatchenko, Huziel E Sauceda, Igor Poltavsky, Kristof T Schütt, and Klaus-Robert Müller....

    [...]

  • ...Some state-of-the-art examples (not a comprehensive list) include the Behler-Parrinello neural network (BPNN) [7], the Gaussian approximation potentials (GAP) [8], the Gradientdomain machine learning (GDML) [9], and the Deep potential for molecular dynamics (DeePMD) [10, 11]....

    [...]

  • ...Some examples (not a comprehensive list) include the Behler-Parrinello neural network (BPNN) [9], the Gaussian approximation potentials (GAP) [11], the Gradientdomain machine learning (GDML) [14], and the Deep potential for molecular dynamics (DeePMD) [17, 18]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A simple derivation of a simple GGA is presented, in which all parameters (other than those in LSD) are fundamental constants, and only general features of the detailed construction underlying the Perdew-Wang 1991 (PW91) GGA are invoked.
Abstract: Generalized gradient approximations (GGA’s) for the exchange-correlation energy improve upon the local spin density (LSD) description of atoms, molecules, and solids. We present a simple derivation of a simple GGA, in which all parameters (other than those in LSD) are fundamental constants. Only general features of the detailed construction underlying the Perdew-Wang 1991 (PW91) GGA are invoked. Improvements over PW91 include an accurate description of the linear response of the uniform electron gas, correct behavior under uniform scaling, and a smoother potential. [S0031-9007(96)01479-2] PACS numbers: 71.15.Mb, 71.45.Gm Kohn-Sham density functional theory [1,2] is widely used for self-consistent-field electronic structure calculations of the ground-state properties of atoms, molecules, and solids. In this theory, only the exchange-correlation energy EXC › EX 1 EC as a functional of the electron spin densities n"srd and n#srd must be approximated. The most popular functionals have a form appropriate for slowly varying densities: the local spin density (LSD) approximation Z d 3 rn e unif

146,533 citations

01 Jan 1917
TL;DR: Basic Forms x n dx = 1 n + 1 x n+1 (1) 1 x dx = ln |x| (2) udv = uv − vdu (3) 1 ax + bdx = 1 a ln|ax + b| (4) Integrals of Rational Functions
Abstract: Basic Forms x n dx = 1 n + 1 x n+1 (1) 1 x dx = ln |x| (2) udv = uv − vdu (3) 1 ax + b dx = 1 a ln |ax + b| (4) Integrals of Rational Functions 1 (x + a) 2 dx = −

11,190 citations

Journal ArticleDOI
TL;DR: A new method for performing a nonlinear form of principal component analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.
Abstract: A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear map—for instance, the space of all possible five-pixel products in 16 × 16 images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

8,175 citations

BookDOI
01 Dec 2001
TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.
Abstract: From the Publisher: In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs—-kernels--for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics. Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.

7,880 citations

Journal ArticleDOI
TL;DR: It is shown that the effective atomic C6 coefficients depend strongly on the bonding environment of an atom in a molecule, and the van der Waals radii and the damping function in the C6R(-6) correction method for density-functional theory calculations.
Abstract: We present a parameter-free method for an accurate determination of long-range van der Waals interactions from mean-field electronic structure calculations. Our method relies on the summation of interatomic C6 coefficients, derived from the electron density of a molecule or solid and accurate reference data for the free atoms. The mean absolute error in the C6 coefficients is 5.5% when compared to accurate experimental values for 1225 intermolecular pairs, irrespective of the employed exchangecorrelation functional. We show that the effective atomic C6 coefficients depend strongly on the bonding environment of an atom in a molecule. Finally, we analyze the van der Waals radii and the damping function in the C6R � 6 correction method for density-functional theory calculations.

4,825 citations


"Machine learning of accurate energy..." refers methods in this paper

  • ...The total energy and force labels for each data set were computed using the PBE + vdW-TS electronic structure method (27, 28)....

    [...]