Quantum-chemical insights from deep tensor neural networks.

doi:10.1038/NCOMMS13890

ARTICLE

Received 24 Jun 2016 | Accepted 9 Nov 2016 | Published 9 Jan 2017

Quantum-chemical insights from deep

tensor neural networks

Kristof T. Schu¨tt

1

, Farhad Arbabzadah

1

, Stefan Chmiela

1

, Klaus R. Mu¨ller

1,2

& Alexandre Tkatchenko

3,4

Learning from data has led to paradigm shifts in a multitude of disciplines, including web, text

and image search, speech recognition, as well as bioinformatics. Can machine learning enable

similar breakthroughs in understanding quantum many-body systems? Here we develop an

efﬁcient deep learning approach that enables spatially and chemically resolved insights into

quantum-mechanical observables of molecular systems. We unify concepts from many-body

Hamiltonians with purpose-designed deep tensor neural networks, which leads to size-

extensive and uniformly accurate (1 kcal mol

1

) predictions in compositional and conﬁg-

urational chemical space for molecules of intermediate size. As an example of chemical

relevance, the model reveals a classiﬁcation of aromatic rings with respect to their stability.

Further applications of our model for predicting atomic energies and local chemical potentials

in molecules, reliable isomer energies, and molecules with peculiar electronic structure

demonstrate the potential of machine learning for revealing insights into complex quantum-

chemical systems.

DOI: 10.1038/ncomms13890

OPEN

1

Machine Learning Group, Technische Universita

¨

t Berlin, Marchstr. 23, 10587 Berlin, Germany.

2

Department of Brain and Cognitive Engineering, Korea

University, Anam-dong, Seongbuk-gu, Seoul 136-713, Republic of Korea.

3

Theory Department, Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg

4-6, D-14195 Berlin, Germany.

4

Physics and Materials Science Research Unit, University of Luxembourg, Luxembourg,, L-1511 Luxembourg. Correspondence

and requests for materials should be addressed to K.R.M. (email: klaus-robert.mueller@tu-berlin.de) or to A.T. (email: alexandre.tkatchenko@uni.lu).

NATURE COMMUNICATIONS | 8:13890 | DOI: 10.1038/ncomms13890 | www.nature.com/naturecommunications 1

C

hemistry permeates all aspects of our life, from the

development of new drugs to the food that we consume

and materials we use on a daily basis. Chemists rely

on empirical observations based on creative and painstaking

experimentation that leads to eventual discoveries of molecules

and materials with desired properties and mechanisms to

synthesize them. Many discoveries in chemistry can be guided

by searching large databases of experimental or computational

molecular structures and properties by using concepts based

on chemical similarity. Because the structure and properties

of molecules are determined by the laws of quantum mechanics,

ultimately chemical discovery must be based on fundamental

quantum principles. Indeed, electronic structure calculations

and intelligent data analysis (machine learning) have recently

been combined aiming towards the goal of accelerated discovery

of chemicals with desired properties

1–8

. However, so far the

majority of these pioneering efforts have focused on the

construction of reduced models trained on large data sets of

density-functional theory calculations.

In this work, we develop an efﬁcient deep learning approach

that enables spatially and chemically resolved insights

into quantum-mechanical properties of molecular systems

beyond those trivially contained in the training dataset.

Obviously, computational models are not predictive if they lack

accuracy. In addition to being interpretable, size-extensive

and efﬁcient, our deep tensor neural network (DTNN) approach

is uniformly accurate (1 kcal mol

1

) throughout compositional

and conﬁgurational chemical space. On the more fundamental

side, the mathematical construction of the DTNN model provides

statistically rigorous partitioning of extensive molecular proper-

ties into atomic contributions—a long-standing challenge

for quantum-mechanical calculations of molecules.

Results

Molecular deep tensor neural networks. It is common to use a

carefully chosen representation of the problem at hand as a basis

for machine learning

9–11

. For example, molecules can be

represented as Coulomb matrices

7,12,13

, scattering transforms

14

,

bags of bonds

15

, smooth overlap of atomic positions

16,17

or generalized symmetry functions

18,19

. Kernel-based learning

of molecular properties transforms these representations

non-linearly by virtue of kernel functions. In contrast, deep

neural networks

20

are able to infer the underlying regularities and

learn an efﬁcient representation in a layer-wise fashion

21

.

Molecular properties are governed by the laws of quantum

mechanics, which yield the remarkable ﬂexibility of chemical

systems, but also impose constraints on the behaviour of bonding

in molecules. The approach presented here utilizes the many-

body Hamiltonian concept for the construction of the DTNN

architecture (Fig. 1), embracing the principles of quantum

chemistry, while maintaining the full ﬂexibility of a complex

data-driven learning machine.

DTNN receives molecular structures through a vector of

nuclear charges Z and a matrix of atomic distances D ensuring

rotational and translational invariance by construction (Fig. 1a).

The distances are expanded in a Gaussian basis, yielding a feature

vector

^

d

ij

2 R

G

, which accounts for the different nature of

interactions at various distance regimes. Similar approaches have

been applied to the entries of the Coulomb matrix for the

prediction of molecular properties before

12

.

The total energy E

M

for the molecule M composed of N atoms

is written as a sum over N atomic energy contributions E

i

, thus

satisfying permutational invariance with respect to atom index-

ing. Each atom i is represented by a coefﬁcient vector c 2 R

B

,

where B is the number of basis functions, or features. Motivated

by quantum-chemical atomic basis set expansions, we assign

an atom type-speciﬁc descriptor vector c

Z

i

to these coefﬁcients

c

0ðÞ

i

. Subsequently, this atomic expansion is repeatedly reﬁned by

pairwise interactions with the surrounding atoms

c

t þ1ðÞ

i

¼c

tðÞ

i

þ

X

j 6¼i

v

ij

; ð1Þ

where the interaction term v

ij

reﬂects the inﬂuence of atom j at a

distance D

ij

on atom i. Note that this reﬁnement step is seamlessly

integrated into the architecture of the molecular DTNN, and

is therefore adapted throughout the learning process. In

Supplementary Discussion, we show the relation to convolutional

neural networks that have been applied to images, speech and text

with great success because of their ability to capture local

structure

22–27

. Considering a molecule as a graph, T reﬁnements

of the coefﬁcient vectors are comprised of all walks of length

T through the molecule ending at the corresponding atom

28,29

.

From the point of view of many-body interatomic interactions,

subsequent reﬁnement steps t correlate atomic neighbourhoods

with increasing complexity.

While the initial atomic representations only consider isolated

atoms, the interaction terms characterize how the basis functions

of two atoms overlap with each other at a certain distance. Each

reﬁnement step is supposed to reduce these overlaps, thereby

embedding the atoms of the molecule into their chemical

environment. Following this procedure, the DTNN implicitly

learns an atom-centered basis that is unique and efﬁcient with

respect to the property to be predicted.

Non-linear coupling between the atomic vector features and

the interatomic distances is achieved by a tensor layer

30–32

, such

that the coefﬁcient k of the reﬁnement is given by

v

ijk

¼ tanh c

tðÞ

j

V

k

^

d

ij

þ W

c

tðÞ

j



k

þ W

d

^

d

ij



k

þb

k



; ð2Þ

where b

k

is the bias of feature k and W

c

and W

d

are the weights of

atom representation and distance, respectively. The slice V

k

of the

parameter tensor V 2 R

BBG

combines the inputs

multiplicatively. Since V incorporates many parameters, using

this kind of layer is both computationally expensive as well as

prone to overﬁtting. Therefore, we employ a low-rank tensor

factorization, as described in (ref. 33), such that

v

ij

¼ tanh W

fc

W

cf

c

j

þb

f

1



 W

df

^

d

ij

þb

f

2

hi

; ð3Þ

where ‘’ represents element-wise multiplication, while W

cf

, b

f

1

,

W

df

, b

f

2

and W

fc

are the weight matrices and corresponding

biases of atom representations, distances and resulting factors,

respectively. As the dimensionality of W

cf

c

j

and W

df

^

d

ij

corresponds to the number of factors, choosing only a few

drastically decreases the number of parameters, thus solving both

issues of the tensor layer at once.

Arriving at the ﬁnal embedding after a given number of

interaction reﬁnements, two fully-connected layers predict an

energy contribution from each atomic coefﬁcient vector, such that

their sum corresponds to the total molecular energy E

M

.

Therefore, the DTNN architecture scales with the number of

atoms in a molecule, fully capturing the extensive nature of the

energy. All weights, biases, as well as the atom type-speciﬁc

descriptors were initialized randomly and trained using stochastic

gradient descent.

Learning molecular energies. To demonstrate the versatility of

the proposed DTNN, we train models with up to three interaction

passes T ¼3 for both compositional and conﬁgurational degrees

of freedom in molecular systems. The DTNN accuracy saturates

at T ¼3, and leads to a strong correlation between atoms in

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms13890

2 NATURE COMMUNICATIONS | 8:13890 | DOI: 10.1038/ncomms13890 | www.nature.com/naturecommunications

molecules, as can be visualized by the complexity of the potential

learned by the network (Fig. 1e). For training, we employ

chemically diverse data sets of equilibrium molecular structures,

as well as molecular dynamics (MD) trajectories for small

molecules. We employ two subsets of the GDB-13 database

34,35

referred to as GDB-7, including 47,000 molecules with up to

seven heavy (C, N, O, F) atoms, and GDB-9, consisting of 133,885

molecules with up to nine heavy atoms

36

. In both cases, the

learning task is to predict the molecular total energy calculated

with density-functional theory (DFT). All GDB molecules are

stable and synthetically accessible according to organic chemistry

rules

35

. Molecular features such as functional groups or

signatures include single, double and triple bonds; (hetero-)

cycles, carboxy, cyanide, amide, amine, alcohol, epoxy, sulphide,

ether, ester, chloride, aliphatic and aromatic groups. For each of

the many possible stoichiometries, many constitutional isomers

are considered, each being represented only by a low-energy

conformational isomer.

As Supplementary Table 1 demonstrates, DTNN achieves a

mean absolute error of 1.0 kcal mol

1

on both GDB data sets,

training on 5.8 k GDB-7 (80%) and 25 k (20%) GDB-9 reference

calculations, respectively. Figure 1c shows the performance

on GDB-9 depending on the size of the molecule. We observe

that larger molecules have lower errors because of their

25

# atoms

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Mean abs. error (kcal mol

−1

)

5,000

1.4

1.6

1.8

2.0

2.2

Mean abs. error

0 100 200

Time step

0

−10

−20

−30

−40

Total energy (kcal mol

−1

)

−1.702e5

O

OH

CH

3

O

Z

1

D

12

D

13

D

1n

Z

2

D

21

D

23

D

2n

Feedback loop

c

2

(0)

c

1

(0)

Z =

D =

Z

1

D

11

D

21

Z

2

D

12

D

12

D

n2

D

n1

Z

n

D

1n

D

2n

D

nn

v

12

v

13

v

1n

v

21

v

23

v

2n

t < T

t :=t + 1

Interaction module

c

2

(t)

t =T

v

ij

c

1

(t )

t =T

E

1

E

2

t :=t + 1

Gaussian expansion

Hyperbolic tangent

Element-wise product

Element-wise sum

E

n

CH

4

C

6

H

6

C

6

C

3

H

6

C

6

H

5

CH

3

C

3

H

8

C

4

N

2

H

4

E

∑

Molecules with ≥ 20 atoms

+

∑

+

∑

+

t < T

2,500

# add. calcs. ≤ 15 atoms

20

15

10

a

b

c

d

e

c

j

(t)

tanh

W

df

d

ˆ

ij

+ b

f

2

W

fc

W

cf

c

j

+ b

f

1

Figure 1 | Prediction and explanation of molecular energies with a deep tensor neural network. (a) Molecules are encoded as input for the neural

network by a vector of nuclear charges and an inter-atomic distance matrix. This description is complete and invariant to rotation and translation.

(b) Illustration of the network architecture. Each atom type corresponds to a vector of coefﬁcients c

0ðÞ

i

, which is repeatedly reﬁned by interactions v

ij

.

The interactions depend on the current representation c

t

ðÞ

j

, as well as the distance D

ij

to an atom j. After T iterations, an energy contribution E

i

is predicted

for the ﬁnal coefﬁcient vector c

TðÞ

i

. The molecular energy E is the sum over these atomic contributions. (c) Mean absolute errors of predictions for the

GDB-9 dataset of 133,885 molecules as a function of the number of atoms. The employed neural network uses two interaction passes (T ¼2) and 50,000

reference calculation during training. The inset shows the error of an equivalent network trained on 5,000 GDB-9 molecules with 20 or more atoms, as

small molecules with 15 or less atoms are added to the training set. (d) Extract from the calculated (black) and predicted (orange) molecular dynamics

trajectory of toluene. The curve on the right shows the agreement of the predicted and calculated energy distributions. (e) Energy contribution E

probe

(or local chemical potential O

M

H

r

ðÞ

, see text) of a hydrogen test charge on a

P

i

r r

i

kk

2

isosurface for various molecules from the GDB-9 dataset for a

DTNN model with T ¼2.

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms13890 ARTICLE

NATURE COMMUNICATIONS | 8:13890 | DOI: 10.1038/ncomms13890 | www.nature.com/naturecommunications 3

abundance in the training data. However, when predicting larger

molecules than present in the training set, the errors increase.

This is because the molecules in the GDB-9 set are quite small,

so we considered all atoms to be in each other’s chemical

environment. Imposing a distance cutoff to interatomic interac-

tions of 3 Å leads to a 0.1 kcal mol

1

increase in the error.

However, this distance cutoff restricts only the direct interactions

considered in the reﬁnement steps. With multiple reﬁnements,

the effective cutoff increases by a factor of T because of indirect

interactions over multiple atoms. Given large enough molecules,

so that a reasonable distance cutoff can be chosen, scaling

to larger molecules will require only to have well-represented

local environments. For now, we observe that at least a few larger

molecules are needed to achieve a good prediction accuracy.

Following this train of thought, we trained the network on a

restricted subset of 5 k molecules with 420 atoms. By adding

smaller molecules to the training set, we are able to reduce the

test error from 2.1 kcal mol

1

to o1.5 kcal mol

1

(see inset in

Fig. 1c). This result demonstrates that our model is able to

transfer knowledge learned from small molecules to larger

molecules with diverse functional groups.

While only encompassing conformations of a single molecule,

reproducing MD simulation trajectories poses a radically different

challenge to predicting energies of purely equilibrium structures.

We learned potential energies for MD trajectories of benzene,

toluene, malonaldehyde and salicylic acid, carried out at a rather

high temperature of 500 K to achieve exhaustive exploration of

the potential-energy surface of such small molecules. The neural

network yields mean absolute errors of 0.05, 0.18, 0.17 and

0.39 kcal mol

1

for these molecules, respectively (Supplementary

Table 1). Figure 1d shows the excellent agreement between

the DFT and DTNN MD trajectory of toluene, as well as the

corresponding energy distributions. The DTNN errors are much

smaller than the energy of thermal ﬂuctuations at room

temperature (B0.6 kcal mol

1

), meaning that DTNN potential-

energy surfaces can be utilized to calculate accurate molecular

thermodynamic properties by virtue of Monte Carlo simulations.

Supplementary Figs 1 and 2 illustrate how the performance

of DTNN depends on the number of employed reference

calculations and reﬁnement steps (Supplementary Discussion).

The ability of DTNN to accurately describe equilibrium structures

within the GDB-9 database and MD trajectories of selected

molecules of chemical relevance demonstrates the feasibility of

developing a universal machine learning architecture that

can capture compositional as well as conﬁgurational degrees of

freedom in the vast chemical space. While the employed

architecture of the DTNN is universal, the learned coefﬁcients

are different for GDB-9 and MD trajectories of single molecules.

Local chemical potential. Beyond predicting accurate

energies, the true power of DTNN lies in its ability to provide

novel quantum-chemical insights. In the context of DTNN, we

deﬁne a local chemical potential O

M

A

rðÞas an energy of a certain

atom type A, located at a position r in the molecule M. While the

DTNN models the interatomic interactions, we only allow the

atoms of the molecule act on the probe atom, while the probe

does not inﬂuence the molecule. The spatial and chemical

sensitivity provided by our DTNN approach is shown in Fig. 1e

for a variety of fundamental molecular building blocks. In this

case, we employed hydrogen as a test charge, while the results for

O

M

C;N;O

rðÞare shown in Fig. 2. Despite being trained only on total

energies of molecules, the DTNN approach clearly grasps

fundamental chemical concepts such as bond saturation and

different degrees of aromaticity. For example, the DTNN model

predicts the C

6

O

3

H

6

molecule to be ‘more aromatic’ than benzene

or toluene (Fig. 1e). Remarkably, it turns out that C

6

O

3

H

6

does

have higher ring stability than both benzene and toluene and

DTNN predicts it to be the molecule with the most stable

aromatic carbon ring among all molecules in the GDB-9 database

(Fig. 3). Further chemical effects learned by the DTNN model are

shown in Fig. 2 that demonstrates the differences in the chemical

potential distribution of H, C, N and O atoms in benzene,

toluene, salicylic acid and malonaldehyde. For example, the

chemical potentials of different atoms over an aromatic ring are

qualitatively different for H, C, N and O atoms—an evident fact

for a trained chemist. However, the subtle chemical differences

described by DTNN are accompanied by chemically accurate

predictions—a challenging task for humans.

Because DTNN provides atomic energies by construction, it

allows us to classify molecules by the stability of different building

blocks, for example aromatic rings or methyl groups. An example

of such classiﬁcation is shown in Fig. 3, where we plot the

molecules with most stable and least stable carbon aromatic rings

in GDB-9. The distribution of atomic energies is shown in

Supplementary Fig. 3, while Supplementary Fig. 4 lists the full

stability ranking. The DTNN classiﬁcation leads to interesting

stability trends, notwithstanding the intrinsic non-uniqueness of

atomic energy partitioning. However, unlike atomic projections

employed in electronic-structure calculations, the DTNN

approach has a ﬁrm foundation in statistical learning theory.

In quantum-chemical calculations, every molecule would corre-

spond to a different partitioning depending on its self-consistent

electron density. In contrast, the DTNN approach learns the

partitioning on a large molecular dataset, generating a transfer-

able and global ‘dressed atom’ representation of molecules in

chemical space. Recalling that DTNN exhibits errors below

1 kcal mol

1

, the classiﬁcation shown in Fig. 3 can provide useful

guidance for the chemical discovery of molecules with desired

properties. Analytical gradients of the DTNN model with respect

to chemical composition or O

M

A

rðÞ could also aid in the

exploration of chemical compound space

37

.

Energy predictions for isomers. The quantitative accuracy

achieved by DTNN and its size extensivity paves the way to

the calculation of conﬁgurational and conformational energy

differences—a long-standing challenge for machine learning

approaches

7,12,13,38

. The reliability of DTNN for isomer energy

predictions is demonstrated by the energy distribution in Fig. 4

for molecular isomers with C

7

O

2

H

10

chemical formula (a total of

6,095 isomers in the GDB-9 data set).

Training a common model for chemical as well as conforma-

tional freedoms requires a more complex model. Furthermore,

it comes with technical challenges like sampling and multiscale

issues since the MD trajectories form clusters of small variation

within the chemical compound space. As a proof of principle, we

trained the DTNN to predict various MD trajectories of the

C

7

O

2

H

10

isomers. To this end, we calculated short MD

trajectories of 5,000 steps each for 113 randomly picked isomers

as well as consistent total energies for all equilbrium structures.

The training set is composed of all isomers in equilibrium as well

as 50% of each MD trajectory. The remaining MD calculations

are used for validation and testing. Despite the added complexity,

our model achieves a mean absolute error of 1.7 kcal mol

1

.

Discussion

DTNNs provide an efﬁcient way to represent chemical environ-

ments allowing for chemically accurate predictions. To this end,

an implicit, atom-centered basis is learned from reference

calculations. Employing this representation, atoms can be

embedded in their chemical environment within a few reﬁnement

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms13890

4 NATURE COMMUNICATIONS | 8:13890 | DOI: 10.1038/ncomms13890 | www.nature.com/naturecommunications

steps. Furthermore, DTNNs have the advantage that the

embedding is built recursively from pairwise distances. Therefore,

all necessary invariances (translation, rotation, permutation) are

guaranteed to be exploited by the model. In addition, the learned

embedding can be used to generate alchemical reaction paths

(Supplementary Fig. 5).

In previous approaches, potential-energy surfaces were

constructed by ﬁtting many-body expansions with neural

networks

39–41

. However, these methods require a separate NN

for each non-equivalent many-body term in the expansion. Since

DTNN learns a common basis in which the atom interact, higher-

order interactions can obtained more efﬁciently without separate

treament.

Approaches like smooth overlap of atomic positions

16,17

or manually crafted atom-centered symmetry functions

18,19,42

are, like DTNN, based on representing chemical environments.

All these approaches have in common that size-extensivity

regarding the number of atoms is achieved by predicting atomic

energy contributions using a non-linear regression method

(for example, neural networks or kernel ridge regression).

However, the previous approaches have a ﬁxed set of basis

functions describing the atomic environments. In contrast,

DTNNs are able to adapt to the problem at hand in a

−110

−80

−50

−150 −115 −80

−140 −100 −60

−145 −105 −65

Oxygen

NitrogenCarbonHydrogen

Ω

M

A

(r) in kcal mol

–1

Figure 2 | Chemical potentials O

M

A

rðÞfor A ¼{C, N, O, H} atoms. The isosurface was generated for

P

i

r r

i

kk

2

¼3.8 Å

2

(the index i is used to sum

over all atoms of the corresponding molecule). The molecules shown are (in order from top to bottom of the ﬁgure): benzene, toluene, salicylic acid and

malondehyde. Atom colouring: carbon ¼black, hydrogen ¼white, oxygen ¼red.

–859.9 –858.3 –857.8

–857.3 –856.9 –856.8

–845.1 –841.9

–841.7 –841.2

# 1 – 10

# 281 – 290

E

ring

in kcal mol

–1

E

ring

in kcal mol

–1

–857.4–857.4

–856.6–856.8

–841.7 –841.4 –841.1

–841.9–842.1–843.8

Figure 3 | Classiﬁcation of molecular carbon ring stability. Shown are

20 molecules (10 most stable and 10 least stable) with respect to the

energy of the carbon ring predicted by the DTNN model. Atom colouring:

carbon ¼black; hydrogen ¼white; oxygen ¼red; nitrogen ¼blue;

ﬂuorine ¼yellow.

Kendall rank correlation

coefficient = 0.969

–1,750

–1,800

–1,850

–1,900

–1.900 –1,850 –1,800

–1,750

Atomization energy (DFT)

Atomization energy (NN)

–1,900 –1,850 –1,800 –1,750

Atomization energy (kcal mol

–1

)

Figure 4 | Isomer energies with chemical formula C

7

O

2

H

10

. DTNN

trained on the GDB-9 database is able to acurately discriminate between

6,095 different isomers of C

7

O

2

H

10

, which exhibit a non-trivial spectrum of

relative energies.

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms13890 ARTICLE

NATURE COMMUNICATIONS | 8:13890 | DOI: 10.1038/ncomms13890 | www.nature.com/naturecommunications 5

Quantum-chemical insights from deep tensor neural networks.

Citations

A Comprehensive Survey on Graph Neural Networks

Graph Neural Networks: A Review of Methods and Applications

Neural Message Passing for Quantum Chemistry

Deep learning and process understanding for data-driven Earth system science

Methods for interpreting and understanding deep neural networks

References

Generalized Gradient Approximation Made Simple

Density‐functional thermochemistry. III. The role of exact exchange

Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density

ImageNet Classification with Deep Convolutional Neural Networks

Deep learning

Related Papers (5)

Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning

Generalized neural-network representation of high-dimensional potential-energy surfaces.

ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost

Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons.

SchNet - A deep learning architecture for molecules and materials.