Benchmarking Density-Functionals on
Structural Parameters of Small/Medium-Sized
Organic Molecules
Éric Brémond,
†
Marika Savarese,
†
Neil Qiang Su,
‡
Ángel José Pérez-Jiménez,
¶
Xin Xu,
‡
Juan Carlos Sancho-García,
¶
and Carlo Adamo
⇤,§,k,†
CompuNet, Istituto Italiano di Tecnologia, via Morego 30, I-16163 Genoa, Italy, Shanghai
Key Laboratory of Molecular Catalysis and Innovative Materials, MOE Laborator y for
Computational Physical Science, Department of Chemistry, Fudan University, Shanghai
200433, China, Departamento de Química Física, Universidad de Alicante, E-03080
Alic ante, Spain, Chimie ParisTech, PSL R esearch University, CNRS, Institut de R e cherche
de Chimie Paris IRCP, F-75005 Paris, France, and Institut Universitaire de France, 103
Boulevard Saint Michel, F-75005 Paris, France
E-mail: carlo.adamo@chimie-paristech.fr
⇤
To whom correspondence should be addressed
†
Istituto Italiano di Tecnologia
‡
Fudan University
¶
Universidad de Alicante
§
Chimie ParisTech
k
Institut Universitaire de France
1
This is a previous version of the article published in the Journal of Chemical Theory and Computation. 2016, 12(2): 459-465. doi:10.1021/acs.jctc.5b01144
Abstract
In this letter we report the error analysis of 59 exchange-correlation functionals in
evaluating the structural parameters of small- and medium-sized organic molecules.
From this analysis, recently developed double-hybrids, such as xDH-PBE0, emerge as
the most reliable methods, while glo bal -hybrids confirm their robustness in reproduc-
ing molecular structures. Notably the M06-L density-functional is the only semilocal
method reaching an accuracy comparable to hybrids’. A comparison with errors ob-
tained on energetic databases (including thermochemistry, reaction barriers and inter-
action energies) indicate that most of the functionals have a coherent behavior, showing
low (or high) deviations on both energy and structure datasets. Only a few of them are
more prone toward one of these two properties.
2
The quality of any method rooted in density functional theory (DFT) is (strongly) affected
by the choice of the exchange-correlation functional (ECF), which gives the unknown term
of the Kohn-Sham energy. If from one side the spreading of DFT in chemistry and physics
has encouraged the research of new and better-performing density-functionals,
1
from the
other side their validation has become a due step before any routine application. Such a
benchmark passes through a careful evaluation (and consequent statistical analysis) of the
errors on defined properties and systems sets.
Starting from the nineties, a large effort has been made in order to define standard
benchmark sets allowing for a meaningful and fair comparison between different ECFs.
2–8
Among the properties firstly targeted, atomization energies, ionization potentials and elec-
tron affinities
2–4
as well as bond lengths and angles of (mostly) small organic systems received
aparticularattention.
5,6
Later, several other key properties were added to these, such as
different spectroscopic observables,
9–13
gaps in solids,
8,14
lattice constants,
8,15,16
and reaction
energies,
17
just to mention some.
Since performances on properties and structural parameters are generally believed to be
disconnected, it is a commonly-accepted practice to carry out such benchmarks a t given
molecular structures. However some exceptions can be found in the literature,
18–20
mainly
concerning specific cases, like transition state structures,
21–23
weakly bound interacting com-
plexes,
24,25
conjugated polymers
26
or H-bonds.
27
Most of these latter studies showed that
ECFs performing well on a given non-structural property are not necessarily the best can-
didates for an accurate determinatio n of molecular geometries. Nevertheless, properties and
structures are often eva luated using the same ECF, which prompts for more systematic
studies.
One of the main reasons concerning the recent deficiency of benchmarks on molecular
structures is the lack of accurate reference values to perform these systematic investiga-
tions. Within this framework, Barone and coworkers have recently developed two reference
databases of semiexperimental equilibrium structures of semirigid organic molecules named
3
here CCse21 and B3se 47.
28–30
Whereas the former gathers a collection of 21 small organic
molecules ranging from tri- to octo-atomic systems, the latter is the subsequent extension
including 26 additional medium-sized organic systems dealing with various types of covalent
bonds and different molecular skeletons (see Figure S1 and S2 in the Suppo rting Informa-
tion). Both databases are an excellent diag no stic test to discriminate density-functionals in
modeling structural parameters of organic systems.
In this Letter, we use these two datasets to thoroughly benchmark the accuracy of 59
ECFs (reported in Table 1), and 3 post-Hartree-Fock (post-HF) approaches derived from the
second-order Møller-Plesset theory in its canonical (MP2 ) or spin-scaled versions (SCS- and
SOS-MP2). For the sake of completeness, the Hartree-Fock (HF) values are also reported.
The references and further details of all the considered computational methods involved in
this Letter are given in Table S1 of the Supporting Information.
In order to discriminate the accuracy of the selected approaches, we define a criterion
based on the matrix containing all the interatomic distances. For each system, we compute
the mean absolute deviation (MAD) over the distance matrix of the probed and the reference
geometries, and calculate the averaged deviation over the set. Figure 1 reports these statistics
for the 63 computational approaches considered in this Letter (see Table S2 and S3 in the
Supporting Information for more details).
For the CCse21 dataset, the deviations span from 0.002 to 0.016 Å for xDH-PBE0
and HF methods, respectively. Within this interval, a smooth transition from high to low
accuracy is observed. Apart from the worst performing ECFs like BLYP, B97D, B97D3 or
TPSS, most of the methods give a slight increase of the distance matrix deviation (⇠ 1 ·10
3
Å) going from the CCse21 to the B3se47 data ba se. In other words, most of the methods
show a coherent behavior for both small- and medium-sized semirigid organic compounds.
Going through the details, the top rank performing density-functionals is ruled by double-
hybrids, and more spec ially by the xDH-PBE0 double-hybrid and some reparameterization
variants of the B2-PLYP density-functional with deviations lower than 0.003 Å o n the dis-
4
tance matrix criterion. These ECFs (containing a fraction of nonlocal correlation) out-
perform the other considered methods for the description of minimum energy structures.
They are followed by modern and highly parameterized range-separated hybrids belonging
to the !B97 family of ECFs, which are often underlined in the literature as promising ap-
proaches to model several other chemical properties,
11,27,31
and by the large panel of global-
hybrids chosen to perform this investigation. Among these global-hybrids, SOGGA11-X, an
exchange-hybridized variant of the semilocal SOGGA11 already highlighted for its accuracy
in modeling bond lengths,
18–20
and the parameter-free PBE0 dens ity-functionals, are part of
the best performing approaches with an error of 0.004 Å for both the CCse21 and B3se47
datasets. The popular B3LYP global-hybrid is on the line with an error of 0.005 Å on the
CCse21 dataset, and of 0.007 Å on the B3se47 database. Distance matrix deviations are
larger than 0.007 Å for semilocal ECFs (CCse21 database), noting that mGGAs are gener-
ally more accurate than GGAs. At this point, two remarkable results should be underlined.
From one side GGA density-functionals casting empirical dispersion corrections such as B97D
and B97D3 are worse performing than their parent approaches, thus suggesting problems in
their parameterization procedure. On the other side M06-L is the only semilo cal ECF giving
results comparable to those obtained with the best performing (and more computationally
expensive) hybrid density-functionals.
Another interesting issue arising from Figure 1 concerns the performance of the Minnesota
density-functionals
8
excluding the already mentioned SOGGA11-X. First generation density-
functionals (e.g., M05, M06) are more accurate than those belonging to the second one (e.g,
M11, N12-SX, MN12-SX). This effect could be related to the transition from global- to
range-separated hybridization scheme. Note that the behavior of semilocal approaches is
more difficult to rationalize even if some of the most recent ones are mainly para meterized
for thermochemistry (M11-L).
Figure 2 gives some insights on the accuracy of the computational appro a ches to model
the length of selected CH, CC and CO bonds extracted from the CCse21 database (see Table
5