Showing papers in "Journal of Chemical Physics in 2018"

PDF

Open Access

Journal Article•DOI•

SchNet - A deep learning architecture for molecules and materials.

[...]

Kristof T. Schütt¹, Huziel E. Sauceda², Pieter-Jan Kindermans¹, Alexandre Tkatchenko³, Klaus-Robert Müller¹ - Show less +1 more•Institutions (3)

Technical University of Berlin¹, Fritz Haber Institute of the Max Planck Society², University of Luxembourg³

28 Jun 2018-Journal of Chemical Physics

TL;DR: SchNet as mentioned in this paper is a deep learning architecture specifically designed to model atomistic systems by making use of continuous-filter convolutional layers, where the model learns chemically plausible embeddings of atom types across the periodic table.

...read moreread less

Abstract: Deep learning has led to a paradigm shift in artificial intelligence, including web, text, and image search, speech recognition, as well as bioinformatics, with growing impact in chemical physics. Machine learning, in general, and deep learning, in particular, are ideally suitable for representing quantum-mechanical interactions, enabling us to model nonlinear potential-energy surfaces or enhancing the exploration of chemical compound space. Here we present the deep learning architecture SchNet that is specifically designed to model atomistic systems by making use of continuous-filter convolutional layers. We demonstrate the capabilities of SchNet by accurately predicting a range of properties across chemical space for molecules and materials, where our model learns chemically plausible embeddings of atom types across the periodic table. Finally, we employ SchNet to predict potential-energy surfaces and energy-conserving force fields for molecular dynamics simulations of small molecules and perform an exemplary study on the quantum-mechanical properties of C20-fullerene that would have been infeasible with regular ab initio molecular dynamics.

...read moreread less

1,104 citations

Journal Article•DOI•

Less is more: Sampling chemical space with active learning

[...]

Justin S. Smith¹, Benjamin Nebgen², Nicholas Lubbers², Olexandr Isayev³, Adrian E. Roitberg¹ - Show less +1 more•Institutions (3)

University of Florida¹, Los Alamos National Laboratory², University of North Carolina at Chapel Hill³

22 May 2018-Journal of Chemical Physics

TL;DR: Active learning via query by committee (AL-QBC) as discussed by the authors uses the disagreement between an ensemble of ML potentials to infer the reliability of the ensemble's prediction, which improves the overall fitness of ANAKIN-ME (ANI) deep learning potentials by mitigating human biases in deciding what new training data to use.

...read moreread less

Abstract: The development of accurate and transferable machine learning (ML) potentials for predicting molecular energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is based on the concept of active learning (AL) via Query by Committee (QBC), which uses the disagreement between an ensemble of ML potentials to infer the reliability of the ensemble's prediction. QBC allows the presented AL algorithm to automatically sample regions of chemical space where the ML potential fails to accurately predict the potential energy. AL improves the overall fitness of ANAKIN-ME (ANI) deep learning potentials in rigorous test cases by mitigating human biases in deciding what new training data to use. AL also reduces the training set size to a fraction of the data required when using naive random sampling techniques. To provide validation of our AL approach, we develop the COmprehensive Machine-learning Potential (COMP6) benchmark (publicly available on GitHub) which contains a diverse set of organic molecules. Active learning-based ANI potentials outperform the original random sampled ANI-1 potential with only 10% of the data, while the final active learning-based model vastly outperforms ANI-1 on the COMP6 benchmark after training to only 25% of the data. Finally, we show that our proposed AL technique develops a universal ANI potential (ANI-1x) that provides accurate energy and force predictions on the entire COMP6 benchmark. This universal ML potential achieves a level of accuracy on par with the best ML potentials for single molecules or materials, while remaining applicable to the general class of organic molecules composed of the elements CHNO.

...read moreread less

362 citations

Journal Article•DOI•

Communication: An improved linear scaling perturbative triples correction for the domain based local pair-natural orbital based singles and doubles coupled cluster method [DLPNO-CCSD(T)]

[...]

Yang Guo¹, Christoph Riplinger¹, Ute Becker¹, Dimitrios G. Liakos¹, Yury Minenkov², Luigi Cavallo², Frank Neese¹ - Show less +3 more•Institutions (2)

Max Planck Society¹, King Abdullah University of Science and Technology²

04 Jan 2018-Journal of Chemical Physics

TL;DR: An improved perturbative triples correction (T) algorithm for domain based local pair-natural orbital singles and doubles coupled cluster (DLPNO-CCSD) theory is reported, using triples natural orbitals to represent the virtual spaces for triples amplitudes, storage bottlenecks are avoided.

...read moreread less

Abstract: In this communication, an improved perturbative triples correction (T) algorithm for domain based local pair-natural orbital singles and doubles coupled cluster (DLPNO-CCSD) theory is reported. In our previous implementation, the semi-canonical approximation was used and linear scaling was achieved for both the DLPNO-CCSD and (T) parts of the calculation. In this work, we refer to this previous method as DLPNO-CCSD(T0) to emphasize the semi-canonical approximation. It is well-established that the DLPNO-CCSD method can predict very accurate absolute and relative energies with respect to the parent canonical CCSD method. However, the (T0) approximation may introduce significant errors in absolute energies as the triples correction grows up in magnitude. In the majority of cases, the relative energies from (T0) are as accurate as the canonical (T) results of themselves. Unfortunately, in rare cases and in particular for small gap systems, the (T0) approximation breaks down and relative energies show large deviations from the parent canonical CCSD(T) results. To address this problem, an iterative (T) algorithm based on the previous DLPNO-CCSD(T0) algorithm has been implemented [abbreviated here as DLPNO-CCSD(T)]. Using triples natural orbitals to represent the virtual spaces for triples amplitudes, storage bottlenecks are avoided. Various carefully designed approximations ease the computational burden such that overall, the increase in the DLPNO-(T) calculation time over DLPNO-(T0) only amounts to a factor of about two (depending on the basis set). Benchmark calculations for the GMTKN30 database show that compared to DLPNO-CCSD(T0), the errors in absolute energies are greatly reduced and relative energies are moderately improved. The particularly problematic case of cumulene chains of increasing lengths is also successfully addressed by DLPNO-CCSD(T).

...read moreread less

344 citations

Journal Article•DOI•

B97-3c: A revised low-cost variant of the B97-D density functional method.

[...]

Jan Gerit Brandenburg¹, Christoph Bannwarth², Andreas Hansen², Stefan Grimme²•Institutions (2)

University College London¹, University of Bonn²

09 Feb 2018-Journal of Chemical Physics

TL;DR: A revised version of the well-established B97-D density functional approximation with general applicability for chemical properties of large systems is proposed, based on Becke's power-series ansatz from 1997 and explicitly parametrized by including the standard D3 semi-classical dispersion correction.

...read moreread less

Abstract: A revised version of the well-established B97-D density functional approximation with general applicability for chemical properties of large systems is proposed Like B97-D, it is based on Becke’s power-series ansatz from 1997 and is explicitly parametrized by including the standard D3 semi-classical dispersion correction The orbitals are expanded in a modified valence triple-zeta Gaussian basis set, which is available for all elements up to Rn Remaining basis set errors are mostly absorbed in the modified B97 parametrization, while an established atom-pairwise short-range potential is applied to correct for the systematically too long bonds of main group elements which are typical for most semi-local density functionals The new composite scheme (termed B97-3c) completes the hierarchy of “low-cost” electronic structure methods, which are all mainly free of basis set superposition error and account for most interactions in a physically sound and asymptotically correct manner B97-3c yields excellent mol

...read moreread less

342 citations

Journal Article•DOI•

Alchemical and structural distribution based representation for universal quantum machine learning

[...]

Felix A. Faber¹, Anders S. Christensen¹, Bing Huang¹, O. Anatole von Lilienfeld¹•Institutions (1)

University of Basel¹

20 Mar 2018-Journal of Chemical Physics

TL;DR: A representation of any atom in any chemical environment for the automatized generation of universal kernel ridge regression-based quantum machine learning (QML) models of electronic properties, trained throughout chemical compound space is introduced.

...read moreread less

Abstract: We introduce a representation of any atom in any chemical environment for the automatized generation of universal kernel ridge regression-based quantum machine learning (QML) models of electronic properties, trained throughout chemical compound space. The representation is based on Gaussian distribution functions, scaled by power laws and explicitly accounting for structural as well as elemental degrees of freedom. The elemental components help us to lower the QML model’s learning curve, and, through interpolation across the periodic table, even enable “alchemical extrapolation” to covalent bonding between elements not part of training. This point is demonstrated for the prediction of covalent binding in single, double, and triple bonds among main-group elements as well as for atomization energies in organic molecules. We present numerical evidence that resulting QML energy models, after training on a few thousand random training instances, reach chemical accuracy for out-of-sample compounds. Compound dat...

...read moreread less

297 citations

Journal Article•DOI•

Time-lagged autoencoders: Deep learning of slow collective variables for molecular kinetics.

[...]

Christoph Wehmeyer¹, Frank Noé¹•Institutions (1)

Free University of Berlin¹

15 Mar 2018-Journal of Chemical Physics

TL;DR: It is shown that the time-lagged autoencoder reliably finds low-dimensional embeddings for high-dimensional feature spaces which capture the slow dynamics of the underlying stochastic processes-beyond the capabilities of linear dimension reduction techniques.

...read moreread less

Abstract: Inspired by the success of deep learning techniques in the physical and chemical sciences, we apply a modification of an autoencoder type deep neural network to the task of dimension reduction of molecular dynamics data. We can show that our time-lagged autoencoder reliably finds low-dimensional embeddings for high-dimensional feature spaces which capture the slow dynamics of the underlying stochastic processes—beyond the capabilities of linear dimension reduction techniques.

...read moreread less

295 citations

Journal Article•DOI•

Hierarchical modeling of molecular energies using a deep neural network

[...]

Nicholas Lubbers¹, Justin S. Smith², Justin S. Smith¹, Kipton Barros¹•Institutions (2)

Los Alamos National Laboratory¹, University of Florida²

19 Mar 2018-Journal of Chemical Physics

TL;DR: HIP-NN achieves the state-of-the-art performance on a dataset of 131k ground state organic molecules and predicts energies with 0.26 kcal/mol mean absolute error.

...read moreread less

Abstract: We introduce the Hierarchically Interacting Particle Neural Network (HIP-NN) to model molecular properties from datasets of quantum calculations. Inspired by a many-body expansion, HIP-NN decomposes properties, such as energy, as a sum over hierarchical terms. These terms are generated from a neural network—a composition of many nonlinear transformations—acting on a representation of the molecule. HIP-NN achieves the state-of-the-art performance on a dataset of 131k ground state organic molecules and predicts energies with 0.26 kcal/mol mean absolute error. With minimal tuning, our model is also competitive on a dataset of molecular dynamics trajectories. In addition to enabling accurate energy predictions, the hierarchical structure of HIP-NN helps to identify regions of model uncertainty.

...read moreread less

258 citations

Journal Article•DOI•

Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials.

[...]

Giulio Imbalzano¹, Andrea Anelli¹, Daniele Giofré¹, Sinja Klees², Jörg Behler², Michele Ceriotti¹ - Show less +2 more•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Ruhr University Bochum²

30 Apr 2018-Journal of Chemical Physics

TL;DR: Automatic protocols to select a number of fingerprints out of a large pool of candidates, based on the correlations that are intrinsic to the training data, can greatly simplify the construction of neural network potentials that strike the best balance between accuracy and computational efficiency.

...read moreread less

Abstract: Machine learning of atomic-scale properties is revolutionizing molecular modeling, making it possible to evaluate inter-atomic potentials with first-principles accuracy, at a fraction of the costs. The accuracy, speed, and reliability of machine learning potentials, however, depend strongly on the way atomic configurations are represented, i.e., the choice of descriptors used as input for the machine learning method. The raw Cartesian coordinates are typically transformed in "fingerprints," or "symmetry functions," that are designed to encode, in addition to the structure, important properties of the potential energy surface like its invariances with respect to rotation, translation, and permutation of like atoms. Here we discuss automatic protocols to select a number of fingerprints out of a large pool of candidates, based on the correlations that are intrinsic to the training data. This procedure can greatly simplify the construction of neural network potentials that strike the best balance between accuracy and computational efficiency and has the potential to accelerate by orders of magnitude the evaluation of Gaussian approximation potentials based on the smooth overlap of atomic positions kernel. We present applications to the construction of neural network potentials for water and for an Al-Mg-Si alloy and to the prediction of the formation energies of small organic molecules using Gaussian process regression.

...read moreread less

248 citations

Journal Article•DOI•

Reweighted autoencoded variational Bayes for enhanced sampling (RAVE).

[...]

João Marcelo Lamim Ribeiro¹, Pablo Bravo¹, Pablo Bravo², Yihang Wang¹, Pratyush Tiwary¹ - Show less +1 more•Institutions (2)

University of Maryland, College Park¹, Pontifical Catholic University of Chile²

04 May 2018-Journal of Chemical Physics

TL;DR: The usefulness and reliability of RAVE is demonstrated by applying it to model potentials of increasing complexity, including computation of the binding free energy profile for a hydrophobic ligand-substrate system in explicit water with dissociation time of more than 3 min in computer time at least twenty times less than that needed for umbrella sampling or metadynamics.

...read moreread less

Abstract: Here we propose the reweighted autoencoded variational Bayes for enhanced sampling (RAVE) method, a new iterative scheme that uses the deep learning framework of variational autoencoders to enhance sampling in molecular simulations. RAVE involves iterations between molecular simulations and deep learning in order to produce an increasingly accurate probability distribution along a low-dimensional latent space that captures the key features of the molecular simulation trajectory. Using the Kullback-Leibler divergence between this latent space distribution and the distribution of various trial reaction coordinates sampled from the molecular simulation, RAVE determines an optimum, yet nonetheless physically interpretable, reaction coordinate and optimum probability distribution. Both then directly serve as the biasing protocol for a new biased simulation, which is once again fed into the deep learning module with appropriate weights accounting for the bias, the procedure continuing until estimates of desirable thermodynamic observables are converged. Unlike recent methods using deep learning for enhanced sampling purposes, RAVE stands out in that (a) it naturally produces a physically interpretable reaction coordinate, (b) is independent of existing enhanced sampling protocols to enhance the fluctuations along the latent space identified via deep learning, and (c) it provides the ability to easily filter out spurious solutions learned by the deep learning procedure. The usefulness and reliability of RAVE is demonstrated by applying it to model potentials of increasing complexity, including computation of the binding free energy profile for a hydrophobic ligand-substrate system in explicit water with dissociation time of more than 3 min, in computer time at least twenty times less than that needed for umbrella sampling or metadynamics.

...read moreread less

225 citations

Journal Article•DOI•

wACSF-Weighted atom-centered symmetry functions as descriptors in machine learning potentials.

[...]

Michael Gastegger¹, Ludwig Schwiedrzik¹, Marius Bittermann¹, Florian Berzsenyi¹, Philipp Marquetand¹ - Show less +1 more•Institutions (1)

University of Vienna¹

15 Mar 2018-Journal of Chemical Physics

TL;DR: It is found that using a simple empirical parametrization scheme is sufficient in order to obtain HDNNPs with high accuracy for the wACSFs employed here, and the intrinsic parameters of the descriptors can in principle be optimized with a genetic algorithm in a highly automated manner.

...read moreread less

Abstract: We introduce weighted atom-centered symmetry functions (wACSFs) as descriptors of a chemical system’s geometry for use in the prediction of chemical properties such as enthalpies or potential energies via machine learning. The wACSFs are based on conventional atom-centered symmetry functions (ACSFs) but overcome the undesirable scaling of the latter with an increasing number of different elements in a chemical system. The performance of these two descriptors is compared using them as inputs in high-dimensional neural network potentials (HDNNPs), employing the molecular structures and associated enthalpies of the 133 855 molecules containing up to five different elements reported in the QM9 database as reference data. A substantially smaller number of wACSFs than ACSFs is needed to obtain a comparable spatial resolution of the molecular structures. At the same time, this smaller set of wACSFs leads to a significantly better generalization performance in the machine learning potential than the large set of conventional ACSFs. Furthermore, we show that the intrinsic parameters of the descriptors can in principle be optimized with a genetic algorithm in a highly automated manner. For the wACSFs employed here, we find however that using a simple empirical parametrization scheme is sufficient in order to obtain HDNNPs with high accuracy.

...read moreread less

196 citations

Journal Article•DOI•

Effects of ensembles, ligand, and strain on adsorbate binding to alloy surfaces.

[...]

Hao Li¹, Kihyun Shin¹, Graeme Henkelman¹•Institutions (1)

University of Texas at Austin¹

02 Nov 2018-Journal of Chemical Physics

TL;DR: D density functional theory is used to study the ensemble, ligand, and strain effects of close-packed surfaces alloyed by transition metals with a combination of strong and weak adsorption of H and O and finds that the tunability of adsorbate binding on random alloys is predominately described by the ensemble effect.

...read moreread less

Abstract: Alloying elements with strong and weak adsorption properties can produce a catalyst with optimally tuned adsorbate binding. A full understanding of this alloying effect, however, is not well-established. Here, we use density functional theory to study the ensemble, ligand, and strain effects of close-packed surfaces alloyed by transition metals with a combination of strong and weak adsorption of H and O. Specifically, we consider PdAu, RhAu, and PtAu bimetallics as ordered and randomly alloyed (111) surfaces, as well as randomly alloyed 140-atom clusters. In these alloys, Au is the weak-binding component and Pd, Rh, and Pt are characteristic strong-binding metals. In order to separate the different effects of alloying on binding, we calculate the tunability of H- and O-binding energies as a function of lattice constant (strain effect), number of alloy-substituted sublayers (ligand effect), and randomly alloyed geometries (ensemble effect). We find that on these alloyed surfaces, the ensemble effect more significantly tunes the adsorbate binding as compared to the ligand and strain effects, with the binding energies predominantly determined by the local adsorption environment provided by the specific triatomic ensemble on the (111) surface. However, we also find that tuning of adsorbate binding from the ligand and strain effects cannot be neglected in a quantitative description. Extending our studies to other bimetallics (PdAg, RhAg, PtAg, PdCu, RhCu, and PtCu), we find similar conclusions that the tunability of adsorbate binding on random alloys is predominately described by the ensemble effect.

...read moreread less

Journal Article•DOI•

Perspective: Theory of quantum transport in molecular junctions

[...]

Michael Thoss¹, Ferdinand Evers²•Institutions (2)

University of Freiburg¹, University of Regensburg²

16 Jan 2018-Journal of Chemical Physics

TL;DR: Recent progress in the theory and simulation of quantum transport in molecular junctions is discussed and challenges are identified, which appear crucial to achieve a comprehensive and quantitative understanding of transport in these systems.

...read moreread less

Abstract: Molecular junctions, where single molecules are bound to metal or semiconductor electrodes, represent a unique architecture to investigate molecules in a distinct nonequilibrium situation and, in a broader context, to study basic mechanisms of charge and energy transport in a many-body quantum system at the nanoscale. Experimental studies of molecular junctions have revealed a wealth of interesting transport phenomena, the understanding of which necessitates theoretical modeling. The accurate theoretical description of quantum transport in molecular junctions is challenging because it requires methods that are capable to describe the electronic structure and dynamics of molecules in a condensed phase environment out of equilibrium, in some cases with strong electron-electron and/or electronic-vibrational interaction. This perspective discusses recent progress in the theory and simulation of quantum transport in molecular junctions. Furthermore, challenges are identified, which appear crucial to achieve a comprehensive and quantitative understanding of transport in these systems.

...read moreread less

Journal Article•DOI•

Perspective: Excess-entropy scaling.

[...]

Jeppe C. Dyre¹•Institutions (1)

Roskilde University¹

05 Dec 2018-Journal of Chemical Physics

TL;DR: This article gives an overview of excess-entropy scaling, the 1977 discovery by Rosenfeld that entropy determines properties of liquids like viscosity, diffusion constant, and heat conductivity, and gives examples from computer simulations confirming this intriguing connection between dynamics and thermodynamics.

...read moreread less

Abstract: This article gives an overview of excess-entropy scaling, the 1977 discovery by Rosenfeld that entropy determines properties of liquids like viscosity, diffusion constant, and heat conductivity. We give examples from computer simulations confirming this intriguing connection between dynamics and thermodynamics, counterexamples, and experimental validations. Recent uses in application-related contexts are reviewed, and theories proposed for the origin of excess-entropy scaling are briefly summarized. It is shown that if two thermodynamic state points of a liquid have the same microscopic dynamics, they must have the same excess entropy. In this case, the potential-energy function exhibits a symmetry termed hidden scale invariance, stating that the ordering of the potential energies of configurations is maintained if these are scaled uniformly to a different density. This property leads to the isomorph theory, which provides a general framework for excess-entropy scaling and illuminates, in particular, why this does not apply rigorously and universally. It remains an open question whether all aspects of excess-entropy scaling and related regularities reflect hidden scale invariance in one form or other.

...read moreread less

Journal Article•DOI•

Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning

[...]

Tristan Bereau¹, Robert A. DiStasio², Alexandre Tkatchenko³, O. Anatole von Lilienfeld⁴•Institutions (4)

Max Planck Society¹, Cornell University², University of Luxembourg³, University of Basel⁴

15 Mar 2018-Journal of Chemical Physics

TL;DR: In this article, a combination of physics-based potentials with machine learning (ML) is proposed to handle new molecules and conformations without explicit prior parametrization, which is transferable across small neutral organic and biologically relevant molecules.

...read moreread less

Abstract: Classical intermolecular potentials typically require an extensive parametrization procedure for any new compound considered. To do away with prior parametrization, we propose a combination of physics-based potentials with machine learning (ML), coined IPML, which is transferable across small neutral organic and biologically relevant molecules. ML models provide on-the-fly predictions for environment-dependent local atomic properties: electrostatic multipole coefficients (significant error reduction compared to previously reported), the population and decay rate of valence atomic densities, and polarizabilities across conformations and chemical compositions of H, C, N, and O atoms. These parameters enable accurate calculations of intermolecular contributions—electrostatics, charge penetration, repulsion, induction/polarization, and many-body dispersion. Unlike other potentials, this model is transferable in its ability to handle new molecules and conformations without explicit prior parametrization: All l...

...read moreread less

Journal Article•DOI•

Neural networks vs Gaussian process regression for representing potential energy surfaces: A comparative study of fit quality and vibrational spectrum accuracy

[...]

Aditya Kamath¹, Rodrigo A. Vargas-Hernández², Roman V. Krems², Tucker Carrington³, Sergei Manzhos¹ - Show less +1 more•Institutions (3)

National University of Singapore¹, University of British Columbia², Queen's University³

15 Mar 2018-Journal of Chemical Physics

TL;DR: This paper re-fit an accurate PES of formaldehyde and compares PES errors on the entire point set used to solve the vibrational Schrödinger equation, i.e., the only error that matters in quantum dynamics calculations.

...read moreread less

Abstract: For molecules with more than three atoms, it is difficult to fit or interpolate a potential energy surface (PES) from a small number of (usually ab initio) energies at points. Many methods have been proposed in recent decades, each claiming a set of advantages. Unfortunately, there are few comparative studies. In this paper, we compare neural networks (NNs) with Gaussian process (GP) regression. We re-fit an accurate PES of formaldehyde and compare PES errors on the entire point set used to solve the vibrational Schrodinger equation, i.e., the only error that matters in quantum dynamics calculations. We also compare the vibrational spectra computed on the underlying reference PES and the NN and GP potential surfaces. The NN and GP surfaces are constructed with exactly the same points, and the corresponding spectra are computed with the same points and the same basis. The GP fitting error is lower, and the GP spectrum is more accurate. The best NN fits to 625/1250/2500 symmetry unique potential energy poin...

...read moreread less

Journal Article•DOI•

DeePCG: Constructing coarse-grained models via deep neural networks

[...]

Linfeng Zhang¹, Jiequn Han¹, Han Wang, Roberto Car¹, Weinan E¹ - Show less +1 more•Institutions (1)

Princeton University¹

16 Jul 2018-Journal of Chemical Physics

TL;DR: In this article, the Deep Coarse-Grained Potential (abbreviated DeePCG) model was proposed to construct a many-body coarse-grained potential.

...read moreread less

Abstract: We introduce a general framework for constructing coarse-grained potential models without ad hoc approximations such as limiting the potential to two- and/or three-body contributions. The scheme, called the Deep Coarse-Grained Potential (abbreviated DeePCG), exploits a carefully crafted neural network to construct a many-body coarse-grained potential. The network is trained with full atomistic data in a way that preserves the natural symmetries of the system. The resulting model is very accurate and can be used to sample the configurations of the coarse-grained variables in a much faster way than with the original atomistic model. As an application, we consider liquid water and use the oxygen coordinates as the coarse-grained variables, starting from a full atomistic simulation of this system at the ab initio molecular dynamics level. We find that the two-body, three-body, and higher-order oxygen correlation functions produced by the coarse-grained and full atomistic models agree very well with each other, illustrating the effectiveness of the DeePCG model on a rather challenging task.

...read moreread less

Journal Article•DOI•

Extending the accuracy of the SNAP interatomic potential form.

[...]

Mitchell Wood¹, Aidan P. Thompson¹•Institutions (1)

Sandia National Laboratories¹

29 Mar 2018-Journal of Chemical Physics

TL;DR: An extension of the SNAP form that includes quadratic terms in the bispectrum components is proposed that is shown to provide a large increase in accuracy relative to the linear form, while incurring only a modest increase in computational cost.

...read moreread less

Abstract: The Spectral Neighbor Analysis Potential (SNAP) is a classical interatomic potential that expresses the energy of each atom as a linear function of selected bispectrum components of the neighbor atoms. An extension of the SNAP form is proposed that includes quadratic terms in the bispectrum components. The extension is shown to provide a large increase in accuracy relative to the linear form, while incurring only a modest increase in computational cost. The mathematical structure of the quadratic SNAP form is similar to the embedded atom method (EAM), with the SNAP bispectrum components serving as counterparts to the two-body density functions in EAM. The effectiveness of the new form is demonstrated using an extensive set of training data for tantalum structures. Similar to artificial neural network potentials, the quadratic SNAP form requires substantially more training data in order to prevent overfitting. The quality of this new potential form is measured through a robust cross-validation analysis.

...read moreread less

Journal Article•DOI•

Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions

[...]

Thuong T. Nguyen¹, Eszter Székely², Giulio Imbalzano³, Jörg Behler⁴, Gábor Csányi², Michele Ceriotti³, Andreas W. Götz¹, Francesco Paesani¹ - Show less +4 more•Institutions (4)

University of California, San Diego¹, University of Cambridge², École Polytechnique Fédérale de Lausanne³, University of Göttingen⁴

09 Apr 2018-Journal of Chemical Physics

TL;DR: In this article, the performance of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials (GAPs) in representing water two-body and three-body interaction energies was investigated.

...read moreread less

Abstract: The accurate representation of multidimensional potential energy surfaces is a necessary requirement for realistic computer simulations of molecular systems. The continued increase in computer power accompanied by advances in correlated electronic structure methods nowadays enables routine calculations of accurate interaction energies for small systems, which can then be used as references for the development of analytical potential energy functions (PEFs) rigorously derived from many-body (MB) expansions. Building on the accuracy of the MB-pol many-body PEF, we investigate here the performance of permutationally invariant polynomials (PIPs), neural networks, and Gaussian approximation potentials (GAPs) in representing water two-body and three-body interaction energies, denoting the resulting potentials PIP-MB-pol, Behler-Parrinello neural network-MB-pol, and GAP-MB-pol, respectively. Our analysis shows that all three analytical representations exhibit similar levels of accuracy in reproducing both two-body and three-body reference data as well as interaction energies of small water clusters obtained from calculations carried out at the coupled cluster level of theory, the current gold standard for chemical accuracy. These results demonstrate the synergy between interatomic potentials formulated in terms of a many-body expansion, such as MB-pol, that are physically sound and transferable, and machine-learning techniques that provide a flexible framework to approximate the short-range interaction energy terms.

...read moreread less

Journal Article•DOI•

Survival of the most transferable at the top of Jacob’s ladder: Defining and testing the ωB97M(2) double hybrid density functional

[...]

Narbe Mardirossian¹, Martin Head-Gordon¹•Institutions (1)

University of California, Berkeley¹

07 Jun 2018-Journal of Chemical Physics

TL;DR: The results suggest that ωB97M(2) has the potential to serve as a powerful predictive tool for accurate and efficient electronic structure calculations of main-group chemistry.

...read moreread less

Abstract: A meta-generalized gradient approximation, range-separated double hybrid (DH) density functional with VV10 non-local correlation is presented. The final 14-parameter functional form is determined by screening trillions of candidate fits through a combination of best subset selection, forward stepwise selection, and random sample consensus (RANSAC) outlier detection. The MGCDB84 database of 4986 data points is employed in this work, containing a training set of 870 data points, a validation set of 2964 data points, and a test set of 1152 data points. Following an xDH approach, orbitals from the ωB97M-V density functional are used to compute the second-order perturbation theory correction. The resulting functional, ωB97M(2), is benchmarked against a variety of leading double hybrid density functionals, including B2PLYP-D3(BJ), B2GPPLYP-D3(BJ), ωB97X-2(TQZ), XYG3, PTPSS-D3(0), XYGJ-OS, DSD-PBEP86-D3(BJ), and DSD-PBEPBE-D3(BJ). Encouragingly, the overall performance of ωB97M(2) on nearly 5000 data points clearly surpasses that of all of the tested density functionals. As a Rung 5 density functional, ωB97M(2) completes our family of combinatorially optimized functionals, complementing B97M-V on Rung 3, and ωB97X-V and ωB97M-V on Rung 4. The results suggest that ωB97M(2) has the potential to serve as a powerful predictive tool for accurate and efficient electronic structure calculations of main-group chemistry.

...read moreread less

Journal Article•DOI•

Sparse learning of stochastic dynamical equations.

[...]

Lorenzo Boninsegna¹, Feliks Nüske¹, Cecilia Clementi¹•Institutions (1)

Rice University¹

30 Mar 2018-Journal of Chemical Physics

TL;DR: In this paper, the authors extend SINDy to stochastic dynamical systems which are frequently used to model biophysical processes and prove the asymptotic correctness of SINDY in the infinite data limit.

...read moreread less

Abstract: With the rapid increase of available data for complex systems, there is great interest in the extraction of physically relevant information from massive datasets. Recently, a framework called Sparse Identification of Nonlinear Dynamics (SINDy) has been introduced to identify the governing equations of dynamical systems from simulation data. In this study, we extend SINDy to stochastic dynamical systems which are frequently used to model biophysical processes. We prove the asymptotic correctness of stochastic SINDy in the infinite data limit, both in the original and projected variables. We discuss algorithms to solve the sparse regression problem arising from the practical implementation of SINDy and show that cross validation is an essential tool to determine the right level of sparsity. We demonstrate the proposed methodology on two test systems, namely, the diffusion in a one-dimensional potential and the projected dynamics of a two-dimensional diffusion process.

...read moreread less

Journal Article•DOI•

Constructing first-principles phase diagrams of amorphous LixSi using machine-learning-assisted sampling with an evolutionary algorithm.

[...]

Nongnuch Artrith¹, Alexander Urban¹, Gerbrand Ceder¹•Institutions (1)

Lawrence Berkeley National Laboratory¹

16 Mar 2018-Journal of Chemical Physics

TL;DR: This work proposes a methodology to speed up the sampling of amorphous and disordered materials using a combination of a genetic algorithm and a specialized machine-learning potential based on artificial neural networks (ANNs).

...read moreread less

Abstract: The atomistic modeling of amorphous materials requires structure sizes and sampling statistics that are challenging to achieve with first-principles methods. Here, we propose a methodology to speed up the sampling of amorphous and disordered materials using a combination of a genetic algorithm and a specialized machine-learning potential based on artificial neural networks (ANNs). We show for the example of the amorphous LiSi alloy that around 1000 first-principles calculations are sufficient for the ANN-potential assisted sampling of low-energy atomic configurations in the entire amorphous LixSi phase space. The obtained phase diagram is validated by comparison with the results from an extensive sampling of LixSi configurations using molecular dynamics simulations and a general ANN potential trained to ∼45 000 first-principles calculations. This demonstrates the utility of the approach for the first-principles modeling of amorphous materials.

...read moreread less

Journal Article•DOI•

Machine learning of molecular properties: Locality and active learning

[...]

Konstantin Gubaev¹, Evgeny V. Podryabinkin¹, Alexander V. Shapeev¹•Institutions (1)

Skolkovo Institute of Science and Technology¹

19 Apr 2018-Journal of Chemical Physics

TL;DR: In this paper, a local model of interatomic interactions is proposed for predicting molecular properties, which provides high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that maximizes the expected performance.

...read moreread less

Abstract: In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that sig...

...read moreread less

Journal Article•DOI•

The electric double layer at metal-water interfaces revisited based on a charge polarization scheme.

[...]

Sung Sakong¹, Axel Groß¹•Institutions (1)

University of Ulm¹

28 Aug 2018-Journal of Chemical Physics

TL;DR: This work analyzes the electric double layer in an approach beyond the point charge scheme by instead assessing charge polarizations at electrochemical metal-water interfaces from first principles and derives the electrode potential from the charge polarization.

...read moreread less

Abstract: The description of electrode-electrolyte interfaces is based on the concept of the formation of an electric double layer. This concept was derived from continuum theories extended by introducing point charge distributions. Based on ab initio molecular dynamics simulations, we analyze the electric double layer in an approach beyond the point charge scheme by instead assessing charge polarizations at electrochemical metal-water interfaces from first principles. We show that the atomic structure of water layers at room temperature leads to an oscillatory behavior of the averaged electrostatic potential. We address the relation between the polarization distribution at the interface and the extent of the electric double layer and subsequently derive the electrode potential from the charge polarization.

...read moreread less

Journal Article•DOI•

Fast semistochastic heat-bath configuration interaction

[...]

Junhao Li¹, Matthew Otten¹, Adam A. Holmes¹, Sandeep Sharma², Cyrus Umrigar¹ - Show less +1 more•Institutions (2)

Cornell University¹, University of Colorado Boulder²

06 Dec 2018-Journal of Chemical Physics

TL;DR: In this article, a fast semistochastic heat-bath configuration interaction (SHCI) method for solving the many-body Schrodinger equation is presented, which identifies and eliminates computational bottlenecks in both the variational and perturbative steps.

...read moreread less

Abstract: This paper presents in detail our fast semistochastic heat-bath configuration interaction (SHCI) method for solving the many-body Schrodinger equation. We identify and eliminate computational bottlenecks in both the variational and perturbative steps of the SHCI algorithm. We also describe the parallelization and the key data structures in our implementation, such as the distributed hash table. The improved SHCI algorithm enables us to include in our variational wavefunction two orders of magnitude more determinants than has been reported previously with other selected configuration interaction methods. We use our algorithm to calculate an accurate benchmark energy for the chromium dimer with the X2C relativistic Hamiltonian in the cc-pVDZ-DK basis, correlating 28 electrons in 76 spatial orbitals. Our largest calculation uses two billion Slater determinants in the variational space and semistochastically includes perturbative contributions from at least trillions of additional determinants with better than 10-5 Ha statistical uncertainty.

...read moreread less

Journal Article•DOI•

Collective variable discovery and enhanced sampling using autoencoders: Innovations in network architecture and error function design.

[...]

Wei Chen¹, Aik Rui Tan¹, Andrew L. Ferguson¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

21 Aug 2018-Journal of Chemical Physics

TL;DR: A number of sophistications of the neural network architectures are described to improve and generalize the process of interleaved collective variable discovery and enhanced sampling and to support bespoke error functions for network training to incorporate prior knowledge.

...read moreread less

Abstract: Auto-associative neural networks ("autoencoders") present a powerful nonlinear dimensionality reduction technique to mine data-driven collective variables from molecular simulation trajectories. This technique furnishes explicit and differentiable expressions for the nonlinear collective variables, making it ideally suited for integration with enhanced sampling techniques for accelerated exploration of configurational space. In this work, we describe a number of sophistications of the neural network architectures to improve and generalize the process of interleaved collective variable discovery and enhanced sampling. We employ circular network nodes to accommodate periodicities in the collective variables, hierarchical network architectures to rank-order the collective variables, and generalized encoder-decoder architectures to support bespoke error functions for network training to incorporate prior knowledge. We demonstrate our approach in blind collective variable discovery and enhanced sampling of the configurational free energy landscapes of alanine dipeptide and Trp-cage using an open-source plugin developed for the OpenMM molecular simulation package.

...read moreread less

Journal Article•DOI•

Automated design of collective variables using supervised machine learning.

[...]

Mohammad M. Sultan¹, Vijay S. Pande¹•Institutions (1)

Stanford University¹

07 Sep 2018-Journal of Chemical Physics

TL;DR: This work shows how the decision functions in SML algorithms can be used as initial CVs (SMLcv ) for accelerated sampling, and illustrates how the distance to the support vector machines' decision hyperplane, the output probability estimates from logistic regression, the outputs from shallow or deep neural network classifiers, and other classifiers may be used to reversibly sample slow structural transitions.

...read moreread less

Abstract: Selection of appropriate collective variables (CVs) for enhancing sampling of molecular simulations remains an unsolved problem in computational modeling. In particular, picking initial CVs is particularly challenging in higher dimensions. Which atomic coordinates or transforms there of from a list of thousands should one pick for enhanced sampling runs? How does a modeler even begin to pick starting coordinates for investigation? This remains true even in the case of simple two state systems and only increases in difficulty for multi-state systems. In this work, we solve the “initial” CV problem using a data-driven approach inspired by the field of supervised machine learning (SML). In particular, we show how the decision functions in SML algorithms can be used as initial CVs (SMLcv) for accelerated sampling. Using solvated alanine dipeptide and Chignolin mini-protein as our test cases, we illustrate how the distance to the support vector machines’ decision hyperplane, the output probability estimates from logistic regression, the outputs from shallow or deep neural network classifiers, and other classifiers may be used to reversibly sample slow structural transitions. We discuss the utility of other SML algorithms that might be useful for identifying CVs for accelerating molecular simulations.

...read moreread less

Journal Article•DOI•

Constant size descriptors for accurate machine learning models of molecular properties.

[...]

Christopher R. Collins¹, Geoffrey J. Gordon¹, O. Anatole von Lilienfeld², David Yaron¹•Institutions (2)

Carnegie Mellon University¹, University of Basel²

27 Mar 2018-Journal of Chemical Physics

TL;DR: Two different classes of molecular representations for use in machine learning of thermodynamic and electronic properties are studied, including the Coulomb matrix and Bag of Bonds, and Encoded Bonds, which encode such lists into a feature vector whose length is independent of molecular size.

...read moreread less

Abstract: Two different classes of molecular representations for use in machine learning of thermodynamic and electronic properties are studied. The representations are evaluated by monitoring the performance of linear and kernel ridge regression models on well-studied data sets of small organic molecules. One class of representations studied here counts the occurrence of bonding patterns in the molecule. These require only the connectivity of atoms in the molecule as may be obtained from a line diagram or a SMILES string. The second class utilizes the three-dimensional structure of the molecule. These include the Coulomb matrix and Bag of Bonds, which list the inter-atomic distances present in the molecule, and Encoded Bonds, which encode such lists into a feature vector whose length is independent of molecular size. Encoded Bonds' features introduced here have the advantage of leading to models that may be trained on smaller molecules and then used successfully on larger molecules. A wide range of feature sets are constructed by selecting, at each rank, either a graph or geometry-based feature. Here, rank refers to the number of atoms involved in the feature, e.g., atom counts are rank 1, while Encoded Bonds are rank 2. For atomization energies in the QM7 data set, the best graph-based feature set gives a mean absolute error of 3.4 kcal/mol. Inclusion of 3D geometry substantially enhances the performance, with Encoded Bonds giving 2.4 kcal/mol, when used alone, and 1.19 kcal/mol, when combined with graph features.

...read moreread less

Journal Article•DOI•

Machine learning-based screening of complex molecules for polymer solar cells.

[...]

Peter Bjørn Jørgensen¹, Murat Mesta¹, Suranjan Shil¹, Juan Maria García Lastra¹, Karsten Wedel Jacobsen¹, Kristian Sommer Thygesen¹, Mikkel N. Schmidt¹ - Show less +3 more•Institutions (1)

Technical University of Denmark¹

06 Jun 2018-Journal of Chemical Physics

TL;DR: A screening procedure using a simple string representation for a promising class of donor-acceptor polymers in conjunction with a grammar variational autoencoder is proposed which increases the chance of finding suitable polymers by more than a factor of five in comparison to the randomised search used in gathering the training set.

...read moreread less

Abstract: Polymer solar cells admit numerous potential advantages including low energy payback time and scalable high-speed manufacturing, but the power conversion efficiency is currently lower than for their inorganic counterparts. In a Phenyl-C_61-Butyric-Acid-Methyl-Ester (PCBM)-based blended polymer solar cell, the optical gap of the polymer and the energetic alignment of the lowest unoccupied molecular orbital (LUMO) of the polymer and the PCBM are crucial for the device efficiency. Searching for new and better materials for polymer solar cells is a computationally costly affair using density functional theory (DFT) calculations. In this work, we propose a screening procedure using a simple string representation for a promising class of donor-acceptor polymers in conjunction with a grammar variational autoencoder. The model is trained on a dataset of 3989 monomers obtained from DFT calculations and is able to predict LUMO and the lowest optical transition energy for unseen molecules with mean absolute errors of 43 and 74 meV, respectively, without knowledge of the atomic positions. We demonstrate the merit of the model for generating new molecules with the desired LUMO and optical gap energies which increases the chance of finding suitable polymers by more than a factor of five in comparison to the randomised search used in gathering the training set.

...read moreread less

Journal Article•DOI•

Perspective: Multireference coupled cluster theories of dynamical electron correlation.

[...]

Francesco A. Evangelista¹•Institutions (1)

Emory University¹

17 Jul 2018-Journal of Chemical Physics

TL;DR: This perspective discusses the recent progress and current challenges in multireference wave function methods for dynamical electron correlation, focusing on systematically improvable methods that go beyond the limitations of configuration interaction and perturbation theory.

...read moreread less

Abstract: Predicting the electronic structure and properties of molecular systems that display strong electron correlation effects continues to remain a fundamental theoretical challenge. This perspective discusses the recent progress and current challenges in multireference wave function methods for dynamical electron correlation, focusing on systematically improvable methods that go beyond the limitations of configuration interaction and perturbation theory.

...read moreread less

Journal Article•DOI•

Selected configuration interaction dressed by perturbation.

[...]

Yann Garniron¹, Anthony Scemama¹, Emmanuel Giner², Michel Caffarel¹, Pierre-François Loos¹ - Show less +1 more•Institutions (2)

University of Toulouse¹, Pierre-and-Marie-Curie University²

14 Aug 2018-Journal of Chemical Physics

TL;DR: The present method revises the reference (internal) space under the effect of its interaction with the outer space via the construction of an effective Hamiltonian, following the shifted-Bk philosophy of Davidson and co-workers.

...read moreread less

Abstract: Selected configuration interaction (sCI) methods including second-order perturbative corrections provide near full CI (FCI) quality energies with only a small fraction of the determinants of the FCI space. Here, we introduce both a state-specific and a multi-state sCI method based on the configuration interaction using a perturbative selection made iteratively (CIPSI) algorithm. The present method revises the reference (internal) space under the effect of its interaction with the outer space via the construction of an effective Hamiltonian, following the shifted-Bk philosophy of Davidson and co-workers. In particular, the multi-state algorithm removes the storage bottleneck of the effective Hamiltonian via a low-rank factorization of the dressing matrix. Illustrative examples are reported for the state-specific and multi-state versions.

...read moreread less

Collapse