scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Chemical Theory and Computation in 2021"


Journal ArticleDOI
TL;DR: In this paper, the OPLS4 force field was developed and validated on a drug-like chemical space that includes molecular ions and sulfur-containing moieties, and a novel parametrization strategy for charged species, which can be extended to other systems.
Abstract: We report on the development and validation of the OPLS4 force field. OPLS4 builds upon our previous work with OPLS3e to improve model accuracy on challenging regimes of drug-like chemical space that includes molecular ions and sulfur-containing moieties. A novel parametrization strategy for charged species, which can be extended to other systems, is introduced. OPLS4 leads to improved accuracy on benchmarks that assess small-molecule solvation and protein-ligand binding.

334 citations


Journal ArticleDOI
TL;DR: Gmx_MMPBSA as discussed by the authors is a new tool to perform end-state free energy calculations from GROMACS molecular dynamics trajectories, which provides the user with several options, including bounding free energy calculation with different solvation models (PB, GB, or 3D-RISM), stability calculations, computational alanine scanning, entropy corrections, and binding free energy decomposition.
Abstract: Molecular mechanics/Poisson-Boltzmann (Generalized-Born) surface area is one of the most popular methods to estimate binding free energies. This method has been proven to balance accuracy and computational efficiency, especially when dealing with large systems. As a result of its popularity, several programs have been developed for performing MM/PB(GB)SA calculations within the GROMACS community. These programs, however, present several limitations. Here we present gmx_MMPBSA, a new tool to perform end-state free energy calculations from GROMACS molecular dynamics trajectories. gmx_MMPBSA provides the user with several options, including binding free energy calculations with different solvation models (PB, GB, or 3D-RISM), stability calculations, computational alanine scanning, entropy corrections, and binding free energy decomposition. Noteworthy, several promising methodologies to calculate relative binding free energies such as alanine scanning with variable dielectric constant and interaction entropy have also been implemented in gmx_MMPBSA. Two additional tools-gmx_MMPBSA_test and gmx_MMPBSA_ana-have been integrated within gmx_MMPBSA to improve its usability. Multiple illustrating examples can be accessed through gmx_MMPBSA_test, while gmx_MMPBSA_ana provides fast, easy, and efficient access to different graphics plotted from gmx_MMPBSA output files. The latest version (v1.4.3, 26/05/2021) is available free of charge (documentation, test files, and tutorials included) at https://github.com/Valdes-Tresanco-MS/gmx_MMPBSA.

299 citations


Journal ArticleDOI
TL;DR: A general strategy to fine-tune the potential energy function for molecular dynamics simulations of biopolymer phase separation is developed and illustrated by simulating liquid droplet formation of the FUS low-complexity domain (LCD) with a rebalanced MARTINI model.
Abstract: Disordered proteins and nucleic acids can condense into droplets that resemble the membraneless organelles observed in living cells. MD simulations offer a unique tool to characterize the molecular interactions governing the formation of these biomolecular condensates, their physicochemical properties, and the factors controlling their composition and size. However, biopolymer condensation depends sensitively on the balance between different energetic and entropic contributions. Here, we develop a general strategy to fine-tune the potential energy function for molecular dynamics simulations of biopolymer phase separation. We rebalance protein-protein interactions against solvation and entropic contributions to match the excess free energy of transferring proteins between dilute solution and condensate. We illustrate this formalism by simulating liquid droplet formation of the FUS low-complexity domain (LCD) with a rebalanced MARTINI model. By scaling the strength of the nonbonded interactions in the coarse-grained MARTINI potential energy function, we map out a phase diagram in the plane of protein concentration and interaction strength. Above a critical scaling factor of αc ≈ 0.6, FUS-LCD condensation is observed, where α = 1 and 0 correspond to full and repulsive interactions in the MARTINI model. For a scaling factor α = 0.65, we recover experimental densities of the dilute and dense phases, and thus the excess protein transfer free energy into the droplet and the saturation concentration where FUS-LCD condenses. In the region of phase separation, we simulate FUS-LCD droplets of four different sizes in stable equilibrium with the dilute phase and slabs of condensed FUS-LCD for tens of microseconds, and over one millisecond in aggregate. We determine surface tensions in the range of 0.01-0.4 mN/m from the fluctuations of the droplet shape and from the capillary-wave-like broadening of the interface between the two phases. From the dynamics of the protein end-to-end distance, we estimate shear viscosities from 0.001 to 0.02 Pa s for the FUS-LCD droplets with scaling factors α in the range of 0.625-0.75, where we observe liquid droplets. Significant hydration of the interior of the droplets keeps the proteins mobile and the droplets fluid.

90 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a robust and efficient method to implicitly account for solvation effects in modern semi-empirical quantum mechanics and force fields, based on analytical linearized Poisson-Boltzmann (ALPB) model.
Abstract: We present a robust and efficient method to implicitly account for solvation effects in modern semiempirical quantum mechanics and force fields. A computationally efficient yet accurate solvation model based on the analytical linearized Poisson-Boltzmann (ALPB) model is parameterized for the extended tight binding (xTB) and density functional tight binding (DFTB) methods as well as for the recently proposed GFN-FF general force field. The proposed methods perform well over a broad range of systems and applications, from conformational energies over transition-metal complexes to large supramolecular association reactions of charged species. For hydration free energies of small molecules, GFN1-xTB(ALPB) is reaching the accuracy of sophisticated explicitly solvated approaches, with a mean absolute deviation of only 1.4 kcal/mol compared to the experiment. Logarithmic octanol-water partition coefficients (log Kow) are computed with a mean absolute deviation of about 0.65 using GFN2-xTB(ALPB) compared to experimental values indicating a consistent description of differential solvent effects. Overall, more than twenty solvents for each of the six semiempirical methods are parameterized and tested. They are readily available in the xtb and dftb+ programs for diverse computational applications.

87 citations


Journal ArticleDOI
TL;DR: In this paper, an OpenCL implementation of AutoDock4 is presented, which leverages the highly parallel architecture of GPU hardware to reduce docking runtime by up to 350-fold with respect to a single-threaded process.
Abstract: AutoDock4 is a widely used program for docking small molecules to macromolecular targets. It describes ligand-receptor interactions using a physics-inspired scoring function that has been proven useful in a variety of drug discovery projects. However, compared to more modern and recent software, AutoDock4 has longer execution times, limiting its applicability to large scale dockings. To address this problem, we describe an OpenCL implementation of AutoDock4, called AutoDock-GPU, that leverages the highly parallel architecture of GPU hardware to reduce docking runtime by up to 350-fold with respect to a single-threaded process. Moreover, we introduce the gradient-based local search method ADADELTA, as well as an improved version of the Solis-Wets random optimizer from AutoDock4. These efficient local search algorithms significantly reduce the number of calls to the scoring function that are needed to produce good results. The improvements reported here, both in terms of docking throughput and search efficiency, facilitate the use of the AutoDock4 scoring function in large scale virtual screening.

77 citations


Journal ArticleDOI
TL;DR: In this paper, the climbing image nudged elastic band method (CI-NEB) is used to identify reaction coordinates and to find saddle points representing transition states of reactions, which can make efficient use of parallel computing as the calculations of the discretization points, the so-called images can be carried out simultaneously.
Abstract: The climbing image nudged elastic band method (CI-NEB) is used to identify reaction coordinates and to find saddle points representing transition states of reactions. It can make efficient use of parallel computing as the calculations of the discretization points, the so-called images, can be carried out simultaneously. In typical implementations, the images are distributed evenly along the path by connecting adjacent images with equally stiff springs. However, for systems with a high degree of flexibility, this can lead to poor resolution near the saddle point. By making the spring constants increase with energy, the resolution near the saddle point is improved. To assess the performance of this energy-weighted CI-NEB method, calculations are carried out for a benchmark set of 121 molecular reactions. The performance of the method is analyzed with respect to the input parameters. Energy-weighted springs are found to greatly improve performance and result in successful location of the saddle points in less than a thousand energy and force evaluations on average (about a hundred per image) using the same set of parameter values for all of the reactions. Even better performance is obtained by stopping the calculation before full convergence and complete the saddle point search using an eigenvector following method starting from the location of the climbing image. This combination of methods, referred to as NEB-TS, turns out to be robust and highly efficient as it reduces the average number of energy and force evaluations down to a third, to 305. An efficient and flexible implementation of these methods has been made available in the ORCA software.

72 citations


Journal ArticleDOI
TL;DR: Both the accuracy and consistency obtained with the second-order wave function approaches, ADC(2) and CC2, do not clearly outperform those of TD-DFT, hinting that assessing the accuracy of the latter (or selecting a specific functional) on the basis of the results of the former is not systematically a well-settled strategy.
Abstract: Using a set of oscillator strengths and excited-state dipole moments of near full configuration interaction quality determined for small compounds, we benchmark the performances of several single-reference wave function methods [CC2, CCSD, CC3, CCSDT, ADC(2), and ADC(3/2)] and time-dependent density-functional theory (TD-DFT) with various functionals (B3LYP, PBE0, M06-2X, CAM-B3LYP, and ωB97X-D). We consider the impact of various gauges (length, velocity, and mixed) and formalisms: equation of motion versus linear response, relaxed versus unrelaxed orbitals, and so forth. Beyond the expected accuracy improvements and a neat decrease of formalism sensitivity when using higher-order wave function methods, the present contribution shows that, for both ADC(2) and CC2, the choice of gauge impacts more significantly the magnitude of the oscillator strengths than the choice of formalism and that CCSD yields a notable improvement on this transition property as compared to CC2. For the excited-state dipole moments, switching on orbital relaxation appreciably improves the accuracy of both ADC(2) and CC2 but has a rather small effect at the CCSD level. Going from ground to excited states, the typical errors on dipole moments for a given method tend to roughly triple. Interestingly, the ADC(3/2) oscillator strengths and dipoles are significantly more accurate than their ADC(2) counterparts, whereas the two models do deliver rather similar absolute errors for transition energies. Concerning TD-DFT, one finds: (i) a rather negligible impact of the gauge on oscillator strengths for all tested functionals (except for M06-2X); (ii) deviations of ca. 0.10 D on ground-state dipoles for all functionals; (iii) strong differences between excited-state dipoles obtained with, on the one hand, B3LYP and PBE0 and, on the other hand, M06-2X, CAM-B3LYP, and ωB97X-D, the latter group being markedly more accurate with the selected basis set; and (iv) the better overall performance of CAM-B3LYP for the two considered excited-state properties. Finally, for all investigated properties, both the accuracy and consistency obtained with the second-order wave function approaches, ADC(2) and CC2, do not clearly outperform those of TD-DFT, hinting that assessing the accuracy of the latter (or selecting a specific functional) on the basis of the results of the former is not systematically a well-settled strategy.

71 citations


Journal ArticleDOI
TL;DR: TorchMD as discussed by the authors is a framework for molecular simulations with mixed classical and machine learning potentials, including all force computations including bond, angle, dihedral, Lennard-Jones, and Coulomb interactions.
Abstract: Molecular dynamics simulations provide a mechanistic description of molecules by relying on empirical potentials. The quality and transferability of such potentials can be improved leveraging data-driven models derived with machine learning approaches. Here, we present TorchMD, a framework for molecular simulations with mixed classical and machine learning potentials. All force computations including bond, angle, dihedral, Lennard-Jones, and Coulomb interactions are expressed as PyTorch arrays and operations. Moreover, TorchMD enables learning and simulating neural network potentials. We validate it using standard Amber all-atom simulations, learning an ab initio potential, performing an end-to-end training, and finally learning and simulating a coarse-grained model for protein folding. We believe that TorchMD provides a useful tool set to support molecular simulations of machine learning potentials. Code and data are freely available at github.com/torchmd.

66 citations


Journal ArticleDOI
TL;DR: A Δ-learning scheme is extended, where the ML model learns the difference between a reference method and a cheaper semiempirical method, which reaches the accuracy of the DFT reference method while requiring significantly less parameters.
Abstract: Quantum mechanics/molecular mechanics (QM/MM) molecular dynamics (MD) simulations have been developed to simulate molecular systems, where an explicit description of changes in the electronic structure is necessary. However, QM/MM MD simulations are computationally expensive compared to fully classical simulations as all valence electrons are treated explicitly and a self-consistent field (SCF) procedure is required. Recently, approaches have been proposed to replace the QM description with machine-learned (ML) models. However, condensed-phase systems pose a challenge for these approaches due to long-range interactions. Here, we establish a workflow, which incorporates the MM environment as an element type in a high-dimensional neural network potential (HDNNP). The fitted HDNNP describes the potential-energy surface of the QM particles with an electrostatic embedding scheme. Thus, the MM particles feel a force from the polarized QM particles. To achieve chemical accuracy, we find that even simple systems require models with a strong gradient regularization, a large number of data points, and a substantial number of parameters. To address this issue, we extend our approach to a Δ-learning scheme, where the ML model learns the difference between a reference method (density functional theory (DFT)) and a cheaper semiempirical method (density functional tight binding (DFTB)). We show that such a scheme reaches the accuracy of the DFT reference method while requiring significantly less parameters. Furthermore, the Δ-learning scheme is capable of correctly incorporating long-range interactions within a cutoff of 1.4 nm. It is validated by performing MD simulations of retinoic acid in water and the interaction between S-adenoslymethioniat and cytosine in water. The presented results indicate that Δ-learning is a promising approach for (QM)ML/MM MD simulations of condensed-phase systems.

61 citations


Journal ArticleDOI
TL;DR: In this article, a full-length SARS-CoV-2 S protein was modeled in a viral membrane including both open and closed conformations of the receptor-binding domain (RBD) and different templates for the stalk region.
Abstract: The spike (S) protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) mediates host cell entry by binding to angiotensin-converting enzyme 2 (ACE2) and is considered the major target for drug and vaccine development. We previously built fully glycosylated full-length SARS-CoV-2 S protein models in a viral membrane including both open and closed conformations of the receptor-binding domain (RBD) and different templates for the stalk region. In this work, multiple μs-long all-atom molecular dynamics simulations were performed to provide deeper insights into the structure and dynamics of S protein and glycan functions. Our simulations reveal that the highly flexible stalk is composed of two independent joints and most probable S protein orientations are competent for ACE2 binding. We identify multiple glycans stabilizing the open and/or closed states of the RBD and demonstrate that the exposure of antibody epitopes can be captured by detailed antibody-glycan clash analysis instead of commonly used accessible surface area analysis that tends to overestimate the impact of glycan shielding and neglect possible detailed interactions between glycan and antibodies. Overall, our observations offer structural and dynamic insights into the SARS-CoV-2 S protein and potentialize for guiding the design of effective antiviral therapeutics.

55 citations


Journal ArticleDOI
TL;DR: This paper presents a meta-modelling system that automates the very labor-intensive and therefore time-heavy and expensive and therefore expensive and expensive process of manually cataloging and cataloging individual cells.
Abstract: We present a methodology for defining and optimizing a general force field for classical molecular simulations, and we describe its use to derive the Open Force Field 1.0.0 small-molecule force field, codenamed Parsley. Rather than using traditional atom typing, our approach is built on the SMIRKS-native Open Force Field (SMIRNOFF) parameter assignment formalism, which handles increases in the diversity and specificity of the force field definition without needlessly increasing the complexity of the specification. Parameters are optimized with the ForceBalance tool, based on reference quantum chemical data that include torsion potential energy profiles, optimized gas-phase structures, and vibrational frequencies. These quantum reference data are computed and are maintained with QCArchive, an open-source and freely available distributed computing and database software ecosystem. In this initial application of the method, we present essentially a full optimization of all valence parameters and report tests of the resulting force field against compounds and data types outside the training set. These tests show improvements in optimized geometries and conformational energetics and demonstrate that Parsley's accuracy for liquid properties is similar to that of other general force fields, as is accuracy on binding free energies. We find that this initial Parsley force field affords accuracy similar to that of other general force fields when used to calculate relative binding free energies spanning 199 protein-ligand systems. Additionally, the resulting infrastructure allows us to rapidly optimize an entirely new force field with minimal human intervention.

Journal ArticleDOI
TL;DR: A protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking, and strikes a balance between two objectives: identifying the best scoring compounds and exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach.
Abstract: With the advent of make-on-demand commercial libraries, the number of purchasable compounds available for virtual screening and assay has grown explosively in recent years, with several libraries eclipsing one billion compounds. Today's screening libraries are larger and more diverse, enabling the discovery of more-potent hit compounds and unlocking new areas of chemical space, represented by new core scaffolds. Applying physics-based in silico screening methods in an exhaustive manner, where every molecule in the library must be enumerated and evaluated independently, is increasingly cost-prohibitive. Here, we introduce a protocol for machine learning-enhanced molecular docking based on active learning to dramatically increase throughput over traditional docking. We leverage a novel selection protocol that strikes a balance between two objectives: (1) identifying the best scoring compounds and (2) exploring a large region of chemical space, demonstrating superior performance compared to a purely greedy approach. Together with automated redocking of the top compounds, this method captures almost all the high scoring scaffolds in the library found by exhaustive docking. This protocol is applied to our recent virtual screening campaigns against the D4 and AMPC targets that produced dozens of highly potent, novel inhibitors, and a blind test against the MT1 target. Our protocol recovers more than 80% of the experimentally confirmed hits with a 14-fold reduction in compute cost, and more than 90% of the hit scaffolds in the top 5% of model predictions, preserving the diversity of the experimentally confirmed hit compounds.

Journal ArticleDOI
TL;DR: The ROST61 benchmark as mentioned in this paper is a single-reference open-shell (SOMC) benchmark set, which includes accurate coupled-cluster reference values for 61 reaction energies with a mean reaction energy of -42.8 kcal mol-1.
Abstract: Due to the principle lack of systematic improvement possibilities of density functional theory, careful assessment of the performance of density functional approximations (DFAs) on well-designed benchmark sets, for example, for reaction energies and barrier heights, is crucial. While main-group chemistry is well covered by several available sets, benchmark data for transition metal chemistry is sparse. This is especially the case for larger, chemically relevant molecules. Addressing this issue, we recently introduced the MOR41 benchmark which covers chemically relevant reactions of closed-shell complexes. In this work, we extend these efforts to single-reference open-shell systems and introduce the "reactions of open-shell single-reference transition metal complexes" (ROST61) benchmark set. ROST61 includes accurate coupled-cluster reference values for 61 reaction energies with a mean reaction energy of -42.8 kcal mol-1. Complexes with 13-93 atoms covering 20 d-block elements are included, but due to the restriction to single-reference open-shell systems, important elements such as iron or platinum could not be taken into account, or only to a small extent. We assess the performance of 31 DFAs in combination with three London dispersion (LD) correction schemes. Further, DFT-based composite methods, MP2, and a few semiempirical quantum chemical methods are evaluated. Consistent with the results for the MOR41 closed-shell benchmark, we find that the ordering of DFAs according to Jacob's ladder is preserved and that adding an LD correction is crucial, clearly improving almost all tested methods. The recently introduced r2SCAN-3c composite method stands out with a remarkable mean absolute deviation (MAD) of only 2.9 kcal mol-1, which is surpassed only by hybrid DFAs with low amounts of Fock exchange (e.g., 2.3 kcal mol-1 for TPSS0-D4/def2-QZVPP) and double-hybrid (DH) DFAs but at a significantly higher computational cost. The lowest MAD of only 1.6 kcal mol-1 is obtained with the DH DFA PWPB95-D4 in the def2-QZVPP basis set approaching the estimated accuracy of the reference method. Overall, the ROST61 set adds important reference data to a sparsely sampled but practically relevant area of chemistry. At this point, it provides valuable orientation for the application and development of new DFAs and electronic structure methods in general.

Journal ArticleDOI
TL;DR: In this article, the transferable, polarizable force field for Ionic liquids (J. Chem. Theory Comput. 2019,15, 5858) is extended to electrolytes, protic ionic liquids (PIL), deep eutectic solvents (DES), and glycols.
Abstract: The polarizable CL&Pol force field presented in our previous study, Transferable, Polarizable Force Field for Ionic Liquids (J. Chem. Theory Comput.2019,15, 5858, DOI: http://doi.org/10.1021/acs.jctc.9b0068910.1021/acs.jctc.9b00689), is extended to electrolytes, protic ionic liquids (PIL), deep eutectic solvents (DES), and glycols. These systems are problematic in polarizable simulations because they contain either small, highly charged ions or strong hydrogen bonds, which cause trajectory instabilities due to the pull exerted on the induced dipoles. We use a Tang-Toennies (TT) function to dampen, or smear, the interactions between charges and induced dipole at a short range involving small, highly charged atoms (such as hydrogen or lithium), thus preventing the "polarization catastrophe". The new force field gives stable trajectories and is validated through comparison with experimental data on density, viscosity, and ion diffusion coefficients of liquid systems of the above-mentioned classes. The results also shed light on the hydrogen-bonding pattern in ethylammonium nitrate, a PIL, for which the literature contains conflicting views. We describe the implementation of the TT damping function, of the temperature-grouped Nose-Hoover thermostat for polarizable molecular dynamics (MD) and of the periodic perturbation method for viscosity evaluation from non-equilibrium trajectories in the LAMMPS MD code. The main result of this work is the wider applicability of the CL&Pol polarizable force field to new, important classes of fluids, achieving robust trajectories and a good description of equilibrium and transport properties in challenging systems. The fragment-based approach of CL&Pol will allow ready extension to a wide variety of PILs, DES, and electrolytes.

Journal ArticleDOI
TL;DR: In this article, the authors present a reliable and accurate solution to the induced fit docking problem for protein-ligand binding by combining ligand-based pharmacophore docking, rigid receptor docking, and protein structure prediction with explicit solvent molecular dynamics simulations.
Abstract: We present a reliable and accurate solution to the induced fit docking problem for protein-ligand binding by combining ligand-based pharmacophore docking, rigid receptor docking, and protein structure prediction with explicit solvent molecular dynamics simulations. This novel methodology in detailed retrospective and prospective testing succeeded to determine protein-ligand binding modes with a root-mean-square deviation within 2.5 A in over 90% of cross-docking cases. We further demonstrate these predicted ligand-receptor structures were sufficiently accurate to prospectively enable predictive structure-based drug discovery for challenging targets, substantially expanding the domain of applicability for such methods.

Journal ArticleDOI
TL;DR: This work provides a refined discussion of quantum information theoretical concepts by introducing the physical correlation and its separation into classical and quantum parts as distinctive quantifiers of electronic structure, and succeeds in quantifying the entanglement.
Abstract: A recent development in quantum chemistry has established the quantum mutual information between orbitals as a major descriptor of electronic structure. This has already facilitated remarkable improvements in numerical methods and may lead to a more comprehensive foundation for chemical bonding theory. Building on this promising development, our work provides a refined discussion of quantum information theoretical concepts by introducing the physical correlation and its separation into classical and quantum parts as distinctive quantifiers of electronic structure. In particular, we succeed in quantifying the entanglement. Intriguingly, our results for different molecules reveal that the total correlation between orbitals is mainly classical, raising questions about the general significance of entanglement in chemical bonding. Our work also shows that implementing the fundamental particle number superselection rule, so far not accounted for in quantum chemistry, removes a major part of correlation and entanglement seen previously. In that respect, realizing quantum information processing tasks with molecular systems might be more challenging than anticipated.

Journal ArticleDOI
TL;DR: A way of training self-consistent models that are capable of taking large datasets from different systems and different kinds of labels is developed and it is demonstrated that the functional gives chemically accurate predictions on energy, force, dipole, and electron density for a large class of molecules.
Abstract: We propose a general machine learning-based framework for building an accurate and widely applicable energy functional within the framework of generalized Kohn-Sham density functional theory. To this end, we develop a way of training self-consistent models that are capable of taking large datasets from different systems and different kinds of labels. We demonstrate that the functional that results from this training procedure gives chemically accurate predictions on energy, force, dipole, and electron density for a large class of molecules. It can be continuously improved when more and more data are available.

Journal ArticleDOI
TL;DR: In this paper, a composite protocol consisting of linear-response CCSDT excitation energies determined with Dunning's double-$\zeta$ basis set corrected by CC3/CCSDT-3 energies obtained with the corresponding triple-$ zeta$ based basis set was applied to provide a series of highlyaccurate vertical transition energies for intramolecular charge transfer transitions occurring in molecular compounds.
Abstract: In the aim of completing our previous efforts devoted to local and Rydberg transitions in organic compounds, we provide a series of highly-accurate vertical transition energies for intramolecular charge-transfer transitions occurring in ($\pi$-conjugated) molecular compounds. To this end we apply a composite protocol consisting of linear-response CCSDT excitation energies determined with Dunning's double-$\zeta$ basis set corrected by CC3/CCSDT-3 energies obtained with the corresponding triple-$\zeta$ basis. Further basis set corrections (up to \emph{aug}-cc-pVQZ) are obtained at the CC2 level. We report 30 transitions obtained in 17 compounds. These reference values are then used to benchmark a series of wave function (CIS(D), EOM-MP2, CC2, CCSD, CCSD(T)(a)*, CCSDR(3), CCSDT-3, CC3, ADC(2), ADC(3), and ADC(2.5)), the Green's function-based Bethe-Salpeter equation (BSE) formalism performed on top of the partially self-consistent ev$GW$ scheme considering two different starting points (BSE/ev$GW$@HF and BSE/ev$GW$@PBE0), and TD-DFT combined with several exchange-correlation functionals (B3LYP, PBE0, M06-2X, CAM-B3LYP, LC-$\omega$HPBE, $\omega$B97X, $\omega$B97X-D, and M11). It turns out that CCSD(T)(a)*, CCSDR(3), CCSDT-3, CC3 as well as ADC(2.5) provide rather small average deviations ($\leq 0.10$ eV), CC3 emerging as the only chemically accurate approach. Both CC2 and BSE/ev$GW$@PBE0 also deliver satisfying results given their respective $\mathcal{O}(N^5)$ and $\mathcal{O}(N^4)$ computational scalings. In the TD-DFT context, the best performing functional is $\omega$B97X-D, closely followed by CAM-B3LYP and M06-2X, all providing mean absolute errors around $0.15$ eV relative to the theoretical best estimates.

Journal ArticleDOI
TL;DR: In this paper, the authors presented new LC double hybrid with spin-component and spin-opposite scaled (SCS/SOS) for singlet-singlet excitations.
Abstract: Following the work on spin-component and spin-opposite scaled (SCS/SOS) global double hybrids for singlet-singlet excitations by Schwabe and Goerigk [ J. Chem. Theory Comput. 2017, 13, 4307-4323] and our own works on new long-range corrected (LC) double hybrids for singlet-singlet and singlet-triplet excitations [ J. Chem. Theory Comput. 2019, 15, 4735-4744 and J. Chem. Phys. 2020, 153, 064106], we present new LC double hybrids with SCS/SOS that demonstrate further improvement over previously published results and methods. We introduce new unscaled and scaled versions of different global and LC double hybrids based on Becke88 or PBE exchange combined with LYP, PBE, or P86 correlation. For singlet-singlet excitations, we cross-validate them on six benchmark sets that cover small to medium-sized chromophores with different excitation types (local-valence, Rydberg, and charge transfer). For singlet-triplet excitations, we perform the cross-validation on three different benchmark sets following the same analysis as in our previous work in 2020. In total, 203 excitations are analyzed. Our results confirm and extend those of Schwabe and Goerigk regarding the superior performance of SCS and SOS variants compared to their unscaled parents by decreasing mean absolute deviations, root-mean-square deviations, or error spans by more than half and bringing absolute mean deviations closer to zero. Our SCS/SOS variants are shown to be highly efficient and robust for the computation of vertical excitation energies, which even outperform specialized double hybrids that also contain an LC in their perturbative part. In particular, our new SCS/SOS-ωPBEPP86 and SCS/SOS-ωB88PP86 functionals are four of the most accurate and robust methods tested in this work, and we fully recommend them for future applications. However, if the relevant SCS and SOS algorithms are not available to the user, we suggest ωPBEPP86 as the best unscaled method in this work.

Journal ArticleDOI
TL;DR: In this article, a random forest machine learning model is used to predict the partial atomic charges in MOFs using a small yet meaningful set of features that represent both the elemental properties and the local environment of each atom.
Abstract: Computational high-throughput screening using molecular simulations is a powerful tool for identifying top-performing metal-organic frameworks (MOFs) for gas storage and separation applications. Accurate partial atomic charges are often required to model the electrostatic interactions between the MOF and the adsorbate, especially when the adsorption involves molecules with dipole or quadrupole moments such as water and CO2. Although ab initio methods can be used to calculate accurate partial atomic charges, these methods are impractical for screening large material databases because of the high computational cost. We developed a random forest machine learning model to predict the partial atomic charges in MOFs using a small yet meaningful set of features that represent both the elemental properties and the local environment of each atom. The model was trained and tested on a collection of about 320 000 density-derived electrostatic and chemical (DDEC) atomic charges calculated on a subset of the Computation-Ready Experimental Metal-Organic Framework (CoRE MOF-2019) database and separately on charge model 5 (CM5) charges. The model predicts accurate atomic charges for MOFs at a fraction of the computational cost of periodic density functional theory (DFT) and is found to be transferable to other porous molecular crystals and zeolites. A strong correlation is observed between the partial atomic charge and the average electronegativity difference between the central atom and its bonded neighbors.

Journal ArticleDOI
TL;DR: This work has studied the performance of density-corrected density functional theory (HF-DFT), compared to self-consistent DFT, for several pure and hybrid GGA and meta-GGA exchange–correlation functionals (PBE, BLYP, TPSS, and SCAN) as a function of the percentage of HF exchange in the hybrid.
Abstract: For the large and chemically diverse GMTKN55 benchmark suite, we have studied the performance of density-corrected density functional theory (HF-DFT), compared to self-consistent DFT, for several pure and hybrid GGA and meta-GGA exchange-correlation (XC) functionals (PBE, BLYP, TPSS, and SCAN) as a function of the percentage of HF exchange in the hybrid. The D4 empirical dispersion correction has been added throughout. For subsets dominated by dynamical correlation, HF-DFT is highly beneficial, particularly at low HF exchange percentages. This is especially true for noncovalent interactions where the electrostatic component is dominant, such as hydrogen and halogen bonds: for π-stacking, HF-DFT is detrimental. For subsets with significant nondynamical correlation (i.e., where a Hartree-Fock determinant is not a good zero-order wavefunction), HF-DFT may do more harm than good. While the self-consistent series show optima at or near 37.5% (i.e., 3/8) for all four XC functionals-consistent with Grimme's proposal of the PBE38 functional-HF-BnLYP-D4, HF-PBEn-D4, and HF-TPSSn-D4 all exhibit minima nearer 25% (i.e., 1/4) as the use of HF orbitals greatly mitigates the error at 25% for barrier heights. Intriguingly, for HF-SCANn-D4, the minimum is near 10%, but the weighted mean absolute error (WTMAD2) for GMTKN55 is only barely lower than that for HF-SCAN-D4 (i.e., where the post-HF step is a pure meta-GGA). The latter becomes an attractive option, only slightly more costly than pure Hartree-Fock, and devoid of adjustable parameters other than the three in the dispersion correction. Moreover, its WTMAD2 is only surpassed by the highly empirical M06-2X and by the combinatorially optimized empirical range-separated hybrids ωB97X-V and ωB97M-V.

Journal ArticleDOI
TL;DR: In this paper, a single-point Hessian (SPH) method was proposed for the computation of HVF and thermodynamic contributions to the free energy within the modified rigid-rotor-harmonic-oscillator approximation for general nonequilibrium molecular geometries.
Abstract: The calculation of harmonic vibrational frequencies (HVF) to interpret infrared (IR) spectra and to convert molecular energies to free energies is one of the essential steps in computational chemistry. A prerequisite for accurate thermostatistics so far was to optimize the molecular input structures in order to avoid imaginary frequencies, which inevitably leads to changes in the geometry if different theoretical levels are applied for geometry optimization and frequency calculations. In this work, we propose a new method termed single-point Hessian (SPH) for the computation of HVF and thermodynamic contributions to the free energy within the modified rigid-rotor-harmonic-oscillator approximation for general nonequilibrium molecular geometries. The key ingredient is the application of a biasing potential given as Gaussian functions expressed with the root-mean-square-deviation (RMSD) in Cartesian space in order to retain the initial geometry. The theory derived herein is generally applicable to quantum mechanical (QM), semiempirical QM, and force-field (FF) methods. Besides a detailed description of the underlying theory including the important back-correction of the biased HVF, the SPH approach is tested for reaction paths, molecular dynamics snapshots of crambin, and supramolecular association free energies in comparison to high-level density functional theory (DFT) values. Furthermore, the effect on IR spectra is investigated for organic dimers and transition-metal complexes revealing improved spectra at low theoretical levels. On average, DFT reference free energies are better reproduced by the newly developed SPH scheme than by conventional calculations on freely optimized geometries or without any relaxation.

Journal ArticleDOI
TL;DR: A protocol for performing machine learning assisted free energy simulation of solution-phase and enzyme reactions at an ab initio quantum mechanical and molecular mechanical (ai-QM/MM) level of accuracy is reported.
Abstract: Despite recent advances in the development of machine learning potentials (MLPs) for biomolecular simulations, there has been limited effort on developing stable and accurate MLPs for enzymatic reactions. Here we report a protocol for performing machine-learning-assisted free energy simulation of solution-phase and enzyme reactions at the ab initio quantum-mechanical/molecular-mechanical (ai-QM/MM) level of accuracy. Within our protocol, the MLP is built to reproduce the ai-QM/MM energy and forces on both QM (reactive) and MM (solvent/enzyme) atoms. As an alternative strategy, a delta machine learning potential (ΔMLP) is trained to reproduce the differences between the ai-QM/MM and semiempirical (se) QM/MM energies and forces. To account for the effect of the condensed-phase environment in both MLP and ΔMLP, the DeePMD representation of a molecular system is extended to incorporate the external electrostatic potential and field on each QM atom. Using the Menshutkin and chorismate mutase reactions as examples, we show that the developed MLP and ΔMLP reproduce the ai-QM/MM energy and forces with errors that on average are less than 1.0 kcal/mol and 1.0 kcal mol-1 A-1, respectively, for representative configurations along the reaction pathway. For both reactions, MLP/ΔMLP-based simulations yielded free energy profiles that differed by less than 1.0 kcal/mol from the reference ai-QM/MM results at only a fraction of the computational cost.

Journal ArticleDOI
TL;DR: Polymer Builder as discussed by the authors is a web-based infrastructure that provides a generalized and automated process to build a relaxed polymer system using a coarse-grained model and all-atom replacement.
Abstract: Molecular modeling and simulations are invaluable tools for polymer science and engineering, which predict physicochemical properties of polymers and provide molecular-level insight into the underlying mechanisms However, building realistic polymer systems is challenging and requires considerable experience because of great variations in structures as well as length and time scales This work describes Polymer Builder in CHARMM-GUI (http://wwwcharmm-guiorg/input/polymer), a web-based infrastructure that provides a generalized and automated process to build a relaxed polymer system Polymer Builder not only provides versatile modeling methods to build complex polymer structures, but also generates realistic polymer melt and solution systems through the built-in coarse-grained model and all-atom replacement The coarse-grained model parametrization is generalized and extensively validated with various experimental data and all-atom simulations In addition, the capability of Polymer Builder for generating relaxed polymer systems is demonstrated by density calculations of 34 homopolymer melt systems, characteristic ratio calculations of 170 homopolymer melt systems, a morphology diagram of poly(styrene-b-methyl methacrylate) block copolymers, and self-assembly behavior of amphiphilic poly(ethylene oxide-b-ethylethane) block copolymers in water We hope that Polymer Builder is useful to carry out innovative and novel polymer modeling and simulation research to acquire insight into structures, dynamics, and underlying mechanisms of complex polymer-containing systems

Journal ArticleDOI
TL;DR: This work concludes this work by performing the most detailed DFT benchmark study for CB interactions to date, assessing 109 variations of dispersion-corrected and -uncorrected DFT methods, and carrying out a detailed analysis of 80 of them.
Abstract: We present the CHAL336 benchmark set-the most comprehensive database for the assessment of chalcogen-bonding (CB) interactions. After careful selection of suitable systems and identification of three high-level reference methods, the set comprises 336 dimers each consisting of up to 49 atoms and covers both σ- and π-hole interactions across four categories: chalcogen-chalcogen, chalcogen-π, chalcogen-halogen, and chalcogen-nitrogen interactions. In a subsequent study of DFT methods, we re-emphasize the need for using proper London dispersion corrections when treating noncovalent interactions. We also point out that the deterioration of results and systematic overestimation of interaction energies for some dispersion-corrected DFT methods does not hint at problems with the chosen dispersion correction but is a consequence of large density-driven errors. We conclude this work by performing the most detailed DFT benchmark study for CB interactions to date. We assess 109 variations of dispersion-corrected and dispersion-uncorrected DFT methods and carry out a detailed analysis of 80 of them. Double-hybrid functionals are the most reliable approaches for CB interactions, and they should be used whenever computationally feasible. The best three double hybrids are SOS0-PBE0-2-D3(BJ), revDSD-PBEP86-D3(BJ), and B2NCPLYP-D3(BJ). The best hybrids in this study are ωB97M-V, PW6B95-D3(0), and PW6B95-D3(BJ). We do not recommend using the popular B3LYP functional nor the MP2 approach, which have both been frequently used to describe CB interactions in the past. We hope to inspire a change in computational protocols surrounding CB interactions that leads away from the commonly used, popular methods to the more robust and accurate ones recommended herein. We would also like to encourage method developers to use our set for the investigation and reduction of density-driven errors in new density functional approximations.

Journal ArticleDOI
TL;DR: An all-electron G0W0 implementation for periodic systems with k-point sampling implemented in a crystalline Gaussian basis relies on efficient Gaussian density fitting integrals and includes both analytic continuation and contour deformation schemes.
Abstract: We describe an all-electron G0W0 implementation for periodic systems with k-point sampling implemented in a crystalline Gaussian basis. Our full-frequency G0W0 method relies on efficient Gaussian d...

Journal ArticleDOI
TL;DR: A low-scaling GW algorithm is presented with notably improved accuracy compared to the previous algorithm, demonstrated for frontier orbitals using the GW100 benchmark set, for which the algorithm yields a mean absolute deviation of only 6 meV with respect to canonical implementations.
Abstract: GW is an accurate method for computing electron addition and removal energies of molecules and solids. In a conventional GW implementation, however, its computational cost is O(N4) in the system size N, which prohibits its application to many systems of interest. We present a low-scaling GW algorithm with notably improved accuracy compared to our previous algorithm [J. Phys. Chem. Lett.2018, 9, 306-312]. This is demonstrated for frontier orbitals using the GW100 benchmark set, for which our algorithm yields a mean absolute deviation of only 6 meV with respect to canonical implementations. We show that also excitations of deep valence, semicore, and unbound states match conventional schemes within 0.1 eV. The high accuracy is achieved by using minimax grids with 30 grid points and the resolution of the identity with the truncated Coulomb metric. We apply the low-scaling GW algorithm with improved accuracy to phosphorene nanosheets of increasing size. We find that their fundamental gap is strongly size-dependent varying from 4.0 eV (1.8 nm × 1.3 nm, 88 atoms) to 2.4 eV (6.9 nm × 4.8 nm, 990 atoms) at the evGW0@PBE level.

Journal ArticleDOI
TL;DR: In this paper, a coarse-grained force field is proposed to model both ordered and disordered proteins with consistent accuracy, which combines maximum entropy biasing, least squares fitting, and basic principles of energy landscape theory to ensure that MOFF recreates experimental radii of gyration while predicting the folded structures for globular proteins with lower energy.
Abstract: Many proteins have been shown to function via liquid-liquid phase separation. Computational modeling could offer much needed structural details of protein condensates and reveal the set of molecular interactions that dictate their stability. However, the presence of both ordered and disordered domains in these proteins places a high demand on the model accuracy. Here, we present an algorithm to derive a coarse-grained force field, MOFF, which can model both ordered and disordered proteins with consistent accuracy. It combines maximum entropy biasing, least-squares fitting, and basic principles of energy landscape theory to ensure that MOFF recreates experimental radii of gyration while predicting the folded structures for globular proteins with lower energy. The theta temperature determined from MOFF separates ordered and disordered proteins at 300 K and exhibits a strikingly linear relationship with amino acid sequence composition. We further applied MOFF to study the phase behavior of HP1, an essential protein for post-translational modification and spatial organization of chromatin. The force field successfully resolved the structural difference of two HP1 homologues despite their high sequence similarity. We carried out large-scale simulations with hundreds of proteins to determine the critical temperature of phase separation and uncover multivalent interactions that stabilize higher-order assemblies. In all, our work makes significant methodological strides to connect theories of ordered and disordered proteins and provides a powerful tool for studying liquid-liquid phase separation with near-atomistic details.

Journal ArticleDOI
TL;DR: It is demonstrated that magnetizabilities can be calculated by numerical integration of magnetizability density; this approach is implemented as a new feature in the gauge-including magnetically induced current (GIMIC) method.
Abstract: We have assessed the accuracy of the magnetic properties of a set of 51 density functional approximations, including both recently published and already established functionals. The accuracy assessment considers a series of 27 small molecules and is based on comparing the predicted magnetizabilities to literature reference values calculated using coupled-cluster theory with full singles and doubles and perturbative triples [CCSD(T)] employing large basis sets. The most accurate magnetizabilities, defined as the smallest mean absolute error, are obtained with the BHandHLYP functional. Three of the six studied Berkeley functionals and the three range-separated Florida functionals also yield accurate magnetizabilities. Also, some older functionals like CAM-B3LYP, KT1, BHLYP (BHandH), B3LYP, and PBE0 perform rather well. In contrast, unsatisfactory performance is generally obtained with Minnesota functionals, which are therefore not recommended for calculations of magnetically induced current density susceptibilities and related magnetic properties such as magnetizabilities and nuclear magnetic shieldings. We also demonstrate that magnetizabilities can be calculated by numerical integration of magnetizability density; we have implemented this approach as a new feature in the gauge-including magnetically induced current (GIMIC) method. Magnetizabilities can be calculated from magnetically induced current density susceptibilities within this approach even when analytical approaches for magnetizabilities as the second derivative of the energy have not been implemented. The magnetizability density can also be visualized, providing additional information that is not otherwise easily accessible on the spatial origin of magnetizabilities.

Journal ArticleDOI
TL;DR: Long-range Lennard-Jones interactions have been incorporated into the CHARMM36 (C36) lipid force field (FF) using the LJ particle-mesh Ewald (LJ-PME) method in order to remove the inconsistenc...
Abstract: Long-range Lennard-Jones (LJ) interactions have been incorporated into the CHARMM36 (C36) lipid force field (FF) using the LJ particle-mesh Ewald (LJ-PME) method in order to remove the inconsistenc...