scispace - formally typeset
Search or ask a question

Showing papers by "Adrian E. Roitberg published in 2020"


Journal ArticleDOI
TL;DR: This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed inQSAR to a wide range of research areas outside of traditional QSar boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics.
Abstract: Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure–activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.

383 citations


Journal ArticleDOI
TL;DR: This Review will describe recent theoretical advances including treatment of electronic decoherence in surface-hopping methods, the role of solvent effects, trivial unavoided crossings, analysis of data based on transition densities, and efficient computational implementations of these numerical methods.
Abstract: Optically active molecular materials, such as organic conjugated polymers and biological systems, are characterized by strong coupling between electronic and vibrational degrees of freedom. Typically, simulations must go beyond the Born-Oppenheimer approximation to account for non-adiabatic coupling between excited states. Indeed, non-adiabatic dynamics is commonly associated with exciton dynamics and photophysics involving charge and energy transfer, as well as exciton dissociation and charge recombination. Understanding the photoinduced dynamics in such materials is vital to providing an accurate description of exciton formation, evolution, and decay. This interdisciplinary field has matured significantly over the past decades. Formulation of new theoretical frameworks, development of more efficient and accurate computational algorithms, and evolution of high-performance computer hardware has extended these simulations to very large molecular systems with hundreds of atoms, including numerous studies of organic semiconductors and biomolecules. In this Review, we will describe recent theoretical advances including treatment of electronic decoherence in surface-hopping methods, the role of solvent effects, trivial unavoided crossings, analysis of data based on transition densities, and efficient computational implementations of these numerical methods. We also emphasize newly developed semiclassical approaches, based on the Gaussian approximation, which retain phase and width information to account for significant decoherence and interference effects while maintaining the high efficiency of surface-hopping approaches. The above developments have been employed to successfully describe photophysics in a variety of molecular materials.

221 citations


Journal ArticleDOI
TL;DR: This work provides an extension of the ANI-1x model that is trained to three additional chemical elements: S, F, and Cl, and is shown to accurately predict molecular energies compared to DFT with a ~106 factor speedup and a negligible slowdown.
Abstract: Machine learning (ML) methods have become powerful, predictive tools in a wide range of applications, such as facial recognition and autonomous vehicles. In the sciences, computational chemists and physicists have been using ML for the prediction of physical phenomena, such as atomistic potential energy surfaces and reaction pathways. Transferable ML potentials, such as ANI-1x, have been developed with the goal of accurately simulating organic molecules containing the chemical elements H, C, N, and O. Here, we provide an extension of the ANI-1x model. The new model, dubbed ANI-2x, is trained to three additional chemical elements: S, F, and Cl. Additionally, ANI-2x underwent torsional refinement training to better predict molecular torsion profiles. These new features open a wide range of new applications within organic chemistry and drug development. These seven elements (H, C, N, O, F, Cl, and S) make up ∼90% of drug-like molecules. To show that these additions do not sacrifice accuracy, we have tested this model across a range of organic molecules and applications, including the COMP6 benchmark, dihedral rotations, conformer scoring, and nonbonded interactions. ANI-2x is shown to accurately predict molecular energies compared to density functional theory with a ∼106 factor speedup and a negligible slowdown compared to ANI-1x and shows subchemical accuracy across most of the COMP6 benchmark. The resulting model is a valuable tool for drug development which can potentially replace both quantum calculations and classical force fields for a myriad of applications.

139 citations


Journal ArticleDOI
TL;DR: The ANI-1x and ANi-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process, and are provided to aid research and development of ML models for chemistry.
Abstract: Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry. Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12046440

117 citations


Journal ArticleDOI
TL;DR: This paper presents TorchANI, a PyTorch based software for training/inference of ANI (ANAKIN-ME) deep learning models to obtain potential energy surfaces and other physical properties of molecular systems.
Abstract: This paper presents TorchANI, a PyTorch-based program for training/inference of ANI (ANAKIN-ME) deep learning models to obtain potential energy surfaces and other physical properties of molecular systems. ANI is an accurate neural network potential originally implemented using C++/CUDA in a program called NeuroChem. Compared with NeuroChem, TorchANI has a design emphasis on being lightweight, user friendly, cross platform, and easy to read and modify for fast prototyping, while allowing acceptable sacrifice on running performance. Because the computation of atomic environmental vectors and atomic neural networks are all implemented using PyTorch operators, TorchANI is able to use PyTorch's autograd engine to automatically compute analytical forces and Hessian matrices, as well as do force training without requiring any additional codes. TorchANI is open-source and freely available on GitHub: https://github.com/aiqm/torchani.

106 citations


Posted ContentDOI
04 May 2020-ChemRxiv
TL;DR: TorchANI is able to use PyTorch’s autograd engine to automatically compute analytical forces and Hessian matrices, as well as do force training without additional codes required.
Abstract: This paper presents TorchANI, a PyTorch based software for training/inferenceof ANI (ANAKIN-ME) deep learning models to obtain potential energy surfaces andother physical properties of molecular systems. ANI is an accurate neural networkpotential originally implemented using C++/CUDA in a program called NeuroChem.Compared with NeuroChem, TorchANI has a design emphasis on being light weight,user friendly, cross platform, and easy to read and modify for fast prototyping, whileallowing acceptable sacrifice on running performance. Because the computation ofatomic environmental vectors (AEVs) and atomic neural networks are all implementedusing PyTorch operators, TorchANI is able to use PyTorch’s autograd engine to automatically compute analytical forces and Hessian matrices, as well as do force trainingwithout additional codes required.

71 citations


Journal ArticleDOI
TL;DR: The primary intent behind the NEXMD was to simulate nonadiabatic molecular dynamics, but the code can also perform geometry optimizations, adiabatic excited state dynamics, and single-point calculations all in vacuum or in a simulated solvent.
Abstract: We present a versatile new code released for open community use, the nonadiabatic excited state molecular dynamics (NEXMD) package. This software aims to simulate nonadiabatic excited state molecular dynamics using several semiempirical Hamiltonian models. To model such dynamics of a molecular system, the NEXMD uses the fewest-switches surface hopping algorithm, where the probability of transition from one state to another depends on the strength of the derivative nonadiabatic coupling. In addition, there are a number of algorithmic improvements such as empirical decoherence corrections and tracking trivial crossings of electronic states. While the primary intent behind the NEXMD was to simulate nonadiabatic molecular dynamics, the code can also perform geometry optimizations, adiabatic excited state dynamics, and single-point calculations all in vacuum or in a simulated solvent. In this report, first, we lay out the basic theoretical framework underlying the code. Then we present the code's structure and workflow. To demonstrate the functionality of NEXMD in detail, we analyze the photoexcited dynamics of a polyphenylene ethynylene dendrimer (PPE, C30H18) in vacuum and in a continuum solvent. Furthermore, the PPE molecule example serves to highlight the utility of the getexcited.py helper script to form a streamlined workflow. This script, provided with the package, can both set up NEXMD calculations and analyze the results, including, but not limited to, collecting populations, generating an average optical spectrum, and restarting unfinished calculations.

46 citations


Posted ContentDOI
30 Jul 2020-bioRxiv
TL;DR: This work demonstrates how a new generation of hybrid machine learning / molecular mechanics (ML/MM) potentials can deliver significant accuracy improvements in modeling protein-ligand binding affinities and demonstrates the utility of ML/MM free energy calculations.
Abstract: Alchemical free energy methods with molecular mechanics (MM) force fields are now widely used in the prioritization of small molecules for synthesis in structure-enabled drug discovery projects because of their ability to deliver 1–2 kcal mol−1 accuracy in well-behaved protein-ligand systems. Surpassing this accuracy limit would significantly reduce the number of compounds that must be synthesized to achieve desired potencies and selectivities in drug design campaigns. However, MM force fields pose a challenge to achieving higher accuracy due to their inability to capture the intricate atomic interactions of the physical systems they model. A major limitation is the accuracy with which ligand intramolecular energetics—especially torsions—can be modeled, as poor modeling of torsional profiles and coupling with other valence degrees of freedom can have a significant impact on binding free energies. Here, we demonstrate how a new generation of hybrid machine learning / molecular mechanics (ML/MM) potentials can deliver significant accuracy improvements in modeling protein-ligand binding affinities. Using a nonequilibrium perturbation approach, we can correct a standard, GPU-accelerated MM alchemical free energy calculation in a simple post-processing step to efficiently recover ML/MM free energies and deliver a significant accuracy improvement with small additional computational effort. To demonstrate the utility of ML/MM free energy calculations, we apply this approach to a benchmark system for predicting kinase:inhibitor binding affinities—a congeneric ligand series for non-receptor tyrosine kinase TYK2 (Tyk2)—wherein state-of-the-art MM free energy calculations (with OPLS2.1) achieve inaccuracies of 0.93±0.12 kcal mol−1 in predicting absolute binding free energies. Applying an ML/MM hybrid potential based on the ANI2x ML model and AMBER14SB/TIP3P with the OpenFF 1.0.0 (“Parsley”) small molecule force field as an MM model, we show that it is possible to significantly reduce the error in absolute binding free energies from 0.97 [95% CI: 0.68, 1.21] kcal mol−1 (MM) to 0.47 [95% CI: 0.31, 0.63] kcal mol−1 (ML/MM).

45 citations


Journal ArticleDOI
TL;DR: This research presents a novel probabilistic procedure called QSAR without borders, which can be used to assess the severity of the impact of natural disasters on the response of the immune system.
Abstract: Correction for ‘QSAR without borders’ by Eugene N. Muratov et al., Chem. Soc. Rev., 2020, DOI: 10.1039/d0cs00098a.

18 citations


Journal ArticleDOI
TL;DR: The results can provide new insights into previous theoretical and experimental findings by using a fully force field-based and GPU-accelerated approach, which allows the simulations to be executed with high computational performance.
Abstract: Coupled redox and pH-driven processes are at the core of many important biological mechanisms. As the distribution of protonation and redox states in a system is associated with the pH and redox po...

12 citations


Journal ArticleDOI
TL;DR: A comprehensive knowledge of the main structural, dynamic, and optical properties of photosynthesis can be found in this article, where the authors present a detailed analysis of the photometric properties of natural photosynthesis.
Abstract: Light-harvesting and intramolecular energy funneling are fundamental processes in natural photosynthesis. A comprehensive knowledge of the main structural, dynamic, and optical properties that regu...

Journal ArticleDOI
TL;DR: The results point to a potential mechanosensitive mechanism for fibrillin-1 in regulating extracellular transforming growth factor beta (TGFB) bioavailability and microfibril integrity, which may represent novel mechanisms for mechanical hemostasis regulation inextracellular matrix that are pathologically activated in MFS.
Abstract: Marfan syndrome (MFS) is a highly variable genetic connective tissue disorder caused by mutations in the calcium binding extracellular matrix glycoprotein fibrillin-1. Patients with the most severe form of MFS (neonatal MFS; nMFS) tend to have mutations that cluster in an internal region of fibrillin-1 called the neonatal region. This region is predominantly composed of eight calcium-binding epidermal growth factor-like (cbEGF) domains, each of which binds one calcium ion and is stabilized by three highly conserved disulfide bonds. Crucially, calcium plays a fundamental role in stabilizing cbEGF domains. Perturbed calcium binding caused by cbEGF domain mutations is thus thought to be a central driver of MFS pathophysiology. Using steered molecular dynamics (SMD) simulations, we demonstrate that cbEGF domain calcium binding decreases under mechanical stress (i.e. cbEGF domains are mechanosensitive). We further demonstrate the disulfide bonds in cbEGF domains uniquely orchestrate protein unfolding by showing that MFS disulfide bond mutations markedly disrupt normal mechanosensitive calcium binding dynamics. These results point to a potential mechanosensitive mechanism for fibrillin-1 in regulating extracellular transforming growth factor beta (TGFB) bioavailability and microfibril integrity. Such mechanosensitive "smart" features may represent novel mechanisms for mechanical hemostasis regulation in extracellular matrix that are pathologically activated in MFS.

Journal ArticleDOI
TL;DR: It is hypothesized that axially‐chiral‐cannabinols (ax‐CBNs), unnatural and unknown isomers of cannabinol (CBN) may be valuable scaffolds for cannabinoid‐inspired drug discovery.
Abstract: Phytocannabinoids (and synthetic analogs thereof) are gaining significant attention as promising leads in modern medicine. Considering this, new directions for the design of phytocannabinoid-inspired molecules is of immediate interest. In this regard, we have hypothesized that axially-chiral-cannabinols (ax-CBNs), unnatural and unknown isomers of cannabinol (CBN) may be valuable scaffolds for cannabinoid-inspired drug discovery. There are two main factors directing our interest to these scaffolds: (a) ax-CBNs would have ground-state three-dimensionality; ligand-receptor interactions can be more significant with complimentary 3D-topology, and (b) ax-CBNs at their core structure are biaryl molecules, generally attractive platforms for pharmaceutical development due to their ease of functionalization and stability. Herein we report a synthesis of ax-CBNs, examine physical properties experimentally and computationally, and perform a comparative analysis of ax-CBN and THC in mice behavioral studies.

Journal ArticleDOI
TL;DR: It was uncovered that ring-closing metathesis occurs exclusively on the tetraene-variant, yielding unique, stereochemically and functionally rich polycyclic bridged frameworks, whereas the reduced version (a triene) undergoes ring-rearrangement meetingathesis to 5-6-5 fused ring systems resembling the isoryanodane core.

Journal ArticleDOI
TL;DR: The theoretical findings suggest that the proposed mechanism of the proteolysis catalyzed by HIV-1 PR corresponds in principle with experimental data and it is suggested that the QM/MM MD method can be used as a reliable computational technique to rationalize lead compounds against specific targets such as the HIV- 1 protease.
Abstract: HIV-1 protease (HIV-1 PR) is an essential enzyme for the replication process of its virus, and therefore considered an important target for the development of drugs against the acquired immunodefic...

Journal ArticleDOI
TL;DR: This work investigates the atomistic cause of the highly shifted pKa of the internal Glu23 in the artificially mutated variant V23E of Staphylococcal Nuclease (SNase) using pH replica exchange molecular dynamics simulations and describes the coupling between the conformational and ionization equilibria.
Abstract: Ionizable residues are rarely present in the hydrophobic interior of proteins, but when they are, they play important roles in biological processes such as energy transduction and enzyme catalysis. Internal ionizable residues have anomalous experimental pKa values with respect to their pKa in bulk water. This work investigates the atomistic cause of the highly shifted pKa of the internal Glu23 in the artificially mutated variant V23E of Staphylococcal Nuclease (SNase) using pH replica exchange molecular dynamics (pH-REMD) simulations. The pKa of Glu23 obtained from our calculations is 6.55, which is elevated with respect to the glutamate pKa of 4.40 in bulk water. The calculated value is close to the experimental pKa of 7.10. Our simulations show that the highly shifted pKa of Glu23 is the product of a pH-dependent conformational change, which has been observed experimentally and also seen in our simulations. We carry out an analysis of this pH-dependent conformational change in response to the protonation state change of Glu23. Using a four-state thermodynamic model, we estimate the two conformation-specific pKa values of Glu23 and describe the coupling between the conformational and ionization equilibria.

Posted ContentDOI
07 Feb 2020-ChemRxiv
TL;DR: In this article, an extension of the ANI-1x model is proposed, which is trained to three additional chemical elements (S, C, N, O, F, Cl, S) to better predict molecular torsion profiles.
Abstract: Machine learning (ML) methods have become powerful, predictive tools in a wide range of applications, such as facial recognition and autonomous vehicles. In the sciences, computational chemists and physicists have been using ML for the prediction of physical phenomena, such as atomistic potential energy surfaces and reaction pathways. Transferable ML potentials, such as ANI-1x, have been developed with the goal of accurately simulating organic molecules containing the chemical elements H, C, N, and O. Here we provide an extension of the ANI-1x model. The new model, dubbed ANI-2x, is trained to three additional chemical elements: S, F, and Cl. Additionally, ANI-2x underwent torsional refinement training to better predict molecular torsion profiles. These new features open a wide range of new applications within organic chemistry and drug development. These seven elements (H, C, N, O, F, Cl, S) make up ~90% of drug like molecules. To show that these additions do not sacrifice accuracy, we have tested this model across a range of organic molecules and applications, including the COMP6 benchmark, dihedral rotations, conformer scoring, and non-bonded interactions. ANI-2x is shown to accurately predict molecular energies compared to DFT with a ~106 factor speedup and a negligible slowdown compared to ANI-1x. The resulting model is a valuable tool for drug development that can potentially replace both quantum calculations and classical force fields for myriad applications.

Journal ArticleDOI
TL;DR: This study investigated E. coli Gar Tfase, which shares high sequence similarity with the human GAR Tfases, and most functional residues are conserved, and computed population of CCPS with respect to pH matches well with the experimental pH-activity curve.
Abstract: Human Glycinamide ribonucleotide transformylase (GAR Tfase) is a regulatory enzyme in the de novo purine biosynthesis pathway that has been extensively studied as an anti-cancer target. To some ext...