scispace - formally typeset
Search or ask a question
Posted Content

Quantum Mechanics and Machine Learning Synergies: Graph Attention Neural Networks to Predict Chemical Reactivity.

TL;DR: In this article, deep learning methods were used to predict the reactivity of molecular structures and train them using this curated dataset in combination with different representations of molecular structure using tenfold cross-validation.
Abstract: There is a lack of scalable quantitative measures of reactivity for functional groups in organic chemistry Measuring reactivity experimentally is costly and time-consuming and does not scale to the astronomical size of chemical space In previous quantum chemistry studies, we have introduced Methyl Cation Affinities (MCA*) and Methyl Anion Affinities (MAA*), using a solvation model, as quantitative measures of reactivity for organic functional groups over the broadest range Although MCA* and MAA* offer good estimates of reactivity parameters, their calculation through Density Functional Theory (DFT) simulations is time-consuming To circumvent this problem, we first use DFT to calculate MCA* and MAA* for more than 2,400 organic molecules thereby establishing a large dataset of chemical reactivity scores We then design deep learning methods to predict the reactivity of molecular structures and train them using this curated dataset in combination with different representations of molecular structures Using ten-fold cross-validation, we show that graph attention neural networks applied to informative input fingerprints produce the most accurate estimates of reactivity, achieving over 91% test accuracy for predicting the MCA* plus-minus 30 or MAA* plus-minus 30, over 50 orders of magnitude Finally, we demonstrate the application of these reactivity scores to two tasks: (1) chemical reaction prediction; (2) combinatorial generation of reaction mechanisms The curated dataset of MCA* and MAA* scores is available through the ChemDB chemoinformatics web portal at this http URL
Citations
More filters
Journal ArticleDOI
TL;DR: In this article , the authors address the recent development of data-driven technologies for chemical reaction tasks, including forward reaction prediction, retrosynthesis, reaction optimization, catalysts design, inference of experimental procedures, and reaction classification.
Abstract: Discovering new reactions, optimizing their performance, and extending the synthetically accessible chemical space are critical drivers for major technological advances and more sustainable processes. The current wave of machine intelligence is revolutionizing all data‐rich disciplines. Machine intelligence has emerged as a potential game‐changer for chemical reaction space exploration and the synthesis of novel molecules and materials. Herein, we will address the recent development of data‐driven technologies for chemical reaction tasks, including forward reaction prediction, retrosynthesis, reaction optimization, catalysts design, inference of experimental procedures, and reaction classification. Accurate predictions of chemical reactivity are changing the R&D processes and, at the same time, promoting an accelerated discovery scheme both in academia and across chemical and pharmaceutical industries. This work will help to clarify the key contributions in the fields and the open challenges that remain to be addressed.

19 citations

Journal ArticleDOI
TL;DR: In this article , the requirements for an electrophilic fragment library and the importance of differing warhead reactivity are discussed, and successful case studies from the world of drug discovery are examined.
Abstract: Fragment based drug discovery has long been used for the identification of new ligands and interest in targeted covalent inhibitors has continued to grow in recent years, with high profile drugs such as osimertinib and sotorasib gaining FDA approval. It is therefore unsurprising that covalent fragment-based approaches have become popular and have recently led to the identification of novel targets and binding sites, as well as ligands for targets previously thought to be ‘undruggable’. Understanding the properties of such covalent fragments is important, and characterizing and/or predicting reactivity can be highly useful. This review aims to discuss the requirements for an electrophilic fragment library and the importance of differing warhead reactivity. Successful case studies from the world of drug discovery are then be examined.

5 citations

Journal ArticleDOI
TL;DR: In this article , a deep neural network structure was proposed to predict the solubility of ammonia in ionic liquids based on molecular structure, combined with support vector machine (SVM), random forest (RF) and deep neural networks (DNN) algorithm.
Abstract: The rapid selection of environmentally friendly and efficient solvents is critical for improving the safety, environmental protection, and efficiency of a process. In this study, a deep neural network structure was proposed to predict the solubility of ammonia in ionic liquids based on molecular structure, combined with support vector machine (SVM), random forest (RF) and deep neural network (DNN) algorithm. In this study, a group-based quantisation method for ionic liquids was proposed. On this basis, a feature preprocessing method integrating feature selection and data standardisation was proposed. Then, the eigenvectors extracted from the molecular structure were used to predict the solubility of ammonia in ionic liquids using SVM, RF and DNN models. Based on the cross-validation optimisation model structure, three models were evaluated. Results showed that the three models yielded high prediction accuracy, and that the prediction accuracy of the MLP model was higher than those of the SVM and RF models. For the MLP model, the coefficient of determination was 0.992. The model has good prediction performance and generalisation ability. Therefore, it can be used to select the best ionic liquid ammonia absorbent accurately and efficiently. GRAPHICAL ABSTRACT

4 citations

Journal ArticleDOI
TL;DR: The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online as mentioned in this paper , which is used to measure the importance of an article in terms of the number of citations it has received.
Abstract: ADVERTISEMENT RETURN TO ISSUEEditorialNEXTSpecial Issue on Reaction Informatics and Chemical SpaceMatthias Rarey*Matthias RareyUniversität Hamburg, ZBH − Center for Bioinformatics, 20146 Hamburg, Germany*[email protected]More by Matthias Rareyhttps://orcid.org/0000-0002-9553-6531, Marc C. NicklausMarc C. NicklausNCI, NIH, CADD Group, NCI-Frederick, Frederick, Maryland 21702, United StatesMore by Marc C. Nicklaushttps://orcid.org/0000-0002-4775-7030, and Wendy WarrWendy WarrWendy Warr & Associates, Cheshire CW4 7HZ, U.K.More by Wendy Warrhttps://orcid.org/0000-0002-7064-4739Cite this: J. Chem. Inf. Model. 2022, 62, 9, 2009–2010Publication Date (Web):May 9, 2022Publication History Published online9 May 2022Published inissue 9 May 2022https://doi.org/10.1021/acs.jcim.2c00390Copyright © Published 2022 by American Chemical SocietyRIGHTS & PERMISSIONSArticle Views1172Altmetric-Citations2LEARN ABOUT THESE METRICSArticle Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated. Share Add toView InAdd Full Text with ReferenceAdd Description ExportRISCitationCitation and abstractCitation and referencesMore Options Share onFacebookTwitterWechatLinked InReddit PDF (517 KB) Get e-AlertsSUBJECTS:Algorithms,Biological databases,Chemical reactions,Chemoinformatics,Machine learning Get e-Alerts

3 citations

Journal ArticleDOI
TL;DR: In this paper , the authors present alternative approaches for the efficient generation of quantitative structure-reactivity relationships that are based on quantum chemistry, supervised learning, and uncertainty quantification, and observe a tendency for these relationships to become not only more predictive but also more interpretable over time.
Abstract: Reactivity scales are useful research tools for chemists, both experimental and computational. However, to determine the reactivity of a single molecule, multiple measurements need to be carried out, which is a time-consuming and resource-intensive task. In this Tutorial Review, we present alternative approaches for the efficient generation of quantitative structure-reactivity relationships that are based on quantum chemistry, supervised learning, and uncertainty quantification. First published in 2002, we observe a tendency for these relationships to become not only more predictive but also more interpretable over time.

2 citations

References
More filters
Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Journal ArticleDOI
TL;DR: It is shown by an extensive benchmark on molecular energy data that the mathematical form of the damping function in DFT‐D methods has only a minor impact on the quality of the results and BJ‐damping seems to provide a physically correct short‐range behavior of correlation/dispersion even with unmodified standard functionals.
Abstract: It is shown by an extensive benchmark on molecular energy data that the mathematical form of the damping function in DFT-D methods has only a minor impact on the quality of the results. For 12 different functionals, a standard "zero-damping" formula and rational damping to finite values for small interatomic distances according to Becke and Johnson (BJ-damping) has been tested. The same (DFT-D3) scheme for the computation of the dispersion coefficients is used. The BJ-damping requires one fit parameter more for each functional (three instead of two) but has the advantage of avoiding repulsive interatomic forces at shorter distances. With BJ-damping better results for nonbonded distances and more clear effects of intramolecular dispersion in four representative molecular structures are found. For the noncovalently-bonded structures in the S22 set, both schemes lead to very similar intermolecular distances. For noncovalent interaction energies BJ-damping performs slightly better but both variants can be recommended in general. The exception to this is Hartree-Fock that can be recommended only in the BJ-variant and which is then close to the accuracy of corrected GGAs for non-covalent interactions. According to the thermodynamic benchmarks BJ-damping is more accurate especially for medium-range electron correlation problems and only small and practically insignificant double-counting effects are observed. It seems to provide a physically correct short-range behavior of correlation/dispersion even with unmodified standard functionals. In any case, the differences between the two methods are much smaller than the overall dispersion effect and often also smaller than the influence of the underlying density functional.

14,151 citations

Journal ArticleDOI
TL;DR: This chapter discusses the construction of Benzenoid and Coronoid Hydrocarbons through the stages of enumeration, classification, and topological properties in a number of computers used for this purpose.
Abstract: (1) Klamer, A. D. “Some Results Concerning Polyominoes”. Fibonacci Q. 1965, 3(1), 9-20. (2) Golomb, S. W. Polyominoes·, Scribner, New York, 1965. (3) Harary, F.; Read, R. C. “The Enumeration of Tree-like Polyhexes”. Proc. Edinburgh Math. Soc. 1970, 17, 1-14. (4) Lunnon, W. F. “Counting Polyominoes” in Computers in Number Theory·, Academic: London, 1971; pp 347-372. (5) Lunnon, W. F. “Counting Hexagonal and Triangular Polyominoes”. Graph Theory Comput. 1972, 87-100. (6) Brunvoll, J.; Cyvin, S. J.; Cyvin, B. N. “Enumeration and Classification of Benzenoid Hydrocarbons”. J. Comput. Chem. 1987, 8, 189-197. (7) Balaban, A. T., et al. “Enumeration of Benzenoid and Coronoid Hydrocarbons”. Z. Naturforsch., A: Phys., Phys. Chem., Kosmophys. 1987, 42A, 863-870. (8) Gutman, I. “Topological Properties of Benzenoid Systems”. Bull. Soc. Chim., Beograd 1982, 47, 453-471. (9) Gutman, I.; Polansky, O. E. Mathematical Concepts in Organic Chemistry·, Springer: Berlin, 1986. (10) To3i6, R.; Doroslovacki, R.; Gutman, I. “Topological Properties of Benzenoid Systems—The Boundary Code”. MATCH 1986, No. 19, 219-228. (11) Doroslovacki, R.; ToSic, R. “A Characterization of Hexagonal Systems”. Rev. Res. Fac. Sci.-Univ. Novi Sad, Math. Ser. 1984,14(2) 201-209. (12) Knop, J. V.; Szymanski, K.; Trinajstic, N. “Computer Enumeration of Substituted Polyhexes”. Comput. Chem. 1984, 8(2), 107-115. (13) Stojmenovic, L; Tosió, R.; Doroslovaóki, R. “Generating and Counting Hexagonal Systems”. Proc. Yugosl. Semin. Graph Theory, 6th, Dubrovnik 1985; pp 189-198. (14) Doroslovaóki, R.; Stojmenovió, I.; Tosió, R. “Generating and Counting Triangular Systems”. BIT 1987, 27, 18-24. (15) Knop, J. V.; Miller, W. R.; Szymanski, K.; Trinajstic, N. Computer Generation of Certain Classes of Molecules·, Association of Chemists and Technologists of Croatia: Zagreb, 1985.

4,541 citations

Journal ArticleDOI
TL;DR: A description of their implementation has not previously been presented in the literature, and ECFPs can be very rapidly calculated and can represent an essentially infinite number of different molecular features.
Abstract: Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure−activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.

4,173 citations

Book ChapterDOI
03 Jun 2018
TL;DR: It is shown that factorization models for link prediction such as DistMult can be significantly improved through the use of an R-GCN encoder model to accumulate evidence over multiple inference steps in the graph, demonstrating a large improvement of 29.8% on FB15k-237 over a decoder-only baseline.
Abstract: Knowledge graphs enable a wide variety of applications, including question answering and information retrieval. Despite the great effort invested in their creation and maintenance, even the largest (e.g., Yago, DBPedia or Wikidata) remain incomplete. We introduce Relational Graph Convolutional Networks (R-GCNs) and apply them to two standard knowledge base completion tasks: Link prediction (recovery of missing facts, i.e. subject-predicate-object triples) and entity classification (recovery of missing entity attributes). R-GCNs are related to a recent class of neural networks operating on graphs, and are developed specifically to handle the highly multi-relational data characteristic of realistic knowledge bases. We demonstrate the effectiveness of R-GCNs as a stand-alone model for entity classification. We further show that factorization models for link prediction such as DistMult can be significantly improved through the use of an R-GCN encoder model to accumulate evidence over multiple inference steps in the graph, demonstrating a large improvement of 29.8% on FB15k-237 over a decoder-only baseline.

3,168 citations