scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Scaffold‐directed face selectivity Machine‐Learned from vectors of non‐covalent interactions

19 Apr 2021-Angewandte Chemie (John Wiley & Sons, Ltd)-Vol. 60, Iss: 28, pp 15230-15235
TL;DR: In this article, a method to vectorize and machine-learn non-covalent interactions responsible for scaffold-directed reactions important in synthetic chemistry is described, and models trained on this representation predict correct face of approach in ca. 90% of Michael additions or Diels-Alder cycloadditions.
Abstract: This work describes a method to vectorize and Machine-Learn, ML, non-covalent interactions responsible for scaffold-directed reactions important in synthetic chemistry. Models trained on this representation predict correct face of approach in ca. 90 % of Michael additions or Diels-Alder cycloadditions. These accuracies are significantly higher than those based on traditional ML descriptors, energetic calculations, or intuition of experienced synthetic chemists. Our results also emphasize the importance of ML models being provided with relevant mechanistic knowledge; without such knowledge, these models cannot easily "transfer-learn" and extrapolate to previously unseen reaction mechanisms.
Citations
More filters
Journal ArticleDOI
TL;DR: It is shown that ML models cannot offer any meaningful predictions of optimum reaction conditions, even if the search space is restricted to only solvents and bases, and highlighted the likely importance of systematically generating reliable and standardized data sets for algorithm training.
Abstract: Applications of machine learning (ML) to synthetic chemistry rely on the assumption that large numbers of literature-reported examples should enable construction of accurate and predictive models of chemical reactivity. This paper demonstrates that abundance of carefully curated literature data may be insufficient for this purpose. Using an example of Suzuki–Miyaura coupling with heterocyclic building blocks—and a carefully selected database of >10,000 literature examples—we show that ML models cannot offer any meaningful predictions of optimum reaction conditions, even if the search space is restricted to only solvents and bases. This result holds irrespective of the ML model applied (from simple feed-forward to state-of-the-art graph-convolution neural networks) or the representation to describe the reaction partners (various fingerprints, chemical descriptors, latent representations, etc.). In all cases, the ML methods fail to perform significantly better than naive assignments based on the sheer frequency of certain reaction conditions reported in the literature. These unsatisfactory results likely reflect subjective preferences of various chemists to use certain protocols, other biasing factors as mundane as availability of certain solvents/reagents, and/or a lack of negative data. These findings highlight the likely importance of systematically generating reliable and standardized data sets for algorithm training.

35 citations

Journal ArticleDOI
Li-Cheng Xu1, Shuo-Qing Zhang1, Xin Li1, Miao-Jiong Tang1, Pei-Pei Xie1, Xin Hong1 
TL;DR: In this article, the authors developed a hierarchical learning approach to achieve predictive machine leaning model using only dozens of enantioselectivity data with the target olefin, which offers a useful solution for the few-shot learning problem.
Abstract: Asymmetric hydrogenation of olefins is one of the most powerful asymmetric transformations in molecular synthesis. Although several privileged catalyst scaffolds are available, the catalyst development for asymmetric hydrogenation is still a time- and resource-consuming process due to the lack of predictive catalyst design strategy. Targeting the data-driven design of asymmetric catalysis, we herein report the development of a standardized database that contains the detailed information of over 12000 literature asymmetric hydrogenations of olefins. This database provides a valuable platform for the machine learning applications in asymmetric catalysis. Based on this database, we developed a hierarchical learning approach to achieve predictive machine leaning model using only dozens of enantioselectivity data with the target olefin, which offers a useful solution for the few-shot learning problem and will facilitate the reaction optimization with new olefin substrate in catalysis screening.

13 citations

Journal ArticleDOI
TL;DR: Trends in the function of time, reaction-type "popularity" and complexity based on the algorithm that extracts generalized reaction class templates are studied, useful in the context of computer-assisted synthesis, machine learning, and also for identifying erroneous entries in reaction databases.
Abstract: In terms of molecules and specific reaction examples, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. The proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well-known methods. The newly discovered chemistries are more complex than decades ago and allow for the rapid construction of complex scaffolds in fewer numbers of steps. We study these and other trends in the function of time, reaction-type popularity and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer-assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and identifying erroneous entries in reaction databases.

7 citations

Journal ArticleDOI
TL;DR: In this article , a machine learning workflow for the multi-objective optimization of catalytic reactions that employ chiral bisphosphine ligands was described through the optimization of two sequential reactions required in the asymmetric synthesis of an active pharmaceutical ingredient.
Abstract: Optimization of the catalyst structure to simultaneously improve multiple reaction objectives (e.g., yield, enantioselectivity, and regioselectivity) remains a formidable challenge. Herein, we describe a machine learning workflow for the multi-objective optimization of catalytic reactions that employ chiral bisphosphine ligands. This was demonstrated through the optimization of two sequential reactions required in the asymmetric synthesis of an active pharmaceutical ingredient. To accomplish this, a density functional theory-derived database of >550 bisphosphine ligands was constructed, and a designer chemical space mapping technique was established. The protocol used classification methods to identify active catalysts, followed by linear regression to model reaction selectivity. This led to the prediction and validation of significantly improved ligands for all reaction outputs, suggesting a general strategy that can be readily implemented for reaction optimizations where performance is controlled by bisphosphine ligands.

5 citations

Journal ArticleDOI
TL;DR: A recent review as mentioned in this paper summarizes the cutting-edge embedding techniques and model designs in synthetic performance prediction, elaborating how chemical knowledge can be incorporated into machine learning until June 2022.
Abstract: Abstract Recent years have witnessed a boom of machine learning (ML) applications in chemistry, which reveals the potential of data‐driven prediction of synthesis performance. Digitalization and ML modelling are the key strategies to fully exploit the unique potential within the synergistic interplay between experimental data and the robust prediction of performance and selectivity. A series of exciting studies have demonstrated the importance of chemical knowledge implementation in ML, which improves the model's capability for making predictions that are challenging and often go beyond the abilities of human beings. This Minireview summarizes the cutting‐edge embedding techniques and model designs in synthetic performance prediction, elaborating how chemical knowledge can be incorporated into machine learning until June 2022. By merging organic synthesis tactics and chemical informatics, we hope this Review can provide a guide map and intrigue chemists to revisit the digitalization and computerization of organic chemistry principles.

2 citations

References
More filters
Journal ArticleDOI
TL;DR: A description of their implementation has not previously been presented in the literature, and ECFPs can be very rapidly calculated and can represent an essentially infinite number of different molecular features.
Abstract: Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure−activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.

4,173 citations

BookDOI
01 May 1998
TL;DR: This chapter discusses Reinforcement Learning with Self-Modifying Policies J. Schmidhuber, et al., and theoretical Models of Learning to Learn J. Baxter, a first step towards Continual Learning.
Abstract: Preface. Part I: Overview Articles. 1. Learning to Learn: Introduction and Overview S. Thrun, L. Pratt. 2. A Survey of Connectionist Network Reuse Through Transfer L. Pratt, B. Jennings. 3. Transfer in Cognition A. Robins. Part II: Prediction. 4. Theoretical Models of Learning to Learn J. Baxter. 5. Multitask Learning R. Caruana. 6. Making a Low-Dimensional Representation Suitable for Diverse Tasks N. Intrator, S. Edelman. 7. The Canonical Distortion Measure for Vector Quantization and Function Approximation J. Baxter. 8. Lifelong Learning Algorithms S. Thrun. Part III: Relatedness. 9. The Parallel Transfer of Task Knowledge Using Dynamic Learning Rates Based on a Measure of Relatedness D.L. Silver, R.E. Mercer. 10. Clustering Learning Tasks and the Selective Cross-Task Transfer of Knowledge S. Thrun, J. O'Sullivan. Part IV: Control. 11. CHILD: A First Step Towards Continual Learning M.B. Ring. 12. Reinforcement Learning with Self-Modifying Policies J. Schmidhuber, et al. 13. Creating Advice-Taking Reinforcement Learners R. Maclin, J.W. Shavlik. Contributing Authors. Index.

1,382 citations

Journal ArticleDOI
01 Mar 2018-Nature
TL;DR: This work combines Monte Carlo tree search with an expansion policy network that guides the search, and a filter network to pre-select the most promising retrosynthetic steps that solve for almost twice as many molecules, thirty times faster than the traditional computer-aided search method.
Abstract: To plan the syntheses of small organic molecules, chemists use retrosynthesis, a problem-solving technique in which target molecules are recursively transformed into increasingly simpler precursors. Computer-aided retrosynthesis would be a valuable tool but at present it is slow and provides results of unsatisfactory quality. Here we use Monte Carlo tree search and symbolic artificial intelligence (AI) to discover retrosynthetic routes. We combined Monte Carlo tree search with an expansion policy network that guides the search, and a filter network to pre-select the most promising retrosynthetic steps. These deep neural networks were trained on essentially all reactions ever published in organic chemistry. Our system solves for almost twice as many molecules, thirty times faster than the traditional computer-aided search method, which is based on extracted rules and hand-designed heuristics. In a double-blind AB test, chemists on average considered our computer-generated routes to be equivalent to reported literature routes.

1,146 citations

Journal ArticleDOI
TL;DR: This model has been applied to reactions of all types in both organic and inorganic chemistry, including substitutions and eliminations, cycloadditions, and several types of organometallic reactions.
Abstract: The activation strain or distortion/interaction model is a tool to analyze activation barriers that determine reaction rates. For bimolecular reactions, the activation energies are the sum of the energies to distort the reactants into geometries they have in transition states plus the interaction energies between the two distorted molecules. The energy required to distort the molecules is called the activation strain or distortion energy. This energy is the principal contributor to the activation barrier. The transition state occurs when this activation strain is overcome by the stabilizing interaction energy. Following the changes in these energies along the reaction coordinate gives insights into the factors controlling reactivity. This model has been applied to reactions of all types in both organic and inorganic chemistry, including substitutions and eliminations, cycloadditions, and several types of organometallic reactions.

880 citations

Journal ArticleDOI
TL;DR: In this article, molecular graph convolutions, a machine learning architecture for learning from undirected graphs, specifically small molecules, are described. But they do not outperform all fingerprint-based methods, and they represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.
Abstract: Molecular “fingerprints” encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular graph convolutions, a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph—atoms, bonds, distances, etc.—which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.

796 citations