Predicting glycosylation stereoselectivity using machine learning.

Abstract: Predicting the stereochemical outcome of chemical reactions is challenging in mechanistically ambiguous transformations. The stereoselectivity of glycosylation reactions is influenced by at least eleven factors across four chemical participants and temperature. A random forest algorithm was trained using a highly reproducible, concise dataset to accurately predict the stereoselective outcome of glycosylations. The steric and electronic contributions of all chemical reagents and solvents were quantified by quantum mechanical calculations. The trained model accurately predicts stereoselectivities for unseen nucleophiles, electrophiles, acid catalyst, and solvents across a wide temperature range (overall root mean square error 6.8%). All predictions were validated experimentally on a standardized microreactor platform. The model helped to identify novel ways to control glycosylation stereoselectivity and accurately predicts previously unknown means of stereocontrol. By quantifying the degree of influence of each variable, we begin to gain a better general understanding of the transformation, for example that environmental factors influence the stereoselectivity of glycosylations more than the coupling partners in this area of chemical space.

Journal ArticleDOI: 10.1021/JACS.0C12096
Yue Fu1, Leonardo Bernasconi1, Peng Liu1Institutions (1)
Abstract: We report a computational approach to evaluate the reaction mechanisms of glycosylation using ab initio molecular dynamics (AIMD) simulations in explicit solvent. The reaction pathways are simulated via free energy calculations based on metadynamics and trajectory simulations using Born-Oppenheimer molecular dynamics. We applied this approach to investigate the mechanisms of the glycosylation of glucosyl α-trichloroacetimidate with three acceptors (EtOH, i-PrOH, and t-BuOH) in three solvents (ACN, DCM, and MTBE). The reactants and the solvents are treated explicitly using density functional theory. We show that the profile of the free energy surface, the synchronicity of the transition state structure, and the time gap between leaving group dissociation and nucleophile association can be used as three complementary indicators to describe the glycosylation mechanism within the SN1/SN2 continuum for a given reaction. This approach provides a reliable means to rationalize and predict reaction mechanisms and to estimate lifetimes of oxocarbenium intermediates and their dependence on the glycosyl donor, acceptor, and solvent environment.

Journal ArticleDOI: 10.1002/ANGE.202101986
19 Apr 2021-Angewandte Chemie
Abstract: This work describes a method to vectorize and Machine-Learn, ML, non-covalent interactions responsible for scaffold-directed reactions important in synthetic chemistry. Models trained on this representation predict correct face of approach in ca. 90 % of Michael additions or Diels-Alder cycloadditions. These accuracies are significantly higher than those based on traditional ML descriptors, energetic calculations, or intuition of experienced synthetic chemists. Our results also emphasize the importance of ML models being provided with relevant mechanistic knowledge; without such knowledge, these models cannot easily "transfer-learn" and extrapolate to previously unseen reaction mechanisms.

Open accessJournal ArticleDOI: 10.1039/D1RE00184A
Abstract: Continuous flow synthesis of active pharmaceutical ingredients (APIs) can offer access to process conditions that are otherwise hazardous when operated in batch mode, resulting in improved mixing and heat transfer, which enables higher yields and greater reaction selectivity. Reaction kinetic parameter estimation from flow synthesis data is an essential activity for the development of process models for drug substance manufacturing unit operations and systems, facilitating a reduction of experimental effort and accelerating development. The flow synthesis of lomustine, an anti-cancer API, in two flow reactors (carbamylation + nitrosation stages) was recently demonstrated by Jaman et al. (Org. Process Res. Dev., 2019, 23, 334). In this study, we postulate kinetic rate laws based on hereby proposed reaction mechanisms presented for the first time in the literature for this API synthesis. We then perform kinetic parameter regression for the proposed rate laws, on the basis of published data, towards establishing reactor models. For the carbamylation (irreversible reaction), we compare two candidate reaction rate laws, an overall third-order rate law (first-order in each reagent) deriving best fit. For the nitrosation, we propose two substitution reactions on the basis of published mechanisms (a rate-limiting equilibrium step, followed by a fast irreversible reaction) with very good model fit.

Journal ArticleDOI: 10.1002/ANIE.202111540
23 Sep 2021-Angewandte Chemie
Abstract: In terms of molecules and specific reaction examples, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. The proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well-known methods. The newly discovered chemistries are more complex than decades ago and allow for the rapid construction of complex scaffolds in fewer numbers of steps. We study these and other trends in the function of time, reaction-type popularity and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer-assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and identifying erroneous entries in reaction databases.

Journal ArticleDOI: 10.1021/ACS.ORGLETT.1C01968
16 Jul 2021-Organic Letters
Abstract: A multifunctional O-phenyl thiocarbonyl (O(C═S)OPh) group was introduced in glycosylation reactions. This auxiliary group exhibits three features (1) C6-long-range participation effect, (2) relay activation, and (3) switchable promoter-controlled carbonylation, which enables the facile synthesis of both 6-deoxy glucoside and 6-alcohol glucoside. In addition, we successfully quantified the extent of the C6-acyl participation effect and developed its application toward the α-trisaccharide motif.

Open accessJournal ArticleDOI: 10.1162/153244303322753616
Isabelle Guyon, André Elisseeff1Institutions (1)
Abstract: Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis, and combinatorial chemistry. The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data. The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

Open accessJournal ArticleDOI: 10.1038/S41467-017-02088-W
Abstract: Many studies have shown how pigments and internal nanostructures generate color in nature. External surface structures can also influence appearance, such as by causing multiple scattering of light (structural absorption) to produce a velvety, super black appearance. Here we show that feathers from five species of birds of paradise (Aves: Paradisaeidae) structurally absorb incident light to produce extremely low-reflectance, super black plumages. Directional reflectance of these feathers (0.05-0.31%) approaches that of man-made ultra-absorbent materials. SEM, nano-CT, and ray-tracing simulations show that super black feathers have titled arrays of highly modified barbules, which cause more multiple scattering, resulting in more structural absorption, than normal black feathers. Super black feathers have an extreme directional reflectance bias and appear darkest when viewed from the distal direction. We hypothesize that structurally absorbing, super black plumage evolved through sensory bias to enhance the perceived brilliance of adjacent color patches during courtship display.

Journal ArticleDOI: 10.1002/SIM.4780030207
Frank E. Harrell1, Kerry L. Lee1, Robert M. Califf1, David B. Pryor1  +1 moreInstitutions (1)
Abstract: Regression models such as the Cox proportional hazards model have had increasing use in modelling and estimating the prognosis of patients with a variety of diseases. Many applications involve a large number of variables to be modelled using a relatively small patient sample. Problems of overfitting and of identifying important covariates are exacerbated in analysing prognosis because the accuracy of a model is more a function of the number of events than of the sample size. We used a general index of predictive discrimination to measure the ability of a model developed on training samples of varying sizes to predict survival in an independent test sample of patients suspected of having coronary artery disease. We compared three methods of model fitting: (1) standard ‘step-up’ variable selection, (2) incomplete principal components regression, and (3) Cox model regression after developing clinical indices from variable clusters. We found regression using principal components to offer superior predictions in the test sample, whereas regression using indices offers easily interpretable models nearly as good as the principal components models. Standard variable selection has a number of deficiencies.

Open accessJournal ArticleDOI: 10.1038/S41586-018-0337-2
26 Jul 2018-Nature
Abstract: Here we summarize recent progress in machine learning for the chemical sciences. We outline machine-learning techniques that are suitable for addressing research questions in this domain, as well as future directions for the field. We envisage a future in which the design, synthesis, characterization and application of molecules and materials is accelerated by artificial intelligence.

