Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning.

doi:10.3390/BIOM10101385

Home
/
Papers
/
Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning.

Journal Article•DOI•

Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning.

Alice Capecchi¹, Jean-Louis Reymond¹•Institutions (1)

University of Bern¹

28 Sep 2020-Vol. 10, Iss: 10, pp 1385

TL;DR: The recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, is used to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin and separates bacterial and fungal NPs from one another.

read less

Abstract: Microbial natural products (NPs) are an important source of drugs, however, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP). The resulting interactive map organizes molecules by physico-chemical properties and compound families such as peptides and glycosides. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

NPClassifier: A Deep Neural Network-Based Structural Classification Tool for Natural Products.

[...]

Hyun-Woo Kim¹, Mingxun Wang², Christopher A. Leber¹, Louis-Félix Nothias², Raphael Reher³, Raphael Reher¹, Kyo Bin Kang⁴, Justin J. J. van der Hooft⁵, Pieter C. Dorrestein², William H. Gerwick², William H. Gerwick¹, Garrison W. Cottrell¹ - Show less +8 more•Institutions (5)

University of California, San Diego¹, University of Montana², Martin Luther University of Halle-Wittenberg³, Sookmyung Women's University⁴, Wageningen University and Research Centre⁵

18 Oct 2021-Journal of Natural Products

TL;DR: In this article, a deep learning tool for the automated structural classification of natural products (NPs) from their counted Morgan fingerprints is introduced. But it is not suitable for the classification of large numbers of NPs, and it cannot handle the massive amounts of data appearing for NP structures.

...read moreread less

Abstract: Computational approaches such as genome and metabolome mining are becoming essential to natural products (NPs) research. Consequently, a need exists for an automated structure-type classification system to handle the massive amounts of data appearing for NP structures. An ideal semantic ontology for the classification of NPs should go beyond the simple presence/absence of chemical substructures, but also include the taxonomy of the producing organism, the nature of the biosynthetic pathway, and/or their biological properties. Thus, a holistic and automatic NP classification framework could have considerable value to comprehensively navigate the relatedness of NPs, and especially so when analyzing large numbers of NPs. Here, we introduce NPClassifier, a deep-learning tool for the automated structural classification of NPs from their counted Morgan fingerprints. NPClassifier is expected to accelerate and enhance NP discovery by linking NP structures to their underlying properties.

...read moreread less

69 citations

Journal Article•DOI•

Natural product drug discovery in the artificial intelligence era

[...]

Fabien Gerard Christian Plisson, Michael A. Horst¹•Institutions (1)

National Autonomous University of Mexico¹

01 Jan 2022-Chemical Science

TL;DR: In this article , the rational applications of AI approaches developed to assist in discovering bioactive NPs and capturing the molecular "patterns" of these privileged structures for combinatorial design or target selectivity are discussed.

...read moreread less

Abstract: Natural products (NPs) are primarily recognized as privileged structures to interact with protein drug targets. Their unique characteristics and structural diversity continue to marvel scientists for developing NP-inspired medicines, even though the pharmaceutical industry has largely given up. High-performance computer hardware, extensive storage, accessible software and affordable online education have democratized the use of artificial intelligence (AI) in many sectors and research areas. The last decades have introduced natural language processing and machine learning algorithms, two subfields of AI, to tackle NP drug discovery challenges and open up opportunities. In this article, we review and discuss the rational applications of AI approaches developed to assist in discovering bioactive NPs and capturing the molecular "patterns" of these privileged structures for combinatorial design or target selectivity.

...read moreread less

44 citations

Journal Article•DOI•

Alkaloids in Contemporary Drug Discovery to Meet Global Disease Needs.

[...]

Sharna-kay Daley, Geoffrey A. Cordell¹•Institutions (1)

University of Florida¹

22 Jun 2021-Molecules

TL;DR: An overview of the role of alkaloids in drug discovery, the application of more sustainable chemicals, and biological approaches, and the implementation of information systems to address the current challenges faced in meeting global disease needs is presented in this article.

...read moreread less

Abstract: An overview is presented of the well-established role of alkaloids in drug discovery, the application of more sustainable chemicals, and biological approaches, and the implementation of information systems to address the current challenges faced in meeting global disease needs. The necessity for a new international paradigm for natural product discovery and development for the treatment of multidrug resistant organisms, and rare and neglected tropical diseases in the era of the Fourth Industrial Revolution and the Quintuple Helix is discussed.

...read moreread less

21 citations

Journal Article•DOI•

Progress on open chemoinformatic tools for expanding and exploring the chemical space

[...]

José L. Medina-Franco¹, Norberto Sánchez-Cruz¹, Edgar López-López², Edgar López-López¹, Bárbara I. Díaz-Eufracio¹ - Show less +1 more•Institutions (2)

National Autonomous University of Mexico¹, Instituto Politécnico Nacional²

18 Jun 2021-Journal of Computer-aided Molecular Design

TL;DR: In this article, the authors discuss the recent progress on chemoinformatic tools developed to expand and characterize the chemical space of compound data sets using different types of molecular representations, generate visual representations of such spaces, and explore structure-property relationships in the context of chemical spaces.

...read moreread less

Abstract: The concept of chemical space is a cornerstone in chemoinformatics, and it has broad conceptual and practical applicability in many areas of chemistry, including drug design and discovery. One of the most considerable impacts is in the study of structure-property relationships where the property can be a biological activity or any other characteristic of interest to a particular chemistry discipline. The chemical space is highly dependent on the molecular representation that is also a cornerstone concept in computational chemistry. Herein, we discuss the recent progress on chemoinformatic tools developed to expand and characterize the chemical space of compound data sets using different types of molecular representations, generate visual representations of such spaces, and explore structure-property relationships in the context of chemical spaces. We emphasize the development of methods and freely available tools focusing on drug discovery applications. We also comment on the general advantages and shortcomings of using freely available and easy-to-use tools and discuss the value of using such open resources for research, education, and scientific dissemination.

...read moreread less

17 citations

Journal Article•DOI•

Peptides in chemical space

[...]

Alice Capecchi¹, Jean-Louis Reymond¹•Institutions (1)

University of Bern¹

01 Mar 2021

TL;DR: An overview of the known peptide chemical space is presented in form of an interactive map representing 40,531 peptides collected from eleven open-access peptide and peptide-containing databases, accessible at https://tm.gdb.tools/map4/peptide_databases_tmap/ .

...read moreread less

Abstract: Peptides, defined as sequences of amino acids up to approximately 50 residues in length, represent an extremely large reservoir of potentially bioactive compounds, referred to here as the peptide chemical space. Recent advances in computer hardware and software have led to a wide application of computational methods to explore this chemical space. Here, we review different in silico approaches including structure-based design, genetic algorithms, and machine learning. We also review the use of molecular fingerprints to sample virtual libraries and to visualize the peptide chemical space. Finally, we present an overview of the known peptide chemical space in form of an interactive map representing 40,531 peptides collected from eleven open-access peptide and peptide-containing databases, accessible at https://tm.gdb.tools/map4/peptide_databases_tmap/ . These peptides are displayed as TMAP (Tree-Map) according to their molecular fingerprint similarity computed using MAP4, a MinHashed atom pair fingerprint well suited to analyze large molecules.

...read moreread less

14 citations

References

PDF

Open Access

More filters

Journal Article•

Scikit-learn: Machine Learning in Python

[...]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel¹, Peter Prettenhofer², Ron Weiss³, Vincent Dubourg, Jake Vanderplas⁴, Alexandre Passos⁵, David Cournapeau, Matthieu Brucher⁶, Matthieu Perrot, Edouard Duchesnay - Show less +12 more•Institutions (6)

Kobe University¹, Bauhaus University, Weimar², Google³, University of Washington⁴, University of Massachusetts Amherst⁵, Total S.A.⁶

01 Feb 2011-Journal of Machine Learning Research

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.

...read moreread less

Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

...read moreread less

47,974 citations

Journal Article•

Visualizing Data using t-SNE

[...]

Laurens van der Maaten, Geoffrey E. Hinton

01 Jan 2008-Journal of Machine Learning Research

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

...read moreread less

Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

...read moreread less

30,124 citations

Posted Content•

Scikit-learn: Machine Learning in Python

[...]

Fabian Pedregosa¹, Gaël Varoquaux¹, Alexandre Gramfort¹, Vincent Michel¹, Bertrand Thirion¹, Olivier Grisel, Mathieu Blondel, Andreas Müller², Joel Nothman, Gilles Louppe², Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Edouard Duchesnay - Show less +15 more•Institutions (2)

French Institute for Research in Computer Science and Automation¹, University of Liège²

02 Jan 2012-arXiv: Learning

TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.

...read moreread less

28,898 citations

Journal Article•DOI•

Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings

[...]

Christopher A. Lipinski¹, Franco Lombardo¹, Beryl W. Dominy¹, Paul J. Feeney¹•Institutions (1)

Pfizer¹

15 Jan 1997-Advanced Drug Delivery Reviews

TL;DR: Experimental and computational approaches to estimate solubility and permeability in discovery and development settings are described in this article, where the rule of 5 is used to predict poor absorption or permeability when there are more than 5 H-bond donors, 10 Hbond acceptors, and the calculated Log P (CLogP) is greater than 5 (or MlogP > 415).

...read moreread less

14,026 citations

Journal Article•DOI•

SciPy 1.0--Fundamental Algorithms for Scientific Computing in Python

[...]

Pauli Virtanen¹, Ralf Gommers, Travis E. Oliphant, Matt Haberland², Matt Haberland³, Tyler Reddy⁴, David Cournapeau, Evgeni Burovski⁵, Pearu Peterson, Warren Weckesser⁶, Jonathan Bright, Stefan van der Walt⁶, Matthew Brett⁷, Joshua Wilson, K. Jarrod Millman⁶, Nikolay Mayorov, Andrew Nelson⁸, Eric Jones, Robert Kern, Eric B. Larson⁹, CJ Carey¹⁰, Ilhan Polat, Yu Feng⁶, Eric Moore, Jake Vanderplas⁹, Denis Laxalde, Josef Perktold, Robert Cimrman¹¹, Ian Henriksen¹², Ian Henriksen¹³, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro¹⁴, Fabian Pedregosa¹⁵, Paul van Mulbregt¹⁵, SciPy . Contributors - Show less +33 more•Institutions (15)

University of Jyväskylä¹, University of California, Los Angeles², California Polytechnic State University³, Los Alamos National Laboratory⁴, National Research University – Higher School of Economics⁵, University of California, Berkeley⁶, University of Birmingham⁷, Australian Nuclear Science and Technology Organisation⁸, University of Washington⁹, University of Massachusetts Amherst¹⁰, University of West Bohemia¹¹, Brigham Young University¹², University of Texas at Austin¹³, Universidade Federal de Minas Gerais¹⁴, Google¹⁵

23 Jul 2019-arXiv: Mathematical Software

TL;DR: SciPy as discussed by the authors is an open source scientific computing library for the Python programming language, which includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics.

...read moreread less

Abstract: SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories, and millions of downloads per year. This includes usage of SciPy in almost half of all machine learning projects on GitHub, and usage by high profile projects including LIGO gravitational wave analysis and creation of the first-ever image of a black hole (M87). The library includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics. In this work, we provide an overview of the capabilities and development practices of the SciPy library and highlight some recent technical developments.

...read moreread less

12,774 citations