scispace - formally typeset
Journal ArticleDOI: 10.1039/D0NP00043D

Machine learning approaches for elucidating the biological effects of natural products.

04 Mar 2021-Natural Product Reports (The Royal Society of Chemistry)-Vol. 38, Iss: 2, pp 346-361
Abstract: Covering: 2000 to 2020 Machine learning (ML) is an efficient tool for the prediction of bioactivity and the study of structure–activity relationships. Over the past decade, an emerging trend for combining these approaches with the study of natural products (NPs) has developed in order to manage the challenge of the discovery of bioactive NPs. In the present review, we will introduce the basic principles and protocols for using the ML approach to investigate the bioactivity of NPs, citing a series of practical examples regarding the study of anti-microbial, anti-cancer, and anti-inflammatory NPs, etc. ML algorithms manage a variety of classification and regression problems associated with bioactive NPs, from those that are linear to non-linear and from pure compounds to plant extracts. Inspired by cases reported in the literature and our own experience, a number of key points have been emphasized for reducing modeling errors, including dataset preparation and applicability domain analysis.

... read more

Citations
  More

14 results found


Open accessJournal ArticleDOI: 10.1007/S13659-020-00293-7
Min Huang1, Jin-Jian Lu2, Jian Ding1Institutions (2)
Abstract: Natural products, with remarkable chemical diversity, have been extensively investigated for their anticancer potential for more than a half-century. The collective efforts of the community have achieved the tremendous advancements, bringing natural products to clinical use and discovering new therapeutic opportunities, yet the challenges remain ahead. With remarkable changes in the landscape of cancer therapy and growing role of cutting-edge technologies, we may have come to a crossroads to revisit the strategies to understand nature products and to explore their therapeutic utility. This review summarizes the key advancements in nature product-centered cancer research and calls for the implementation of systematic approaches, new pharmacological models, and exploration of emerging directions to revitalize natural products search in cancer therapy.

... read more

14 Citations


Open accessJournal ArticleDOI: 10.3390/BIOM10111566
17 Nov 2020-
Abstract: Natural products have a significant role in drug discovery. Natural products have distinctive chemical structures that have contributed to identifying and developing drugs for different therapeutic areas. Moreover, natural products are significant sources of inspiration or starting points to develop new therapeutic agents. Natural products such as peptides and macrocycles, and other compounds with unique features represent attractive sources to address complex diseases. Computational approaches that use chemoinformatics and molecular modeling methods contribute to speed up natural product-based drug discovery. Several research groups have recently used computational methodologies to organize data, interpret results, generate and test hypotheses, filter large chemical databases before the experimental screening, and design experiments. This review discusses a broad range of chemoinformatics applications to support natural product-based drug discovery. We emphasize profiling natural product data sets in terms of diversity; complexity; acid/base; absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties; and fragment analysis. Novel techniques for the visual representation of the chemical space are also discussed.

... read more

Topics: Cheminformatics (60%), Chemical space (57.99%), Drug discovery (56%) ... show more

10 Citations


Open accessJournal ArticleDOI: 10.1039/D0NP00061B
Abstract: Covering: up to the second quarter of 2020 Threat or treat? While pathogenic bacteria pose significant threats, they also represent a huge reservoir of potential pharmaceuticals to treat various diseases. The alarming antimicrobial resistance crisis and the dwindling clinical pipeline urgently call for the discovery and development of new antibiotics. Pathogenic bacteria have an enormous potential for natural products drug discovery, yet they remained untapped and understudied. Herein, we review the specialised metabolites isolated from entomopathogenic, phytopathogenic, and human pathogenic bacteria with antibacterial and antifungal activities, highlighting those currently in pre-clinical trials or with potential for drug development. Selected unusual biosynthetic pathways, the key roles they play (where known) in various ecological niches are described. We also provide an overview of the mode of action (molecular target), activity, and minimum inhibitory concentration (MIC) towards bacteria and fungi. The exploitation of pathogenic bacteria as a rich source of antimicrobials, combined with the recent advances in genomics and natural products research methodology, could pave the way for a new golden age of antibiotic discovery. This review should serve as a compendium to communities of medicinal chemists, organic chemists, natural product chemists, biochemists, clinical researchers, and many others interested in the subject.

... read more

Topics: Pathogenic bacteria (51%)

8 Citations


Open accessJournal ArticleDOI: 10.3390/BIOM10101385
Alice Capecchi1, Jean-Louis Reymond1Institutions (1)
28 Sep 2020-
Abstract: Microbial natural products (NPs) are an important source of drugs, however, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP). The resulting interactive map organizes molecules by physico-chemical properties and compound families such as peptides and glycosides. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin.

... read more

7 Citations


Open accessJournal ArticleDOI: 10.1039/D0NP00055H
David Prihoda1, Julia M. Maritz2, Ondrej Klempir, David Dzamba  +4 moreInstitutions (2)
Abstract: Covering: up to the end of 2020. The machine learning field can be defined as the study and application of algorithms that perform classification and prediction tasks through pattern recognition instead of explicitly defined rules. Among other areas, machine learning has excelled in natural language processing. As such methods have excelled at understanding written languages (e.g. English), they are also being applied to biological problems to better understand the “genomic language”. In this review we focus on recent advances in applying machine learning to natural products and genomics, and how those advances are improving our understanding of natural product biology, chemistry, and drug discovery. We discuss machine learning applications in genome mining (identifying biosynthetic signatures in genomic data), predictions of what structures will be created from those genomic signatures, and the types of activity we might expect from those molecules. We further explore the application of these approaches to data derived from complex microbiomes, with a focus on the human microbiome. We also review challenges in leveraging machine learning approaches in the field, and how the availability of other “omics” data layers provides value. Finally, we provide insights into the challenges associated with interpreting machine learning models and the underlying biology and promises of applying machine learning to natural product drug discovery. We believe that the application of machine learning methods to natural product research is poised to accelerate the identification of new molecular entities that may be used to treat a variety of disease indications.

... read more

5 Citations


References
  More

124 results found


Open accessJournal ArticleDOI: 10.1613/JAIR.953
Abstract: An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

... read more

11,512 Citations


Open accessJournal ArticleDOI: 10.1613/JAIR.953
Abstract: An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of oversampling the minority (abnormal)cla ss and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space)tha n only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space)t han varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC)and the ROC convex hull strategy.

... read more

Topics: Naive Bayes classifier (55%)

11,077 Citations


Open accessJournal ArticleDOI: 10.1023/A:1022689900470
03 Jan 1991-Machine Learning
Abstract: Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to solve incremental learning tasks. In this paper, we describe a framework and methodology, called instance-based learning, that generates classification predictions using only specific instances. Instance-based learning algorithms do not maintain a set of abstractions derived from specific instances. This approach extends the nearest neighbor algorithm, which has large storage requirements. We describe how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy. While the storage-reducing algorithm performs well on several real-world databases, its performance degrades rapidly with the level of attribute noise in training instances. Therefore, we extended it with a significance test to distinguish noisy instances. This extended algorithm's performance degrades gracefully with increasing noise levels and compares favorably with a noise-tolerant decision tree algorithm.

... read more

Topics: Instance-based learning (72%), Lazy learning (60%), Decision tree learning (56%) ... show more

4,492 Citations


Open accessJournal ArticleDOI: 10.1093/NAR/GKY1049
Topics: UniProt (68%)

3,758 Citations


Journal ArticleDOI: 10.1021/CI100050T
David Rogers1, Mathew Hahn1Institutions (1)
Abstract: Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure−activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.

... read more

2,865 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
202110
20204
Network Information