scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Deep Learning Approach to Antibiotic Discovery

TL;DR: A deep neural network capable of predicting molecules with antibacterial activity is trained and a molecule from the Drug Repurposing Hub-halicin- is discovered that is structurally divergent from conventional antibiotics and displays bactericidal activity against a wide phylogenetic spectrum of pathogens.
About: This article is published in Cell.The article was published on 2020-02-20 and is currently open access. It has received 1002 citations till now.
Citations
More filters
Posted Content
TL;DR: The OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains, ranging from social and information networks to biological networks, molecular graphs, source code ASTs, and knowledge graphs, indicating fruitful opportunities for future research.
Abstract: We present the Open Graph Benchmark (OGB), a diverse set of challenging and realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains, ranging from social and information networks to biological networks, molecular graphs, source code ASTs, and knowledge graphs. For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics. In addition to building the datasets, we also perform extensive benchmark experiments for each dataset. Our experiments suggest that OGB datasets present significant challenges of scalability to large-scale graphs and out-of-distribution generalization under realistic data splits, indicating fruitful opportunities for future research. Finally, OGB provides an automated end-to-end graph ML pipeline that simplifies and standardizes the process of graph data loading, experimental setup, and model evaluation. OGB will be regularly updated and welcomes inputs from the community. OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at this https URL .

1,097 citations

Posted Content
TL;DR: The TUDataset for graph classification and regression is introduced, which consists of over 120 datasets of varying sizes from a wide range of applications and provides Python-based data loaders, kernel and graph neural network baseline implementations, and evaluation tools.
Abstract: Recently, there has been an increasing interest in (supervised) learning with graph data, especially using graph neural networks. However, the development of meaningful benchmark datasets and standardized evaluation procedures is lagging, consequently hindering advancements in this area. To address this, we introduce the TUDataset for graph classification and regression. The collection consists of over 120 datasets of varying sizes from a wide range of applications. We provide Python-based data loaders, kernel and graph neural network baseline implementations, and evaluation tools. Here, we give an overview of the datasets, standardized evaluation procedures, and provide baseline experiments. All datasets are available at this http URL. The experiments are fully reproducible from the code available at this http URL.

346 citations

Journal ArticleDOI
TL;DR: Key findings from a 2-year weekly effort to track and share key developments in medical AI are discussed, including prospective studies and advances in medical image analysis, which have reduced the gap between research and deployment.

346 citations

Journal ArticleDOI
TL;DR: Machine learning is becoming a widely used tool for the analysis of biological data as mentioned in this paper, however, proper use of machine learning methods can be challenging for experimentalists, proper application of ML methods can also be challenging, and best practices and points to consider when embarking on experiments involving machine learning are discussed.
Abstract: The expanding scale and inherent complexity of biological data have encouraged a growing use of machine learning in biology to build informative and predictive models of the underlying biological processes. All machine learning techniques fit models to data; however, the specific methods are quite varied and can at first glance seem bewildering. In this Review, we aim to provide readers with a gentle introduction to a few key machine learning techniques, including the most recently developed and widely used techniques involving deep neural networks. We describe how different techniques may be suited to specific types of biological data, and also discuss some best practices and points to consider when one is embarking on experiments involving machine learning. Some emerging directions in machine learning methodology are also discussed. Machine learning is becoming a widely used tool for the analysis of biological data. However, for experimentalists, proper use of machine learning methods can be challenging. This Review provides an overview of machine learning techniques and provides guidance on their applications in biology.

325 citations

Journal ArticleDOI
19 Aug 2021
TL;DR: In this paper, the authors present a strategic blueprint to substantially improve our ability to discover and develop new antibiotics, and propose both short-term and long-term solutions to overcome the most urgent limitations in the various sectors of research and funding.
Abstract: An ever-increasing demand for novel antimicrobials to treat life-threatening infections caused by the global spread of multidrug-resistant bacterial pathogens stands in stark contrast to the current level of investment in their development, particularly in the fields of natural-product-derived and synthetic small molecules. New agents displaying innovative chemistry and modes of action are desperately needed worldwide to tackle the public health menace posed by antimicrobial resistance. Here, our consortium presents a strategic blueprint to substantially improve our ability to discover and develop new antibiotics. We propose both short-term and long-term solutions to overcome the most urgent limitations in the various sectors of research and funding, aiming to bridge the gap between academic, industrial and political stakeholders, and to unite interdisciplinary expertise in order to efficiently fuel the translational pipeline for the benefit of future generations.

255 citations

References
More filters
Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations


Additional excerpts

  • ...REAGENT or RESOURCE SOURCE IDENTIFIER tgcaaaataatatgcaccacgacggcggtcagaaaaataa This study AB5046 gaagcgttacttcgcgatctgatcaacgattcgtggaatc This study AB5047 Software and Algorithms Chemprop Yang et al., 2019b https://github.com/swansonk14/chemprop RDKit Landrum, 2006 https://github.com/rdkit BWA Li and Durbin, 2009 https://github.com/lh3/bwa DESeq2 Love et al., 2014 https://bioconductor.org/packages/ release/bioc/html/DESeq2.html edgeR Robinson et al., 2010 https://bioconductor.org/packages/ release/bioc/html/edgeR.html GenomeView Abeel et al., 2012 https://genomeview.org EcoCyc Pathway Tools Keseler et al., 2013 https://ecocyc.org...

    [...]

  • ...…AB5047 Software and Algorithms Chemprop Yang et al., 2019b https://github.com/swansonk14/chemprop RDKit Landrum, 2006 https://github.com/rdkit BWA Li and Durbin, 2009 https://github.com/lh3/bwa DESeq2 Love et al., 2014 https://bioconductor.org/packages/ release/bioc/html/DESeq2.html edgeR…...

    [...]

Journal ArticleDOI
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Abstract: Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).

29,413 citations

Journal ArticleDOI
TL;DR: A simple and highly efficient method to disrupt chromosomal genes in Escherichia coli in which PCR primers provide the homology to the targeted gene(s), which should be widely useful, especially in genome analysis of E. coli and other bacteria.
Abstract: We have developed a simple and highly efficient method to disrupt chromosomal genes in Escherichia coli in which PCR primers provide the homology to the targeted gene(s). In this procedure, recombination requires the phage lambda Red recombinase, which is synthesized under the control of an inducible promoter on an easily curable, low copy number plasmid. To demonstrate the utility of this approach, we generated PCR products by using primers with 36- to 50-nt extensions that are homologous to regions adjacent to the gene to be inactivated and template plasmids carrying antibiotic resistance genes that are flanked by FRT (FLP recognition target) sites. By using the respective PCR products, we made 13 different disruptions of chromosomal genes. Mutants of the arcB, cyaA, lacZYA, ompR-envZ, phnR, pstB, pstCA, pstS, pstSCAB-phoU, recA, and torSTRCAD genes or operons were isolated as antibiotic-resistant colonies after the introduction into bacteria carrying a Red expression plasmid of synthetic (PCR-generated) DNA. The resistance genes were then eliminated by using a helper plasmid encoding the FLP recombinase which is also easily curable. This procedure should be widely useful, especially in genome analysis of E. coli and other bacteria because the procedure can be done in wild-type cells.

14,389 citations

Journal ArticleDOI
TL;DR: A description of their implementation has not previously been presented in the literature, and ECFPs can be very rapidly calculated and can represent an essentially infinite number of different molecular features.
Abstract: Extended-connectivity fingerprints (ECFPs) are a novel class of topological fingerprints for molecular characterization. Historically, topological fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure−activity modeling. ECFPs are circular fingerprints with a number of useful qualities: they can be very rapidly calculated; they are not predefined and can represent an essentially infinite number of different molecular features (including stereochemical information); their features represent the presence of particular substructures, allowing easier interpretation of analysis results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.

4,173 citations


"A Deep Learning Approach to Antibio..." refers background in this paper

  • ...An important development relates to how molecules are represented; traditionally, molecules were represented by their fingerprint vectors, which reflected the presence or absence of functional groups in the molecule, or by descriptors that include computable molecular properties and require expert knowledge to construct (Mauri et al., 2006; Moriwaki et al., 2018; Rogers and Hahn, 2010)....

    [...]

  • ...Excitingly, we observed that halicin resulted in C. difficile clearance at a greater rate than vehicle or the antibiotic metronidazole (Figure 5F), which is not only a first-line treatment for C. difficile infection, but also the antibiotic most similar to halicin based on Tanimoto score (Figure 2H; Table S2H)....

    [...]

  • ...37; Figures 2G and 2H; Table S2H) (Rogers and Hahn, 2010) and the antibiotic metronidazole (Tanimoto similarity 0....

    [...]

  • ...Excitingly, halicin, which is structurally most similar to a family of nitro-containing antiparasitic compounds (Tanimoto similarity 0.37; Figures 2G and 2H; Table S2H) (Rogers and Hahn, 2010) and the antibiotic metronidazole (Tanimoto similarity 0.21), displayed excellent growth inhibitory activity against E. coli, achieving a minimum inhibitory concentration (MIC) of 2 mg/mL (Figure 2I)....

    [...]