scispace - formally typeset
Search or ask a question
Journal ArticleDOI

CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database

TL;DR: A new Resistomes & Variants module provides analysis and statistical summary of in silico predicted resistance variants from 82 pathogens and over 100 000 genomes, able to summarize predicted resistance using the information included in CARD, identify trends in AMR mobility and determine previously undescribed and novel resistance variants.
Abstract: The Comprehensive Antibiotic Resistance Database (CARD; https://card.mcmaster.ca) is a curated resource providing reference DNA and protein sequences, detection models and bioinformatics tools on the molecular basis of bacterial antimicrobial resistance (AMR). CARD focuses on providing high-quality reference data and molecular sequences within a controlled vocabulary, the Antibiotic Resistance Ontology (ARO), designed by the CARD biocuration team to integrate with software development efforts for resistome analysis and prediction, such as CARD's Resistance Gene Identifier (RGI) software. Since 2017, CARD has expanded through extensive curation of reference sequences, revision of the ontological structure, curation of over 500 new AMR detection models, development of a new classification paradigm and expansion of analytical tools. Most notably, a new Resistomes & Variants module provides analysis and statistical summary of in silico predicted resistance variants from 82 pathogens and over 100 000 genomes. By adding these resistance variants to CARD, we are able to summarize predicted resistance using the information included in CARD, identify trends in AMR mobility and determine previously undescribed and novel resistance variants. Here, we describe updates and recent expansions to CARD and its biocuration process, including new resources for community biocuration of AMR molecular reference data.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the authors showed that early life exposure of the environmental concentrations of antibiotics can increase the abundance of unfavorable bacteria, antibiotic resistance genes and associated pathways in the gut microbiome of zebrafish.

34 citations

Journal ArticleDOI
01 Aug 2021
TL;DR: In this paper, the authors compared the performance of ONT ligation and rapid read sets for bacterial whole-genome sequencing, with a specific aim of assessing their ability to recover small plasmid sequences.
Abstract: Oxford Nanopore Technologies (ONT) sequencing platforms currently offer two approaches to whole-genome native-DNA library preparation: ligation and rapid. In this study, we compared these two approaches for bacterial whole-genome sequencing, with a specific aim of assessing their ability to recover small plasmid sequences. To do so, we sequenced DNA from seven plasmid-rich bacterial isolates in three different ways: ONT ligation, ONT rapid and Illumina. Using the Illumina read depths to approximate true plasmid abundance, we found that small plasmids (<20 kbp) were underrepresented in ONT ligation read sets (by a mean factor of ~4) but were not underrepresented in ONT rapid read sets. This effect correlated with plasmid size, with the smallest plasmids being the most underrepresented in ONT ligation read sets. We also found lower rates of chimaeric reads in the rapid read sets relative to ligation read sets. These results show that when small plasmid recovery is important, ONT rapid library preparations are preferable to ligation-based protocols.

33 citations

Journal ArticleDOI
TL;DR: The DRAGdb database is a compilation of all the major MTB drug resistance genes across bacterial species, which allows identification of homoplasy and pleiotropy phenomena of DRAGs.
Abstract: Tuberculosis treatment includes broad-spectrum antibiotics such as rifampicin, streptomycin and fluoroquinolones, which are also used against other pathogenic bacteria. We developed Drug Resistance Associated Genes database (DRAGdb), a manually curated repository of mutational data of drug resistance associated genes (DRAGs) across ESKAPE (i.e. Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter spp.) pathogens, and other bacteria with a special focus on Mycobacterium tuberculosis (MTB). Analysis of mutations in drug-resistant genes listed in DRAGdb suggested both homoplasy and pleiotropy to be associated with resistance. Homoplasy was observed in six genes namely gidB, gyrA, gyrB, rpoB, rpsL and rrs. For these genes, drug resistance-associated mutations at codon level were conserved in MTB, ESKAPE and many other bacteria. Pleiotropy was exemplified by a single nucleotide mutation that was associated with resistance to amikacin, gentamycin, rifampicin and vancomycin in Staphylococcus aureus. DRAGdb data also revealed that mutations in some genes such as pncA, inhA, katG and embA,B,C were specific to Mycobacterium species. For inhA and pncA, the mutations in the promoter region along with those in coding regions were associated with resistance to isoniazid and pyrazinamide respectively. In summary, the DRAGdb database is a compilation of all the major MTB drug resistance genes across bacterial species, which allows identification of homoplasy and pleiotropy phenomena of DRAGs.

32 citations

Journal ArticleDOI
TL;DR: COPLA as mentioned in this paper is a bioinformatic tool for universal, species-independent, plasmid classification, which can assign plasmids to known and novel taxonomic units based on their genomic sequence.
Abstract: Plasmids are mobile genetic elements, key in the dissemination of antibiotic resistance, virulence determinants and other adaptive traits in bacteria. Obtaining a robust method for plasmid classification is necessary to better understand the genetics and epidemiology of many pathogens. Until now, plasmid classification systems focused on specific traits, which limited their precision and universality. The definition of plasmid taxonomic units (PTUs), based on average nucleotide identity metrics, allows the generation of a universal plasmid classification scheme, applicable to all bacterial taxa. Here we present COPLA, a software able to assign plasmids to known and novel PTUs, based on their genomic sequence. We implemented an automated pipeline able to assign a given plasmid DNA sequence to its cognate PTU, and assessed its performance using a sample of 1000 unclassified plasmids. Overall, 41% of the samples could be assigned to a previously defined PTU, a number that reached 63% in well-known taxa such as the Enterobacterales order. The remaining plasmids represent novel PTUs, indicating that a large fraction of plasmid backbones is still uncharacterized. COPLA is a bioinformatic tool for universal, species-independent, plasmid classification. Offered both as an automatable pipeline and an open web service, COPLA will help bacterial geneticists and clinical microbiologists to quickly classify plasmids.

32 citations

Posted ContentDOI
02 Oct 2020-bioRxiv
TL;DR: This work presents MetaGraph, a versatile framework for the scalable analysis of extensive sequence repositories, and introduces the concept of differential assembly, which allows for the extraction of sequences present in a foreground set of samples but absent in a given background set.
Abstract: The amount of biological sequencing data available in public repositories is growing exponentially, forming an invaluable biomedical research resource. Yet, making all this sequencing data searchable and easily accessible to life science and data science researchers is an unsolved problem. We present MetaGraph, a versatile framework for the scalable analysis of extensive sequence repositories. MetaGraph efficiently indexes vast collections of sequences to enable fast search and comprehensive analysis. A wide range of underlying data structures offer different practically relevant trade-offs between the space taken by an index and its query performance. Achieving compression ratios of up to 1,000-fold over the already compressed raw input data, MetaGraph indexes can represent the content of large sequencing archives in the working memory of a single compute server. We demonstrate our framework’s scalability by indexing over 1.4 million whole genome sequencing (WGS) records from NCBI’s Sequence Read Archive, representing a total input of more than three petabases. Meta-Graphprovides a flexible methodological framework allowing for index construction to be scaled from consumer laptops to distribution onto a cloud compute cluster for processing terabases to petabases of input data. Notably, processing of data sets ranging from 1 TB of raw WGS reads to 20 TB of human RNA-sequencing data results in indexes whose memory footprints are small enough to host on standard desktop workstations. Besides demonstrating the utility of MetaGraph indexes on key applications, such as experiment discovery, sequence alignment, error correction, and differential assembly, we make a wide range of indexes available as a community resource, including indexes of over 450,000 microbial WGS records, more than 110,000 fungi WGS records, and more than 40,000 whole metagenome sequencing records. A subset of these indexes is made available online for interactive queries. All indexes will be available for download and in the cloud. In total, indexes comprising more than 1 million sequencing records are available for download. As an example of our indexes’ integrative analysis capabilities, we introduce the concept of differential assembly, which allows for the extraction of sequences present in a foreground set of samples but absent in a given background set. We apply this technique to differentially assemble contigs to identify pathogenic agents transfected via human kidney transplants. In a second example, we indexed more than 20,000 human RNA-Seq records from the TCGA and GTEx cohorts and use them to extract transcriptome features that are hard to characterize using a classical linear reference. We discovered over 200 trans-splicing events in GTEx and found broad evidence for tissue-specific non-A-to-I RNA-editing in GTEx and TCGA.

32 citations


Cites background or methods from "CARD 2020: antibiotic resistome sur..."

  • ...# of A M R p er s am pl e In [1]: from metagraph....

    [...]

  • ...d) Same as in b, querying AMR gene DNA sequences from the CARD database [1]....

    [...]

  • ...In this analysis, the full CARD AMR database [1] is queried against a MetaGraph index containing more than 4,400 whole metagenome sequencing samples from the MetaSUB cohort....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations


"CARD 2020: antibiotic resistome sur..." refers background in this paper

  • ...The latter is described by CARD’s Model Ontology (MO, Supplementary Figure S1), which includes reference nucleotide and protein sequences, as well as additional search parameters including mutations conferring AMR (if applicable) and curated BLAST(P/N) (34,35) bit score cut-offs....

    [...]

Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations


"CARD 2020: antibiotic resistome sur..." refers methods in this paper

  • ...Metagenomics analysis (i.e. RGI bwt) uses Bowtie2 (40) or BWA (41) mapping of sequencing reads to CARD’s PHM reference sequences only, while annotation of genomes or assembly contigs predicts resistome using four of CARD’s AMR detection models: PHM, PVM, RVM and POM (note: RGI currently only scans for nonsynonymous substitutions; not frameshifts, deletions or insertions)....

    [...]

  • ...RGI bwt) uses Bowtie2 (40) or BWA (41) mapping of sequencing reads to CARD’s PHM reference sequences only, while annotation of genomes or assembly contigs predicts resistome using four of CARD’s AMR detection models: PHM, PVM, RVM and POM (note: RGI currently only scans for nonsynonymous substitutions; not frameshifts, deletions or insertions)....

    [...]

Journal ArticleDOI
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Abstract: The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

34,239 citations


"CARD 2020: antibiotic resistome sur..." refers methods in this paper

  • ...In 2017, we described the CARD*Shark text-mining algorithm (26) for computer-assisted literature triage, which we have expanded based on the new ARO Drug Class classification tags....

    [...]

Journal ArticleDOI
TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.
Abstract: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

13,223 citations


"CARD 2020: antibiotic resistome sur..." refers background or methods in this paper

  • ...The website also includes a built-in BLAST instance for comparing sequences to CARD reference sequences and a web instance of RGI for resistome prediction with data visualization tools (https:// card.mcmaster.ca/analyze)....

    [...]

  • ...The RVM is functionally similar to the PVM, except it works for rRNA mutations and therefore uses a nucleotide reference sequence and a BLASTN bit score cut-off....

    [...]

  • ...Briefly, RGI algorithmically predicts AMR genes and mutations from submitted genomes using a combination of open reading frame prediction with Prodigal (38), sequence alignment with BLAST (35) or DIAMOND (39), and curated resistance mutations included with the AMR detection model....

    [...]

  • ...In the same time period, the CARD website hosted ∼45 000 BLAST analyses, ∼220 000 RGI analyses, ∼64 000 data file downloads, and ∼10,000 RGI software downloads....

    [...]

  • ...We had determined that the asymptotic nature of the BLAST expectation value (E) gave it very low discriminatory power between different -lactamase gene families (nearly 13 of CARD’s content), but that the linear nature of the BLAST bit score (S′) allowed this level of discrimination....

    [...]