scispace - formally typeset
Search or ask a question
Journal ArticleDOI

In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing

TL;DR: Two easy-to-use Web tools for in silico detection and characterization of whole-genome sequence (WGS) and whole-plasmid sequence data from members of the family Enterobacteriaceae are designed and developed.
Abstract: In the work presented here, we designed and developed two easy-to-use Web tools for in silico detection and characterization of whole-genome sequence (WGS) and whole-plasmid sequence data from members of the family Enterobacteriaceae. These tools will facilitate bacterial typing based on draft genomes of multidrug-resistant Enterobacteriaceae species by the rapid detection of known plasmid types. Replicon sequences from 559 fully sequenced plasmids associated with the family Enterobacteriaceae in the NCBI nucleotide database were collected to build a consensus database for integration into a Web tool called PlasmidFinder that can be used for replicon sequence analysis of raw, contig group, or completely assembled and closed plasmid sequencing data. The PlasmidFinder database currently consists of 116 replicon sequences that match with at least at 80% nucleotide identity all replicon sequences identified in the 559 fully sequenced plasmids. For plasmid multilocus sequence typing (pMLST) analysis, a database that is updated weekly was generated from www.pubmlst.org and integrated into a Web tool called pMLST. Both databases were evaluated using draft genomes from a collection of Salmonella enterica serovar Typhimurium isolates. PlasmidFinder identified a total of 103 replicons and between zero and five different plasmid replicons within each of 49 S . Typhimurium draft genomes tested. The pMLST Web tool was able to subtype genomic sequencing data of plasmids, revealing both known plasmid sequence types (STs) and new alleles and ST variants. In conclusion, testing of the two Web tools using both fully assembled plasmid sequences and WGS-generated draft genomes showed them to be able to detect a broad variety of plasmids that are often associated with antimicrobial resistance in clinically relevant bacterial pathogens.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
Ye Feng1, Shengmei Zou1, Hangfei Chen1, Yunsong Yu1, Zhi Ruan1 
TL;DR: The updated BacWGSTdb 2.0 bears great utility in continuing to provide users, including epidemiologists, clinicians and bench scientists, with a one-stop solution to bacterial genome sequence analysis.
Abstract: An increasing prevalence of hospital acquired infections and foodborne illnesses caused by pathogenic and multidrug-resistant bacteria has stimulated a pressing need for benchtop computational techniques to rapidly and accurately classify bacteria from genomic sequence data, and based on that, to trace the source of infection. BacWGSTdb (http://bacdb.org/BacWGSTdb) is a free publicly accessible database we have developed for bacterial whole-genome sequence typing and source tracking. This database incorporates extensive resources for bacterial genome sequencing data and the corresponding metadata, combined with specialized bioinformatics tools that enable the systematic characterization of the bacterial isolates recovered from infections. Here, we present BacWGSTdb 2.0, which encompasses several major updates, including (i) the integration of the core genome multi-locus sequence typing (cgMLST) approach, which is highly scalable and appropriate for typing isolates belonging to different lineages; (ii) the addition of a multiple genome analysis module that can process dozens of user uploaded sequences in a batch mode; (iii) a new source tracking module for comparing user uploaded plasmid sequences to those deposited in the public databases; (iv) the number of species encompassed in BacWGSTdb 2.0 has increased from 9 to 20, which represents bacterial pathogens of medical importance; (v) a newly designed, user-friendly interface and a set of visualization tools for providing a convenient platform for users are also included. Overall, the updated BacWGSTdb 2.0 bears great utility in continuing to provide users, including epidemiologists, clinicians and bench scientists, with a one-stop solution to bacterial genome sequence analysis.

115 citations

Journal ArticleDOI
01 Nov 2018
TL;DR: Support-vector machine (SVM) models were selected as the best classifier for all three bacterial species and outperformed other existing plasmid prediction tools using a benchmarking set of isolates, demonstrating the scalability of the models and their applicability.
Abstract: Assembly of bacterial short-read whole-genome sequencing data frequently results in hundreds of contigs for which the origin, plasmid or chromosome, is unclear. Complete genomes resolved by long-read sequencing can be used to generate and label short-read contigs. These were used to train several popular machine learning methods to classify the origin of contigs from Enterococcus faecium, Klebsiella pneumoniae and Escherichia coli using pentamer frequencies. We selected support-vector machine (SVM) models as the best classifier for all three bacterial species (F1-score E. faecium=0.92, F1-score K. pneumoniae=0.90, F1-score E. coli=0.76), which outperformed other existing plasmid prediction tools using a benchmarking set of isolates. We demonstrated the scalability of our models by accurately predicting the plasmidome of a large collection of 1644 E. faecium isolates and illustrate its applicability by predicting the location of antibiotic-resistance genes in all three species. The SVM classifiers are publicly available as an R package and graphical-user interface called ‘mlplasmids’. We anticipate that this tool may significantly facilitate research on the dissemination of plasmids encoding antibiotic resistance and/or contributing to host adaptation.

113 citations

Journal ArticleDOI
TL;DR: NPScarf is presented, which can scaffold and complete short read assemblies while the long read sequencing run is in progress, and reports assembly metrics in real-time so the sequencing run can be terminated once an assembly of sufficient quality is obtained.
Abstract: Third generation sequencing technologies provide the opportunity to improve genome assemblies by generating long reads spanning most repeat sequences. However, current analysis methods require substantial amounts of sequence data and computational resources to overcome the high error rates. Furthermore, they can only perform analysis after sequencing has completed, resulting in either over-sequencing, or in a low quality assembly due to under-sequencing. Here we present npScarf, which can scaffold and complete short read assemblies while the long read sequencing run is in progress. It reports assembly metrics in real-time so the sequencing run can be terminated once an assembly of sufficient quality is obtained. In assembling four bacterial and one eukaryotic genomes, we show that npScarf can construct more complete and accurate assemblies while requiring less sequencing data and computational resources than existing methods. Our approach offers a time- and resource-effective strategy for completing short read assemblies.

112 citations


Cites background or methods from "In Silico Detection and Typing of P..."

  • ...For real-time detection of plasmid-encoded genes, we identified plasmid origin of replication sequences from the Il-10 lumina assembly using the PlasmidFinder database (Carattoli et al., 2014)....

    [...]

  • ...pneumoniae assemblies were identified by uploading the assembly to the PlasmidFinder database (Carattoli et al., 2014)....

    [...]

  • ...Plasmid origin of replication sequences in both K. pneumoniae assemblies were identified by uploading the assembly to the PlasmidFinder database (Carattoli et al., 2014)....

    [...]

  • ...lumina assembly using the PlasmidFinder database (Carattoli et al., 2014)....

    [...]

  • ...We also ascertained the predicted plasmids in these assemblies by looking for the existence of plas- 70 mid origins of replication sequences from PlasmidFinder database (Carattoli et al., 2014)....

    [...]

Journal ArticleDOI
TL;DR: IS1294-mediated transposition of the gene into an ISApl1 composite transposon may have represented the original import mechanism of mcr-1 into Enterobacteriaceae.
Abstract: 286 www.thelancet.com/infection Vol 16 March 2016 consistent with it being nested within an ISApl1 composite transposon (figure). IS1294 uses a one-ended, rolling circle transposition mechanism capable of mobilising adjacent sequences. We postulate that mcr-1 was originally introduced into the ISApl1 element by IS1294, which was subsequently lost. We have identifi ed mcr-1 in a human faecal E coli isolate in Cambodia taken 2 years before the human isolates were collected in the study by Liu and colleagues, and which is associated with diff erent genetic structures than those reported by Liu and colleagues. IS1294-mediated transposition of the gene into an ISApl1 composite transposon may have represented the original import mechanism of mcr-1 into Enterobacteriaceae. Large, established databases of whole genome sequences represent a rich repository to investigate the historical presence of novel resistance mechanisms.

111 citations

Journal ArticleDOI
TL;DR: The genomes of 10 Salmonella enterica serovar Infantis isolates obtained from chicken, cattle, and human sources collected between 2012 and 2015 in the United States through routine NARMS surveillance and product sampling programs revealed that all U.S. isolates were closely related, indicating a high likelihood that strains from humans, chickens, and cattle recently evolved from a common ancestor.
Abstract: We sequenced the genomes of 10 Salmonella enterica serovar Infantis isolates containing bla CTX-M-65 obtained from chicken, cattle, and human sources collected between 2012 and 2015 in the United States through routine National Antimicrobial Resistance Monitoring System (NARMS) surveillance and product sampling programs. We also completely assembled the plasmids from four of the isolates. All isolates had a D87Y mutation in the gyrA gene and harbored between 7 and 10 resistance genes [ aph(4)-Ia , aac(3)-IVa , aph(3 ′ )-Ic , bla CTX-M-65 , fosA3 , floR , dfrA14 , sul1 , tetA , aadA1 ] located in two distinct sites of a megaplasmid (∼316 to 323 kb) similar to that described in a bla CTX-M-65 -positive S . Infantis isolate from a patient in Italy. High-quality single nucleotide polymorphism (hqSNP) analysis revealed that all U.S. isolates were closely related, separated by only 1 to 38 pairwise high-quality SNPs, indicating a high likelihood that strains from humans, chickens, and cattle recently evolved from a common ancestor. The U.S. isolates were genetically similar to the bla CTX-M-65 -positive S . Infantis isolate from Italy, with a separation of 34 to 47 SNPs. This is the first report of the bla CTX-M-65 gene and the pESI (plasmid for emerging S . Infantis)-like megaplasmid from S . Infantis in the United States, and it illustrates the importance of applying a global One Health human and animal perspective to combat antimicrobial resistance.

107 citations


Cites methods from "In Silico Detection and Typing of P..."

  • ...3 (16 March 2016), a tool for in silico detection and typing of plasmids (23)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A web server providing a convenient way of identifying acquired antimicrobial resistance genes in completely sequenced isolates was created, and the method was evaluated on WGS chromosomes and plasmids of 30 isolates.
Abstract: Objectives Identification of antimicrobial resistance genes is important for understanding the underlying mechanisms and the epidemiology of antimicrobial resistance. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available in routine diagnostic laboratories and is anticipated to substitute traditional methods for resistance gene identification. Thus, the current challenge is to extract the relevant information from the large amount of generated data.

3,956 citations


"In Silico Detection and Typing of P..." refers methods in this paper

  • ...To extract the relevant information from the large amount of data generated, a Web-based tool, ResFinder, for the identification of acquired or intrinsically present antimicrobial resistance genes in whole-genome data was recently developed (15)....

    [...]

Journal ArticleDOI
TL;DR: NCBI’s Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints.
Abstract: NCBI's Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

2,934 citations


"In Silico Detection and Typing of P..." refers background in this paper

  • ...In particular, the replicase proteins showing the pfam02387 or pfam01051 conserved domains were assigned to the FII and FIB groups, respectively (31)....

    [...]

Journal ArticleDOI
TL;DR: Results indicated that the inc/rep PCR method demonstrates high specificity and sensitivity in detecting replicons on reference plasmids and also revealed the presence of recurrent and common plasmid in epidemiologically unrelated Salmonella isolates of different serotypes.

2,163 citations


"In Silico Detection and Typing of P..." refers methods in this paper

  • ...A collection of 24 previously characterized and fully FIG 1 Numbers of fully sequenced plasmids (y axis) classified into incompatibility groups occurring in the different bacterial species of the Enterobacteriaceae family....

    [...]

  • ...Since 2005, a PCR-based replicon typing (PBRT) scheme has been available that targets in multiplex PCRs the replicons of the major plasmid families occurring in members of the family Enterobacteriaceae (2)....

    [...]

  • ...Here, we present two free, easy-to-use Web tools, PlasmidFinder and pMLST, to analyze and classify plasmids from bacterial species of the family Enterobacteriaceae....

    [...]

  • ...Here, we describe the design of two new easy-to-use Web tools useful for the rapid identification of plasmids in Enterobacteriaceae species that are of interest for epidemiological and clinical microbiology investigations of the plasmid-associated spread of antimicrobial resistance....

    [...]

  • ...This method was initially developed to detect the replicons of plasmids belonging to the 18 major incompatibility (Inc) groups of Enterobacteriaceae species (3)....

    [...]

Journal ArticleDOI
TL;DR: The Bacterial Isolate Genome Sequence Database (BIGSDB) represents a freely available resource that will assist the broader community in the elucidation of the structure and function of bacteria by means of a population genomics approach.
Abstract: The opportunities for bacterial population genomics that are being realised by the application of parallel nucleotide sequencing require novel bioinformatics platforms These must be capable of the storage, retrieval, and analysis of linked phenotypic and genotypic information in an accessible, scalable and computationally efficient manner The Bacterial Isolate Genome Sequence Database (BIGSDB) is a scalable, open source, web-accessible database system that meets these needs, enabling phenotype and sequence data, which can range from a single sequence read to whole genome data, to be efficiently linked for a limitless number of bacterial specimens The system builds on the widely used mlstdbNet software, developed for the storage and distribution of multilocus sequence typing (MLST) data, and incorporates the capacity to define and identify any number of loci and genetic variants at those loci within the stored nucleotide sequences These loci can be further organised into 'schemes' for isolate characterisation or for evolutionary or functional analyses Isolates and loci can be indexed by multiple names and any number of alternative schemes can be accommodated, enabling cross-referencing of different studies and approaches LIMS functionality of the software enables linkage to and organisation of laboratory samples The data are easily linked to external databases and fine-grained authentication of access permits multiple users to participate in community annotation by setting up or contributing to different schemes within the database Some of the applications of BIGSDB are illustrated with the genera Neisseria and Streptococcus The BIGSDB source code and documentation are available at http://pubmlstorg/software/database/bigsdb/ Genomic data can be used to characterise bacterial isolates in many different ways but it can also be efficiently exploited for evolutionary or functional studies BIGSDB represents a freely available resource that will assist the broader community in the elucidation of the structure and function of bacteria by means of a population genomics approach

1,943 citations

Journal ArticleDOI
TL;DR: A Web-based method for MLST of 66 bacterial species based on whole-genome sequencing data that enables investigators to determine the sequence types of their isolates on the basis of WGS data.
Abstract: Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the “gold standard” of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST.

1,620 citations


"In Silico Detection and Typing of P..." refers methods in this paper

  • ...If raw sequence reads are uploaded, they are first assembled (after the sequencing platform is given by the user) as described previously (16)....

    [...]