SRST2: Rapid genomic surveillance for public health and hospital microbiology labs

doi:10.1186/S13073-014-0090-6

Home
/
Papers
/
SRST2: Rapid genomic surveillance for public health and hospital microbiology labs

Journal Article•DOI•

SRST2: Rapid genomic surveillance for public health and hospital microbiology labs

Michael Inouye¹, Harriet Dashnow¹, Harriet Dashnow², Lesley Raven¹, Mark B. Schultz¹, Bernard J. Pope², Bernard J. Pope¹, Takehiro Tomita¹, Justin Zobel¹, Kathryn E. Holt¹ - Show less +6 more•Institutions (2)

University of Melbourne¹, Victorian Life Sciences Computation Initiative²

20 Nov 2014-Genome Medicine (BioMed Central)-Vol. 6, Iss: 11, pp 90-90

TL;DR: This work presents SRST2, a read mapping-based tool for fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data, which is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment.

read less

Abstract: Rapid molecular typing of bacterial pathogens is critical for public health epidemiology, surveillance and infection control, yet routine use of whole genome sequencing (WGS) for these purposes poses significant challenges. Here we present SRST2, a read mapping-based tool for fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data. Using >900 genomes from common pathogens, we show SRST2 is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment. We include validation of SRST2 within a public health laboratory, and demonstrate its use for microbial genome surveillance in the hospital setting. In the face of rising threats of antimicrobial resistance and emerging virulence among bacterial pathogens, SRST2 represents a powerful tool for rapidly extracting clinically useful information from raw WGS data. Source code is available from http://katholt.github.io/srst2/.

...read moreread less

Citations

PDF

Open Access

More filters

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

[...]

Glenn Tesler

01 Jun 2012

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

...read moreread less

10,124 citations

Journal Article•DOI•

Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications.

[...]

Keith A. Jolley¹, James E. Bray¹, Martin C. J. Maiden¹•Institutions (1)

University of Oxford¹

24 Sep 2018

TL;DR: Developments in the BIGSdb software made from publication to June 2018 are described and it is shown how the platform realises microbial population genomics for a wide range of applications.

...read moreread less

Abstract: The PubMLST.org website hosts a collection of open-access, curated databases that integrate population sequence data with provenance and phenotype information for over 100 different microbial species and genera. Although the PubMLST website was conceived as part of the development of the first multi-locus sequence typing (MLST) scheme in 1998 the software it uses, the Bacterial Isolate Genome Sequence database (BIGSdb, published in 2010), enables PubMLST to include all levels of sequence data, from single gene sequences up to and including complete, finished genomes. Here we describe developments in the BIGSdb software made from publication to June 2018 and show how the platform realises microbial population genomics for a wide range of applications. The system is based on the gene-by-gene analysis of microbial genomes, with each deposited sequence annotated and curated to identify the genes present and systematically catalogue their variation. Originally intended as a means of characterising isolates with typing schemes, the synthesis of sequences and records of genetic variation with provenance and phenotype data permits highly scalable (whole genome sequence data for tens of thousands of isolates) means of addressing a wide range of functional questions, including: the prediction of antimicrobial resistance; likely cross-reactivity with vaccine antigens; and the functional activities of different variants that lead to key phenotypes. There are no limitations to the number of sequences, genetic loci, allelic variants or schemes (combinations of loci) that can be included, enabling each database to represent an expanding catalogue of the genetic variation of the population in question. In addition to providing web-accessible analyses and links to third-party analysis and visualisation tools, the BIGSdb software includes a RESTful application programming interface (API) that enables access to all the underlying data for third-party applications and data analysis pipelines.

...read moreread less

1,349 citations

Journal Article•DOI•

Genomic analysis of diversity, population structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health

[...]

Kathryn E. Holt¹, Heiman F. L. Wertheim², Ruth N. Zadoks³, Stephen Baker², Chris A. Whitehouse⁴, David A. B. Dance², Adam Jenney¹, Thomas R. Connor⁵, Li Yang Hsu⁶, Juliëtte A. Severin⁷, Sylvain Brisse⁸, Hanwei Cao¹, Jonathan J. Wilksch¹, Claire L. Gorrie¹, Mark B. Schultz¹, David J. Edwards¹, Kinh Van Nguyen, Trung Vu Nguyen, Trinh Tuyet Dao, Martijn Mensink³, Vien Le Minh⁹, Nguyen Thi Khanh Nhu¹⁰, Constance Schultsz¹¹, Kuntaman Kuntaman¹², Paul N. Newton², Paul N. Newton¹³, Catrin E. Moore², Catrin E. Moore¹³, Richard A. Strugnell¹, Nicholas R. Thomson¹⁴, Nicholas R. Thomson¹⁵ - Show less +27 more•Institutions (15)

University of Melbourne¹, University of Oxford², Cornell University³, United States Army Medical Research Institute of Infectious Diseases⁴, Cardiff University⁵, University Health System⁶, Erasmus University Medical Center⁷, Pasteur Institute⁸, University of California, San Francisco⁹, University of Queensland¹⁰, University of Amsterdam¹¹, Airlangga University¹², Mahosot Hospital¹³, Wellcome Trust¹⁴, University of London¹⁵

07 Jul 2015-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The DNA sequence of K. pneumoniae isolates from around the world is determined and it is shown that there is a wide spectrum of diversity, including variation within shared sequences and gain and loss of whole genes, and there is an unrecognized association between the possession of specific gene profiles associated with virulence and antibiotic resistance.

...read moreread less

Abstract: Klebsiella pneumoniae is now recognized as an urgent threat to human health because of the emergence of multidrug-resistant strains associated with hospital outbreaks and hypervirulent strains associated with severe community-acquired infections. K. pneumoniae is ubiquitous in the environment and can colonize and infect both plants and animals. However, little is known about the population structure of K. pneumoniae, so it is difficult to recognize or understand the emergence of clinically important clones within this highly genetically diverse species. Here we present a detailed genomic framework for K. pneumoniae based on whole-genome sequencing of more than 300 human and animal isolates spanning four continents. Our data provide genome-wide support for the splitting of K. pneumoniae into three distinct species, KpI (K. pneumoniae), KpII (K. quasipneumoniae), and KpIII (K. variicola). Further, for K. pneumoniae (KpI), the entity most frequently associated with human infection, we show the existence of >150 deeply branching lineages including numerous multidrug-resistant or hypervirulent clones. We show K. pneumoniae has a large accessory genome approaching 30,000 protein-coding genes, including a number of virulence functions that are significantly associated with invasive community-acquired disease in humans. In our dataset, antimicrobial resistance genes were common among human carriage isolates and hospital-acquired infections, which generally lacked the genes associated with invasive disease. The convergence of virulence and resistance genes potentially could lead to the emergence of untreatable invasive K. pneumoniae infections; our data provide the whole-genome framework against which to track the emergence of such threats.

...read moreread less

879 citations

Cites methods from "SRST2: Rapid genomic surveillance f..."

...Inouye M, et al. (2014) SRST2: Rapid genomic surveillance for public health and hospital microbiology labs....
[...]
...STs were assigned to each genome according to the K. pneumoniae MLST database (13) by mapping to known alleles using SRST2 (68)....
[...]
...pneumoniae MLST database (13) by mapping to known alleles using SRST2 (68)....
[...]
...Read sets also were screened for known alleles of important genes using a readmapping approach with SRST2 (68)....
[...]

Journal Article•DOI•

Completing bacterial genome assemblies with multiplex MinION sequencing

[...]

Ryan R. Wick¹, Louise M. Judd¹, Claire L. Gorrie¹, Kathryn E. Holt¹•Institutions (1)

University of Melbourne¹

14 Sep 2017

TL;DR: This work advocates the use of Illumina sequencing as a first analysis step, followed by ONT reads as needed to resolve genomic structure, and demonstrates that multiplexed ONT sequencing is a valuable tool for high-throughput bacterial genome finishing.

...read moreread less

Abstract: Illumina sequencing platforms have enabled widespread bacterial whole genome sequencing. While Illumina data is appropriate for many analyses, its short read length limits its ability to resolve genomic structure. This has major implications for tracking the spread of mobile genetic elements, including those which carry antimicrobial resistance determinants. Fully resolving a bacterial genome requires long-read sequencing such as those generated by Oxford Nanopore Technologies (ONT) platforms. Here we describe our use of the ONT MinION to sequence 12 isolates of Klebsiella pneumoniae on a single flow cell. We assembled each genome using a combination of ONT reads and previously available Illumina reads, and little to no manual intervention was needed to achieve fully resolved assemblies using the Unicycler hybrid assembler. Assembling only ONT reads with Canu was less effective, resulting in fewer resolved genomes and higher error rates even following error correction with Nanopolish. We demonstrate that multiplexed ONT sequencing is a valuable tool for high-throughput bacterial genome finishing. Specifically, we advocate the use of Illumina sequencing as a first analysis step, followed by ONT reads as needed to resolve genomic structure.

...read moreread less

540 citations

Journal Article•DOI•

Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis.

[...]

Phelim Bradley¹, N. Claire Gordon², Timothy M Walker², Laura Dunn², Simon Heys¹, B. K. Huang¹, Sarah G. Earle², Louise Pankhurst², Luke Anson², M de Cesare¹, Paolo Piazza¹, Antonina A. Votintseva², Tanya Golubchik², Daniel J. Wilson², Daniel J. Wilson¹, David H. Wyllie², Roland Diel, Stefan Niemann, Silke Feuerriegel, Thomas Kohl, Nazir Ahmed Ismail, Shaheed V. Omar, E. Grace Smith, David Buck¹, Gil McVean¹, A. Sarah Walker², A. Sarah Walker³, Peto Tea.², Peto Tea.³, Derrick W. Crook³, Derrick W. Crook⁴, Derrick W. Crook², Zamin Iqbal¹ - Show less +29 more•Institutions (4)

Wellcome Trust Centre for Human Genetics¹, University of Oxford², National Institutes of Health³, Public Health England⁴

21 Dec 2015-Nature Communications

TL;DR: De Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates and is implemented in a software package that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop.

...read moreread less

Abstract: The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package ('Mykrobe predictor') that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes.

...read moreread less

453 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The Sequence Alignment/Map format and SAMtools

[...]

Heng Li¹, Bob Handsaker², Alec Wysoker², T. J. Fennell², Jue Ruan³, Nils Homer², Gabor T. Marth⁴, Gonçalo R. Abecasis², Richard Durbin¹ - Show less +5 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of California, Los Angeles², Chinese Academy of Sciences³, Boston College⁴

01 Aug 2009-Bioinformatics

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.

...read moreread less

Abstract: Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]

...read moreread less

45,957 citations

"SRST2: Rapid genomic surveillance f..." refers methods in this paper

...18 [30] mpileup and parsed by SRST2 to determine percent coverage, divergence, and mismatches as well as to calculate a score for each possible allele....
[...]

Journal Article•DOI•

Fast gapped-read alignment with Bowtie 2

[...]

Ben Langmead¹, Steven L. Salzberg², Steven L. Salzberg³, Steven L. Salzberg¹•Institutions (3)

University of Maryland, College Park¹, Johns Hopkins University², Johns Hopkins University School of Medicine³

01 Apr 2012-Nature Methods

TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

...read moreread less

Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

...read moreread less

37,898 citations

Journal Article•DOI•

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing

[...]

Anton Bankevich¹, Sergey Nurk, Dmitry Antipov, Alexey Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev, Pavel A. Pevzner - Show less +12 more•Institutions (1)

Saint Petersburg Academic University¹

07 May 2012-Journal of Computational Biology

TL;DR: SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies.

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software.

...read moreread less

16,859 citations

Additional excerpts

...There are several assemblers (for example, Velvet [24], SPAdes [25]) that can produce a bacterial genome assembly in minutes to hours with a few gigabytes of memory....
[...]
...Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing....
[...]

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

[...]

Glenn Tesler

01 Jun 2012

...read moreread less

10,124 citations

Journal Article•DOI•

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

[...]

Daniel R. Zerbino¹, Ewan Birney¹•Institutions (1)

European Bioinformatics Institute¹

01 May 2008-Genome Research

TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.

...read moreread less

Abstract: We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.

...read moreread less

9,389 citations