A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples
Samia N. Naccache,Scot Federman,Narayanan Veeraraghavan,Matei Zaharia,Deanna Lee,Erik Samayoa,Jerome Bouquet,Alexander L. Greninger,Ka Cheung Luk,Barryett Enge,Debra A. Wadford,Sharon Messenger,Gillian Genrich,Kristen Pellegrino,Gilda Grard,Eric M. Leroy,Bradley S. Schneider,Joseph N. Fair,Miguel Ángel Martínez,Pavel Isa,John A. Crump,Joseph L. DeRisi,Taylor Sittler,John Hackett,Steve Miller,Charles Y. Chiu +25 more
TLDR
SURPI is described, a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and use of the pipeline is demonstrated in the analysis of 237 clinical samples comprising more than 1.1 billion sequences.Abstract:
Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI ("sequence-based ultrarapid pathogen identification"), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7-500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times.read more
Citations
More filters
Journal ArticleDOI
VFDB 2016: hierarchical and refined dataset for big data analysis—10 years on
TL;DR: The virulence factor database (VFDB) recently improved two aspects of the infrastructural dataset of VFDB and promoted the usability of the database in the big data era for the bioinformatic mining of the explosively growing data regarding bacterial VFs.
Journal ArticleDOI
Actionable Diagnosis of Neuroleptospirosis by Next-Generation Sequencing
Michael R. Wilson,Samia N. Naccache,Erik Samayoa,Mark Biagtan,Hiba Bashir,Guixia Yu,Shahriar M. Salamat,Sneha Somasekar,Scot Federman,Steve Miller,Robert A. Sokolic,Elizabeth Garabedian,Fabio Candotti,Rebecca H. Buckley,Kurt D. Reed,Teresa L. Meyer,Christine M. Seroogy,Renee Galloway,Sheryl L Henderson,James E. Gern,Joseph L. DeRisi,Charles Y. Chiu +21 more
TL;DR: A 14-year-old boy with severe combined immunodeficiency presented three times to a medical facility over a period of 4 months with fever and headache that progressed to hydrocephalus and status epilepticus necessitating a medically induced coma, confirming evidence of Leptospira santarosai infection.
Journal ArticleDOI
Clinical Metagenomic Next-Generation Sequencing for Pathogen Detection.
TL;DR: This review focuses on the application of untargeted metagenomic next-generation sequencing to the clinical diagnosis of infectious diseases, particularly in areas in which conventional diagnostic approaches have limitations.
Journal ArticleDOI
Clinical Metagenomic Sequencing for Diagnosis of Meningitis and Encephalitis
Michael R. Wilson,Hannah A. Sample,Kelsey C. Zorn,Shaun Arevalo,Guixia Yu,John Neuhaus,Scot Federman,Doug Stryke,Benjamin Briggs,Charles Langelier,Amy C. Berger,Vanja C. Douglas,S. Andrew Josephson,Felicia C. Chow,Brent D. Fulton,Joseph L. DeRisi,Jeffrey M. Gelfand,Samia N. Naccache,Jeffrey M. Bender,Jennifer Dien Bard,Jamie A. Murkey,Magrit Carlson,Paul M. Vespa,Tara Vijayan,Paul R Allyn,Shelley Campeau,Romney M. Humphries,Jeffrey D. Klausner,Czarina Ganzon,Fatemeh Memar,Nicolle Anne Ocampo,Lara Zimmermann,Stuart H. Cohen,Christopher R. Polage,Roberta L. DeBiasi,Barbara Haller,Ronald H. Dallas,Gabriela Maron,Randall T. Hayden,Kevin Messacar,Samuel R. Dominguez,Steve Miller,Charles Y. Chiu +42 more
TL;DR: Routine microbiologic testing is often insufficient to detect all neuroinvasive pathogens, so metagenomic NGS of CSF obtained from patients with meningitis or encephalitis improved diagnosis of neurologic infections and provided actionable information in some cases.
Journal ArticleDOI
NCBI Viral Genomes Resource
TL;DR: The NCBI Viral Genomes Resource is a reference resource designed to bring order to this sequence shockwave and improve usability of viral sequence data.
References
More filters
Journal ArticleDOI
Basic Local Alignment Search Tool
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI
Fast and accurate short read alignment with Burrows–Wheeler transform
Heng Li,Richard Durbin +1 more
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Journal ArticleDOI
Fast gapped-read alignment with Bowtie 2
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Journal ArticleDOI
Cutadapt removes adapter sequences from high-throughput sequencing reads
TL;DR: The command-line tool cutadapt is developed, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features.
Journal ArticleDOI
Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine.
Mark H. Zweig,Gregory Campbell +1 more
TL;DR: Receiver-operating characteristic (ROC) plots provide a pure index of accuracy by demonstrating the limits of a test's ability to discriminate between alternative states of health over the complete spectrum of operating conditions.