PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data
Feng Zeng,Rui Jiang,Ting Chen +2 more
TLDR
A hidden Markov model (HMM) is proposed to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion and a realignment-based SNP-calling program, termed PyroHMMsnp, is developed, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach.Abstract:
Both 454 and Ion Torrent sequencers are capable of producing large amounts of long high-quality sequencing reads. However, as both methods sequence homopolymers in one cycle, they both suffer from homopolymer uncertainty and incorporation asynchronization. In mapping, such sequencing errors could shift alignments around homopolymers and thus induce incorrect mismatches, which have become a critical barrier against the accurate detection of single nucleotide polymorphisms (SNPs). In this article, we propose a hidden Markov model (HMM) to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion. We use a hierarchical model to describe the sequencing and base-calling processes, and we estimate parameters of the HMM from resequencing data by an expectation-maximization algorithm. Based on the HMM, we develop a realignment-based SNP-calling program, termed PyroHMMsnp, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach. Simulation experiments show that the performance of PyroHMMsnp is exceptional across various sequencing coverages in terms of sensitivity, specificity and F1 measure, compared with other tools. Analysis of the human resequencing data shows that PyroHMMsnp predicts 12.9% more SNPs than Samtools while achieving a higher specificity. (http://code.google.com/p/pyrohmmsnp/).read more
Citations
More filters
Journal ArticleDOI
Performance comparison of SNP detection tools with illumina exome sequencing data—an assessment using both family pedigree information and sample-matched SNP array data
TL;DR: The main purpose of the study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest.
Journal ArticleDOI
Gene discovery through transcriptome sequencing for the invasive mussel Limnoperna fortunei
Marcela Uliano-Silva,Juliana Alves Americo,Rodrigo Brindeiro,Francesco Dondero,Francisco Prosdocimi,Mauro de Freitas Rebelo +5 more
TL;DR: The presence of toll-like receptors gives a first insight into an immune system that could be more complex than previously assumed and may be involved in the prevention of disease and extinction when population densities are high and the apparent lack of special adaptations to extremely low O2 levels is a target worth pursuing for the development of a molecular control approach.
Journal ArticleDOI
Next Generation Sequencing in Non-Small Cell Lung Cancer: New Avenues Toward the Personalized Medicine
Simona Coco,Anna Truini,Irene Vanni,Maria Giovanna Dal Bello,Angela Alama,Erika Rijavec,Carlo Genova,Giulia Barletta,Claudio Sini,G. Burrafato,Federica Biello,Francesco Boccardo,Francesco Grossi +12 more
TL;DR: Despite several problems have to be overcome toward the personalized therapy, the NGS represents a highly attractive system to identify mutations improving the outcome of patients with this deadly disease, providing information about mutational spectrum of this cancer.
Journal ArticleDOI
PyroHMMvar: a sensitive and accurate method to call short indels and SNPs for Ion Torrent and 454 data.
Feng Zeng,Rui Jiang,Ting Chen +2 more
TL;DR: Based on the previously proposed hidden Markov model, a method called PyroHMMvar is developed, which can simultaneously detect short indels and SNPs, as demonstrated in human resequencing data and is less sensitive to mapping parameter settings than the other methods.
Journal ArticleDOI
HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data.
TL;DR: HECTOR is a practical 454 pyrosequencing read error corrector which is competitive in terms of both correction quality and speed, and theoretically capable of processing arbitrary-length homopolymer-length errors, with a linear time complexity.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Fast and accurate short read alignment with Burrows–Wheeler transform
Heng Li,Richard Durbin +1 more
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Journal ArticleDOI
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Mark A. DePristo,Eric Banks,Ryan Poplin,Kiran V. Garimella,Jared Maguire,Christopher Hartl,Anthony A. Philippakis,Anthony A. Philippakis,Anthony A. Philippakis,Guillermo del Angel,Manuel A. Rivas,Manuel A. Rivas,Matt Hanna,Aaron McKenna,Timothy Fennell,Andrew Kernytsky,Andrey Sivachenko,Kristian Cibulskis,Stacey Gabriel,David Altshuler,David Altshuler,Mark J. Daly,Mark J. Daly +22 more
TL;DR: A unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs is presented.
Journal ArticleDOI
Genome sequencing in microfabricated high-density picolitre reactors
Marcel Margulies,Michael Egholm,William E. Altman,Said Attiya,Joel S. Bader,Lisa A. Bemben,Jan Berka,Michael S. Braverman,Yi-Ju Chen,Zhoutao Chen,Scott Dewell,Lei Du,J. M. Fierro,Xavier V. Gomes,Brian C. Godwin,Wen He,Scott Edward Helgesen,Chun Heen Ho,Gerard P. Irzyk,Szilveszter C. Jando,Maria L. I. Alenquer,Thomas P. Jarvie,Kshama B. Jirage,Jong-Bum Kim,James R. Knight,Janna R. Lanza,John H. Leamon,Steven Lefkowitz,Ming Lei,Jing Li,Kenton Lohman,Hong Lu,Vinod Makhijani,Keith Mcdade,Michael P. McKenna,Eugene W. Myers,Elizabeth Nickerson,John Nobile,Ramona Plant,Bernard P. Puc,Michael T. Ronan,George T. Roth,Gary J. Sarkis,Jan Fredrik Simons,John Simpson,Maithreyan Srinivasan,Karrie R. Tartaro,Alexander Tomasz,Kari A. Vogt,Greg A. Volkmer,Shally H. Wang,Yong Wang,Michael P. Weiner,Pengguang Yu,Richard F. Begley,Jonathan M. Rothberg +55 more
TL;DR: A scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments with 96% coverage at 99.96% accuracy in one run of the machine is described.
Related Papers (5)
Genome sequencing in microfabricated high-density picolitre reactors
Marcel Margulies,Michael Egholm,William E. Altman,Said Attiya,Joel S. Bader,Lisa A. Bemben,Jan Berka,Michael S. Braverman,Yi-Ju Chen,Zhoutao Chen,Scott Dewell,Lei Du,J. M. Fierro,Xavier V. Gomes,Brian C. Godwin,Wen He,Scott Edward Helgesen,Chun Heen Ho,Gerard P. Irzyk,Szilveszter C. Jando,Maria L. I. Alenquer,Thomas P. Jarvie,Kshama B. Jirage,Jong-Bum Kim,James R. Knight,Janna R. Lanza,John H. Leamon,Steven Lefkowitz,Ming Lei,Jing Li,Kenton Lohman,Hong Lu,Vinod Makhijani,Keith Mcdade,Michael P. McKenna,Eugene W. Myers,Elizabeth Nickerson,John Nobile,Ramona Plant,Bernard P. Puc,Michael T. Ronan,George T. Roth,Gary J. Sarkis,Jan Fredrik Simons,John Simpson,Maithreyan Srinivasan,Karrie R. Tartaro,Alexander Tomasz,Kari A. Vogt,Greg A. Volkmer,Shally H. Wang,Yong Wang,Michael P. Weiner,Pengguang Yu,Richard F. Begley,Jonathan M. Rothberg +55 more
Mapping short DNA sequencing reads and calling variants using mapping quality scores
Heng Li,Jue Ruan,Richard Durbin +2 more
An integrated semiconductor device enabling non-optical genome sequencing
Jonathan M. Rothberg,Wolfgang Hinz,Todd Rearick,Jonathan Schultz,William J. Mileski,Melville Davey,John H. Leamon,Kim L. Johnson,Mark James Milgrew,Matthew D. Edwards,Jeremy Hoon,Jan Fredrik Simons,David Marran,Jason W. Myers,John F. Davidson,Annika Branting,John Nobile,Bernard P. Puc,David Light,Travis A. Clark,Martin Huber,Jeffrey T. Branciforte,Isaac B. Stoner,Simon Cawley,Michael R. Lyons,Yutao Fu,Nils Homer,Marina Sedova,Xin Miao,Brian Reed,Jeffrey Sabina,Erika Feierstein,Michelle Schorn,Mohammad Alanjary,Eileen T. Dimalanta,Devin Dressman,Rachel Kasinskas,Tanya Sokolsky,Jacqueline A. Fidanza,Eugeni Namsaraev,Kevin McKernan,Alan Williams,G. Thomas Roth,James Bustillo +43 more