scispace - formally typeset
Open AccessJournal ArticleDOI

PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data

TLDR
A hidden Markov model (HMM) is proposed to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion and a realignment-based SNP-calling program, termed PyroHMMsnp, is developed, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach.
Abstract
Both 454 and Ion Torrent sequencers are capable of producing large amounts of long high-quality sequencing reads. However, as both methods sequence homopolymers in one cycle, they both suffer from homopolymer uncertainty and incorporation asynchronization. In mapping, such sequencing errors could shift alignments around homopolymers and thus induce incorrect mismatches, which have become a critical barrier against the accurate detection of single nucleotide polymorphisms (SNPs). In this article, we propose a hidden Markov model (HMM) to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion. We use a hierarchical model to describe the sequencing and base-calling processes, and we estimate parameters of the HMM from resequencing data by an expectation-maximization algorithm. Based on the HMM, we develop a realignment-based SNP-calling program, termed PyroHMMsnp, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach. Simulation experiments show that the performance of PyroHMMsnp is exceptional across various sequencing coverages in terms of sensitivity, specificity and F1 measure, compared with other tools. Analysis of the human resequencing data shows that PyroHMMsnp predicts 12.9% more SNPs than Samtools while achieving a higher specificity. (http://code.google.com/p/pyrohmmsnp/).

read more

Citations
More filters
Journal ArticleDOI

Performance comparison of SNP detection tools with illumina exome sequencing data—an assessment using both family pedigree information and sample-matched SNP array data

TL;DR: The main purpose of the study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest.
Journal ArticleDOI

Gene discovery through transcriptome sequencing for the invasive mussel Limnoperna fortunei

TL;DR: The presence of toll-like receptors gives a first insight into an immune system that could be more complex than previously assumed and may be involved in the prevention of disease and extinction when population densities are high and the apparent lack of special adaptations to extremely low O2 levels is a target worth pursuing for the development of a molecular control approach.
Journal ArticleDOI

Next Generation Sequencing in Non-Small Cell Lung Cancer: New Avenues Toward the Personalized Medicine

TL;DR: Despite several problems have to be overcome toward the personalized therapy, the NGS represents a highly attractive system to identify mutations improving the outcome of patients with this deadly disease, providing information about mutational spectrum of this cancer.
Journal ArticleDOI

PyroHMMvar: a sensitive and accurate method to call short indels and SNPs for Ion Torrent and 454 data.

TL;DR: Based on the previously proposed hidden Markov model, a method called PyroHMMvar is developed, which can simultaneously detect short indels and SNPs, as demonstrated in human resequencing data and is less sensitive to mapping parameter settings than the other methods.
Journal ArticleDOI

HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data.

TL;DR: HECTOR is a practical 454 pyrosequencing read error corrector which is competitive in terms of both correction quality and speed, and theoretically capable of processing arbitrary-length homopolymer-length errors, with a linear time complexity.
References
More filters
Journal ArticleDOI

The Sequence Alignment/Map format and SAMtools

TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI

Fast and accurate short read alignment with Burrows–Wheeler transform

TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Related Papers (5)