scispace - formally typeset
Open AccessJournal ArticleDOI

Highly improved homopolymer aware nucleotide-protein alignments with 454 data

Reads0
Chats0
TLDR
Increased accuracy provided by HAXAT does not only result in improved homologue estimations, but also provides un-interrupted reading-frames, which greatly facilitate further analysis of protein space, for example phylogenetic analysis.
Abstract
Roche 454 sequencing is the leading sequencing technology for producing long read high throughput sequence data. Unlike most methods where sequencing errors translate to base uncertainties, 454 sequencing inaccuracies create nucleotide gaps. These gaps are particularly troublesome for translated search tools such as BLASTx where they introduce frame-shifts and result in regions of decreased identity and/or terminated alignments, which affect further analysis. To address this issue, the Homopolymer Aware Cross Alignment Tool (HAXAT) was developed. HAXAT uses a novel dynamic programming algorithm for solving the optimal local alignment between a 454 nucleotide and a protein sequence by allowing frame-shifts, guided by 454 flowpeak values. The algorithm is an efficient minimal extension of the Smith-Waterman-Gotoh algorithm that easily fits in into other tools. Experiments using HAXAT demonstrate, through the introduction of 454 specific frame-shift penalties, significantly increased accuracy of alignments spanning homopolymer sequence errors. The full effect of the new parameters introduced with this novel alignment model is explored. Experimental results evaluating homopolymer inaccuracy through alignments show a two to five-fold increase in Matthews Correlation Coefficient over previous algorithms, for 454-derived data. This increased accuracy provided by HAXAT does not only result in improved homologue estimations, but also provides un-interrupted reading-frames, which greatly facilitate further analysis of protein space, for example phylogenetic analysis. The alignment tool is available at http://bioinfo.ifm.liu.se/454tools/haxat .

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea

TL;DR: Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study site in the summer and winter of three years, revealing differences in the viral communities throughout a depth profile and between seasons in the same year.
Journal ArticleDOI

Frameshift alignment: statistics and post-genomic applications

TL;DR: A method to estimate statistical significance of frameshift alignments, similar to classic BLAST statistics is described, suggesting that metagenomic analysis needs to use frameshIFT alignment to derive accurate results.
Journal ArticleDOI

PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data

TL;DR: A hidden Markov model (HMM) is proposed to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion and a realignment-based SNP-calling program, termed PyroHMMsnp, is developed, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach.
Journal ArticleDOI

Extended pairwise local alignment of wild card DNA/RNA sequences using dynamic programming

TL;DR: This paper proposes an algorithm which solves the problem of input data wild cards, offers a highly flexible set of parameters and displays a detailed alignment output and a compact representation of the mutated positions of the alignment.
Journal ArticleDOI

Improved DNA-Versus-Protein Homology Search for Protein Fossils

TL;DR: In this article , a 64×21 substitution matrix is fitted to sequence data, automatically learning the genetic code and detecting subtly homologous regions by considering alternative possible alignments between them, and calculate significance (probability of occurring by chance between random sequences).
References
More filters
Journal ArticleDOI

The Genome Sequencer FLX System--longer reads, more applications, straight forward bioinformatics and more complete data sets.

TL;DR: The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput.
Journal ArticleDOI

Metagenomic analysis of DNA viruses in a wastewater treatment plant in tropical climate.

TL;DR: Novel diversity and function with the DNA viral communities in the influent, activated sludge, anaerobic digester, and effluent of a domestic WWTP using metagenomics is revealed, suggesting that VLPs of most viral types could be present between 1 and 30 days in the process before they were discharged.
Journal ArticleDOI

Molecular sequence accuracy and the analysis of protein coding regions

TL;DR: With a simultaneous translation and alignment algorithm, identification of sequence homologies is resilient to the introduction of random errors and incorporation of prior knowledge about the location and characteristics of errors improves tolerance to error of amino acid sequence alignments.
Journal ArticleDOI

Heterogeneity of TT virus related sequences isolated from human tumour biopsy specimens.

TL;DR: The high variability of these virus types suggests that additional primer combinations within the highly conserved region of the genome would detect a still higher rate of positive tumours.
Journal ArticleDOI

An efficient simulator of 454 data using configurable statistical models.

TL;DR: A new platform independent application named 454sim is developed for simulation of 454 data at high speed and accuracy, generally 200 times faster compared to previous programs and it allows for simple adjustments of the statistical models.
Related Papers (5)