scispace - formally typeset
Open AccessJournal ArticleDOI

Highly improved homopolymer aware nucleotide-protein alignments with 454 data

TLDR
Increased accuracy provided by HAXAT does not only result in improved homologue estimations, but also provides un-interrupted reading-frames, which greatly facilitate further analysis of protein space, for example phylogenetic analysis.
Abstract
Roche 454 sequencing is the leading sequencing technology for producing long read high throughput sequence data. Unlike most methods where sequencing errors translate to base uncertainties, 454 sequencing inaccuracies create nucleotide gaps. These gaps are particularly troublesome for translated search tools such as BLASTx where they introduce frame-shifts and result in regions of decreased identity and/or terminated alignments, which affect further analysis. To address this issue, the Homopolymer Aware Cross Alignment Tool (HAXAT) was developed. HAXAT uses a novel dynamic programming algorithm for solving the optimal local alignment between a 454 nucleotide and a protein sequence by allowing frame-shifts, guided by 454 flowpeak values. The algorithm is an efficient minimal extension of the Smith-Waterman-Gotoh algorithm that easily fits in into other tools. Experiments using HAXAT demonstrate, through the introduction of 454 specific frame-shift penalties, significantly increased accuracy of alignments spanning homopolymer sequence errors. The full effect of the new parameters introduced with this novel alignment model is explored. Experimental results evaluating homopolymer inaccuracy through alignments show a two to five-fold increase in Matthews Correlation Coefficient over previous algorithms, for 454-derived data. This increased accuracy provided by HAXAT does not only result in improved homologue estimations, but also provides un-interrupted reading-frames, which greatly facilitate further analysis of protein space, for example phylogenetic analysis. The alignment tool is available at http://bioinfo.ifm.liu.se/454tools/haxat .

read more

Content maybe subject to copyright    Report

Citations
More filters
Dissertation

Transcriptional co-regulation of microRNAs and protein-coding genes

Aaron Webber
TL;DR: This study has demonstrated significant novel linkages between the transcriptional TRF and post-transcriptional microRNA-mediated regulatory layers and contributes to the characterization of both natural and pathogenic SIV infections, with longer term implications for HIV therapeutics.

Marine Viral Diversity and Spatiotemporal Variability

TL;DR: This document summarizes current capabilities, research and operational priorities, and plans for further studies that were established at the 2015 USGS workshop on quantitative hazard assessments of earthquake-triggered landsliding and liquefaction in the Central American region.
Posted ContentDOI

Improved DNA-versus-Protein Homology Search for Protein Fossils

TL;DR: In this article, a 64x21 substitution matrix is fitted to sequence data, automatically learning the genetic code and detecting subtly homologous regions by considering alternative possible alignments between them, and calculate significance (probability of occurring by chance between random sequences).
Patent

Methods and systems for local sequence alignment

TL;DR: A method for nucleic acid sequencing includes disposing a plurality of template polynucleotide strands in a plurality defined spaces disposed on a sensor array, at least some of the templates having a sequencing primer and a polymerase operably bound therewith as mentioned in this paper.
Book ChapterDOI

Improved DNA-versus-Protein Homology Search for Protein Fossils.

TL;DR: In this article, the authors used a 64-times-21 substitution matrix fitted to sequence data, automatically learning the genetic code to detect subtly homologous regions by considering alternative possible alignments between them, and calculate significance.
References
More filters
Journal ArticleDOI

Basic Local Alignment Search Tool

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI

Improved tools for biological sequence comparison.

TL;DR: Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity.
Journal ArticleDOI

A general method applicable to the search for similarities in the amino acid sequence of two proteins

TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.
Journal ArticleDOI

Identification of common molecular subsequences.

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).
Related Papers (5)