scispace - formally typeset
Book ChapterDOI

Computing similarity between RNA strings

Reads0
Chats0
TLDR
This paper defines a notion of alignment between two RNA strings and presents a method for optimally aligning a given RNA sequence with unknown secondary structure to one with known sequence and structure, thus attacking the structure prediction problem in the case when the structure of a closely related sequence is known.
Abstract
Ribonucleic acid (RNA) strings are strings over the four-letter alphabet {A,C,G,U} with a secondary structure of base-pairing between A-U and C-G pairs in the string Edges are drawn between two bases that are paired in the secondary structure and these edges have traditionally been assumed to be noncrossing The noncrossing base-pairing naturally leads to a tree-like representation of the secondary structure of RNA strings In this paper, we address several notions of similarity between two RNA strings that take into account both the primary sequence and secondary base-pairing structure of the strings We present efficient algorithms for exact matching and approximate matching between two RNA strings We define a notion of alignment between two RNA strings and devise algorithms based on dynamic programming We then present a method for optimally aligning a given RNA sequence with unknown secondary structure to one with known sequence and structure, thus attacking the structure prediction problem in the case when the structure of a closely related sequence is known The techniques employed to prove our results include reductions to well-known string matching problems, allowing wild cards and ranges, and speeding up dynamic programming by using the tree structures implicit in the secondary structure of RNA strings

read more

Citations
More filters
Journal ArticleDOI

A general edit distance between RNA structures.

TL;DR: The notion of edit distance is proposed to measure the similarity between two RNA secondary and tertiary structures, by incorporating various edit operations performed on both bases and arcs (i.e., base-pairs).
Proceedings ArticleDOI

Algorithmic aspects of protein structure similarity

TL;DR: These are the first approximation algorithms with guaranteed error bounds, and NP-completeness results in the literature in the area of protein structure alignment/fold recognition for measures of structure similarity of practical interest.
Journal ArticleDOI

Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization

TL;DR: A graph-based representation for sequence-structure alignments is presented, which is model as an integer linear program (ILP) using methods from combinatorial optimization and results on a recently published benchmark set for RNA alignments are presented.
Book ChapterDOI

Finding Common Subsequences with Arcs and Pseudoknots

TL;DR: The problem of finding the longest common subsequence, on which pairwise sequence comparison algorithms are frequently based, is modified to require common subsequences to preserve the arcs induced by the selected symbol positions to be analyzed using classical and parameterized complexity.
Dissertation

Algorithms and complexity for annotated sequence analysis

TL;DR: This research describes schemes to combinatorially annotate information onto sequences so that it can be analyzed in tandem with the sequence so that the overall result would reflect both types of information about the sequence.
References
More filters
Journal ArticleDOI

A general method applicable to the search for similarities in the amino acid sequence of two proteins

TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.
Journal ArticleDOI

Identification of common molecular subsequences.

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).
Journal ArticleDOI

Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information

TL;DR: In this article, a dynamic programming algorithm was proposed to fold an RNA molecule that finds a conformation of minimum free energy using published values of stacking and destabilizing energies, based on applied mathematics.
Journal ArticleDOI

Fast Pattern Matching in Strings

TL;DR: An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time.
Journal ArticleDOI

On finding all suboptimal foldings of an RNA molecule

TL;DR: The mathematical problem of determining how well defined a minimum energy folding is can now be solved and all predicted base pairs that can participate in suboptimal structures may be displayed and analyzed graphically.
Related Papers (5)