Computing similarity between RNA strings

doi:10.1007/3-540-60044-2_30

Book ChapterDOI

Computing similarity between RNA strings

Vineet Bafna, +2 more

- pp 1-16

Chats0

TLDR

This paper defines a notion of alignment between two RNA strings and presents a method for optimally aligning a given RNA sequence with unknown secondary structure to one with known sequence and structure, thus attacking the structure prediction problem in the case when the structure of a closely related sequence is known.

Abstract:

Ribonucleic acid (RNA) strings are strings over the four-letter alphabet {A,C,G,U} with a secondary structure of base-pairing between A-U and C-G pairs in the string Edges are drawn between two bases that are paired in the secondary structure and these edges have traditionally been assumed to be noncrossing The noncrossing base-pairing naturally leads to a tree-like representation of the secondary structure of RNA strings In this paper, we address several notions of similarity between two RNA strings that take into account both the primary sequence and secondary base-pairing structure of the strings We present efficient algorithms for exact matching and approximate matching between two RNA strings We define a notion of alignment between two RNA strings and devise algorithms based on dynamic programming We then present a method for optimally aligning a given RNA sequence with unknown secondary structure to one with known sequence and structure, thus attacking the structure prediction problem in the case when the structure of a closely related sequence is known The techniques employed to prove our results include reductions to well-known string matching problems, allowing wild cards and ranges, and speeding up dynamic programming by using the tree structures implicit in the secondary structure of RNA strings

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A general edit distance between RNA structures.

Tao Jiang, +3 more

- 01 Jan 2002 -

Journal of Computational Biology

TL;DR: The notion of edit distance is proposed to measure the similarity between two RNA secondary and tertiary structures, by incorporating various edit operations performed on both bases and arcs (i.e., base-pairs).

...read moreread less

Proceedings ArticleDOI

Algorithmic aspects of protein structure similarity

Deborah Goldman, +2 more

TL;DR: These are the first approximation algorithms with guaranteed error bounds, and NP-completeness results in the literature in the area of protein structure alignment/fold recognition for measures of structure similarity of practical interest.

...read moreread less

Journal ArticleDOI

Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization

Markus Bauer, +2 more

- 27 Jul 2007 -

BMC Bioinformatics

TL;DR: A graph-based representation for sequence-structure alignments is presented, which is model as an integer linear program (ILP) using methods from combinatorial optimization and results on a recently published benchmark set for RNA alignments are presented.

...read moreread less

Book ChapterDOI

Finding Common Subsequences with Arcs and Pseudoknots

Patricia A. Evans

TL;DR: The problem of finding the longest common subsequence, on which pairwise sequence comparison algorithms are frequently based, is modified to require common subsequences to preserve the arcs induced by the selected symbol positions to be analyzed using classical and parameterized complexity.

...read moreread less

Dissertation

Algorithms and complexity for annotated sequence analysis

Michael R. Fellows, +2 more

TL;DR: This research describes schemes to combinatorially annotate information onto sequences so that it can be analyzed in tandem with the sequence so that the overall result would reflect both types of information about the sequence.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

A general method applicable to the search for similarities in the amino acid sequence of two proteins

Saul B. Needleman, +1 more

- 28 Mar 1970 -

Journal of Molecular Biology

TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.

...read moreread less

Journal ArticleDOI

Identification of common molecular subsequences.

Temple F. Smith, +1 more

- 25 Mar 1981 -

Journal of Molecular Biology

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

...read moreread less

Journal ArticleDOI

Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information

Michael Zuker, +1 more

- 10 Jan 1981 -

Nucleic Acids Research

TL;DR: In this article, a dynamic programming algorithm was proposed to fold an RNA molecule that finds a conformation of minimum free energy using published values of stacking and destabilizing energies, based on applied mathematics.

...read moreread less

Journal ArticleDOI

Fast Pattern Matching in Strings

Donald E. Knuth, +2 more

- 01 Jun 1977 -

SIAM Journal on Computing

TL;DR: An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time.

...read moreread less

Journal ArticleDOI

On finding all suboptimal foldings of an RNA molecule

Michael Zuker

- 07 Apr 1989 -

Science

TL;DR: The mathematical problem of determining how well defined a minimum energy folding is can now be solved and all predicted base pairs that can participate in suboptimal structures may be displayed and analyzed graphically.

...read moreread less