scispace - formally typeset
Open AccessJournal ArticleDOI

Algorithms for the Longest Common Subsequence Problem

Daniel S. Hirschberg
- 01 Oct 1977 - 
- Vol. 24, Iss: 4, pp 664-675
Reads0
Chats0
TLDR
A lgor i thm is appl icable in the genera l case and requi res O ( p n + n log n) t ime for any input strings o f lengths m and n even though the lower bound on T ime of O ( m n ) need not apply to all inputs.
Abstract
We start by def ining conven t ions and t e rmino logy that will be used th roughou t this paper . String C = clc~ ... cp is a subsequence of string A = aja2 "'" am if there is a mapp ing F : {1, 2 . . . . , p} ~ {1, 2, ... , m} such that F(i) = k only if c~ = ak and F is a m o n o t o n e strictly increasing funct ion (i .e. F(i) = u, F(]) = v, and i < j imply that u < v). C can be fo rmed by delet ing m p (not necessari ly ad jacen t ) symbols f rom A . F o r example , " c o u r s e " is a subsequence of " c o m p u t e r sc ience . " Str ing C is a c o m m o n s ubs equence of strings A and B if C is a s u b s e q u e n c e of A and also a subsequence of B. String C is a longest c o m m o n subsequence (abbrev ia ted LCS) of string A and B if C is a c o m m o n subsequence of A and B of maximal length , i .e. there is no c o m m o n subsequence of A and B that has grea te r length. Th roughou t this paper , we assume that A and B are strings of lengths m and n , m _< n , that have an LCS C of (unknown) length p . We assume that the symbols that may appea r in these strings c o m e f rom some a lphabet of size t . A symbol can be s tored in m e m o r y by using log t bits, which we assume will fit in one word of memory . Symbols can be c o m p a r e d (a -< b?) in one t ime unit . The n u m b e r of di f ferent symbols that actual ly appear in string B is def ined to be s (which must be less than n and t). The longest c o m m o n s u b s e q u e n c e prob lem has been solved by using a recurs ion re la t ionship on the length of the solut ion [7, 12, 16, 21]. These are general ly appl icable a lgor i thms that take O ( m n ) t ime for any input strings o f lengths m and n even though the lower bound on t ime of O ( m n ) need not apply to all inputs [2]. We present a lgor i thms that , depend ing on the na ture of the Input, may not requ i re quadra t ic t ime to r ecove r an LCS. The first a lgor i thm is appl icable in the genera l case and requi res O ( p n + n log n) t ime. T h e second a lgor i thm requi res t ime b o u n d e d by O((m + 1 p )p log n). In the c o m m o n special case where p is close to m , this a lgor i thm takes t ime

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances

TL;DR: This work implemented 18 recently proposed algorithms in a common Java framework and compared them against two standard benchmark classifiers (and each other) by performing 100 resampling experiments on each of the 85 datasets, indicating that only nine of these algorithms are significantly more accurate than both benchmarks.
Journal ArticleDOI

Computing the fréchet distance between two polygonal curves

TL;DR: As a measure for the resemblance of curves in arbitrary dimensions the authors consider the so-called Frechet-distance, which is compatible with parametrizations of the curves, and for polygonal chains P and Q consisting of p and q edges an algorithm of runtime O(pq log( pq))) measuring the Frechet Distance.
Journal ArticleDOI

The Tree-to-Tree Correction Problem

TL;DR: An algorithm is presented which solves the problem of determining the distance from T to T' as measured by the mlmmum cost sequence of edit operaUons needed to transform T into T'.
Journal ArticleDOI

An O ( ND ) difference algorithm and its variations

TL;DR: A simpleO(ND) time and space algorithm is developed whereN is the sum of the lengths of A andB andD is the size of the minimum edit script forA andB, and the algorithm performs well when differences are small and is consequently fast in typical applications.
Book

Data Mining: The Textbook

TL;DR: This textbook explores the different aspects of data mining from the fundamentals to the complex data types and their applications, capturing the wide diversity of problem domains for data mining issues.
References
More filters
Journal ArticleDOI

A general method applicable to the search for similarities in the amino acid sequence of two proteins

TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.
Book

The Design and Analysis of Computer Algorithms

TL;DR: This text introduces the basic data structures and programming techniques often used in efficient algorithms, and covers use of lists, push-down stacks, queues, trees, and graphs.
Journal ArticleDOI

The String-to-String Correction Problem

TL;DR: An algorithm is presented which solves the string-to-string correction problem in time proportional to the product of the lengths of the two strings.
Related Papers (5)