scispace - formally typeset
Search or ask a question

Showing papers on "Edit distance published in 1991"


Book ChapterDOI
09 Sep 1991
TL;DR: A scheme in which T is first preprocessed to make the subsequent searches with different P fast to find all approximate occurrences P′ of a pattern string P in a text string T such that the edit distance between P and P′ is ≤k.
Abstract: The problem of finding all approximate occurrences P′ of a pattern string P in a text string T such that the edit distance between P and P′ is ≤k is considered We concentrate on a scheme in which T is first preprocessed to make the subsequent searches with different P fast Two preprocessing methods and the corresponding search algorithms are described The first is based suffix automata and is applicable for edit distances with general edit operation costs The second is a special design for unit cost edit distance and is based on q-gram lists The preprocessing needs in both cases time and space O(|T|) The search algorithms run in the worst case in time O(|P||T|) or O(k|T|), and in the best case in time O(|P|)

175 citations


Journal Article
TL;DR: The modification of Hart's edit markup software is described and a second variation based on a simple edit distance algorithm adapted to a general Southeast Asian font system devised by the author is described.
Abstract: Hart, Nesbit and Nakayama, and others have described answer markup methods for providing feedback to short answers entered by foreign language learners. These methods are not directly applicable to the languages of Southeast Asia, which are not written in a strictly linear fashion. Instead, these languages contain written tones and vowel fragments which appear above and/ or below the main line. Thus a given column may have several written letters making unmodified columnwise edit markup misleading. This paper describes the modification of Hart's edit markup software and a second variation based on a simple edit distance algorithm adapted to a general Southeast Asian font system devised by the author.

4 citations


Journal ArticleDOI
TL;DR: Two parallel algorithms for sequence comparison on the Connection Machine 2 are given and the specific comparison measure is the edit distance, which is the minimum cost of transforming X into Y via a series of weighted insertions, deletions and substitutions of characters.
Abstract: We give two parallel algorithms for sequence comparison on the Connection Machine 2 (CM-2). The specific comparison measure we compute is the edit distance: given a finite alphabet ∑ and two input sequences X ϵ ∑+ and Y ϵ ∑+ the edit distance d(X,Y) is the minimum cost of transforming X into Y via a series of weighted insertions, deletions and substitutions of characters. The edit distance comparison measure is equivalent to or subsumes a broad range of well known sequence comparison measures. The CM-2 is very fast at performing parallel prefix operations. Our contribution consists of casting the problem in terms of these operations. Our first algorithm computes d(X,Y) using N processors and O(M S) time units, where M = min(|X|,||Y|) + 1, N = max(|X|,|Y|) + 1 and S is the time required for a parallel prefix operation. The second algorithm computes d(X,Y) using NM processors and O((log N log M)(S + R)) time units, where R is the time for a ‘router’ communication step—one in which each processor is able to read data, in parallel, from the memory of any other processor. Our algorithms can also be applied to several variants of the problem, such as subsequence comparisons, and one—many and many-many comparisons on 'sequence databases'.

2 citations