Search or ask a question

Showing papers on "Edit distance published in 1991"

PDF

Open Access

Book Chapter•DOI•

Two algorithms for approxmate string matching in static texts

[...]

Petteri Jokinen¹, Esko Ukkonen¹•Institutions (1)

University of Helsinki¹

09 Sep 1991

TL;DR: A scheme in which T is first preprocessed to make the subsequent searches with different P fast to find all approximate occurrences P′ of a pattern string P in a text string T such that the edit distance between P and P′ is ≤k.

...read moreread less

Abstract: The problem of finding all approximate occurrences P′ of a pattern string P in a text string T such that the edit distance between P and P′ is ≤k is considered We concentrate on a scheme in which T is first preprocessed to make the subsequent searches with different P fast Two preprocessing methods and the corresponding search algorithms are described The first is based suffix automata and is applicable for edit distances with general edit operation costs The second is a special design for unit cost edit distance and is based on q-gram lists The preprocessing needs in both cases time and space O(|T|) The search algorithms run in the worst case in time O(|P||T|) or O(k|T|), and in the best case in time O(|P|)

...read moreread less

175 citations

Journal Article•

Answer Markup Algorithms for Southeast Asian Languages

[...]

George M. Henry

01 Jan 1991-the CALICO Journal

TL;DR: The modification of Hart's edit markup software is described and a second variation based on a simple edit distance algorithm adapted to a general Southeast Asian font system devised by the author is described.

...read moreread less

Abstract: Hart, Nesbit and Nakayama, and others have described answer markup methods for providing feedback to short answers entered by foreign language learners. These methods are not directly applicable to the languages of Southeast Asia, which are not written in a strictly linear fashion. Instead, these languages contain written tones and vowel fragments which appear above and/ or below the main line. Thus a given column may have several written letters making unmodified columnwise edit markup misleading. This paper describes the modification of Hart's edit markup software and a second variation based on a simple edit distance algorithm adapted to a general Southeast Asian font system devised by the author.

...read moreread less

4 citations

Journal Article•DOI•

Sequence comparison on the connection machine

[...]

Mikhail J. Atallah¹, Scott McFaddin¹•Institutions (1)

Purdue University¹

01 May 1991-Concurrency and Computation: Practice and Experience

TL;DR: Two parallel algorithms for sequence comparison on the Connection Machine 2 are given and the specific comparison measure is the edit distance, which is the minimum cost of transforming X into Y via a series of weighted insertions, deletions and substitutions of characters.

...read moreread less

Abstract: We give two parallel algorithms for sequence comparison on the Connection Machine 2 (CM-2). The specific comparison measure we compute is the edit distance: given a finite alphabet ∑ and two input sequences X ϵ ∑+ and Y ϵ ∑+ the edit distance d(X,Y) is the minimum cost of transforming X into Y via a series of weighted insertions, deletions and substitutions of characters. The edit distance comparison measure is equivalent to or subsumes a broad range of well known sequence comparison measures. The CM-2 is very fast at performing parallel prefix operations. Our contribution consists of casting the problem in terms of these operations. Our first algorithm computes d(X,Y) using N processors and O(M S) time units, where M = min(|X|,||Y|) + 1, N = max(|X|,|Y|) + 1 and S is the time required for a parallel prefix operation. The second algorithm computes d(X,Y) using NM processors and O((log N log M)(S + R)) time units, where R is the time for a ‘router’ communication step—one in which each processor is able to read data, in parallel, from the memory of any other processor. Our algorithms can also be applied to several variants of the problem, such as subsequence comparisons, and one—many and many-many comparisons on 'sequence databases'.

...read moreread less

2 citations