Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

On-Line Approximate String Searching Algorithms: Survey and Experimental Results

[...]

Panagiotis D. Michailidis¹, Konstantinos G. Margaritis¹•Institutions (1)

University of Macedonia¹

01 Jan 2002-International Journal of Computer Mathematics

TL;DR: This paper presents a short survey and experimental results for well known sequential approximate string searching algorithms based on different approaches including dynamic programming, deterministic finite automata, filtering, counting and bit parallelism.

...read moreread less

Abstract: The problem of approximate string searching comprises two classes of problems: string searching with k mismatches and string searching with k differences. In this paper we present a short survey and experimental results for well known sequential approximate string searching algorithms. We consider algorithms based on different approaches including dynamic programming, deterministic finite automata, filtering, counting and bit parallelism. We compare these algorithms in terms of running time against pattern length and for several values of k for four different kinds of text: binary alphabet, alphabet of size 8, English alphabet and DNA alphabet. Finally, we compare the experimental results of the algorithms with their theoretical complexities.

...read moreread less

28 citations

Book Chapter•DOI•

Measuring spelling similarity for cognate identification

[...]

Luís Gomes¹, José Gabriel Pereira Lopes¹•Institutions (1)

Universidade Nova de Lisboa¹

10 Oct 2011

TL;DR: SpSim as discussed by the authors is a new spelling similarity measure for cognate identification that is tolerant towards characteristic spelling differences that are automatically extracted from a set of cognates known apriori.

...read moreread less

Abstract: The most commonly used measures of string similarity, such as the Longest Common Subsequence Ratio (LCSR) and those based on Edit Distance, only take into account the number of matched and mismatched characters. However, we observe that cognates belonging to a pair of languages exhibit recurrent spelling differences such as "ph" and "f" in English-Portuguese cognates "phase" and "fase". Those differences are attributable to the evolution of the spelling rules of each language over time, and thus they should not be penalized in the same way as arbitrary differences found in non-cognate words, if we are using word similarity as an indicator of cognaticity. This paper describes SpSim, a new spelling similarity measure for cognate identification that is tolerant towards characteristic spelling differences that are automatically extracted from a set of cognates known apriori. Compared to LCSR and EdSim (Edit Distance -based similarity), SpSim yields an F-measure 10% higher when used for cognate identification on five different language pairs.

...read moreread less

28 citations

Proceedings Article•

Learning String Edit Distance

[...]

Eric Sven Ristad¹, Peter N. Yianilos¹•Institutions (1)

Princeton University¹

08 Jul 1997

TL;DR: In this paper, a stochastic model for string-edit distance is proposed, which is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.

...read moreread less

Abstract: In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string-edit distance. Our stochastic model allows us to learn a string-edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string-edit distance with nearly one-fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.

...read moreread less

28 citations

Patent•

Method for computing the minimum edit distance with fine granularity suitably quickly

[...]

Peter Klier

28 Apr 2005

TL;DR: In this article, the A* (or A-star) search is used to search for the answer using a novel counting heuristic that gives a lower bound on the minimum edit distance for any given subproblem.

...read moreread less

Abstract: This invention related to a method for computing the minimum edit distance, measured as the number of insertions plus the number of deletions, between two sequences of data, which runs in an amount of time that is nearly proportional to the size of the input data under many circumstances. Utilizing the A* (or A-star) search, the invention searches for the answer using a novel counting heuristic that gives a lower bound on the minimum edit distance for any given subproblem. In addition, regions over which the heuristic matches the maximum value of the answer are optimized by eliminating the search over redundant paths. The invention can also be used to produce the edit script. The invention can be modified for other types of comparison and pattern recognition.

...read moreread less

28 citations

Journal Article•DOI•

VLSI algorithms for solving recurrence equations and applications

[...]

Oscar H. Ibarra¹, Michael A. Palis•Institutions (1)

University of Minnesota¹

01 Jul 1987-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Optimal linear-time algorithms for solving recurrence equations on simple systolic arrays are presented and applications to some pattern recognition and sequence comparison problems are given.

...read moreread less

Abstract: Optimal linear-time algorithms for solving recurrence equations on simple systolic arrays are presented. The systolic arrays use only one-way communication between processors and communicate with the external environment through only one I/O port. Because of their architectural simplicity, the arrays are well suited for direct VLSI implementation. Applications to some pattern recognition and sequence comparison problems are given. For example, it is shown that the set of (k + 2)-tuples of strings (x 1 , . . . , x k+1 , Y) such that y is a shuffle of x 1 ,. . . , x k+1 can be recognized by a one-way k-dimensional systolic array in (k + 1)n - k time. The longest common subsequence (LCS) problem and the string-to-string correction problem are also considered: the length of an LCS of k + 1 sequences can be computed by a one-way k-dimensional systolic array in (k + 1) n - k time; the edit distance between two strings can be computed by a one-way dimensional systolic array in 2n - 1 time. Applications to other related problems, e.g., dynamic time warping and optimum generalized alignment, as well as optimal-time simulations of multihead acceptors and multitape transducers are also given.

...read moreread less

28 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics