Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Faster and Space-Optimal Edit Distance 1 Dictionary

[...]

Djamal Belazzougui¹•Institutions (1)

École Normale Supérieure¹

18 Jun 2009

TL;DR: This paper proposes the first data structure for approximate dictionary search that occupies optimal space (up to a constant factor) and able to answer an approximate query for edit distance "1" (report all strings of dictionary that are at edit distance at most " 1" from query string) in time linear in the length of query string.

...read moreread less

Abstract: In the approximate dictionary search problem we have to construct a data structure on a set of strings so that we can answer to queries of the kind: find all strings of the set that are similar (according to some string distance) to a given string. In this paper we propose the first data structure for approximate dictionary search that occupies optimal space (up to a constant factor) and able to answer an approximate query for edit distance "1" (report all strings of dictionary that are at edit distance at most "1" from query string) in time linear in the length of query string. Based on our new dictionary we propose a full-text index for approximate queries with edit distance "1" (report all positions of all sub-strings of the text that are at edit distance at most "1" from query string) answering to a query in time linear in the length of query string using space $O(n(\lg(n)\lg\lg(n))^2)$ in the worst case on a text of length n . Our index is the first index that answers queries in time linear in the length of query string while using space O (n ·poly (log (n ))) in the worst case and for any alphabet size.

...read moreread less

26 citations

Journal Article•DOI•

Analysing differences among animal songs quantitatively by means of the Levenshtein distance measure

[...]

Jakob Tougaard, Nina Eriksen

01 Jan 2006-Behaviour

TL;DR: This analysis is extended and a first approach to a robust statistical test is developed that addresses the central issue whether two groups of songs belong to the same population of songs or are significantly different.

...read moreread less

Abstract: The Levenshtein or string edit distance is an objective measure of the difference between two strings of elements. Levenshtein distance analysis has previously been applied to humpback whale songs, where it provided a quantitative measure of song change from year to year. This analysis is extended and a first approach to a robust statistical test is developed. The statistical test addresses the central issue whether two groups of songs (either from different individuals, different groups or different years) belong to the same population of songs or are significantly different. This is accomplished through derivation of the Kohonen median song sequence, which has the smallest possible summed Levenshtein distance to all songs of the group. By a simple t-test or nonparametric equivalent it is tested whether the median distance to the Kohonen median song sequence of a second group is significantly larger, which indicates that the groups are different. The test is expanded to handle multiple comparisons among several groups of songs.

...read moreread less

26 citations

Proceedings Article•DOI•

Context dependent phonetic string edit distance for automatic speech recognition

[...]

Jasha Droppo¹, Alex Acero¹•Institutions (1)

Microsoft¹

14 Mar 2010

TL;DR: It is shown how this phonetic string edit distance can be learned from data, and that including context in the model is essential for good performance, and improved accuracy on a business search task is demonstrated.

...read moreread less

Abstract: An automatic speech recognition system searches for the word transcription with the highest overall score for a given acoustic observation sequence. This overall score is typically a weighted combination of a language model score and an acoustic model score. We propose including a third score, which measures the similarity of the word transcription's pronunciation to the output of a less constrained phonetic recognizer. We show how this phonetic string edit distance can be learned from data, and that including context in the model is essential for good performance. We demonstrate improved accuracy on a business search task.

...read moreread less

26 citations

Journal Article•DOI•

Optimal Symbol Alignment Distance: A New Distance for Sequences of Symbols

[...]

Javier Herranz, Jordi Nin, Marc Solé

01 Oct 2011-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper proposes a new distance for sequences of symbols (or strings) called Optimal Symbol Alignment distance (OSA distance, for short), which has a very low cost in practice, which makes it a suitable candidate for computing distances in applications with large amounts of sequences.

...read moreread less

Abstract: Comparison functions for sequences (of symbols) are important components of many applications, for example, clustering, data cleansing, and integration. For years, many efforts have been made to improve the performance of such comparison functions. Improvements have been done either at the cost of reducing the accuracy of the comparison, or by compromising certain basic characteristics of the functions, such as the triangular inequality. In this paper, we propose a new distance for sequences of symbols (or strings) called Optimal Symbol Alignment distance (OSA distance, for short). This distance has a very low cost in practice, which makes it a suitable candidate for computing distances in applications with large amounts of (very long) sequences. After providing a mathematical proof that the OSA distance is a real distance, we present some experiments for different scenarios (DNA sequences, record linkage, etc.), showing that the proposed distance outperforms, in terms of execution time and/or accuracy, other well-known comparison functions such as the Edit or Jaro-Winkler distances.

...read moreread less

26 citations

Patent•

Character string updated degree evaluation program

[...]

Masayuki Takahashi¹, Yoshiki Mikami¹, Katsuko T. Nakahira¹•Institutions (1)

Nagaoka University of Technology¹

18 May 2007

TL;DR: The authors provided a character string updated degree evaluation program that enables quantitative grasping of an amount of intellectual work through editing and updating of character strings, where a text subjected to comparison is divided into common part character strings each having a length greater than or equal to a threshold value, and non-common part character string strings.

...read moreread less

Abstract: There is provided a character string updated degree evaluation program that enables quantitative grasping of an amount of intellectual work through editing and updating of character strings. A text subjected to comparison is divided into common part character strings each having a length greater than or equal to a threshold value, and non-common part character strings. A number of edited points from the original text and a context edit distance are calculated based on the rate of the common part character strings and the occurrence pattern thereof. A number of edited point is acquired from a number of elements contained in a common part character string set, and a context edit distance is acquired from a change in an order of occurrence of the common part character strings. Calculation of a new creation percentage and analysis by an N-gram are performed on the non-common part character string. The new creation percentage is acquired from the total length of the elements contained in a non-common part character string set, and a new creation novelty degree is acquired from a non-partial matching rate between a non-common part character string set and an element contained in the non-common part character string set. Calculations for the common part character string set and for the non-common part character string set are united, thereby calculating a text updated degree.

...read moreread less

26 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics