Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Indexing mixed types for approximate retrieval

[...]

Liang Jin¹, Chen Li¹, Nick Koudas², Anthony K. H. Tung³•Institutions (3)

University of California, Irvine¹, University of Toronto², National University of Singapore³

30 Aug 2005

TL;DR: The approach presented is based on representing sets of strings at higher levels of the index structure as tries suitably compressed in a way that reasoning about edit distance between a query string and a compressed trie at index nodes is still feasible.

...read moreread less

Abstract: In various applications such as data cleansing, being able to retrieve categorical or numerical attributes based on notions of approximate match (e.g., edit distance, numerical distance) is of profound importance. Commonly, approximate match predicates are specified on combinations of attributes in conjunction. Existing database techniques for approximate retrieval, however, limit their applicability to single attribute retrieval through B-trees and their variants. In this paper, we propose a methodology that utilizes known multidimensional indexing structures for the problem of approximate multi-attribute retrieval. Our method enables indexing of a collection of string and/or numeric attributes to facilitate approximate retrieval using edit distance as an approximate match predicate for strings and numeric distance for numeric attributes. The approach presented is based on representing sets of strings at higher levels of the index structure as tries suitably compressed in a way that reasoning about edit distance between a query string and a compressed trie at index nodes is still feasible. We propose and evaluate various techniques to generate the compressed trie representation and fully specify our indexing methodology. Our experimental results show the benefits of our proposal when compared with various alternate strategies for the same problem.

...read moreread less

24 citations

Journal Article•

Algorithms for transposition invariant string matching

[...]

Veli Mäkinen, Gonzalo Navarro, Esko Ukkonen

01 Jan 2003-Lecture Notes in Computer Science

TL;DR: In this paper, the problem of computing the transposition invariant distance for various distance functions d, that are different versions of the edit distance, was studied, and algorithms whose time complexities are close to the known upper bounds were given.

...read moreread less

Abstract: Given strings A and B over an alphabet Σ C U, where U is some numerical universe closed under addition and subtraction, and a distance function d(A, B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is min t ∈ U {d(A + t,B)}, where A + t = (a 1 + t)(a 2 + t)... (a m + t). We study the problem of computing the transposition invariant distance for various distance (and similarity) functions d, that are different versions of the edit distance. For all these problems we give algorithms whose time complexities are close to the known upper bounds without transposition invariance. In particular, we show how sparse dynamic programming can be used to solve transposition invariant problems.

...read moreread less

24 citations

[...]

Lei Chen¹•Institutions (1)

University of Waterloo¹

01 Jan 2005

TL;DR: Various similarity models are proposed to capture the similarities among time series and trajectory data under various circumstances and requirements, such as the appearance of noise and local time shifting.

...read moreread less

Abstract: Time series data have been used in many applications, such as financial data analysis and weather forecasting. Similarly, trajectories of moving objects are often used to perform movement pattern analysis in surveillance video and sensor monitoring systems. All these applications are closely related to similarity-based time series or trajectory data retrieval. In this dissertation, various similarity models are proposed to capture the similarities among time series and trajectory data under various circumstances and requirements, such as the appearance of noise and local time shifting. A novel representation, called multi-scale time series histograms , is proposed to answer pattern existence queries and shape match queries. Earlier proposals generally address one or the other; multi-scale time series histograms can answer both types, which offers users more flexibility. A metric distance function, called Edit distance with Real Penalty (ERP), is proposed that can support local time shifting in time series and trajectory data. A second distance function, Edit Distance on Real sequence (EDR) is proposed to measure the similarity between time series or trajectories with local time shifting and noise. Since the proposed similarity models are computationally expensive, several indexing and pruning methods are proposed to improve the retrieval efficiency. For multi-scale time series histograms, A multi-step filtering process is introduced to improve the retrieval efficiency without introducing false dismissals. For ERP, a framework is developed to index time series or trajectory data under a metric distance function, which exploits the pruning power of lower bounding and triangle inequality. For EDR, three pruning techniques—mean value Q-grams, near triangle inequality, and trajectory histograms—are developed to improve the retrieval efficiency.

...read moreread less

24 citations

Book Chapter•DOI•

ELKI in Time: ELKI 0.2 for the Performance Evaluation of Distance Measures for Time Series

[...]

Elke Achtert¹, Thomas Bernecker¹, Hans-Peter Kriegel¹, Erich Schubert¹, Arthur Zimek¹ - Show less +1 more•Institutions (1)

Ludwig Maximilian University of Munich¹

30 Jun 2009

TL;DR: The new version ELKI 0.2 now is extended to time series data and offers a selection of specialized distance measures, which can serve as a visualization- and evaluation-tool for the behavior of different distance measures on time seriesData.

...read moreread less

Abstract: ELKI is a unified software framework, designed as a tool suitable for evaluation of different algorithms on high dimensional real-valued feature-vectors. A special case of high dimensional real-valued feature-vectors are time series data where traditional distance measures like L p -distances can be applied. However, also a broad range of specialized distance measures like, e.g., dynamic time-warping, or generalized distance measures like second order distances, e.g., shared-nearest-neighbor distances, have been proposed. The new version ELKI 0.2 now is extended to time series data and offers a selection of these distance measures. It can serve as a visualization- and evaluation-tool for the behavior of different distance measures on time series data.

...read moreread less

23 citations

Journal Article•DOI•

Unified Compression-Based Acceleration of Edit-Distance Computation

[...]

Danny Hermelin¹, Gad M. Landau², Shir Landau³, Oren Weimann²•Institutions (3)

Max Planck Society¹, University of Haifa², Tel Aviv University³

01 Feb 2013-Algorithmica

TL;DR: This paper presents an algorithm running in O(nNlg(N/n) time for computing the edit-distance of these two strings under any rational scoring function, and an O( n2/3N4/3) time algorithm for arbitrary scoring functions.

...read moreread less

Abstract: The edit distance problem is a classical fundamental problem in computer science in general, and in combinatorial pattern matching in particular. The standard dynamic programming solution for this problem computes the edit-distance between a pair of strings of total length O(N) in O(N2) time. To this date, this quadratic upper-bound has never been substantially improved for general strings. However, there are known techniques for breaking this bound in case the strings are known to compress well under a particular compression scheme. The basic idea is to first compress the strings, and then to compute the edit distance between the compressed strings. As it turns out, practically all known o(N2) edit-distance algorithms work, in some sense, under the same paradigm described above. It is therefore natural to ask whether there is a single edit-distance algorithm that works for strings which are compressed under any compression scheme. A rephrasing of this question is to ask whether a single algorithm can exploit the compressibility properties of strings under any compression method, even if each string is compressed using a different compression. In this paper we set out to answer this question by using straight line programs. These provide a generic platform for representing many popular compression schemes including the LZ-family, Run-Length Encoding, Byte-Pair Encoding, and dictionary methods. For two strings of total length N having straight-line program representations of total size n, we present an algorithm running in O(nNlg(N/n)) time for computing the edit-distance of these two strings under any rational scoring function, and an O(n2/3N4/3) time algorithm for arbitrary scoring functions. Our new result, while providing a speed up for compressible strings, does not surpass the quadratic time bound even in the worst case scenario.

...read moreread less

23 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics