Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•

Edit distance string search

[...]

Eric Theodore Bax¹, Ian Douglas Swett¹•Institutions (1)

Avaya¹

09 Feb 2004

TL;DR: In this paper, a process determines for a search string which, if any, of the strings in a text list have edit distance from the search string less than a threshold, using dynamic programming.

...read moreread less

Abstract: A process determines for a search string which, if any, of the strings in a text list have edit distance from the search string less than a threshold. The process uses dynamic programming on a grid with search string characters corresponding to rows and text characters corresponding to columns. For each text string, computation proceeds by columns. If successive text strings share a prefix, then the columns corresponding to the prefix are re-used. If the minimum value in a column is at least the threshold, then the prefix corresponding to that and previous columns causes edit distance to be at least the threshold. So the computation for the present text is abandoned, and computations for any other texts that share the prefix are avoided.

...read moreread less

17 citations

Journal Article•DOI•

Multi-sorting algorithm for finding pairs of similar short substrings from large-scale string data

[...]

Takeaki Uno¹•Institutions (1)

National Institute of Informatics¹

01 Nov 2010

TL;DR: This paper addresses the problem of finding pairs of strings with small Hamming distances from huge databases composed of short strings of a fixed length, and proposes an algorithm that runs in time almost linear in the input/output size.

...read moreread less

Abstract: Finding similar substrings/substructures is a central task in analyzing huge string data such as genome sequences, Web documents, log data, feature vectors of pictures, photos, videos, etc. Although the existence of polynomial time algorithms for such problems is trivial since the number of substrings is bounded by the square of their lengths, straightforward algorithms do not work for huge databases because of their high degree order of the computation time. This paper addresses the problem of finding pairs of strings with small Hamming distances from huge databases composed of short strings of a fixed length. Comparison of long strings can be solved by inputting all their substrings of fixed length so that we can find candidates of similar non-short substrings. We focus on the practical efficiency of algorithms, and propose an algorithm that runs in time almost linear in the input/output size. We prove that the computation time of its variant is linear in the database size when the length of the short strings is constant, and computational experiments for genome sequences and Web texts show its practical efficiency. Slight modifications adapt to the edit distance and mismatch tolerance computation. An implementation is available at the author’s homepage.

...read moreread less

17 citations

Book Chapter•DOI•

Operator-Based distance for genetic programming: subtree crossover distance

[...]

Steven Gustafson¹, Leonardo Vanneschi²•Institutions (2)

University of Nottingham¹, University of Milano-Bicocca²

30 Mar 2005

TL;DR: This paper explores distance measures based on genetic operators for genetic programming using tree structures using subtree crossover operator and makes progress toward improved algorithmic analysis by using appropriate measures of distance and similarity.

...read moreread less

Abstract: This paper explores distance measures based on genetic operators for genetic programming using tree structures. The consistency between genetic operators and distance measures is a crucial point for analytical measures of problem difficulty, such as fitness distance correlation, and for measures of population diversity, such as entropy or variance. The contribution of this paper is the exploration of possible definitions and approximations of operator-based edit distance measures. In particular, we focus on the subtree crossover operator. An empirical study is presented to illustrate the features of an operator-based distance. This paper makes progress toward improved algorithmic analysis by using appropriate measures of distance and similarity.

...read moreread less

17 citations

Proceedings Article•DOI•

[...]

Thomas Bocek¹, E. Hunt², David Hausheer¹, Burkhard Stiller¹•Institutions (2)

University of Zurich¹, ETH Zurich²

07 Apr 2008

TL;DR: The new algorithm, called P2P fast similarity search (P2PFastSS), finds similar keys in any distributed hash table (DHT) using the edit distance metric, and is independent of the underlying P1P routing algorithm.

...read moreread less

Abstract: Peer-to-peer (P2P) systems show numerous advantages over centralized systems, such as load balancing, scalability, and fault tolerance, and they require certain functionality, such as search, repair, and message and data transfer. In particular, structured P2P networks perform an exact search in logarithmic time proportional to the number of peers. However, keyword similarity search in a structured P2P network remains a challenge. Similarity search for service discovery can significantly improve service management in a distributed environment. As services are often described informally in text form, keyword similarity search can find the required services or data items more reliably. This paper presents a fast similarity search algorithm for structured P2P systems. The new algorithm, called P2P fast similarity search (P2PFastSS), finds similar keys in any distributed hash table (DHT) using the edit distance metric, and is independent of the underlying P2P routing algorithm. Performance analysis shows that P2PFastSS carries out a similarity search in time proportional to the logarithm of the number of peers. Simulations on PlanetLab confirm these results and show that a similarity search with 34,000 peers performs in less than three seconds on average. Thus, P2PFastSS is suitable for similarity search in large-scale network infrastructures, such as service description matching in service discovery or searching for similar terms in P2P storage networks.

...read moreread less

17 citations

Using Long-Term Structure to Retrieve Music: Representation and Matching

[...]

Jean-Julien Aucouturier, Mark Sandler¹•Institutions (1)

Queen Mary University of London¹

01 Jan 2001

TL;DR: A measure of the similarity of the long-term structure of musical pieces is presented, which can be matched to other similar scores using a generalized edit distance, in order to assess structural similarity.

...read moreread less

Abstract: We present a measure of the similarity of the long-term structure of musical pieces. The system deals with raw polyphonic data. Through unsupervised learning, we generate an abstract representation of music the “texture score”. This “texture score” can be matched to other similar scores using a generalized edit distance, in order to assess structural similarity. We notably apply this algorithm to the retrieval of different interpretations of the same song within a music database.

...read moreread less

17 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics