Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•

System and Method for Machine Learning using a Similarity Inverse Matrix

[...]

Christine Podilchuk

02 Jan 2007

TL;DR: In this article, a system and method of machine learning that uses an inverse matrix of a reference similarity matrix as a transformation matrix is proposed. But the transformation matrix cannot be used to improve the performance of query vectors in classifying or identifying digital representations of an unknown object.

...read moreread less

Abstract: A system and method of machine learning that uses an inverse matrix of a reference similarity matrix as a transformation matrix. The reference similarity matrix relates a reference set of objects to themselves using a distance metric such as an image edit distance. The transformation matrix is used to improve the performance of query vectors in classifying or identifying digital representations of an unknown object. The query vector is a measure of similarity between the unknown object and the members of the reference set. Multiplying the query vector by the transformation matrix produces an improved query vector having improved similarity scores. The highest improved similarity score indicates the best match member of the reference set If the similarity score is high enough, the unknown object may either be classified as belonging to the same class, or recognized as being the same object, as the best match object.

...read moreread less

20 citations

Patent•

System and method for detecting matches of small edit distance

[...]

Ziv Bar-Yossef¹, Robert Krauthgamer¹, Shanmugasundaram Ravikumar¹, Jayram S. Thathachar¹•Institutions (1)

IBM¹

30 Sep 2005

TL;DR: In this article, a system and method of approximating edit distance for a set of character strings in a database includes producing a representative sketch for each of the character strings; and approximating an edit distance between two selected character strings based only on the representative sketch.

...read moreread less

Abstract: A system and method of approximating edit distance for a set of character strings in a database includes producing a representative sketch for each of the character strings; and approximating an edit distance between two selected character strings based only on the representative sketch for each of the selected character strings. The character strings may comprise text, wherein the method further comprises encoding positions of substrings in the text using anchors, wherein the anchors comprise identical substrings occurring in two input character strings at a nearby position. A set of anchors may be used in a correlated manner, wherein character strings with a sufficiently small edit distance are likely to use a same sequence of anchors. The character strings may be substantially non-repetitive. The representative sketch of a first character string is preferably constructed absent knowledge of a second character string. A size of the representative sketch may be constant.

...read moreread less

20 citations

Proceedings Article•DOI•

FIExPat: flexible extraction of sequential patterns

[...]

P.-Y. Rolland¹•Institutions (1)

Université Paul Cézanne Aix-Marseille III¹

29 Nov 2001

TL;DR: The FlExPat algorithm is designed to satisfactorily cope with the trade-off between flexibility, particularly in sequence data representation and in associated similarity metrics, and computational efficiency, and some experimental results obtained with FlExpat on music data are presented and commented.

...read moreread less

Abstract: This paper addresses sequential data mining, a sub-area of data mining where the data to be analyzed is organized in sequences. In many problem domains a natural ordering exists over data. Examples of sequential databases (SDBs) include: (a) collections of temporal data sequences, such as chronological series of daily stock indices or multimedia data (sound, music, video, etc.); and (b) macromolecule banks, where amino acid or proteic sequences are represented as strings. In a SDB it is often valuable to detect regularities through one or several sequences. In particular, finding exact or approximate repetitions of segments can be utilized directly (e.g. for determining the biochemical activity of a protein region) or indirectly, e.g. for prediction in finance. To this end, we present concepts and an algorithm for automatically extracting sequential patterns from a sequential database. Such a pattern is defined as a group of significantly similar segments from one or several sequences. Appropriate functions for measuring similarity between sequence segments are proposed, generalizing the edit distance framework. There is a trade off between flexibility, particularly in sequence data representation and in associated similarity metrics, and computational efficiency. We designed the FlExPat algorithm to satisfactorily cope with this trade-off. FlExPat's complexity is in practice lesser than quadratic in the total length of the SDB analyzed, while allowing high flexibility. Some experimental results obtained with FlExPat on music data are presented and commented.

...read moreread less

20 citations

Journal Article•DOI•

A clique-based method using dynamic programming for computing edit distance between unordered trees.

[...]

Tomoya Mori¹, Takeyuki Tamura, Daiji Fukagawa, Atsuhiro Takasu, Etsuji Tomita, Tatsuya Akutsu - Show less +2 more•Institutions (1)

Kyoto University¹

11 Oct 2012-Journal of Computational Biology

TL;DR: The improved method is obtained by introducing a dynamic programming scheme and heuristic techniques to the previous clique-based method for the tree edit distance problem for unordered trees, and is much faster than the previous method.

...read moreread less

Abstract: Many kinds of tree-structured data, such as RNA secondary structures, have become available due to the progress of techniques in the field of molecular biology. To analyze the tree-structured data, various measures for computing the similarity between them have been developed and applied. Among them, tree edit distance is one of the most widely used measures. However, the tree edit distance problem for unordered trees is NP-hard. Therefore, it is required to develop efficient algorithms for the problem. Recently, a practical method called clique-based algorithm has been proposed, but it is not fast for large trees. This article presents an improved clique-based method for the tree edit distance problem for unordered trees. The improved method is obtained by introducing a dynamic programming scheme and heuristic techniques to the previous clique-based method. To evaluate the efficiency of the improved method, we applied the method to comparison of real tree structured data such as glycan structure...

...read moreread less

20 citations

Journal Article•DOI•

An image-based near-duplicate video retrieval and localization using improved Edit distance

[...]

Hao Liu¹, Qingjie Zhao¹, Hao Wang¹, Peng Lv¹, Yanming Chen¹ - Show less +1 more•Institutions (1)

Beijing Institute of Technology¹

01 Nov 2017-Multimedia Tools and Applications

TL;DR: An image-based algorithm using improved Edit distance for near-duplicate video retrieval and localization and a detect-and-refine-strategy-based dynamic programming algorithm is proposed to generate the path matrix, which can be used to aggregate scores for video similarity measure and localize the similar parts.

...read moreread less

Abstract: The rapid development of social network in recent years has spurred enormous growth of near-duplicate videos. The existence of huge volumes of near-duplicates shows a rising demand on effective near-duplicate video retrieval technique in copyright violation and search result reranking. In this paper, we propose an image-based algorithm using improved Edit distance for near-duplicate video retrieval and localization. By regarding video sequences as strings, Edit distance is used and extended to retrieve and localize near-duplicate videos. Firstly, bag-of-words (BOW) model is utilized to measure the frame similarities, which is robust to spatial transformations. Then, non-near-duplicate videos are filtered out by computing the proposed relative Edit distance similarity (REDS). Next, a detect-and-refine-strategy-based dynamic programming algorithm is proposed to generate the path matrix, which can be used to aggregate scores for video similarity measure and localize the similar parts. Experiments on CC_WEB_VIDEO and TREC CBCD 2011 datasets demonstrated the effectiveness and robustness of the proposed method in retrieval and localization tasks.

...read moreread less

20 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics