Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

[...]

Karin Kailing¹, Hans-Peter Kriegel¹, Stefan Schönauer¹, Thomas Seidl²•Institutions (2)

Ludwig Maximilian University of Munich¹, RWTH Aachen University²

14 Mar 2004

TL;DR: In this article, a set of new filter methods for structural and for content-based information in tree-structured data as well as ways to flexibly combine different filter criteria are presented.

...read moreread less

Abstract: Structured and semi-structured object representations are getting more and more important for modern database applications. Examples for such data are hierarchical structures including chemical compounds, XML data or image data. As a key feature, database systems have to support the search for similar objects where it is important to take into account both the structure and the content features of the objects. A successful approach is to use the edit distance for tree structured data. As the computation of this measure is NP-complete, constrained edit distances have been successfully applied to trees. While yielding good results, they are still computationally complex and, therefore, of limited benefit for searching in large databases. In this paper, we propose a filter and refinement architecture to overcome this problem. We present a set of new filter methods for structural and for content-based information in tree-structured data as well as ways to flexibly combine different filter criteria. The efficiency of our methods, resulting from the good selectivity of the filters is demonstrated in extensive experiments with real-world applications.

...read moreread less

84 citations

Support Vector Machines for Paraphrase Identification and Corpus Construction

[...]

Chris Brockett, William B. Dolan¹•Institutions (1)

Microsoft¹

01 Jan 2005

TL;DR: The use of annotated datasets and Support Vector Machines are described to induce larger monolingual paraphrase corpora from a comparable corpus of news clusters found on the World Wide Web, which dramatically reduces the Alignment Error Rate of the extracted corpora.

...read moreread less

Abstract: The lack of readily-available large corpora of aligned monolingual sentence pairs is a major obstacle to the development of Statistical Machine Translation-based paraphrase models. In this paper, we describe the use of annotated datasets and Support Vector Machines to induce larger monolingual paraphrase corpora from a comparable corpus of news clusters found on the World Wide Web. Features include: morphological variants; WordNet synonyms and hypernyms; loglikelihood-based word pairings dynamically obtained from baseline sentence alignments; and formal string features such as word-based edit distance. Use of this technique dramatically reduces the Alignment Error Rate of the extracted corpora over heuristic methods based on position of the sentences in the text.

...read moreread less

84 citations

Journal Article•DOI•

Efficiently Indexing Large Sparse Graphs for Similarity Search

[...]

Guoren Wang¹, Bin Wang¹, Xiaochun Yang¹, Ge Yu¹•Institutions (1)

Northeastern University (China)¹

01 Mar 2012-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper focuses on the index structure for similarity search on a set of large sparse graphs and proposes an efficient indexing mechanism by introducing the Q-Gram idea and developed a series of techniques for inverted index construction and online query processing.

...read moreread less

Abstract: The graph structure is a very important means to model schemaless data with complicated structures, such as protein-protein interaction networks, chemical compounds, knowledge query inferring systems, and road networks. This paper focuses on the index structure for similarity search on a set of large sparse graphs and proposes an efficient indexing mechanism by introducing the Q-Gram idea. By decomposing graphs to small grams (organized by κ-Adjacent Tree patterns) and pairing-up on those κ-Adjacent Tree patterns, the lower bound estimation of their edit distance can be calculated for candidate filtering. Furthermore, we have developed a series of techniques for inverted index construction and online query processing. By building the candidate set for the query graph before the exact edit distance calculation, the number of graphs need to proceed into exact matching can be greatly reduced. Extensive experiments on real and synthetic data sets have been conducted to show the effectiveness and efficiency of the proposed indexing mechanism.

...read moreread less

83 citations

Book Chapter•DOI•

An Error-Tolerant Approximate Matching Algorithm for Attributed Planar Graphs and Its Application to Fingerprint Classification

[...]

Michel Neuhaus¹, Horst Bunke¹•Institutions (1)

University of Bern¹

18 Aug 2004-Lecture Notes in Computer Science

TL;DR: An efficient algorithm is proposed for edit distance computation of planar graphs given graphs embedded in the plane by iteratively match small subgraphs by locally optimizing structural correspondences to obtain a valid edit path and hence an upper bound of the edit distance.

...read moreread less

Abstract: Graph edit distance is a powerful error-tolerant similarity measure for graphs. For pattern recognition problems involving large graphs, however, the high computational complexity makes it sometimes impossible to apply edit distance algorithms. In the present paper we propose an efficient algorithm for edit distance computation of planar graphs. Given graphs embedded in the plane, we iteratively match small subgraphs by locally optimizing structural correspondences. Eventually we obtain a valid edit path and hence an upper bound of the edit distance. To demonstrate the efficiency of our approach, we apply the proposed algorithm to the problem of fingerprint classification.

...read moreread less

83 citations

Journal Article•DOI•

Private genome analysis through homomorphic encryption

[...]

Miran Kim, Kristin E. Lauter¹•Institutions (1)

Cryptography Research¹

21 Dec 2015-BMC Medical Informatics and Decision Making

TL;DR: Gentry et al. as mentioned in this paper used homomorphic encryption for secure computation of the minor allele frequencies and χ2 statistic in a genome-wide association studies setting, which can be performed in an untrusted cloud without requiring the decryption key or any interaction with the data owner.

...read moreread less

Abstract: The rapid development of genome sequencing technology allows researchers to access large genome datasets. However, outsourcing the data processing o the cloud poses high risks for personal privacy. The aim of this paper is to give a practical solution for this problem using homomorphic encryption. In our approach, all the computations can be performed in an untrusted cloud without requiring the decryption key or any interaction with the data owner, which preserves the privacy of genome data. We present evaluation algorithms for secure computation of the minor allele frequencies and χ2 statistic in a genome-wide association studies setting. We also describe how to privately compute the Hamming distance and approximate Edit distance between encrypted DNA sequences. Finally, we compare performance details of using two practical homomorphic encryption schemes - the BGV scheme by Gentry, Halevi and Smart and the YASHE scheme by Bos, Lauter, Loftus and Naehrig. The approach with the YASHE scheme analyzes data from 400 people within about 2 seconds and picks a variant associated with disease from 311 spots. For another task, using the BGV scheme, it took about 65 seconds to securely compute the approximate Edit distance for DNA sequences of size 5K and figure out the differences between them. The performance numbers for BGV are better than YASHE when homomorphically evaluating deep circuits (like the Hamming distance algorithm or approximate Edit distance algorithm). On the other hand, it is more efficient to use the YASHE scheme for a low-degree computation, such as minor allele frequencies or χ2 test statistic in a case-control study.

...read moreread less

83 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics