scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A novel technique to address the problem of efficient privacy-preserving approximate record linkage by utilizing a secure blocking component based on phonetic algorithms statistically enhanced to improve security and a secure matching component where actual approximate matching is performed using the Levenshtein Distance algorithm.
Abstract: Performing approximate data matching has always been an intriguing problem for both industry and academia. This task becomes even more challenging when the requirement of data privacy rises. In this paper, we propose a novel technique to address the problem of efficient privacy-preserving approximate record linkage. The secure framework we propose consists of two basic components. First, we utilize a secure blocking component based on phonetic algorithms statistically enhanced to improve security. Second, we use a secure matching component where actual approximate matching is performed using a novel private approach of the Levenshtein Distance algorithm. Our goal is to combine the speed of private blocking with the increased accuracy of approximate secure matching. Category: Ubiquitous computing; Security and privacy

44 citations

Proceedings ArticleDOI
07 Dec 2015
TL;DR: The Hamming-Ipsen-Mikhailov (HIM) distance is introduced, a novel metric to quantitatively measure the difference between two graphs sharing the same vertices, to overcome the drawbacks affecting the two components when considered separately.
Abstract: Comparing and classifying graphs represent two essential steps for network analysis, across different scientific and applicative domains. Here we deal with both operations by introducing the Hamming-Ipsen-Mikhailov (HIM) distance, a novel metric to quantitatively measure the difference between two graphs sharing the same vertices. The new measure combines the local Hamming edit distance and the global Ipsen-Mikhailov spectral distance so to overcome the drawbacks affecting the two components when considered separately. Building the kernel function derived from the HIM distance makes possible to move from network comparison to network classification via the Support Vector Machine (SVM) algorithm. Applications of HIM-based methods on synthetic dynamical networks as well as in trade economy and diplomacy datasets demonstrate the effectiveness of HIM as a general purpose solution. An Open Source implementation is provided by the R package nettools, (already configured for High Performance Computing) and the Django-Celery web interface ReNette http://renette.fbk.eu.

44 citations

Journal ArticleDOI
TL;DR: This work presents an approach based on tree edit distance to compare merge trees, and results show the utility of the edit distance towards a feature-driven analysis of scalar fields.
Abstract: Topological structures such as the merge tree provide an abstract and succinct representation of scalar fields. They facilitate effective visualization and interactive exploration of feature-rich data. A merge tree captures the topology of sub-level and super-level sets in a scalar field. Estimating the similarity between merge trees is an important problem with applications to feature-directed visualization of time-varying data. We present an approach based on tree edit distance to compare merge trees. The comparison measure satisfies metric properties, it can be computed efficiently, and the cost model for the edit operations is both intuitive and captures well-known properties of merge trees. Experimental results on time-varying scalar fields, 3D cryo electron microscopy data, shape data, and various synthetic datasets show the utility of the edit distance towards a feature-driven analysis of scalar fields.

44 citations

Journal ArticleDOI
TL;DR: It is shown how the edit distances can be used to compute a matrix of pairwise affinities using χ2 statistics, and a maximum likelihood method for clustering the graphs by iteratively updating the elements of the affinity matrix is presented.
Abstract: This paper describes work aimed at the unsupervised learning of shape-classes from shock trees. We commence by considering how to compute the edit distance between weighted trees. We show how to transform the tree edit distance problem into a series of maximum weight clique problems, and show how to use relaxation labeling to find an approximate solution. This allows us to compute a set of pairwise distances between graph-structures. We show how the edit distances can be used to compute a matrix of pairwise affinities using ?2 statistics. We present a maximum likelihood method for clustering the graphs by iteratively updating the elements of the affinity matrix. This involves interleaved steps for updating the affinity matrix using an eigendecomposition method and updating the cluster membership indicators. We illustrate the new tree clustering framework on shock-graphs extracted from the silhouettes of 2D shapes.

43 citations

Journal ArticleDOI
TL;DR: An integer linear programming (ILP) formulation to compute the DCJ distance between two genomes with duplicate genes is proposed and it is demonstrated that this method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments.
Abstract: Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this article, we propose an integer linear programming (ILP) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse, and rat genomes, where once again our method outperforms MSOAR.

43 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139