scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Proceedings ArticleDOI
17 Jun 2006
TL;DR: A novel boosted distance metric is proposed that not only finds the best distance metric that fits the distribution of the underlying elements but also selects the most important feature elements with respect to similarity.
Abstract: In this paper, we present a general guideline to establish the relation between a distribution model and its corresponding similarity estimation. A rich set of distance metrics, such as harmonic distance and geometric distance, is derived according to Maximum Likelihood theory. These metrics can provide a more accurate feature model than the conventional Euclidean distance (SSD) and Manhattan distance (SAD). Because the feature elements are from heterogeneous sources and may have different influence on similarity estimation, the assumption of single isotropic distribution model is often inappropriate. We propose a novel boosted distance metric that not only finds the best distance metric that fits the distribution of the underlying elements but also selects the most important feature elements with respect to similarity. We experiment with different distance metrics for similarity estimation and compute the accuracy of different methods in two applications: stereo matching and motion tracking in video sequences. The boosted distance metric is tested on fifteen benchmark data sets from the UCI repository and two image retrieval applications. In all the experiments, robust results are obtained based on the proposed methods.

52 citations

Journal ArticleDOI
B.J. Oommen1
TL;DR: An algorithm is presented to compute the minimum distance associated with editing X to Y subject to the specified constraint and the technique to computed the optimal transformation is presented.

51 citations

Journal ArticleDOI
01 Dec 2013
TL;DR: Efficient algorithms are proposed to handle three types of graph similarity queries by exploiting both matching and mismatching features as well as degree information to improve the filtering and verification on candidates.
Abstract: Graphs are widely used to model complicated data semantics in many applications in bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to tolerate noise arising from various sources such as erroneous data entries and find similarity matches. In this paper, we study graph similarity queries with edit distance constraints. Inspired by the $$q$$ -gram idea for string similarity problems, our solution extracts paths from graphs as features for indexing. We establish a lower bound of common features to generate candidates. Efficient algorithms are proposed to handle three types of graph similarity queries by exploiting both matching and mismatching features as well as degree information to improve the filtering and verification on candidates. We demonstrate the proposed algorithms significantly outperform existing approaches with extensive experiments on real and synthetic datasets.

51 citations

Book ChapterDOI
01 Sep 2007
TL;DR: A new algorithm for automatic recognition of hand drawn sketches based on the Levenshtein distance is presented, which is trainable by every user and improves the recognition performance of the techniques which were used before for widget recognition.
Abstract: In this paper we present a new algorithm for automatic recognition of hand drawn sketches based on the Levenshtein distance. The purpose for drawing sketches in our application is to create graphical user interfaces in a similar manner as the well established paper sketching. The new algorithm is trainable by every user and improves the recognition performance of the techniques which were used before for widget recognition. In addition, this algorithm ay serve for recognizing other types of sketches, such as letters, figures, and commands. In this way, there is no modality disruption at sketching time.

51 citations

Proceedings Article
30 Aug 2005
TL;DR: This paper develops a novel technique, called SEPIA, which groups strings into clusters, builds a histogram structure for each cluster, and constructs a global histogram for the database and discusses how to extend the techniques to other similarity functions.
Abstract: Many database applications have the emerging need to support fuzzy queries that ask for strings that are similar to a given string, such as "name similar to smith" and "telephone number similar to 412-0964." Query optimization needs the selectivity of such a fuzzy predicate, i.e., the fraction of records in the database that satisfy the condition. In this paper, we study the problem of estimating selectivities of fuzzy string predicates. We develop a novel technique, called SEPIA, to solve the problem. It groups strings into clusters, builds a histogram structure for each cluster, and constructs a global histogram for the database. It is based on the following intuition: given a query string q, a preselected string p in a cluster, and a string s in the cluster, based on the proximity between q and p, and the proximity between p and s, we can obtain a probability distribution from a global histogram about the similarity between q and s. We give a full specification of the technique using the edit distance function. We study challenges in adopting this technique, including how to construct the histogram structures, how to use them to do selectivity estimation, and how to alleviate the effect of non-uniform errors in the estimation. We discuss how to extend the techniques to other similarity functions. Our extensive experiments on real data sets show that this technique can accurately estimate selectivities of fuzzy string predicates.

51 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139