Topic
Edit distance
About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.
Papers published on a yearly basis
Papers
More filters
••
17 Jun 2006
TL;DR: A novel boosted distance metric is proposed that not only finds the best distance metric that fits the distribution of the underlying elements but also selects the most important feature elements with respect to similarity.
Abstract: In this paper, we present a general guideline to establish the relation between a distribution model and its corresponding similarity estimation. A rich set of distance metrics, such as harmonic distance and geometric distance, is derived according to Maximum Likelihood theory. These metrics can provide a more accurate feature model than the conventional Euclidean distance (SSD) and Manhattan distance (SAD). Because the feature elements are from heterogeneous sources and may have different influence on similarity estimation, the assumption of single isotropic distribution model is often inappropriate. We propose a novel boosted distance metric that not only finds the best distance metric that fits the distribution of the underlying elements but also selects the most important feature elements with respect to similarity. We experiment with different distance metrics for similarity estimation and compute the accuracy of different methods in two applications: stereo matching and motion tracking in video sequences. The boosted distance metric is tested on fifteen benchmark data sets from the UCI repository and two image retrieval applications. In all the experiments, robust results are obtained based on the proposed methods.
52 citations
••
TL;DR: An algorithm is presented to compute the minimum distance associated with editing X to Y subject to the specified constraint and the technique to computed the optimal transformation is presented.
51 citations
••
01 Dec 2013TL;DR: Efficient algorithms are proposed to handle three types of graph similarity queries by exploiting both matching and mismatching features as well as degree information to improve the filtering and verification on candidates.
Abstract: Graphs are widely used to model complicated data semantics in many applications in bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to tolerate noise arising from various sources such as erroneous data entries and find similarity matches. In this paper, we study graph similarity queries with edit distance constraints. Inspired by the $$q$$ -gram idea for string similarity problems, our solution extracts paths from graphs as features for indexing. We establish a lower bound of common features to generate candidates. Efficient algorithms are proposed to handle three types of graph similarity queries by exploiting both matching and mismatching features as well as degree information to improve the filtering and verification on candidates. We demonstrate the proposed algorithms significantly outperform existing approaches with extensive experiments on real and synthetic datasets.
51 citations
••
01 Sep 2007TL;DR: A new algorithm for automatic recognition of hand drawn sketches based on the Levenshtein distance is presented, which is trainable by every user and improves the recognition performance of the techniques which were used before for widget recognition.
Abstract: In this paper we present a new algorithm for automatic recognition of hand drawn sketches based on the Levenshtein distance. The purpose for drawing sketches in our application is to create graphical user interfaces in a similar manner as the well established paper sketching. The new algorithm is trainable by every user and improves the recognition performance of the techniques which were used before for widget recognition. In addition, this algorithm ay serve for recognizing other types of sketches, such as letters, figures, and commands. In this way, there is no modality disruption at sketching time.
51 citations
•
30 Aug 2005TL;DR: This paper develops a novel technique, called SEPIA, which groups strings into clusters, builds a histogram structure for each cluster, and constructs a global histogram for the database and discusses how to extend the techniques to other similarity functions.
Abstract: Many database applications have the emerging need to support fuzzy queries that ask for strings that are similar to a given string, such as "name similar to smith" and "telephone number similar to 412-0964." Query optimization needs the selectivity of such a fuzzy predicate, i.e., the fraction of records in the database that satisfy the condition. In this paper, we study the problem of estimating selectivities of fuzzy string predicates. We develop a novel technique, called SEPIA, to solve the problem. It groups strings into clusters, builds a histogram structure for each cluster, and constructs a global histogram for the database. It is based on the following intuition: given a query string q, a preselected string p in a cluster, and a string s in the cluster, based on the proximity between q and p, and the proximity between p and s, we can obtain a probability distribution from a global histogram about the similarity between q and s. We give a full specification of the technique using the edit distance function. We study challenges in adopting this technique, including how to construct the histogram structures, how to use them to do selectivity estimation, and how to alleviate the effect of non-uniform errors in the estimation. We discuss how to extend the techniques to other similarity functions. Our extensive experiments on real data sets show that this technique can accurately estimate selectivities of fuzzy string predicates.
51 citations