scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Experiments in hand-written digit recognition are presented, revealing that the normalized edit distance consistently provides better results than both unnormalized or post-normalized classical edit distances.
Abstract: Given two strings X and Y over a finite alphabet, the normalized edit distance between X and Y, d(X,Y) is defined as the minimum of W(P)/L(P), where P is an editing path between X and Y, W(P) is the sum of the weights of the elementary edit operations of P, and L(P) is the number of these operations (length of P). It is shown that in general, d(X,Y) cannot be computed by first obtaining the conventional (unnormalized) edit distance between X and Y and then normalizing this value by the length of the corresponding editing path. In order to compute normalized edit distances, an algorithm that can be implemented to work in O(m*n/sup 2/) time and O(n/sup 2/) memory space is proposed, where m and n are the lengths of the strings under consideration, and m>or=n. Experiments in hand-written digit recognition are presented, revealing that the normalized edit distance consistently provides better results than both unnormalized or post-normalized classical edit distances. >

339 citations

Proceedings ArticleDOI
03 Apr 2006
TL;DR: The concept of a graph closure, a generalized graph that represents a number of graphs, is introduced and the indexing technique, called Closure-tree, organizes graphs hierarchically where each node summarizes its descendants by a graphclosure.
Abstract: Graphs have become popular for modeling structured data. As a result, graph queries are becoming common and graph indexing has come to play an essential role in query processing. We introduce the concept of a graph closure, a generalized graph that represents a number of graphs. Our indexing technique, called Closure-tree, organizes graphs hierarchically where each node summarizes its descendants by a graph closure. Closure-tree can efficiently support both subgraph queries and similarity queries. Subgraph queries find graphs that contain a specific subgraph, whereas similarity queries find graphs that are similar to a query graph. For subgraph queries, we propose a technique called pseudo subgraph isomorphism which approximates subgraph isomorphism with high accuracy. For similarity queries, we measure graph similarity through edit distance using heuristic graph mapping methods. We implement two kinds of similarity queries: K-NN query and range query. Our experiments on chemical compounds and synthetic graphs show that for subgraph queries, Closuretree outperforms existing techniques by up to two orders of magnitude in terms of candidate answer set size and index size. For similarity queries, our experiments validate the quality and efficiency of the presented algorithms.

332 citations

Proceedings Article
01 May 2000
TL;DR: This paper defines evaluation criteria which are more adequate than pure edit distance and describes how the measurement along these quality criteria is performed semi-automatically in a fast, convenient and above all consistent way using this tool and the corresponding graphical user interface.
Abstract: In this paper we present a tool for the evaluation of translation quality. First, the typical requirements of such a tool in the framework of machine translation (MT) research are discussed. We define evaluation criteria which are more adequate than pure edit distance and we describe how the measurement along these quality criteria is performed semi-automatically in a fast, convenient and above all consistent way using our tool and the corresponding graphical user interface.

318 citations

Patent
25 Jan 1997
TL;DR: In this paper, a document browser for electronic filing systems, which supports pen-based markup and annotation, is described, where the user may electronically write notes (60-64) anywhere on a page (32, 38) and then later search for those notes using the approximate ink matching (AIM) technique.
Abstract: In summary there is disclosed a document browser for electronic filing systems, which supports pen-based markup and annotation. The user may electronically write notes (60-64) anywhere on a page (32, 38) and then later search for those notes using the approximate ink matching (AIM) technique. The technique segments (104) the user-drawn strokes, extracts (108) and vector quantizes (112) features contained in those strokes. An edit distance comparison technique (118) is used to query the database (120), rendering the system capable of performing approximate or partial matches to achieve fuzzy search capability.

299 citations

Journal ArticleDOI
TL;DR: It is shown that the similarity provided by TWED is a potentially useful metric in time series retrieval applications since it could benefit from the triangular inequality property to speed up the retrieval process while tuning the parameters of the elastic measure.
Abstract: In a way similar to the string-to-string correction problem, we address discrete time series similarity in light of a time-series-to-time-series-correction problem for which the similarity between two time series is measured as the minimum cost sequence of edit operations needed to transform one time series into another. To define the edit operations, we use the paradigm of a graphical editing process and end up with a dynamic programming algorithm that we call time warp edit distance (TWED). TWED is slightly different in form from dynamic time warping (DTW), longest common subsequence (LCSS), or edit distance with real penalty (ERP) algorithms. In particular, it highlights a parameter that controls a kind of stiffness of the elastic measure along the time axis. We show that the similarity provided by TWED is a potentially useful metric in time series retrieval applications since it could benefit from the triangular inequality property to speed up the retrieval process while tuning the parameters of the elastic measure. In that context, a lower bound is derived to link the matching of time series into down sampled representation spaces to the matching into the original space. The empiric quality of the TWED distance is evaluated on a simple classification task. Compared to edit distance, DTW, LCSS, and ERP, TWED has proved to be quite effective on the considered experimental task.

298 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139