scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Book ChapterDOI
11 Mar 2001
TL;DR: It is argued that with these new concepts various well-established techniques from statistical pattern recognition become applicable in the structural domain, particularly to graph representations, including k-means clustering, vector quantization, and Kohonen maps.
Abstract: Two novel concepts in structural pattern recognition are discussed in this paper The first, median of a set of graphs, can be used to characterize a set of graphs by just a single prototype Such a characterization is needed in various tasks, for example, in clustering The second novel concept is weighted mean of a pair of graphs It can be used to synthesize a graph that has a specified degree of similarity, or distance, to each of a pair of given graphs Such an operation is needed in many machine learning tasks It is argued that with these new concepts various well-established techniques from statistical pattern recognition become applicable in the structural domain, particularly to graph representations Concrete examples include k-means clustering, vector quantization, and Kohonen maps

54 citations

Journal ArticleDOI
01 Nov 2013
TL;DR: A partition-based approach to tackle the graph similarity queries with edit distance constraints is presented, by dividing data graphs into variable-size non-overlapping partitions, and the edit distance constraint is converted to a graph containment constraint for candidate generation.
Abstract: Graphs are widely used to model complex data in many applications, such as bioinformatics, chemistry, social networks, pattern recognition, etc. A fundamental and critical query primitive is to efficiently search similar structures in a large collection of graphs. This paper studies the graph similarity queries with edit distance constraints. Existing solutions to the problem utilize fixed-size overlapping substructures to generate candidates, and thus become susceptible to large vertex degrees or large distance thresholds. In this paper, we present a partition-based approach to tackle the problem. By dividing data graphs into variable-size non-overlapping partitions, the edit distance constraint is converted to a graph containment constraint for candidate generation. We develop efficient query processing algorithms based on the new paradigm. A candidate pruning technique and an improved graph edit distance algorithm are also developed to further boost the performance. In addition, a cost-aware graph partitioning technique is devised to optimize the index. Extensive experiments demonstrate our approach significantly outperforms existing approaches.

54 citations

Posted Content
TL;DR: A new method for recognition of offline Handwritten non-compound Devnagari Characters in two stages uses two well known and established pattern recognition techniques: one using neural networks and the other one using minimum edit distance.
Abstract: This paper deals with a new method for recognition of offline Handwritten non-compound Devnagari Characters in two stages. It uses two well known and established pattern recognition techniques: one using neural networks and the other one using minimum edit distance. Each of these techniques is applied on different sets of characters for recognition. In the first stage, two sets of features are computed and two classifiers are applied to get higher recognition accuracy. Two MLP's are used separately to recognize the characters. For one of the MLP's the characters are represented with their shadow features and for the other chain code histogram feature is used. The decision of both MLP's is combined using weighted majority scheme. Top three results produced by combined MLP's in the first stage are used to calculate the relative difference values. In the second stage, based on these relative differences character set is divided into two. First set consists of the characters with distinct shapes and second set consists of confused characters, which appear very similar in shapes. Characters of distinct shapes of first set are classified using MLP. Confused characters in second set are classified using minimum edit distance method. Method of minimum edit distance makes use of corner detected in a character image using modified Harris corner detection technique. Experiment on this method is carried out on a database of 7154 samples. The overall recognition is found to be 90.74%.

54 citations

Journal ArticleDOI
TL;DR: This paper presents a user-centered system for signature verification that performs one of the first systems that is based on a direct comparison of the elementary neuromuscular strokes which are detected in the handwriting to verify the identity of the user.
Abstract: When using tablet computers, smartphones, or digital pens, human users perform movements with a stylus or their fingers that can be analyzed by the kinematic theory of rapid human movements. In this paper, we present a user-centered system for signature verification that performs such a kinematic analysis to verify the identity of the user. It is one of the first systems that is based on a direct comparison of the elementary neuromuscular strokes which are detected in the handwriting. Taking into account the number of strokes, their similarity, and their timing, the string edit distance is employed to derive a dissimilarity measure for signature verification. On several benchmark datasets, we demonstrate that this neuromuscular analysis is complementary to a well-established verification using dynamic time warping. By combining both approaches, our verifier is able to outperform current state-of-the-art results in on-line signature verification.

54 citations

Proceedings ArticleDOI
18 Sep 2011
TL;DR: The proposed approach effectively segments the alignment problem into small sub problems which in turn yields dramatic time savings even when there are large pieces of inserted or deleted text and the OCR accuracy is poor.
Abstract: This paper aims to evaluate the accuracy of optical character recognition (OCR) systems on real scanned books. The ground truth e-texts are obtained from the Project Gutenberg website and aligned with their corresponding OCR output using a fast recursive text alignment scheme (RETAS). First, unique words in the vocabulary of the book are aligned with unique words in the OCR output. This process is recursively applied to each text segment in between matching unique words until the text segments become very small. In the final stage, an edit distance based alignment algorithm is used to align these short chunks of texts to generate the final alignment. The proposed approach effectively segments the alignment problem into small sub problems which in turn yields dramatic time savings even when there are large pieces of inserted or deleted text and the OCR accuracy is poor. This approach is used to evaluate the OCR accuracy of real scanned books in English, French, German and Spanish.

54 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139