scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work proposes a novel approach for automatic anomaly detection in heterogeneous sensor networks based on coupling edge data analysis with cloud data analysis and shows how the combination of edge and cloud computing can mitigate the drawbacks of purely edge-based analysis or purely cloud-based solutions.

66 citations

Proceedings ArticleDOI
09 Jan 2001
TL;DR: This work defines costs for the edit-operations and gives an algorithm for computing them, and shows that this approach performs intuitively in categorization and indexing tasks, and its results are better than previous approaches.
Abstract: We report on our experience with the implementation of an algorithm for comparing shapes by computing the edit-distance between their medial axes. A shape-comparison method that is robust to various visual transformations has several applications in computer vision, including organizing and querying an image database, and object recognition.There are two components to research on this problem, mathematical formulation of the shape-comparison problem and the computational solution method. We have a clear, well-defined formulation and polynomial-time algorithms for solution. Previous research has involved either ill-defined formulations or heuristic methods for solution.Our starting-point for the implementation is the edit-distance algorithm of Klein et al. [6]. We discuss how we altered that algorithm to handle rotation-invariance while keeping down the time and storage requirements. Most important, we define costs for the edit-operations and give an algorithm for computing them.We use a database of shapes to illustrates that our approach performs intuitively in categorization and indexing tasks, and our results are better than previous approaches.

66 citations

Journal ArticleDOI
TL;DR: In this paper, the authors improved the time complexity of the problem from O(rn2m2) to O(rnm, where r, n, and m are the lengths of P, S1, S2, and S2 respectively.
Abstract: Given strings S1, S2, and P, the constrained longest common subsequence problem for S1 and S2 with respect to P is to find a longest common subsequence lcs of S1 and S2 which contains P as a subsequence. We present an algorithm which improves the time complexity of the problem from the previously known O(rn2m2) to O(rnm) where r, n, and m are the lengths of P, S1, and S2, respectively. As a generalization of this, we extend the definition of the problem so that the lcs sought contains a subsequence whose edit distance from P is less than a given parameter d. For the latter problem, we propose an algorithm whose time complexity is O(drnm).

66 citations

Posted Content
TL;DR: An improvement has been made to this method by grouping some similar looking alphabets and reducing the weighted difference among members of the same group, and the results showed marked improvement over the traditional Levenshtein distance technique.
Abstract: Dictionary lookup methods are popular in dealing with ambiguous letters which were not recognized by Optical Character Readers. However, a robust dictionary lookup method can be complex as apriori probability calculation or a large dictionary size increases the overhead and the cost of searching. In this context, Levenshtein distance is a simple metric which can be an effective string approximation tool. After observing the effectiveness of this method, an improvement has been made to this method by grouping some similar looking alphabets and reducing the weighted difference among members of the same group. The results showed marked improvement over the traditional Levenshtein distance technique.

66 citations

Proceedings Article
03 Aug 2013
TL;DR: This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automaton (DFA) from the given description of its language, and provides algorithms for transforming MOSEL descriptions into DFAs and vice-versa.
Abstract: One challenge in making online education more effective is to develop automatic grading software that can provide meaningful feedback. This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automaton (DFA) from the given description of its language. We focus on how to assign partial grades for incorrect answers. Each student's answer is compared to the correct DFA using a hybrid of three techniques devised to capture different classes of errors. First, in an attempt to catch syntactic mistakes, we compute the edit distance between the two DFA descriptions. Second, we consider the entropy of the symmetric difference of the languages of the two DFAs, and compute a score that estimates the fraction of the number of strings on which the student answer is wrong. Our third technique is aimed at capturing mistakes in reading of the problem description. For this purpose, we consider a description language MOSEL, which adds syntactic sugar to the classical Monadic Second Order Logic, and allows defining regular languages in a concise and natural way. We provide algorithms, along with optimizations, for transforming MOSEL descriptions into DFAs and vice-versa. These allow us to compute the syntactic edit distance of the incorrect answer from the correct one in terms of their logical representations. We report an experimental study that evaluates hundreds of answers submitted by (real) students by comparing grades/feedback computed by our tool with human graders. Our conclusion is that the tool is able to assign partial grades in a meaningful way, and should be preferred over the human graders for both scalability and consistency.

66 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139