scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Proceedings ArticleDOI
20 May 2007
TL;DR: This paper shows how the evolution of changes at source code line level can be inferred from CVS repositories, by combining information retrieval techniques and the Levenshtein edit distance.
Abstract: Observing the evolution of software systems at different levels of granularity has been a key issue for a number of studies, aiming at predicting defects or at studying certain phenomena, such as the presence of clones or of crosscutting concerns. Versioning systems such as CVS and SVN, however, only provide information about lines added or deleted by a contributor: any change is shown as a sequence of additions and deletions. This provides an erroneous estimate of the amount of code changed. This paper shows how the evolution of changes at source code line level can be inferred from CVS repositories, by combining information retrieval techniques and the Levenshtein edit distance. The application of the proposed approach to the ArgoUML case study indicates a high precision and recall.

102 citations

Patent
Hee-Jun Song1, Young-Hee Park1, Hyun Sik Shim1, Ham Jong Gyu1, Harksoo Kim1, Jooho Lee1, Se Hee Lee1 
07 Apr 2009
TL;DR: In this article, a spelling correction system and method automatically recognizes and corrects misspelled inputs in an electronic device with relatively lower computing power in a learning process, a misspelling correction dictionary is constructed on the basis of a corpus of accepted words, and context-sensitive strings are selected from among all the strings registered in the dictionary Context information about the context sensitive strings is acquired.
Abstract: A spelling correction system and method automatically recognizes and corrects misspelled inputs in an electronic device with relatively lower computing power In a learning process, a misspelling correction dictionary is constructed on the basis of a corpus of accepted words, and context-sensitive strings are selected from among all the strings registered in the dictionary Context information about the context-sensitive strings is acquired In an applying process, at least one target string is selected from among all the strings in a user's input sentence through the dictionary If the target string is one of the context-sensitive strings, the target string is corrected by use of the context information

102 citations

01 Jan 2005
TL;DR: A novel automatic sentence segmentation method for evaluating machine translation output with possibly erroneous sentence boundaries that efficiently produces an optimal automatic segmentation of the hypotheses and thus allows application of existing well-established evaluation measures.
Abstract: This paper presents a novel automatic sentence segmentation method for evaluating machine translation output with possibly erroneous sentence boundaries. The algorithm can process translation hypotheses with segment boundaries which do not correspond to the reference segment boundaries, or a completely unsegmented text stream. Thus, the method is especially useful for evaluating translations of spoken language. The evaluation procedure takes advantage of the edit distance algorithm and is able to handle multiple reference translations. It efficiently produces an optimal automatic segmentation of the hypotheses and thus allows application of existing well-established evaluation measures. Experiments show that the evaluation measures based on the automatically produced segmentation correlate with the human judgement at least as well as the evaluation measures which are based on manual sentence boundaries.

98 citations

Book ChapterDOI
09 Jul 2011
TL;DR: By mapping messages into a large context, the authors can compute the distances between them, and then classify them, which yields more accurate classification of a set of Twitter messages than alternative techniques using string edit distance and latent semantic analysis.
Abstract: By mapping messages into a large context, we can compute the distances between them, and then classify them. We test this conjecture on Twitter messages: Messages are mapped onto their most similar Wikipedia pages, and the distances between pages are used as a proxy for the distances between messages. This technique yields more accurate classification of a set of Twitter messages than alternative techniques using string edit distance and latent semantic analysis.

97 citations

Book ChapterDOI
02 Jun 1993
TL;DR: This work considers the problem of computing the shortest series of reversals that transform one permutation to another, and takes an arbitrary substring of elements and reverses their order.
Abstract: Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of genes on corresponding chromosomes, and a reversal takes an arbitrary substring of elements and reverses their order.

96 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139