Topic
Edit distance
About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.
Papers published on a yearly basis
Papers
More filters
••
20 May 2007TL;DR: This paper shows how the evolution of changes at source code line level can be inferred from CVS repositories, by combining information retrieval techniques and the Levenshtein edit distance.
Abstract: Observing the evolution of software systems at different levels of granularity has been a key issue for a number of studies, aiming at predicting defects or at studying certain phenomena, such as the presence of clones or of crosscutting concerns. Versioning systems such as CVS and SVN, however, only provide information about lines added or deleted by a contributor: any change is shown as a sequence of additions and deletions. This provides an erroneous estimate of the amount of code changed. This paper shows how the evolution of changes at source code line level can be inferred from CVS repositories, by combining information retrieval techniques and the Levenshtein edit distance. The application of the proposed approach to the ArgoUML case study indicates a high precision and recall.
102 citations
•
07 Apr 2009TL;DR: In this article, a spelling correction system and method automatically recognizes and corrects misspelled inputs in an electronic device with relatively lower computing power in a learning process, a misspelling correction dictionary is constructed on the basis of a corpus of accepted words, and context-sensitive strings are selected from among all the strings registered in the dictionary Context information about the context sensitive strings is acquired.
Abstract: A spelling correction system and method automatically recognizes and corrects misspelled inputs in an electronic device with relatively lower computing power In a learning process, a misspelling correction dictionary is constructed on the basis of a corpus of accepted words, and context-sensitive strings are selected from among all the strings registered in the dictionary Context information about the context-sensitive strings is acquired In an applying process, at least one target string is selected from among all the strings in a user's input sentence through the dictionary If the target string is one of the context-sensitive strings, the target string is corrected by use of the context information
102 citations
01 Jan 2005
TL;DR: A novel automatic sentence segmentation method for evaluating machine translation output with possibly erroneous sentence boundaries that efficiently produces an optimal automatic segmentation of the hypotheses and thus allows application of existing well-established evaluation measures.
Abstract: This paper presents a novel automatic sentence segmentation method for evaluating machine translation output with possibly erroneous sentence boundaries. The algorithm can process translation hypotheses with segment boundaries which do not correspond to the reference segment boundaries, or a completely unsegmented text stream. Thus, the method is especially useful for evaluating translations of spoken language. The evaluation procedure takes advantage of the edit distance algorithm and is able to handle multiple reference translations. It efficiently produces an optimal automatic segmentation of the hypotheses and thus allows application of existing well-established evaluation measures. Experiments show that the evaluation measures based on the automatically produced segmentation correlate with the human judgement at least as well as the evaluation measures which are based on manual sentence boundaries.
98 citations
••
09 Jul 2011TL;DR: By mapping messages into a large context, the authors can compute the distances between them, and then classify them, which yields more accurate classification of a set of Twitter messages than alternative techniques using string edit distance and latent semantic analysis.
Abstract: By mapping messages into a large context, we can compute the distances between them, and then classify them. We test this conjecture on Twitter messages: Messages are mapped onto their most similar Wikipedia pages, and the distances between pages are used as a proxy for the distances between messages. This technique yields more accurate classification of a set of Twitter messages than alternative techniques using string edit distance and latent semantic analysis.
97 citations
••
02 Jun 1993TL;DR: This work considers the problem of computing the shortest series of reversals that transform one permutation to another, and takes an arbitrary substring of elements and reverses their order.
Abstract: Motivated by the problem in computational biology of reconstructing the series of chromosome inversions by which one organism evolved from another, we consider the problem of computing the shortest series of reversals that transform one permutation to another. The permutations describe the order of genes on corresponding chromosomes, and a reversal takes an arbitrary substring of elements and reverses their order.
96 citations