scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the effectiveness of the edit distance as a predictor of perceived rhythmic dissimilarity under simple rhythmic alterations was investigated, where rhythms were approached as a set of pulses that are either onsets or silences.
Abstract: The 'edit distance' (or 'Levenshtein distance') measure of distance between two data sets is defined as the minimum number of editing operations - insertions, deletions, and substitutions - that are required to transform one data set to the other (Orpen and Huron, 1992). This measure of distance has been applied frequently and successfully in music information retrieval, but rarely in predicting human perception of distance. In this study, we investigate the effectiveness of the edit distance as a predictor of perceived rhythmic dissimilarity under simple rhythmic alterations. Approaching rhythms as a set of pulses that are either onsets or silences, we study two types of alterations. The first experiment is designed to test the model's accuracy for rhythms that are relatively similar; whether rhythmic variations with the same edit distance to a source rhythm are also perceived as relatively similar by human subjects. In addition, we observe whether the salience of an edit operation is affected by its metric placement in the rhythm. Instead of using a rhythm that regularly subdivides a 4/4 meter, our source rhythm is a syncopated 16-pulse rhythm, the son. Results show a high correlation between the predictions by the edit distance model and human similarity judgments (r = 0.87); a higher correlation than for the well-known generative theory of tonal music (r = 0.64). In the second experiment, we seek to assess the accuracy of the edit distance model in predicting relatively dissimilar rhythms. The stimuli used are random permutations of the son's inter-onset intervals: 3-3-4-2-4. The results again indicate that the edit distance correlates well with the perceived rhythmic dissimilarity judgments of the subjects (r = 0.76). To gain insight in the relationships between the individual rhythms, the results are also presented by means of graphic phylogenetic trees.

22 citations

Book ChapterDOI
05 Sep 2011
TL;DR: A way to index population genotype information together with the complete genome sequence, so that one can use the index to efficiently align a given sequence to the genome with all plausible genotype recombinations taken into account.
Abstract: We propose a way to index population genotype information together with the complete genome sequence, so that one can use the index to efficiently align a given sequence to the genome with all plausible genotype recombinations taken into account. This is achieved through converting a multiple alignment of individual genomes into a finite automaton recognizing all strings that can be read from the alignment by switching the sequence at any time. The finite automaton is indexed with an extension of Burrows-Wheeler transform to allow pattern search inside the plausible recombinant sequences. The size of the index stays limited, because of the high similarity of individual genomes. The index finds applications in variation calling and in primer design.

22 citations

Proceedings ArticleDOI
30 Oct 2008
TL;DR: This paper presents a transliteration algorithm for mapping English named entities to their proper Tamil equivalents using a grapheme-based model, in which transliterations equivalents are identified by mapping the source language names to their equivalents in a target language database, instead of generating them.
Abstract: Transliteration of named entities in user queries is a vital step in any Cross-Language Information Retrieval (CLIR) system. Several methods for transliteration have been proposed till date based on the nature of the languages considered. In this paper, we present a transliteration algorithm for mapping English named entities to their proper Tamil equivalents. Our algorithm employs a grapheme-based model, in which transliteration equivalents are identified by mapping the source language names to their equivalents in a target language database, instead of generating them. The basic principle is to compress the source word into its minimal form and align it across an indexed list of target language words to arrive at the top n-equivalents based on the edit distance. We compare the performance of our approach with a statistical generation approach using Microsoft Research India (MSRI) transliteration corpus. Our approach has proved very effective in terms of accuracy and time.

22 citations

Patent
29 Nov 2006
TL;DR: In this article, a system embodiment includes logic to produce a gram from a string and logic to identify candidate documents based on identifying matches between query grams and document grams stored in an inverted index that relates grams to documents.
Abstract: Systems, methodologies, media, and other embodiments associated with efficiently computing document similarity are described. One exemplary system embodiment includes logic to produce a gram from a string and logic to identify candidate documents based on identifying matches between query grams and document grams stored in an inverted index that relates grams to documents. The example system may also include logic to selectively partially reconstruct a candidate document from entries in the inverted index and logic to compute an edit distance between a string associated with a query and a string associated with the partially reconstructed candidate document. The example system may also include a signal logic configured to provide a signal corresponding to the edit distance.

22 citations

Journal ArticleDOI
TL;DR: An index that stores a text of length n such that given a pattern of length m, all the substrings of the text that are within Hamming distance (or edit distance) at most k from the pattern are reported in O(m+loglogn+#matches) time.

22 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139