scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Book ChapterDOI
10 Apr 2006
TL;DR: Empirical studies are presented that show the suitability of this measure to dynamically calculate the fitness distance correlation coefficient during the evolution, to construct a fitness sharing system for genetic programming and to measure genotypic diversity in the population.
Abstract: To analyse various properties of the search process of genetic programming it is useful to quantify the distance between two individuals. Using operator-based distance measures can make this analysis more accurate and reliable than using distance measures which have no relationship with the genetic operators. This paper extends a recent definition of a distance measure based on subtree crossover for genetic programming. Empirical studies are presented that show the suitability of this measure to dynamically calculate the fitness distance correlation coefficient during the evolution, to construct a fitness sharing system for genetic programming and to measure genotypic diversity in the population. These experiments confirm the accuracy of the new measure and its consistency with the subtree crossover genetic operator.

14 citations

Proceedings ArticleDOI
08 Dec 2015
TL;DR: The main contribution of this work is to present a memory-access-efficient implementation for computing the ASM on a GPU called w-SCAN, which relies on warp shuffle for communication between threads without resorting to shared memory access.
Abstract: The approximate string matching (ASM) problem asks to find a substring of string Y of length n that is most similar to string X of length m. The ASM can be solved by dynamic programming technique, which computes a table of size m × n. The main contribution of this work is to present a memory-access-efficient implementation for computing the ASM on a GPU. The key idea of our implementation relies on warp shuffle for communication between threads without resorting to shared memory access. Surprisingly, our implementation performs only O(mn/w) memory access operations, where w is the warp size, although O(mn) memory access operations are necessary to access all elements in the table of size n × m. Experimental results, carried out on a GeForce GTX 980 GPU, shows that the proposed implementation called w-SCAN provides a speed-up factor over 200 as compared to a single CPU implementation. Also, w-SCAN computes the ASM in less than 40% of the time required by another prominent alternative.

14 citations

Journal ArticleDOI
TL;DR: This work shows in this work how to tackle online approximate matching when the distance function is non-local, and gives new solutions which are applicable to a wide variety of matching problems including function and parameterised matching, swap matching, Swap-mismatch, k-difference with transpositions, overlap matching, edit distance/LCS and L"1 and L's rearrangement distances.

14 citations

Proceedings ArticleDOI
P.H. Nguyen1, Thuan Ngo1, D.A. Phan1, T. Dinh1, T.Q. Huynh1 
13 Jul 2008
TL;DR: A new approach for Vietnamese spelling checking based on Vietnamese characteristics for each phase is presented, which includes the use of a syllable Bi-gram in combination with parts of speech (POS) to find out suspected syllables.
Abstract: The spelling checking problem is considered to contain two main phases: the detecting phase and the correcting phase. In this paper, we present a new approach for Vietnamese spelling checking based on Vietnamese characteristics for each phase. Our research approach includes the use of a syllable Bi-gram in combination with parts of speech (POS) to find out suspected syllables. In the correcting phase, we based on minimum edit distance, SoundEx algorithms and some heuristics to build a weight function for assessing suggestion candidates. The training corpus and the test set were collected from e-newspapers.

14 citations

Proceedings Article
03 Jul 2002
TL;DR: In this article, gapped q-grams with just one gap were used to filter the Levenshtein distance, and the resulting filters provided a significant improvement over the contiguous Q-gram filters.
Abstract: We have recently shown that q-gram filters based on gapped q-grams instead of the usual contiguous q-grams can provide orders of magnitude faster and/or more efficient filtering for the Hamming distance. In this paper, we extend the results for the Levenshtein distance, which is more problematic for gapped q-grams because an insertion or deletion in a gap affects a q-gram while a replacement does not. To keep this effect under control, we concentrate on gapped q-grams with just one gap. We demostrate with experiments that the resulting filters provide a significant improvement over the contiguous q-gram filters. We also develop new techniques for dealing with complex q-gram filters.

14 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139