scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Book ChapterDOI
22 Jul 1998
TL;DR: An improved similarity measure for the first-order instance based learner Ribl is presented that employs the concept of edit distances to efficiently compute distances between lists and terms.
Abstract: The similarity measures used in first-order IBL so far have been limited to the function-free case. In this paper we show that a lot of predictive power can be gained by allowing lists and other terms in the input representation and designing similarity measures that work directly on these structures. We present an improved similarity measure for the first-order instance based learner Ribl that employs the concept of edit distances to efficiently compute distances between lists and terms, discuss its computational and formal properties, and show that it is empirically superior by a wide margin on a problem from the domain of biochemistry.

32 citations

Journal ArticleDOI
10 Oct 2017-PLOS ONE
TL;DR: An efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs) that all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory.
Abstract: Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs). In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively.

32 citations

Journal ArticleDOI
TL;DR: A dynamic programming algorithm to compare two quotiented trees using a constrained edit distance using an adaptation of an algorithm recently proposed by Zhang for comparing unordered rooted trees.
Abstract: In this paper we propose a dynamic programming algorithm to compare two quotiented trees using a constrained edit distance. A quotiented tree is a tree defined with an additional equivalent relation on vertices and such that the quotient graph is also a tree. The core of the method relies on an adaptation of an algorithm recently proposed by Zhang for comparing unordered rooted trees. This method is currently being used in plant architecture modelling to quantify different types of variability between plants represented by quotiented trees.

32 citations

Patent
10 Mar 2009
TL;DR: In this paper, an architecture for extracting document information from documents received as search results based on a query string, and computing an edit distance between the data string and the query string is presented.
Abstract: Architecture for extracting document information from documents received as search results based on a query string, and computing an edit distance between the data string and the query string. The edit distance is employed in determining relevance of the document as part of result ranking by detecting near-matches of a whole query or part of the query. The edit distance evaluates how close the query string is to a given data stream that includes document information such as TAUC (title, anchor text, URL, clicks) information, etc. The architecture includes the index-time splitting of compound terms in the URL to allow the more effective discovery of query terms. Additionally, index-time filtering of anchor text is utilized to find the top N anchors of one or more of the document results. The TAUC information can be input to a neural network (e.g., 2-layer) to improve relevance metrics for ranking the search results.

32 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139