scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work proposes a certain methodology for preserving the privacy of various record linkage approaches, implements, examines and compares four pairs of privacy preserving record linkage methods and protocols and presents also a blocking scheme as an extension to the privacy preserve record linkage methodology.
Abstract: Privacy-preserving record linkage is a very important task, mostly because of the very sensitive nature of the personal data. The main focus in this task is to find a way to match records from among different organisation data sets or databases without revealing competitive or personal information to non-owners. Towards accomplishing this task, several methods and protocols have been proposed. In this work, we propose a certain methodology for preserving the privacy of various record linkage approaches and we implement, examine and compare four pairs of privacy preserving record linkage methods and protocols. Two of these protocols use n-gram based similarity comparison techniques, the third protocol uses the well known edit distance and the fourth one implements the Jaro-Winkler distance metric. All of the protocols used are enhanced by private key cryptography and hash encoding. This paper presents also a blocking scheme as an extension to the privacy preserving record linkage methodology. Our comparison is backed up by extended experimental evaluation that demonstrates the performance achieved by each of the proposed protocols.

39 citations

Proceedings ArticleDOI
11 Jan 2004
TL;DR: To the knowledge, this is the first data structure for this problem with both query time and storage subexponential in d and the space requirement of this data structure is roughlyO, i.e., strongly subexp exponential.
Abstract: We present a data structure for the approximate nearest neighbor problem under edit metric (which is defined as the minimum number of insertions, deletions and character substitutions needed to transform one string into another). For any l ≥ 1 and a set of n strings of length d, the data structure reports a 3l-approximate Nearest Neighbor for any given query string q in O(d) time. The space requirement of this data structure is roughly O(nd1/(l+1)), i.e., strongly subexponential. To our knowledge, this is the first data structure for this problem with both o(n) query time and storage subexponential in d.

39 citations

Journal ArticleDOI
TL;DR: Taking advantage of the metric property of ERP, an ERP-induced inner product and a Gaussian ERP kernel are developed and embedded into difference-weighted KNN classifiers, and the experimental results show that the proposed classifiers are effective for accurate classification of pulse waveform.
Abstract: Advances in sensor and signal processing techniques have provided effective tools for quantitative research in traditional Chinese pulse diagnosis (TCPD). Because of the inevitable intraclass variation of pulse patterns, the automatic classification of pulse waveforms has remained a difficult problem. In this paper, by referring to the edit distance with real penalty (ERP) and the recent progress in k-nearest neighbors (KNN) classifiers, we propose two novel ERP-based KNN classifiers. Taking advantage of the metric property of ERP, we first develop an ERP-induced inner product and a Gaussian ERP kernel, then embed them into difference-weighted KNN classifiers, and finally develop two novel classifiers for pulse waveform classification. The experimental results show that the proposed classifiers are effective for accurate classification of pulse waveform.

39 citations

Proceedings ArticleDOI
22 Jun 2020
TL;DR: In this article, it was shown that the edit distance between two strings of length n can be computed via a randomized algorithm within a factor of f(n) in n 1+ϔ time as long as edit distance is at least n 1−δ for some δ (n) > 0.
Abstract: We show that the edit distance between two strings of length n can be computed via a randomized algorithm within a factor of f(є) in n 1+є time as long as the edit distance is at least n 1−δ for some δ(є) > 0.

39 citations

Proceedings Article
01 May 2006
TL;DR: A methodology for the automatic detection of cognates between two languages based solely on the orthography of words is proposed, which allows to achieve an improvement in the F-measure in comparison with detecting cognates based only on the edit distance between them.
Abstract: Present-day machine translation technologies crucially depend on the size and quality of lexical resources. Much of recent research in the area has been concerned with methods to build bilingual dictionaries automatically. In this paper we propose a methodology for the automatic detection of cognates between two languages based solely on the orthography of words. From a set of known cognates, the method induces rules capturing regularities of orthographic mutations that a word undergoes when migrating from one language into the other. The rules are then applied as a preprocessing step before measuring the orthographic similarity between putative cognates. As a result, the method allows to achieve an improvement in the F-measure of 11,86% in comparison with detecting cognates based only on the edit distance between them.

39 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139