scispace - formally typeset
Search or ask a question
Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work answers the uniqueness problem whether two different functions may share the same distance transform in a generality completely sufficient for all practical applications in imaging sciences, the full-scale problem remains open.

39 citations

Journal ArticleDOI
TL;DR: This work exploits a classical embedding of the edit distance into the Hamming distance to enable some flexibility on the tolerated edit distance when looking for close keywords while preserving the confidentiality of the queries.
Abstract: Our work is focused on fuzzy keyword search over encrypted data in Cloud Computing. We adapt results on private identification schemes by Bringer et al. to this new context. We here exploit a classical embedding of the edit distance into the Hamming distance. Our way of doing enables some flexibility on the tolerated edit distance when looking for close keywords while preserving the confidentiality of the queries. Our proposal is proved secure in a security model taking into account privacy.

39 citations

Journal ArticleDOI
TL;DR: The nearly 50-year-old quadratic time bound for computing Dynamic Time Warping or GED between two sequences of n points in R is broken by presenting deterministic algorithms that run in O(n2 log log log n/log log n) time.
Abstract: Dynamic Time Warping (DTW) and Geometric Edit Distance (GED) are basic similarity measures between curves or general temporal sequences (e.g., time series) that are represented as sequences of points in some metric space (X, dist). The DTW and GED measures are massively used in various fields of computer science and computational biology. Consequently, the tasks of computing these measures are among the core problems in P. Despite extensive efforts to find more efficient algorithms, the best-known algorithms for computing the DTW or GED between two sequences of points in X = Rd are long-standing dynamic programming algorithms that require quadratic runtime, even for the one-dimensional case d = 1, which is perhaps one of the most used in practice.In this article, we break the nearly 50-year-old quadratic time bound for computing DTW or GED between two sequences of n points in R by presenting deterministic algorithms that run in O(n2 log log log n/ log log n) time. Our algorithms can be extended to work also for higher-dimensional spaces Rd, for any constant d, when the underlying distance-metric dist is polyhedral (e.g., L1, Linfin).

39 citations

Book ChapterDOI
30 Aug 2010
TL;DR: An alternative method to select small prefixes by exploiting the relationship between arithmetic mean and geometric mean of elements' weights is proposed, which dramatically reduces the average size of prefixes without much overhead and saves much computation time.
Abstract: Given a large collection of objects, finding all pairs of similar objects, namely similarity join, is widely used to solve various problems in many application domains.Computation time of similarity join is critical issue, since similarity join requires computing similarity values for all possible pairs of objects. Several existing algorithms adopt prefix filtering to avoid unnecessary similarity computation; however, existing algorithms implementing the prefix filtering have inefficiency in filtering out object pairs, in particular, when aggregate weighted similarity function, such as cosine similarity, is used to quantify similarity values between objects. This is mostly caused by large prefixes the algorithms select. In this paper, we propose an alternative method to select small prefixes by exploiting the relationship between arithmetic mean and geometric mean of elements' weights. A new algorithm, MMJoin, implementing the proposed methods dramatically reduces the average size of prefixes without much overhead. Finally, it saves much computation time. We demonstrate that our algorithm outperforms a state-of-the-art one with empirical evaluation on large-scale real world datasets.

39 citations

Proceedings ArticleDOI
01 Dec 2010
TL;DR: This work proposes an asymmetric two-party protocol in which a lightweight client Bob with a string y interacts with a single powerful server Alice containing string x in its database, based on semantically secure homomorphic functions and additive secret sharing.
Abstract: Alice and Bob possess strings x and y of length m and n respectively and want to compute the Levenshtein distance L(x, y) between the strings under privacy and communication constraints. The Levenshtein distance, or edit distance, has a dynamic programming formulation that solves a series of minimum-finding problems. Based on this formulation, there are known symmetric privacy-preserving protocols for the computation of L(x, y), in which the two parties incur equal protocol overhead. In this work, we propose an asymmetric two-party protocol in which a lightweight client Bob with a string y interacts with a single powerful server Alice containing string x in its database. We present a privacy-preserving minimum-finding protocol based on semantically secure homomorphic functions and additive secret sharing. This protocol is executed repeatedly, to enable private computation of the edit distance. Our protocol supports arbitrary finite insertion/deletion costs and a variety of substitution costs. While Alice requires similar effort as in previous approaches, the advantage is that Bob incurs far fewer ciphertext operations and transmissions, making the protocol well-suited for client-server querying applications.

39 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
86% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Feature vector
48.8K papers, 954.4K citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
81% related
Scalability
50.9K papers, 931.6K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202339
202296
2021111
2020149
2019145
2018139