Topic
Edit distance
About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: This work answers the uniqueness problem whether two different functions may share the same distance transform in a generality completely sufficient for all practical applications in imaging sciences, the full-scale problem remains open.
39 citations
••
TL;DR: This work exploits a classical embedding of the edit distance into the Hamming distance to enable some flexibility on the tolerated edit distance when looking for close keywords while preserving the confidentiality of the queries.
Abstract: Our work is focused on fuzzy keyword search over encrypted data in Cloud Computing. We adapt results on private identification schemes by Bringer et al. to this new context. We here exploit a classical embedding of the edit distance into the Hamming distance. Our way of doing enables some flexibility on the tolerated edit distance when looking for close keywords while preserving the confidentiality of the queries. Our proposal is proved secure in a security model taking into account privacy.
39 citations
••
TL;DR: The nearly 50-year-old quadratic time bound for computing Dynamic Time Warping or GED between two sequences of n points in R is broken by presenting deterministic algorithms that run in O(n2 log log log n/log log n) time.
Abstract: Dynamic Time Warping (DTW) and Geometric Edit Distance (GED) are basic similarity measures between curves or general temporal sequences (e.g., time series) that are represented as sequences of points in some metric space (X, dist). The DTW and GED measures are massively used in various fields of computer science and computational biology. Consequently, the tasks of computing these measures are among the core problems in P. Despite extensive efforts to find more efficient algorithms, the best-known algorithms for computing the DTW or GED between two sequences of points in X = Rd are long-standing dynamic programming algorithms that require quadratic runtime, even for the one-dimensional case d = 1, which is perhaps one of the most used in practice.In this article, we break the nearly 50-year-old quadratic time bound for computing DTW or GED between two sequences of n points in R by presenting deterministic algorithms that run in O(n2 log log log n/ log log n) time. Our algorithms can be extended to work also for higher-dimensional spaces Rd, for any constant d, when the underlying distance-metric dist is polyhedral (e.g., L1, Linfin).
39 citations
••
30 Aug 2010TL;DR: An alternative method to select small prefixes by exploiting the relationship between arithmetic mean and geometric mean of elements' weights is proposed, which dramatically reduces the average size of prefixes without much overhead and saves much computation time.
Abstract: Given a large collection of objects, finding all pairs of similar objects, namely similarity join, is widely used to solve various problems in many application domains.Computation time of similarity join is critical issue, since similarity join requires computing similarity values for all possible pairs of objects. Several existing algorithms adopt prefix filtering to avoid unnecessary similarity computation; however, existing algorithms implementing the prefix filtering have inefficiency in filtering out object pairs, in particular, when aggregate weighted similarity function, such as cosine similarity, is used to quantify similarity values between objects. This is mostly caused by large prefixes the algorithms select. In this paper, we propose an alternative method to select small prefixes by exploiting the relationship between arithmetic mean and geometric mean of elements' weights. A new algorithm, MMJoin, implementing the proposed methods dramatically reduces the average size of prefixes without much overhead. Finally, it saves much computation time. We demonstrate that our algorithm outperforms a state-of-the-art one with empirical evaluation on large-scale real world datasets.
39 citations
••
01 Dec 2010TL;DR: This work proposes an asymmetric two-party protocol in which a lightweight client Bob with a string y interacts with a single powerful server Alice containing string x in its database, based on semantically secure homomorphic functions and additive secret sharing.
Abstract: Alice and Bob possess strings x and y of length m and n respectively and want to compute the Levenshtein distance L(x, y) between the strings under privacy and communication constraints. The Levenshtein distance, or edit distance, has a dynamic programming formulation that solves a series of minimum-finding problems. Based on this formulation, there are known symmetric privacy-preserving protocols for the computation of L(x, y), in which the two parties incur equal protocol overhead. In this work, we propose an asymmetric two-party protocol in which a lightweight client Bob with a string y interacts with a single powerful server Alice containing string x in its database. We present a privacy-preserving minimum-finding protocol based on semantically secure homomorphic functions and additive secret sharing. This protocol is executed repeatedly, to enable private computation of the edit distance. Our protocol supports arbitrary finite insertion/deletion costs and a variety of substitution costs. While Alice requires similar effort as in previous approaches, the advantage is that Bob incurs far fewer ciphertext operations and transmissions, making the protocol well-suited for client-server querying applications.
39 citations