Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Processing compressed texts: a tractability border

[...]

Yury Lifshits¹•Institutions (1)

Steklov Mathematical Institute¹

09 Jul 2007

TL;DR: A pair of similar problems (equivalence checking, Hamming distance computation) that have radically different complexity on compressed texts are indicated.

...read moreread less

Abstract: What kind of operations can we perform effectively (without full unpacking) with compressed texts? In this paper we consider three fundamental problems: (1) check the equality of two compressed texts, (2) check whether one compressed text is a substring of another compressed text, and (3) compute the number of different symbols (Hamming distance) between two compressed texts of the same length. We present an algorithm that solves the first problem in O(n3) time and the second problem in O(n2m) time. Here n is the size of compressed representation (we consider representations by straight-line programs) of the text and m is the size of compressed representation of the pattern. Next, we prove that the third problem is actually #P-complete. Thus, we indicate a pair of similar problems (equivalence checking, Hamming distance computation) that have radically different complexity on compressed texts. Our algorithmic technique used for problems (1) and (2) helps for computing minimal periods and covers of compressed texts.

...read moreread less

129 citations

Proceedings Article•DOI•

Efficient Genome-Wide, Privacy-Preserving Similar Patient Query based on Private Edit Distance

[...]

Xiao Shaun Wang¹, Yan Huang², Yongan Zhao², Haixu Tang², XiaoFeng Wang², Diyue Bu² - Show less +2 more•Institutions (2)

University of Maryland, College Park¹, Indiana University²

12 Oct 2015

TL;DR: This paper proposes GENSETS, a genome-wide, privacy- preserving similar patient query system able to support search- ing large-scale, distributed genome databases across the nation, and implements a prototype of GENSET, a combination of a novel genomic edit distance ap- proximation algorithm and new construction of private set difference size protocols.

...read moreread less

Abstract: Edit distance has been proven to be an important and frequently-used metric in many human genomic research, with Similar Patient Query (SPQ) being a particularly promising and attractive example However, due to the widespread privacy concerns on revealing personal genomic data, the scope and scale of many novel use of genome edit distance are substantially limited While the problem of private genomic edit distance has been studied by the research community for over a decade [6], the state-of-the-art solution [31] is far from even close to be applicable to real genome sequences In this paper, we propose several private edit distance protocols that feature unprecedentedly high efficiency and precision Our construction is a combination of a novel genomic edit distance ap- proximation algorithm and new construction of private set difference size protocols With the private edit distance based secure SPQ primitive, we propose GENSETS, a genome-wide, privacy- preserving similar patient query system It is able to support search- ing large-scale, distributed genome databases across the nation We have implemented a prototype of GENSETS The experimental results show that, with 100 Mbps network connection, it would take GENSETS less than 200 minutes to search through 1 million breast cancer patients (distributed nation-wide in 250 hospitals, each having 4000 patients), based on edit distances between their genomes of lengths about 75 million nucleotides each

...read moreread less

128 citations

Patent•

A search system and method for retrieval of data, and the use thereof in a search engine

[...]

Knut Magne Risvik

09 Jul 1999

TL;DR: A search system for information retrieval includes a data structure in the form of a non-evenly spaced sparse suffix tree for storing suffixes of words and/or symbols, or sequences thereof, in a text T and a query Q.

...read moreread less

Abstract: A search system for information retrieval includes a data structure in the form of a non-evenly spaced sparse suffix tree for storing suffixes of words and/or symbols, or sequences thereof, in a text T, a metric M including combined edit distance metrics for an approximate degree of matching respectively between words and/or symbols, or between sequences thereof, in the text T and a query Q, the latter distance metric including weighting cost functions for edit operations which transform a sequence S of the text into a sequence P of the query Q, and search algorithms for determining the degree of matching respectively between words and/or symbols, or between sequences thereof, in respectively the text T and the query Q, such that information R is retrieved with a specified degree of matching with the query Q Optionally the search system also includes algorithms for determining exact matching such that information R may be retrieved with an exact degree of matching with the query Q

...read moreread less

128 citations

Journal Article•DOI•

Structural entropy and metamorphic malware

[...]

Donabelle Baysa¹, Richard M. Low¹, Mark Stamp¹•Institutions (1)

San Jose State University¹

14 Apr 2013-Journal of Computer Virology and Hacking Techniques

TL;DR: Previous work on structural entropy to the metamorphic detection problem is applied and it is shown that this technique relies on an analysis of variations in the complexity of data within a file to obtain strong results in certain challenging cases.

...read moreread less

Abstract: Metamorphic malware is capable of changing its internal structure without altering its functionality. A common signature is nonexistent in highly metamorphic malware and, consequently, such malware can remain undetected under standard signature scanning. In this paper, we apply previous work on structural entropy to the metamorphic detection problem. This technique relies on an analysis of variations in the complexity of data within a file. The process consists of two stages, namely, file segmentation and sequence comparison. In the segmentation stage, we use entropy measurements and wavelet analysis to segment files. The second stage measures the similarity of file pairs by computing an edit distance between the sequences of segments obtained in the first stage. We apply this similarity measure to the metamorphic detection problem and show that we obtain strong results in certain challenging cases.

...read moreread less

128 citations

Journal Article•DOI•

[...]

Somayeh Dodge¹, Patrick Laube¹, Robert Weibel¹•Institutions (1)

University of Zurich¹

01 Sep 2012-International Journal of Geographical Information Science

TL;DR: A novel approach for finding similar trajectories, using trajectory segmentation based on movement parameters (MPs) such as speed, acceleration, or direction, using a modified version of edit distance called normalized weighted edit distance (NWED) is introduced as a similarity measure.

...read moreread less

Abstract: This article describes a novel approach for finding similar trajectories, using trajectory segmentation based on movement parameters MPs such as speed, acceleration, or direction. First, a segmentation technique is applied to decompose trajectories into a set of segments with homogeneous characteristics with respect to a particular MP. Each segment is assigned to a movement parameter class MPC, representing the behavior of the MP. Accordingly, the segmentation procedure transforms a trajectory to a sequence of class labels, that is, a symbolic representation. A modified version of edit distance called normalized weighted edit distance NWED is introduced as a similarity measure between different sequences. As an application, we demonstrate how the method can be employed to cluster trajectories. The performance of the approach is assessed in two case studies using real movement datasets from two different application domains, namely, North Atlantic Hurricane trajectories and GPS tracks of couriers in London. Three different experiments have been conducted that respond to different facets of the proposed techniques and that compare our NWED measure to a related method.

...read moreread less

128 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics