Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Algorithm for Driver Intention Detection with Fuzzy Logic and Edit Distance

[...]

Jens Heine¹, Michael Sylla¹, Ingmar Langer¹, Thomas Schramm, Bettina Abendroth, Ralph Bruder - Show less +2 more•Institutions (1)

Opel¹

15 Sep 2015

TL;DR: An algorithm to predict driver's intention with Fuzzy Logic and Edit Distance to parameterize advanced driver assistance systems to reduce the warning dilemma and so to raise the driver's acceptance for such systems.

...read moreread less

Abstract: Driver intention detection is helpful to parameterize advanced driver assistance systems to reduce the warning dilemma and so to raise the driver's acceptance for such systems. An algorithm to predict driver's intention with Fuzzy Logic and Edit Distance is presented. The main features and the functionality is explained. The necessary steps for training and validation of the algorithm are presented. The performance of the first configuration is discussed and the future steps for improving the performance are shown.

...read moreread less

15 citations

Proceedings Article•DOI•

Sequence assembly from corrupted shotgun reads

[...]

Shirshendu Ganguly¹, Elchanan Mossel², Miklos Z. Racz³•Institutions (3)

University of Washington¹, University of California, Berkeley², Microsoft³

10 Jul 2016

TL;DR: It is shown that if the reads are long enough and there are sufficiently many of them, then approximate reconstruction is possible: a simple algorithm such that for almost all original sequences the output of the algorithm is a sequence whose edit distance from the original one is at most O(ε) times the length of the original sequence.

...read moreread less

Abstract: The prevalent technique for DNA sequencing consists of two main steps: shotgun sequencing, where many randomly located fragments, called reads, are extracted from the overall sequence, followed by an assembly algorithm that aims to reconstruct the original sequence. There are many different technologies that generate the reads: widely-used second-generation methods create short reads with low error rates, while emerging third-generation methods create long reads with high error rates. Both error rates and error profiles differ among methods, so reconstruction algorithms are often tailored to specific shotgun sequencing technologies. As these methods change over time, a fundamental question is whether there exist reconstruction algorithms which are robust, i.e., which perform well under a wide range of error distributions. Here we study this question of sequence assembly from corrupted reads. We make no assumption on the types of errors in the reads, but only assume a bound on their magnitude. More precisely, for each read we assume that instead of receiving the true read with no errors, we receive a corrupted read which has edit distance at most e times the length of the read from the true read. We show that if the reads are long enough and there are sufficiently many of them, then approximate reconstruction is possible: we construct a simple algorithm such that for almost all original sequences the output of the algorithm is a sequence whose edit distance from the original one is at most O(e) times the length of the original sequence.

...read moreread less

15 citations

Patent•

VLSI circuit structure for determining the edit distance between strings

[...]

Nagarajan Ranganathan¹, Raghu Sastry¹•Institutions (1)

University of South Florida¹

30 Sep 1994

TL;DR: In this paper, a VLSI circuit structure for computing the edit distance between two strings over a given alphabet is presented, which can perform approximate string matching for variable edit costs, and does not place any constraint on the lengths of the strings that can be compared.

...read moreread less

Abstract: The edit distance between two strings a1, . . . , am and b1, . . . , bn is the minimum cost s of a sequence of editing operations (insertions, deletions and substitutions) that convert one string into the other. This invention provides VLSI circuit structure for computing the edit distance between two strings over a given alphabet. The circuit structure can perform approximate string matching for variable edit costs. More importantly, the circuit structure does not place any constraint on the lengths of the strings that can be compared. It makes use of simple basic cells and requires regular nearest-neighbor communication, which makes it suitable for VLSI implementation.

...read moreread less

15 citations

Journal Article•

Simple and practical sequence nearest neighbors with block operations

[...]

S. Muthu Muthukrishnan, S. Cenk Sahinalp

01 Jan 2002-Lecture Notes in Computer Science

TL;DR: This paper describes how to implement and test the accuracy of the transformations provided in [11] in terms of estimating the block edit distance under controlled data sets, and presents a data structure for computing approximate nearest neighbors in hamming space, simpler than the well-known ones in [9,6].

...read moreread less

Abstract: Sequence nearest neighbors problem can be defined as follows. Given a database D of n sequences, preprocess D so that given any query sequence Q, one can quickly find a sequence S in D for which d(S, Q) < d(S, T) for any other sequence T in D. Here d(S, Q) denotes the distance between sequences S and Q, which can be defined as the minimum number of edit operations to transform one sequence into the other. The edit operations considered in this paper include single character edits (insertions, deletions, replacements) as well as block (substring) edits (copying, uncopying and relocating blocks). One of the main application domains for the sequence nearest neighbors problem is computational genomics where available tools for sequence comparison and search usually focus on edit operations involving single characters only. While such tools are useful for capturing certain evolutionary mechanisms (mainly point mutations), they may have limited applicability for understanding mechanisms for segmental rearrangements (duplications, translocations and deletions) underlying genome evolution. Recent improvements towards the resolution of the human genome composition suggest that such segmental rearrangements are much more common than what was estimated before. Thus there is substantial need for incorporating similarity measures that capture block edit operations in genomic sequence comparison and search. 1 Unfortunately even the computation of a block edit distance between two sequences under any set of non-trivial edit operations is NP-hard. The first efficient data structure for approximate sequence nearest neighbor search for any set of non-trivial edit operations were described in [11]; the measure considered in this paper is the block edit distance. This method achieves a preprocessing time and space polynomial in size of D and query time near-linear in size of Q by allowing an approximate factor of O(log l(log* l) 2 ). The approach involves embedding sequences into Hamming space so that approximating Hamming distances estimates sequence block edit distances within the approximation ratio above. In this study we focus on simplification and experimental evaluation of the [11] method. We first describe how we implement and test the accuracy of the transformations provided in [11] in terms of estimating the block edit distance under controlled data sets. Then, based on the hamming distance estimator described in [3] we present a data structure for computing approximate nearest neighbors in hamming space; this is simpler than the well-known ones in [9,6]. We finally report on how well the combined data structure performs for sequence nearest neighbor search under block edit distance.

...read moreread less

14 citations

Journal Article•DOI•

Cross-document event clustering using knowledge mining from co-reference chains

[...]

June-Jei Kuo¹, Hsin-Hsi Chen¹•Institutions (1)

National Taiwan University¹

01 Mar 2007-Information Processing and Management

TL;DR: A metric of normalized chain edit distance to mine, incrementally, controlled vocabulary from cross-document coreference chains is proposed to unify terms among different co-reference chains.

...read moreread less

Abstract: Unifying terminology usages which captures more term semantics is useful for event clustering. This paper proposes a metric of normalized chain edit distance to mine, incrementally, controlled vocabulary from cross-document coreference chains. Controlled vocabulary is employed to unify terms among different co-reference chains. A novel threshold model that incorporates both time decay function and spanning window uses the controlled vocabulary for event clustering on streaming news. Under correct co-reference chains, the proposed system has a 15.97% performance increase compared to the baseline system, and a 5.93% performance increase compared to the system without introducing controlled vocabulary. Furthermore, a Chinese co-reference resolution system with a chain filtering mechanism is used to experiment on the robustness of the proposed event clustering system. The clustering system using noisy co-reference chains still achieves a 10.55% performance increase compared to the baseline system. The above shows that our approach is promising.

...read moreread less

14 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics