Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Time series classification with ensembles of elastic distance measures

[...]

Jason Lines¹, Anthony J. Bagnall¹•Institutions (1)

University of East Anglia¹

01 May 2015-Data Mining and Knowledge Discovery

TL;DR: This work believes that their ensemble is the first ever classifier to significantly outperform DTW and raises the bar for future work in this area, and demonstrates that the ensemble is more accurate than approaches not based in the time domain.

...read moreread less

Abstract: Several alternative distance measures for comparing time series have recently been proposed and evaluated on time series classification (TSC) problems. These include variants of dynamic time warping (DTW), such as weighted and derivative DTW, and edit distance-based measures, including longest common subsequence, edit distance with real penalty, time warp with edit, and move---split---merge. These measures have the common characteristic that they operate in the time domain and compensate for potential localised misalignment through some elastic adjustment. Our aim is to experimentally test two hypotheses related to these distance measures. Firstly, we test whether there is any significant difference in accuracy for TSC problems between nearest neighbour classifiers using these distance measures. Secondly, we test whether combining these elastic distance measures through simple ensemble schemes gives significantly better accuracy. We test these hypotheses by carrying out one of the largest experimental studies ever conducted into time series classification. Our first key finding is that there is no significant difference between the elastic distance measures in terms of classification accuracy on our data sets. Our second finding, and the major contribution of this work, is to define an ensemble classifier that significantly outperforms the individual classifiers. We also demonstrate that the ensemble is more accurate than approaches not based in the time domain. Nearly all TSC papers in the data mining literature cite DTW (with warping window set through cross validation) as the benchmark for comparison. We believe that our ensemble is the first ever classifier to significantly outperform DTW and as such raises the bar for future work in this area.

...read moreread less

443 citations

Journal Article•DOI•

Finding approximate patterns in strings

[...]

Esko Ukkonen¹•Institutions (1)

University of Helsinki¹

01 Mar 1985-Journal of Algorithms

TL;DR: An algorithm is presented to construct a deterministic finite-state automaton that solves the problem of locating in any string a substring whose edit distance from p is at most a given constant t.

...read moreread less

413 citations

Journal Article•DOI•

Comparing stars: on approximating graph edit distance

[...]

Zhiping Zeng¹, Anthony K. H. Tung², Jianyong Wang¹, Jianhua Feng¹, Lizhu Zhou¹ - Show less +1 more•Institutions (2)

Tsinghua University¹, National University of Singapore²

01 Aug 2009

TL;DR: Three novel methods to compute the upper and lower bounds for the edit distance between two graphs in polynomial time are introduced and result shows that these methods achieve good scalability in terms of both the number of graphs and the size of graphs.

...read moreread less

Abstract: Graph data have become ubiquitous and manipulating them based on similarity is essential for many applications. Graph edit distance is one of the most widely accepted measures to determine similarities between graphs and has extensive applications in the fields of pattern recognition, computer vision etc. Unfortunately, the problem of graph edit distance computation is NP-Hard in general. Accordingly, in this paper we introduce three novel methods to compute the upper and lower bounds for the edit distance between two graphs in polynomial time. Applying these methods, two algorithms AppFull and AppSub are introduced to perform different kinds of graph search on graph databases. Comprehensive experimental studies are conducted on both real and synthetic datasets to examine various aspects of the methods for bounding graph edit distance. Result shows that these methods achieve good scalability in terms of both the number of graphs and the size of graphs. The effectiveness of these algorithms also confirms the usefulness of using our bounds in filtering and searching of graphs.

...read moreread less

413 citations

Proceedings Article•DOI•

Detecting algorithmically generated malicious domain names

[...]

Sandeep Yadav¹, Ashwath Kumar Krishna Reddy¹, A. L. Narasimha Reddy¹, Supranamaya Ranjan²•Institutions (2)

Texas A&M University¹, Narus²

01 Nov 2010

TL;DR: This paper develops a methodology to detect domain fluxing as used by Conficker botnet with minimal false positives and applies it to packet traces collected at a Tier-1 ISP.

...read moreread less

Abstract: Recent Botnets such as Conficker, Kraken and Torpig have used DNS based "domain fluxing" for command-and-control, where each Bot queries for existence of a series of domain names and the owner has to register only one such domain name. In this paper, we develop a methodology to detect such "domain fluxes" in DNS traffic by looking for patterns inherent to domain names that are generated algorithmically, in contrast to those generated by humans. In particular, we look at distribution of alphanumeric characters as well as bigrams in all domains that are mapped to the same set of IP-addresses. We present and compare the performance of several distance metrics, including KL-distance, Edit distance and Jaccard measure. We train by using a good data set of domains obtained via a crawl of domains mapped to all IPv4 address space and modeling bad data sets based on behaviors seen so far and expected. We also apply our methodology to packet traces collected at a Tier-1 ISP and show we can automatically detect domain fluxing as used by Conficker botnet with minimal false positives.

...read moreread less

405 citations

Journal Article•DOI•

An Extension of the String-to-String Correction Problem

[...]

Robert A. Wagner¹, Roy Lowrance¹•Institutions (1)

Vanderbilt University¹

01 Apr 1975-Journal of the ACM

TL;DR: The set of allowable edit operations is extended to include the operation of interchanging the positions of two adjacent characters under certain restrictions on edit-operation costs, and it is shown that the extended problem can still be solved in time proportional to the product of the lengths of the given strings.

...read moreread less

Abstract: The string-to-string correction problem asks for a sequence S of "edit operations" of minimal cost such that ~(A) = B, for given strings A and B The edit operations previously investi- gated allow changing one symbol of a string into another single symbol, deleting one symbol from a string, or inserting a single symbol into a string This paper extends the set of allowable edit opera- tions to include the operation of interchanging the positions of two adjacent characters Under certain restrictions on edit-operation costs, it is shown that the extended problem can still be solved in time proportional to the product of the lengths of the given strings

...read moreread less

350 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics