Topic

Edit distance

About: Edit distance is a research topic. Over the lifetime, 2887 publications have been published within this topic receiving 71491 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

UESTS: An Unsupervised Ensemble Semantic Textual Similarity Method

[...]

Basma Hassan¹, Samir E. AbdelRahman², Reem Bahgat², Ibrahim Farag²•Institutions (2)

Fayoum University¹, Cairo University²

26 Jun 2019-IEEE Access

TL;DR: The experimental results proved that the participation of the proposed aligner in STS is effective, and the proposed UESTS outperforms the state-of-the-art unsupervised approaches, which is a promising result.

...read moreread less

Abstract: Semantic textual similarity (STS) is the task of assessing the degree of similarity between two texts in terms of meaning. Several approaches have been proposed in the literature to determine the semantic similarity between texts. The most promising work recently presented in the literature was supervised approaches. Unsupervised STS approaches are characterized by the fact that they do not require learning data, but they still suffer from some limitations. Word alignment has been widely used in the state-of-the-art approaches. From this point, this paper has three contributions. First, a new synset-oriented word aligner is presented, which relies on a huge multilingual semantic network named BabelNet. Second, three unsupervised STS approaches are proposed: string kernel-based (SK), alignment-based (AL), and weighted alignment-based (WAL). Third, some limitations of the state-of-the-art approaches are tackled, and different similarity methods are demonstrated to be complementary with each other by proposing an unsupervised ensemble STS (UESTS) approach. The UESTS incorporates the merits of four similarity measures: proposed alignment-based, surface-based, corpus-based, and enhanced edit distance. The experimental results proved that the participation of the proposed aligner in STS is effective. Over all the evaluation data sets, the proposed UESTS outperforms the state-of-the-art unsupervised approaches, which is a promising result.

...read moreread less

16 citations

Approximate Median of Strings based on Edit Operations.

[...]

J. Abreu, J. R. Rico-Juan

01 Jan 2013

TL;DR: In this paper, a new algorithm was proposed to compute an approximation to the median of a set of strings, which is obtained through the successive improvements of a partial solution, thus accounting for the frequency of each of the edit operations in all the positions of the approximate median.

...read moreread less

Abstract: This paper presents a new algorithm that can be used to compute an approximation to the median of a set of strings. The approximate median is obtained through the successive improvements of a partial solution. The edit distance from the partial solution to all the strings in the set is computed in each iteration, thus accounting for the frequency of each of the edit operations in all the positions of the approximate median. A goodness index for edit operations is later computed by multiplying their frequency by the cost. Each operation is tested, starting from that with the highest index, in order to verify whether applying it to the partial solution leads to an improvement. If successful, a new iteration begins from the new approximate median. The algorithm finishes when all the operations have been examined without a better solution being found. Comparative experiments involving Freeman chain codes encoding 2D shapes and the Copenhagen chromosome database show that the quality of the approximate median string is similar to benchmark approaches but achieves a much faster convergence.

...read moreread less

16 citations

Proceedings Article•

OCR correction and query expansion for retrieval on OCR data : CLARIT TREC-5 confusion track report

[...]

Xiang Tong, ChengXiang Zhai, Natasa Milic-Frayling, David A. Evans

01 Jan 1996

TL;DR: In CLARIT TREC-5 confusion track experiments, they explored two techniques for improving retrieval performance over corrupted data : (1) OCR word error correction to improve OCR text accuracy, and (2) query expansion by adding query term variants found in the corrupted text.

...read moreread less

Abstract: In CLARIT TREC-5 confusion track experiments, they explored two techniques for improving retrieval performance over corrupted data : (1) OCR word error correction to improve OCR text accuracy, and (2) query expansion by adding query term variants found in the corrupted text. The OCR word correction technique is based on statistical word bigram modeling (Tong & Evans 1996). The variants of a query term are terms similar to the query term, as measured by the edit distance (Wagner 1974). While the official runs were based on the first approach, in the follow-up experiments they tested the second approach as well. In this report, they give a brief description of the OCR correction and query expansion techniques, and then discuss the results of the experiments

...read moreread less

16 citations

Journal Article•DOI•

Human Action Recognition Based on Template Matching

[...]

Chengyou Li¹, Tao Hua¹•Institutions (1)

Liaocheng University¹

01 Jan 2011-Procedia Engineering

TL;DR: This paper presents a new method of human action recognition, which is based on ℜ transform and template matching after the key frame is extracted from a cycle, and utilizes a novel string matching scheme based on edit distance to analyze different human actions.

...read moreread less

16 citations

Proceedings Article•DOI•

Matching for run-length encoded strings

[...]

Alberto Apostolico¹, Gad M. Landau², Steven Skiena³•Institutions (3)

Purdue University¹, University of Haifa², State University of New York System³

11 Jun 1997-Sequence

TL;DR: This work considers the problem of finding the longest common subsequence of two strings, and develops significantly faster algorithms for a special class of strings which emerge frequently in pattern matching problems.

...read moreread less

Abstract: Measuring the similarity between two strings, through such standard measures as Hamming distance, edit distance, and longest common subsequence, is one of the fundamental problems in pattern matching. We consider the problem of finding the longest common subsequence of two strings. A well-known dynamic programming algorithm computes the longest common subsequence of strings X and Y in O(|X|/spl middot/|Y|) time. We develop significantly faster algorithms for a special class of strings which emerge frequently in pattern matching problems. A string S is run-length encoded if it is described as an ordered sequence of pairs (/spl sigma/,i), each consisting of an alphabet symbol /spl sigma/ and an integer i. Each pair corresponds to a run in S consisting of i consecutive occurrences of /spl sigma/. For example, the string aaaabbbbcccabbbbcc can be encoded as a/sup 4/b/sup 4/c/sup 3/a/sup 1/b/sup 4/c/sup 2/. Such a run-length encoded string can be significantly shorter than the expanded string representation. Indeed, runlength coding serves as a popular image compression technique, since many classes of images, such as binary images in facsimile transmission, typically contain large patches of identically-valued pixels.

...read moreread less

16 citations

Collapse

Network Information

Performance

Metrics

3,030

Papers

78,281

Citations

No. of papers in the topic in previous years
Year	Papers
2023	39
2022	96
2021	111
2020	149
2019	145
2018	139

Edit distance

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics