scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
01 Jan 2014
TL;DR: An improved Levenshtein distance algorithm is proposed to calculate the similarity of strings, which improves the formula of similarity and the Levenhtein matrix and has higher accuracy and more flexible searching way in the same space complexity.
Abstract: 【】When calculating the similarity of strings, the Levenshtein Distance(LD) algorithm only considers the operating times and ignores the common substrings of two strings. Aiming at this problem, an improved Levenshtein distance algorithm is proposed to calculate the similarity. The new algorithm improves the formula of similarity and the Levenshtein matrix. When calculating the distance, the new algorithm calculates the longest common substring and all the LD backtracking paths in the original matrix at the same time. Selecting a word in the experiment as a source string, a set of similar words of the different degrees of the source string as a target string, the new similarity measure formula is compared with the existing string similarity calculation method, the new formula reduces the number of target strings into the winner table with similarity sample range and standard deviation of 0.331 and 0.150, respectively. Experimental results show that the new algorithm has higher accuracy and more flexible searching way in the same space complexity. 【Key words】Levenshtein Distance(LD); LD algorithm; backtracking path; the longest common substring; similarity; fuzzy query DOI: 10.3969/j.issn.1000-3428.2014.01.047 计 算 机 工 程 Computer Engineering 第 40卷 第 1期 Vol.40 No.1 2014年 1月 January 2014

10 citations

Proceedings ArticleDOI
01 Sep 2006
TL;DR: This work shows that pessimistic error pruning method gives better generalization in a coreference resolution task than that reported in W.M. Soon et al. (2001) when weights of positive and negative examples are properly chosen.
Abstract: Coreference resolution is the process of determining whether two expressions in natural language refer to the same entity in the world. We adopt machine learning approach using decision tree to a coreference resolution of general noun phrases in unrestricted text based on well defined features. We also use approximate matching algorithms for a string match feature and databases of American last names and male and female first names for gender agreement and alias feature. For the evaluation we use MUC-6 coreference corpora. We show that pessimistic error pruning method gives better generalization in a coreference resolution task than that reported in W.M. Soon et al. (2001) when weights of positive and negative examples are properly chosen

9 citations

Journal ArticleDOI
TL;DR: It is proved that String-to-String Correction is fixed-parameter tractable, for parameter k, and a simple fixed- parameter algorithm is presented that solves the problem in O(2^kn) time.

9 citations

Proceedings Article
01 Jan 2006
TL;DR: A new algorithm for pattern matching when both a text T and a pattern P are presented by SLPs and it is shown how to count all occurrences, check whether any given position is an occurrence or not in time O(n 2 m).
Abstract: Here we study the complexity of string problems as a function of the size of a program that generates input. We consider straight-line programs (SLP), since all algorithms on SLP-generated strings could be applied to processing LZ-compressed texts. The main result is a new algorithm for pattern matching when both a text T and a pattern P are presented by SLPs (so-called fully compressed pattern matching problem). We show how to nd a rst occurrence, count all occurrences, check whether any given position is an occurrence or not in time O(n 2 m). Here m; n are the sizes of straight-line programs generating correspondingly P and T . Then we present polynomial algorithms for computing ngerprint table and compressed representation of all covers (for the rst time) and for nding periods of a given compressed string (our algorithm is faster than previously known). On the other hand, we show that computing the Hamming distance between two SLP-generated strings is NP- and coNP-hard.

9 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839