Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book•

Levenshtein Distance: Information theory, Computer science, String (computer science), String metric, Damerau?Levenshtein distance, Spell checker, Hamming distance

[...]

Frederic P. Miller, Agnes F. Vandome, John McBrewster

24 Nov 2009

TL;DR: The Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e., the so called edit distance), often used in applications that need to determine how similar, or different, two strings are, such as spell checkers.

...read moreread less

Abstract: In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e., the so called edit distance). The Levenshtein distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character. A generalization of the Levenshtein distance (Damerau?Levenshtein distance) allows the transposition of two characters as an operation. Some Translation Environment Tools, such as translation memory leveraging applications, use the Levenhstein algorithm to measure the edit distance between two fuzzy matching content segments.The metric is named after Vladimir Levenshtein, who considered this distance in 1965. It is often used in applications that need to determine how similar, or different, two strings are, such as spell checkers

...read moreread less

66 citations

Proceedings Article•DOI•

[...]

Jiaheng Lu¹, Chunbin Lin¹, Wei Wang², Chen Li³, Haiyong Wang¹ - Show less +1 more•Institutions (3)

Renmin University of China¹, University of New South Wales², University of California, Irvine³

22 Jun 2013

TL;DR: An expansion-based framework to measure string similarities efficiently while considering synonyms is presented, and an estimator to approximate the size of candidates to enable an online selection of signature filters to further improve the efficiency.

...read moreread less

Abstract: A string similarity measure quantifies the similarity between two text strings for approximate string matching or comparison. For example, the strings "Sam" and "Samuel" can be considered similar. Most existing work that computes the similarity of two strings only considers syntactic similarities, e.g., number of common words or q-grams. While these are indeed indicators of similarity, there are many important cases where syntactically different strings can represent the same real-world object. For example, "Bill" is a short form of "William". Given a collection of predefined synonyms, the purpose of the paper is to explore such existing knowledge to evaluate string similarity measures more effectively and efficiently, thereby boosting the quality of string matching.In particular, we first present an expansion-based framework to measure string similarities efficiently while considering synonyms. Because using synonyms in similarity measures is, while expressive, computationally expensive (NP-hard), we propose an efficient algorithm, called selective-expansion, which guarantees the optimality in many real scenarios. We then study a novel indexing structure called SI-tree, which combines both signature and length filtering strategies, for efficient string similarity joins with synonyms. We develop an estimator to approximate the size of candidates to enable an online selection of signature filters to further improve the efficiency. This estimator provides strong low-error, high-confidence guarantees while requiring only logarithmic space and time costs, thus making our method attractive both in theory and in practice. Finally, the results from an empirical study of the algorithms verify the effectiveness and efficiency of our approach.

...read moreread less

65 citations

Journal Article•

A Comparison and Analysis of Name Matching Algorithms

[...]

Chakkrit Snae

27 Jan 2007-World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering

TL;DR: Name variations and some basic description of various name matching algorithms developed to overcome name variation and to find reasonable variants of names which can be used to further increasing mismatches for record linkage and name search are described.

...read moreread less

Abstract: Names are important in many societies, even in technologically oriented ones which use e.g. ID systems to identify individual people. Names such as surnames are the most important as they are used in many processes, such as identifying of people and genealogical research. On the other hand variation of names can be a major problem for the identification and search for people, e.g. web search or security reasons. Name matching presumes a-priori that the recorded name written in one alphabet reflects the phonetic identity of two samples or some transcription error in copying a previously recorded name. We add to this the lode that the two names imply the same person. This paper describes name variations and some basic description of various name matching algorithms developed to overcome name variation and to find reasonable variants of names which can be used to further increasing mismatches for record linkage and name search. The implementation contains algorithms for computing a range of fuzzy matching based on different types of algorithms, e.g. composite and hybrid methods and allowing us to test and measure algorithms for accuracy. NYSIIS, LIG2 and Phonex have been shown to perform well and provided sufficient flexibility to be included in the linkage/matching process for optimising name searching. Keywords—Data mining, name matching algorithm, nominal data, searching system.

...read moreread less

65 citations

Journal Article•DOI•

Approximate string matching with suffix automata

[...]

Esko Ukkonen¹, Derick Wood²•Institutions (2)

University of Helsinki¹, University of Waterloo²

01 Nov 1993-Algorithmica

TL;DR: A newO(kn) algorithm for approximate string matching problem, wheren is the length of the text, based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the edit distance table is given.

...read moreread less

Abstract: Theapproximate string matching problem is, given a text string, a pattern string, and an integerk, to find in the text all approximate occurrences of the pattern. An approximate occurrence means a substring of the text with edit distance at mostk from the pattern. We give a newO(kn) algorithm for this problem, wheren is the length of the text. The algorithm is based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the edit distance table. Some experiments showing that the algorithm has a small overhead are reported.

...read moreread less

65 citations

Patent•

Speech search device and speech search method

[...]

Hanazawa Toshiyuki¹•Institutions (1)

Mitsubishi Electric¹

06 Feb 2014

TL;DR: In this article, a speech search device including a recognizer and language models having different learning data and performs voice recognition on an input speech, to acquire a recognized character string for each language model, a character string comparator, and a search result determinator.

...read moreread less

Abstract: Disclosed is a speech search device including a recognizer 2 that refers to an acoustic model and language models having different learning data and performs voice recognition on an input speech, to acquire a recognized character string for each language model, a character string comparator 6 that compares the recognized character string for each language models with the character strings of search target words stored in a character string dictionary, and calculates a character string matching score showing the degree of matching of the recognized character string with respect to each of the character strings of the search target words, to acquire both a character string having the highest character string matching score and this character string matching score for each recognized character strings, and a search result determinator 8 that refers to the acquired score and outputs one or more search target words in descending order of the scores.

...read moreread less

64 citations

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics