scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Book ChapterDOI
18 Jun 2008
TL;DR: In this paper, the case where bits of imay be erroneously flipped, either in a consistent or transient manner is considered, and the corresponding approximate pattern matching problems are formally defined and efficient algorithms for their resolution are provided.
Abstract: A string Si¾? Σmcan be viewed as a set of pairs S= { (i¾? i , i) : ii¾? { 0,..., mi¾? 1} }. We consider approximate pattern matching problems arising from the setting where errors are introduced to the location component (i), rather than the more traditional setting, where errors are introduced to the content itself (i¾? i ). In this paper, we consider the case where bits of imay be erroneously flipped, either in a consistent or transient manner. We formally define the corresponding approximate pattern matching problems, and provide efficient algorithms for their resolution, while introducing some novel techniques.

22 citations

Journal ArticleDOI
TL;DR: An algorithm to compute the mean shape, when the shape is represented by a string, is presented as a modification of the well-known string edit algorithm, which converts sets of mapped symbols into piecewise linear functions and compute their mean.

22 citations

Journal ArticleDOI
TL;DR: The creation of a single string-matching measure that can perform toponym matching process regardless of the language was attempted, and the creation of an ASM measure called DAS, which comprises name similarity, word similarity and sentence similarity phases, was created.
Abstract: Approximate string matching ASM is a challenging problem, which aims to match different string expressions representing the same object In this paper, detailed experimental studies were conducted on the subject of toponym matching, which is a new domain where ASM can be performed, and the creation of a single string-matching measure that can perform toponym matching process regardless of the language was attempted For this purpose, an ASM measure called DAS, which comprises name similarity, word similarity and sentence similarity phases, was created Considering the experimental results, the retrieval performance and system accuracy of DAS were much better than those of other well-known five measures that were compared on toponym test datasets In addition, DAS had the best metric values of mean average precision in six languages, and precision/recall graphs confirm this result

22 citations

Journal ArticleDOI
01 Dec 2016
TL;DR: Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.
Abstract: The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms Finnish. First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield-style test. Finally, a detailed topic-level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition OCR errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.

22 citations

Patent
11 Aug 2010
TL;DR: In this paper, a method and a device which are used for speech fuzzy retrieval, wherein, the method comprises the following steps: speech recognition is performed on the obtained speech signals by utilizing a preset acoustic model and a language model, and recognition results are obtained; retrieval is performed in a preset text entry database using a preset index table according to the recognition results.
Abstract: The invention discloses a method and a device which are used for speech fuzzy retrieval, wherein, the method comprises the following steps: speech recognition is performed on the obtained speech signals by utilizing a preset acoustic model and a language model, and recognition results are obtained; retrieval is performed in a preset text entry database by utilizing a preset index table according to the recognition results, and primarily elected entries are obtained; fuzzy matching for character strings is performed between the primarily elected entries and the recognition results, entries of which the matching degree is in a threshold value range of preset matching degree are selected as well-chosen entries, and meanwhile, the matching position of each entry is recorded; posterior probability between the text of the matching part and the well-chosen entries and voice signals are calculated; and finally, a plurality of entries are selected as the retrieval results of voice signals by utilizing the posterior probability and the matching proportion obtained through the matching positions. By adopting the invention, text entries matched with the voice signals can be retrieved quickly and accurately in a great capacity text entry database on the basis of voice signals.

21 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839