Topic
Approximate string matching
About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.
Papers published on a yearly basis
Papers
More filters
••
18 Jun 2008TL;DR: In this paper, the case where bits of imay be erroneously flipped, either in a consistent or transient manner is considered, and the corresponding approximate pattern matching problems are formally defined and efficient algorithms for their resolution are provided.
Abstract: A string Si¾? Σmcan be viewed as a set of pairs S= { (i¾? i , i) : ii¾? { 0,..., mi¾? 1} }. We consider approximate pattern matching problems arising from the setting where errors are introduced to the location component (i), rather than the more traditional setting, where errors are introduced to the content itself (i¾? i ). In this paper, we consider the case where bits of imay be erroneously flipped, either in a consistent or transient manner. We formally define the corresponding approximate pattern matching problems, and provide efficient algorithms for their resolution, while introducing some novel techniques.
22 citations
••
TL;DR: An algorithm to compute the mean shape, when the shape is represented by a string, is presented as a modification of the well-known string edit algorithm, which converts sets of mapped symbols into piecewise linear functions and compute their mean.
22 citations
••
TL;DR: The creation of a single string-matching measure that can perform toponym matching process regardless of the language was attempted, and the creation of an ASM measure called DAS, which comprises name similarity, word similarity and sentence similarity phases, was created.
Abstract: Approximate string matching ASM is a challenging problem, which aims to match different string expressions representing the same object In this paper, detailed experimental studies were conducted on the subject of toponym matching, which is a new domain where ASM can be performed, and the creation of a single string-matching measure that can perform toponym matching process regardless of the language was attempted For this purpose, an ASM measure called DAS, which comprises name similarity, word similarity and sentence similarity phases, was created Considering the experimental results, the retrieval performance and system accuracy of DAS were much better than those of other well-known five measures that were compared on toponym test datasets In addition, DAS had the best metric values of mean average precision in six languages, and precision/recall graphs confirm this result
22 citations
••
01 Dec 2016TL;DR: Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.
Abstract: The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms Finnish. First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield-style test. Finally, a detailed topic-level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition OCR errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.
22 citations
•
11 Aug 2010
TL;DR: In this paper, a method and a device which are used for speech fuzzy retrieval, wherein, the method comprises the following steps: speech recognition is performed on the obtained speech signals by utilizing a preset acoustic model and a language model, and recognition results are obtained; retrieval is performed in a preset text entry database using a preset index table according to the recognition results.
Abstract: The invention discloses a method and a device which are used for speech fuzzy retrieval, wherein, the method comprises the following steps: speech recognition is performed on the obtained speech signals by utilizing a preset acoustic model and a language model, and recognition results are obtained; retrieval is performed in a preset text entry database by utilizing a preset index table according to the recognition results, and primarily elected entries are obtained; fuzzy matching for character strings is performed between the primarily elected entries and the recognition results, entries of which the matching degree is in a threshold value range of preset matching degree are selected as well-chosen entries, and meanwhile, the matching position of each entry is recorded; posterior probability between the text of the matching part and the well-chosen entries and voice signals are calculated; and finally, a plurality of entries are selected as the retrieval results of voice signals by utilizing the posterior probability and the matching proportion obtained through the matching positions. By adopting the invention, text entries matched with the voice signals can be retrieved quickly and accurately in a great capacity text entry database on the basis of voice signals.
21 citations