scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Proceedings ArticleDOI
16 Dec 2012
TL;DR: A character recognition mechanism based on a syntactic PR approach that uses the trie data structure for efficient recognition that considers the approximate matching of the string instead of the exact matching to make the approach robust in the presence of noise.
Abstract: This paper shows a character recognition mechanism based on a syntactic PR approach that uses the trie data structure for efficient recognition It uses approximate matching of the string for classification During the preprocessing an input character image is transformed into a skeletonized image and discrete curves are found using a 3 x 3 pixel region A trie, which we call as a sequence trie is used for a look up approach at a lower level to encode a discrete curve pattern of pixels The sequence of such discrete curves from the input pattern is looked up in the sequence trie The encoding of several such sequence numbers for the thinned character constructs a pattern string Approximate string matching is used to compare the encoded pattern string from a template character with the pattern string obtained from the input character We consider the approximate matching of the string instead of the exact matching to make the approach robust in the presence of noise Another trie data structure (called pattern trie) is used for the efficient storage and retrieval for approximate matching of the string We make use of the trie since it takes O(m) in worst case where m is the length of the longest string in the trie For the approximate string matching we use look ahead with a branch and bound scheme in the trie Here we apply our method on 43 Telugu characters from the basic Telugu characters for demonstration The proposed approach has recognised all the test characters given here correctly, however more extensive testing on realistic data is required

9 citations

Journal ArticleDOI
TL;DR: A system which enables users to retrieve MIDI format music by whistling a melodic fragment by extending an existing search engine into a fast approximate melodic matching engine and achieving a real-time and robust whistle-to-MIDI converter.
Abstract: In this paper, we present a "Whistle for Music" system which enables users to retrieve MIDI format music by whistling a melodic fragment. Three essential components are query processing, MIDI preprocessing and an approximate search engine. For query processing, we have achieved a real-time and robust whistle-to-MIDI converter. For feature extraction, the proposed MIDI preprocessing can extract individual, local and global melodic descriptions from MIDI files. In order to match query with target, we extend an existing search engine into a fast approximate melodic matching engine. Based on the integration of those three components, the system can return a list of MIDI files that are ranked by how closely they match the whistling. The systematic evaluation for the query-by-whistling system is finally performed. The results show that careful measurement and objective comparisons can lead us to know the scaling trend about query and target. One encouraging aspect is that the performance can be predicted based on the evaluation methods.

9 citations

Journal ArticleDOI
TL;DR: In this research work Soundex algorithm is used for Hindi and Gujara ti language and applied on the names along with their variations in order to retrieve the output with minimum false hits.
Abstract: In a system with a large database, there always has been a problem that names may not be spe lled well or might not be spelled in a way thatone expected. So, data in the database gets degrad ed. In this case it is required to search the duplicates and merge them in the single entity. In doing so, one problem is that the way in which the strings would be compared. In such cases rather than looking for exact match, approximate string matching would be appreciable . One of the string matching techniques isPhonetic matching which is used to compare the name based on the pronunciation of the words. The similar sounding words could be retrieved from the large database using different phonetic matching algorithm and best known algorithm is Soundex algorithm. Phonetic matching is needed when many people from different culture come together. They either spe ak with different pronunciation or their writing habits are different. This scenario is very common in India, as we have many different languages like Hindi, Gujarati, Marathi, Tamil etc. In this research work Soundex algorithm is used for Hindi and Gujara ti language and applied on the names along with their variations in order to retrieve the output with minimum false hits.

9 citations

01 Jan 2005
TL;DR: The authors used the Smith-Waterman algorithm with affine gap penalty as a method for biomedical literature retrieval and found that the optimum performance was at string identity of 88%, at which the recall and precision were 96.9% and 97.3%, respectively.
Abstract: Text-based search is widely used for biomedical data mining and knowledge discovery. Character errors in literatures affect the accuracy of data mining. Methods for solving this problem are being explored. This work tests the usefulness of the Smith–Waterman algorithm with affine gap penalty as a method for biomedical literature retrieval.Names ofmedicinal herbs collected fromherbalmedicine literatures arematchedwith those frommedicinal chemistry literatures by using this algorithm at different string identity levels (80–100%). The optimum performance is at string identity of 88%, at which the recall and precision are 96.9% and 97.3%, respectively. Our study suggests that the Smith–Waterman algorithm is useful for improving the success rate of biomedical text retrieval. 2004 Elsevier Ltd. All rights reserved.

9 citations

Proceedings ArticleDOI
23 Mar 2012
TL;DR: The KMP algorithm, Rabin-Karp algorithm and their combinatorial are presented and compared, by a number of tests at diverse data scales, to validate the efficiency of these three algorithms.
Abstract: String matching is a special kind of pattern recognition problem, which finds all occurrences of a given pattern string in a given text string. The technology of two-dimensional string matching is applied broadly in many information processing domains. A good two-dimensional string matching algorithm can effectively enhance the searching speed. In this paper, the KMP algorithm, Rabin-Karp algorithm and their combinatorial are presented and compared, by a number of tests at diverse data scales, to validate the efficiency of these three algorithms.

9 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839