Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A syntactic PR approach to Telugu handwritten character recognition

[...]

Samit Kumar Pradhan¹, Atul Negi¹•Institutions (1)

University of Hyderabad¹

16 Dec 2012

TL;DR: A character recognition mechanism based on a syntactic PR approach that uses the trie data structure for efficient recognition that considers the approximate matching of the string instead of the exact matching to make the approach robust in the presence of noise.

...read moreread less

Abstract: This paper shows a character recognition mechanism based on a syntactic PR approach that uses the trie data structure for efficient recognition It uses approximate matching of the string for classification During the preprocessing an input character image is transformed into a skeletonized image and discrete curves are found using a 3 x 3 pixel region A trie, which we call as a sequence trie is used for a look up approach at a lower level to encode a discrete curve pattern of pixels The sequence of such discrete curves from the input pattern is looked up in the sequence trie The encoding of several such sequence numbers for the thinned character constructs a pattern string Approximate string matching is used to compare the encoded pattern string from a template character with the pattern string obtained from the input character We consider the approximate matching of the string instead of the exact matching to make the approach robust in the presence of noise Another trie data structure (called pattern trie) is used for the efficient storage and retrieval for approximate matching of the string We make use of the trie since it takes O(m) in worst case where m is the length of the longest string in the trie For the approximate string matching we use look ahead with a branch and bound scheme in the trie Here we apply our method on 43 Telugu characters from the basic Telugu characters for demonstration The proposed approach has recognised all the test characters given here correctly, however more extensive testing on realistic data is required

...read moreread less

9 citations

Journal Article•DOI•

Whistle for music: using melody transcription and approximate string matching for content-based query over a MIDI database

[...]

Hung-Che Shen¹, Chung-Nan Lee¹•Institutions (1)

National Sun Yat-sen University¹

01 Dec 2007-Multimedia Tools and Applications

TL;DR: A system which enables users to retrieve MIDI format music by whistling a melodic fragment by extending an existing search engine into a fast approximate melodic matching engine and achieving a real-time and robust whistle-to-MIDI converter.

...read moreread less

Abstract: In this paper, we present a "Whistle for Music" system which enables users to retrieve MIDI format music by whistling a melodic fragment. Three essential components are query processing, MIDI preprocessing and an approximate search engine. For query processing, we have achieved a real-time and robust whistle-to-MIDI converter. For feature extraction, the proposed MIDI preprocessing can extract individual, local and global melodic descriptions from MIDI files. In order to match query with target, we extend an existing search engine into a fast approximate melodic matching engine. Based on the integration of those three components, the system can return a list of MIDI files that are ranked by how closely they match the whistling. The systematic evaluation for the query-by-whistling system is finally performed. The results show that careful measurement and objective comparisons can lead us to know the scaling trend about query and target. One encouraging aspect is that the performance can be predicted based on the evaluation methods.

...read moreread less

9 citations

Journal Article•DOI•

Improvement of soundex algorithm for indian language based on phonetic m atching

[...]

Rima Shah, Dheeraj Kumar Singh

01 Jan 2014-International Journal of Computer Science, Engineering and Applications

TL;DR: In this research work Soundex algorithm is used for Hindi and Gujara ti language and applied on the names along with their variations in order to retrieve the output with minimum false hits.

...read moreread less

Abstract: In a system with a large database, there always has been a problem that names may not be spe lled well or might not be spelled in a way thatone expected. So, data in the database gets degrad ed. In this case it is required to search the duplicates and merge them in the single entity. In doing so, one problem is that the way in which the strings would be compared. In such cases rather than looking for exact match, approximate string matching would be appreciable . One of the string matching techniques isPhonetic matching which is used to compare the name based on the pronunciation of the words. The similar sounding words could be retrieved from the large database using different phonetic matching algorithm and best known algorithm is Soundex algorithm. Phonetic matching is needed when many people from different culture come together. They either spe ak with different pronunciation or their writing habits are different. This scenario is very common in India, as we have many different languages like Hindi, Gujarati, Marathi, Tamil etc. In this research work Soundex algorithm is used for Hindi and Gujara ti language and applied on the names along with their variations in order to retrieve the output with minimum false hits.

...read moreread less

9 citations

Assessment of approximate string matching in a biomedical text retrieval problem

[...]

J.F. Wanga, Z.R. Lia, C.Z. Caia, Y.Z. Chena¹•Institutions (1)

National University of Singapore¹

01 Jan 2005

TL;DR: The authors used the Smith-Waterman algorithm with affine gap penalty as a method for biomedical literature retrieval and found that the optimum performance was at string identity of 88%, at which the recall and precision were 96.9% and 97.3%, respectively.

...read moreread less

Abstract: Text-based search is widely used for biomedical data mining and knowledge discovery. Character errors in literatures affect the accuracy of data mining. Methods for solving this problem are being explored. This work tests the usefulness of the Smith–Waterman algorithm with affine gap penalty as a method for biomedical literature retrieval.Names ofmedicinal herbs collected fromherbalmedicine literatures arematchedwith those frommedicinal chemistry literatures by using this algorithm at different string identity levels (80–100%). The optimum performance is at string identity of 88%, at which the recall and precision are 96.9% and 97.3%, respectively. Our study suggests that the Smith–Waterman algorithm is useful for improving the success rate of biomedical text retrieval. 2004 Elsevier Ltd. All rights reserved.

...read moreread less

9 citations

Proceedings Article•DOI•

Comparison of Two-Dimensional String Matching Algorithms

[...]

Chengguo Chang, Hui Wang

23 Mar 2012

TL;DR: The KMP algorithm, Rabin-Karp algorithm and their combinatorial are presented and compared, by a number of tests at diverse data scales, to validate the efficiency of these three algorithms.

...read moreread less

Abstract: String matching is a special kind of pattern recognition problem, which finds all occurrences of a given pattern string in a given text string. The technology of two-dimensional string matching is applied broadly in many information processing domains. A good two-dimensional string matching algorithm can effectively enhance the searching speed. In this paper, the KMP algorithm, Rabin-Karp algorithm and their combinatorial are presented and compared, by a number of tests at diverse data scales, to validate the efficiency of these three algorithms.

...read moreread less

9 citations

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics