scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Book ChapterDOI
01 May 2008
TL;DR: Results indicate that longer queries tends to perform similar to short ones, PRF improves performance considerably, and that queries tend to fare better with WSD rather than using maximal expansion of terms by taking all the translations given in the MRD.
Abstract: We describe cross language retrieval experiments using Amharic queries and English language d ocument collection. Two monolingual and eight bilingual runs were submitted with variations in terms of usage of long and short queries, presence of pseudo relevance feedback (PRF), and approaches for word sense disambiguation (WSD). We used an Amharic-English machine readable dictionary (MRD), and an online Amharic-English dictionary for lookup translation of query terms. Out of dictionary Amharic query terms were considered as possible named entities, and further filtering was attained through restricted fuzzy matching based on edit distance which is calculated against automatically extracted English proper names. The obtained results indicate that longer queries tend to perform similar to short ones, PRF improves performance considerably, and that queries tend to fare better with WSD rather than using maximal expansion of terms by taking all the translations given in the MRD.

8 citations

Journal ArticleDOI
TL;DR: Bit parallelism enhances the processing speed of the approximate string matching algorithm as it takes the benefit of the internal bit operations taking place in parallel inside the system.
Abstract: String matching is to find all the occurrences of a given pattern in a large text both being sequence of characters drawn from finite alphabet set. Approximate String Matching involves the detection of correct patterns along with the detection of some wrong patterns inside the text. Bit Parallelism is a feature that can be used to detect patterns inside the text and is reported to result in more efficient approximate string matching. Bit parallelism enhances the processing speed of the approximate string matching algorithm as it takes the benefit of the internal bit operations taking place in parallel inside the system. The bit parallel method has also been compared with the traditional Aho Corasick Algorithms which consumes more time and memory. In general bit parallel are both memory and time efficient.

8 citations

Journal ArticleDOI
TL;DR: This paper shows how to optimize this dictionary-based syntactic pattern recognition of strings computation by incorporating breadth first search schemes on the underlying graph structure, and demonstrates marked improvements with regard to the operations needed up to 21%, while at the same time maintaining the same accuracy.
Abstract: Dictionary-based syntactic pattern recognition of strings attempts to recognize a transmitted string X *, by processing its noisy version, Y, without sequentially comparing Y with every element X in the finite, (but possibly, large) dictionary, H. The best estimate X + of X *, is defined as that element of H which minimizes the generalized Levenshtein distance (GLD) D(X, Y) between X and Y, for all X ?H. The non-sequential PR computation of X + involves a compact trie-based representation of H. In this paper, we show how we can optimize this computation by incorporating breadth first search schemes on the underlying graph structure. This heuristic emerges from the trie-based dynamic programming recursive equations, which can be effectively implemented using a new data structure called the linked list of prefixes that can be built separately or "on top of" the trie representation of H. The new scheme does not restrict the number of errors in Y to be merely a small constant, as is done in most of the available methods. The main contribution is that our new approach can be used for generalized GLDs and not merely for 0/1 costs. It is also applicable when all possible correct candidates need to be known, and not just the best match. These constitute the cases when the "cutoffs" cannot be used in the DFS trie-based technique (Shang and Merrettal in IEEE Trans Knowl Data Eng 8(4):540---547, 1996). The new technique is compared with the DFS trie-based technique (Risvik in United Patent 6377945 B1, 23 April 2002; Shang and Merrettal in IEEE Trans Knowl Data Eng 8(4):540---547, 1996) using three large and small benchmark dictionaries with different errors. In each case, we demonstrate marked improvements with regard to the operations needed up to 21%, while at the same time maintaining the same accuracy. Additionally, some further improvements can be obtained by introducing the knowledge of the maximum number or percentage of errors in Y.

8 citations

Posted Content
TL;DR: A comprehensive bibliography for the online exact string matching problem is presented, containing a comprehensive list of (almost) all string matching algorithms proposed since 1970.
Abstract: In this short note we present a comprehensive bibliography for the online exact string matching problem The problem consists in finding all occurrences of a given pattern in a text It is an extensively studied problem in computer science, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, data compression, information retrieval, computational biology and chemistry Since 1970 more than 120 string matching algorithms have been proposed In this note we present a comprehensive list of (almost) all string matching algorithms The list is updated to May 2016

8 citations

Book
01 Jan 2010
TL;DR: Some Applications of String Algorithms in Human-Computer Interaction and Approximate String Matching with Reduced Alphabet are explored.
Abstract: String Rearrangement Metrics: A Survey.- Maximal Words in Sequence Comparisons Based on Subword Composition.- Fast Intersection Algorithms for Sorted Sequences.- Indexing and Searching a Mass Spectrometry Database.- Extended Compact Web Graph Representations.- A Parallel Algorithm for Fixed-Length Approximate String-Matching with k-mismatches.- Covering Analysis of the Greedy Algorithm for Partial Cover.- From Nondeterministic Suffix Automaton to Lazy Suffix Tree.- Clustering the Normalized Compression Distance for Influenza Virus Data.- An Evolutionary Model of DNA Substring Distribution.- Indexing a Dictionary for Subset Matching Queries.- Transposition and Time-Scale Invariant Geometric Music Retrieval.- Unified View of Backward Backtracking in Short Read Mapping.- Some Applications of String Algorithms in Human-Computer Interaction.- Approximate String Matching with Reduced Alphabet.- ICT4D: A Computer Science Perspective.- Searching for Linear Dependencies between Heart Magnetic Resonance Images and Lipid Profiles.- The Support Vector Tree.

8 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839