Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Amharic-English Information Retrieval with Pseudo Relevance Feedback

[...]

Atelach Alemu Argaw¹•Institutions (1)

Stockholm University¹

01 May 2008

TL;DR: Results indicate that longer queries tends to perform similar to short ones, PRF improves performance considerably, and that queries tend to fare better with WSD rather than using maximal expansion of terms by taking all the translations given in the MRD.

...read moreread less

Abstract: We describe cross language retrieval experiments using Amharic queries and English language d ocument collection. Two monolingual and eight bilingual runs were submitted with variations in terms of usage of long and short queries, presence of pseudo relevance feedback (PRF), and approaches for word sense disambiguation (WSD). We used an Amharic-English machine readable dictionary (MRD), and an online Amharic-English dictionary for lookup translation of query terms. Out of dictionary Amharic query terms were considered as possible named entities, and further filtering was attained through restricted fuzzy matching based on edit distance which is calculated against automatically extracted English proper names. The obtained results indicate that longer queries tend to perform similar to short ones, PRF improves performance considerably, and that queries tend to fare better with WSD rather than using maximal expansion of terms by taking all the translations given in the MRD.

...read moreread less

8 citations

Journal Article•DOI•

Approximate Multiple Pattern String Matching using Bit Parallelism: A Review

[...]

Syed DanishAli, Zuber Farooqui

30 Jul 2013-International Journal of Computer Applications

TL;DR: Bit parallelism enhances the processing speed of the approximate string matching algorithm as it takes the benefit of the internal bit operations taking place in parallel inside the system.

...read moreread less

Abstract: String matching is to find all the occurrences of a given pattern in a large text both being sequence of characters drawn from finite alphabet set. Approximate String Matching involves the detection of correct patterns along with the detection of some wrong patterns inside the text. Bit Parallelism is a feature that can be used to detect patterns inside the text and is reported to result in more efficient approximate string matching. Bit parallelism enhances the processing speed of the approximate string matching algorithm as it takes the benefit of the internal bit operations taking place in parallel inside the system. The bit parallel method has also been compared with the traditional Aho Corasick Algorithms which consumes more time and memory. In general bit parallel are both memory and time efficient.

...read moreread less

8 citations

Journal Article•DOI•

Breadth-first search strategies for trie-based syntactic pattern recognition

[...]

B. John Oommen¹, Ghada Badr¹•Institutions (1)

Carleton University¹

01 Feb 2007-Pattern Analysis and Applications

TL;DR: This paper shows how to optimize this dictionary-based syntactic pattern recognition of strings computation by incorporating breadth first search schemes on the underlying graph structure, and demonstrates marked improvements with regard to the operations needed up to 21%, while at the same time maintaining the same accuracy.

...read moreread less

Abstract: Dictionary-based syntactic pattern recognition of strings attempts to recognize a transmitted string X *, by processing its noisy version, Y, without sequentially comparing Y with every element X in the finite, (but possibly, large) dictionary, H. The best estimate X + of X *, is defined as that element of H which minimizes the generalized Levenshtein distance (GLD) D(X, Y) between X and Y, for all X ?H. The non-sequential PR computation of X + involves a compact trie-based representation of H. In this paper, we show how we can optimize this computation by incorporating breadth first search schemes on the underlying graph structure. This heuristic emerges from the trie-based dynamic programming recursive equations, which can be effectively implemented using a new data structure called the linked list of prefixes that can be built separately or "on top of" the trie representation of H. The new scheme does not restrict the number of errors in Y to be merely a small constant, as is done in most of the available methods. The main contribution is that our new approach can be used for generalized GLDs and not merely for 0/1 costs. It is also applicable when all possible correct candidates need to be known, and not just the best match. These constitute the cases when the "cutoffs" cannot be used in the DFS trie-based technique (Shang and Merrettal in IEEE Trans Knowl Data Eng 8(4):540---547, 1996). The new technique is compared with the DFS trie-based technique (Risvik in United Patent 6377945 B1, 23 April 2002; Shang and Merrettal in IEEE Trans Knowl Data Eng 8(4):540---547, 1996) using three large and small benchmark dictionaries with different errors. In each case, we demonstrate marked improvements with regard to the operations needed up to 21%, while at the same time maintaining the same accuracy. Additionally, some further improvements can be obtained by introducing the knowledge of the maximum number or percentage of errors in Y.

...read moreread less

8 citations

Posted Content•

Exact Online String Matching Bibliography.

[...]

Simone Faro

17 May 2016-arXiv: Data Structures and Algorithms

TL;DR: A comprehensive bibliography for the online exact string matching problem is presented, containing a comprehensive list of (almost) all string matching algorithms proposed since 1970.

...read moreread less

Abstract: In this short note we present a comprehensive bibliography for the online exact string matching problem The problem consists in finding all occurrences of a given pattern in a text It is an extensively studied problem in computer science, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, data compression, information retrieval, computational biology and chemistry Since 1970 more than 120 string matching algorithms have been proposed In this note we present a comprehensive list of (almost) all string matching algorithms The list is updated to May 2016

...read moreread less

8 citations

Book•

Algorithms and Applications: essays Dedicated to Esko Ukkonen on the Occasion of His 60th Birthday

[...]

Tapio Elomaa¹, Heikki Mannila², Pekka Orponen²•Institutions (2)

Tampere University of Technology¹, Aalto University²

01 Jan 2010

TL;DR: Some Applications of String Algorithms in Human-Computer Interaction and Approximate String Matching with Reduced Alphabet are explored.

...read moreread less

Abstract: String Rearrangement Metrics: A Survey.- Maximal Words in Sequence Comparisons Based on Subword Composition.- Fast Intersection Algorithms for Sorted Sequences.- Indexing and Searching a Mass Spectrometry Database.- Extended Compact Web Graph Representations.- A Parallel Algorithm for Fixed-Length Approximate String-Matching with k-mismatches.- Covering Analysis of the Greedy Algorithm for Partial Cover.- From Nondeterministic Suffix Automaton to Lazy Suffix Tree.- Clustering the Normalized Compression Distance for Influenza Virus Data.- An Evolutionary Model of DNA Substring Distribution.- Indexing a Dictionary for Subset Matching Queries.- Transposition and Time-Scale Invariant Geometric Music Retrieval.- Unified View of Backward Backtracking in Short Read Mapping.- Some Applications of String Algorithms in Human-Computer Interaction.- Approximate String Matching with Reduced Alphabet.- ICT4D: A Computer Science Perspective.- Searching for Linear Dependencies between Heart Magnetic Resonance Images and Lipid Profiles.- The Support Vector Tree.

...read moreread less

8 citations

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics