scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes a novel probabilistic correlation-based similarity measure that enriches the information of records by considering correlations of tokens, and compute weights of tokens and discover correlations of records based on the Probabilistic correlations of Tokens.

18 citations

Book ChapterDOI
11 Jul 1989
TL;DR: A new algorithm for finding all occurrences of the pattern string in the text string with at most k differences is presented and both its theoretical and practical variants improve the known algorithms.
Abstract: Given a text string, a pattern string, and an integer k, a new algorithm for finding all occurrences of the pattern string in the text string with at most k differences is presented. Both its theoretical and practical variants improve the known algorithms.

18 citations

Patent
07 May 2004
TL;DR: In this paper, the authors describe a method for matching patterns of symbols within computer systems by composing a pattern matching expression; and embedding a function using storage means within the expression to form a character matching string.
Abstract: Methods, apparati, and computer-readable media for matching patterns of symbols within computer systems. A method embodiment of the present invention comprises composing ( 11 ) a pattern matching expression; and embedding ( 12 ) a function using storage means within the expression to form a character matching string. The expression may be a regular expression. The character matching string is compared ( 13 ) against a target string. The target string may be one that is suspected to contain malicious computer code.

18 citations

Journal ArticleDOI
TL;DR: The theoretical results are validated by an empirical study with real-world data that shows the proposed optimal O ( n ) time and space algorithm that can find an SUS for every location of a string of size n is at least 8 times faster and uses at least 20 times less memory.

18 citations

01 Jan 2003
TL;DR: This thesis focuses on unit-cost edit distance that defines the distance beween two strings as the minimum number of edit operations that are needed in transforming one of the strings into the other.
Abstract: Given a pattern string and a text, the task of approximate string matching is to find all locations in the text that are similar to the pattern. This type of search may be done for example in applications of spelling error correction or bioinformatics. Typically edit distance is used as the measure of similarity (or distance) between two strings. In this thesis we concentrate on unit-cost edit distance that defines the distance beween two strings as the minimum number of edit operations that are needed in transforming one of the strings into the other. More specifically, we discuss the Levenshtein and the Damerau edit distances. Aproximate string matching algorithms can be divided into off-line and on-line algorithms depending on whether they may or may not, respectively, preprocess the text. In this thesis we propose practical algorithms for both types of approximate string matching as well as for computing edit distance. Our main contributions are a new variant of the bit-parallel approximate string matching algorithm of Myers, a method that makes it easy to modify many existing Levenshtein edit distance algorithms into using the Damerau edit distance, a bit-parallel algorithm for computing edit distance, a more error tolerant version of the ABNDM algorithm, a two-phase filtering scheme, a tuned indexed approximate string matching method for genome searching, and an improved and extended version of the hybrid index of Navarro and Baeza-Yates. To evaluate their practicality, we compare most of the proposed methods with previously existing algorithms. The test results support the claim of the title of this thesis that our proposed algorithms work well in practice.

17 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839