scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Proceedings ArticleDOI
01 Nov 1986
TL;DR: Given a text of length n, a pattern of length m and an integer k, this paper presents parallel and serial algorithms for finding all occurrences of the pattern in the text with at most k differences.
Abstract: Consider the stnng matching problem, where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern or a superfluous character in the text or a superfluous character in the pattern. Given a text of length n, a pattern of length m and an integer k, we present parallel and serial algorithms for finding all occurrences of the pattern in the text with at most k differences. The first part of the parallel algorithm consists of analysis of the pattern and takes 0 (log m ) time using m 2 processors. The rest of the algorithm consists of handling the text. The text han1. The research of this author was supported by NSF grants NSF-DCR-8318874 and NSF-DCR-8413359 and ONR grant

136 citations

Proceedings ArticleDOI
01 Aug 1999
TL;DR: A content-based retrieval model for tackling the mismatch problems specific to music data and a distinct function that extracts key melodies for query suggestion is developed, which improves performance over direct search of the music database.
Abstract: A content-based retrieval model for tackling the mismatch problems specific to music data is proposed and implemented. The system uses a pitch profile encoding for queries in any key and an n-note indexing method for approximate matching in sub-linear time. A distinct function that extracts key melodies for query suggestion is developed. The Web-based system provides flexible user interface for query formulation and result browsing. Users can search the system by a short sequence of notes, by uploading a file created by singing, or by clicking suggested key melodies without input. Experiments show that the pitch profile encoding and a 3-note indexing are able to overcome the key mismatch problem and the random errors caused by pitch error, note deletion and insertion. The use of extracted key melodies improves performance over direct search of the music database. For the type of burst mismatch, a query expansion approach is applied.

135 citations

01 Jan 1985
TL;DR: In this paper, a new structural approach to shape recognition using attributed string matching with merging is proposed, where each attributed string is an ordered sequence of shape boundary primi- tives, each representing abasic boundary structural unit, line segment, with twootypes of numerical attributes, length and direction.
Abstract: A newstructural approach toshape recognition using attrib- utedstring matching withmerging isproposed. After illustrating the disadvantages ofconventional symbolic string matching using changes, deletions, andinsertions, attributed strings aresuggested formatching. Eachattributed string isanordered sequence ofshape boundary primi- tives, eachrepresenting abasic boundary structural unit, line segment, withtwotypesofnumerical attributes, length anddirection. A new typeofprimitive edit operation, called merge, isthenintroduced, which canbeusedtocombine andthenmatchanynumberofconsecutive boundary primitives inoneshapewiththose inanother. Theresulting attributed string matching withmerging approach isshownuseful for recognizing distorted shapes. Experimental results prove thefeasibility oftheproposed approach forgeneral shape recognition. Somepossible extensions oftheapproach arealso included. IndexTerms-Attributed strings, boundaryprimitives, combined primitives, editdistances, matching withmerging, shaperecognition, string edit operations, string matching.

134 citations

Journal ArticleDOI
TL;DR: Nrgrep is a new pattern‐matching tool designed for efficient search of complex patterns based on a single and uniform concept: the bit‐parallel simulation of a non‐deterministic suffix automaton that can find from simple patterns to regular expressions, exactly or allowing errors in the matches.
Abstract: We present nrgrep (‘non-deterministic reverse grep’), a new pattern-matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bit-parallel simulation of a non-deterministic suffix automaton. As a result, nrgrep can find from simple patterns to regular expressions, exactly or allowing errors in the matches, with an efficiency that degrades smoothly as the complexity of the searched pattern increases. Another concept that is fully integrated into nrgrep and that contributes to this smoothness is the selection of adequate subpatterns for fast scanning, which is also absent in many current tools. We show that the efficiency of nrgrep is similar to that of the fastest existing string-matching tools for the simplest patterns, and is by far unmatched for more complex patterns. Copyright © 2001 John Wiley & Sons, Ltd.

134 citations

Journal ArticleDOI
TL;DR: A class of algorithms is presented for very rapid on-line detection of occurrences of a fixed set of pattern arrays as embedded subarrays in an input array by reducing the array problem to a string matching problem in a natural way and it is shown that efficient string matching algorithms may be applied to arrays.
Abstract: A class of algorithms is presented for very rapid on-line detection of occurrences of a fixed set of pattern arrays as embedded subarrays in an input array. By reducing the array problem to a string matching problem in a natural way, it is shown that efficient string matching algorithms may be applied to arrays. This is illustrated by use of the string-matching algorithm of Knuth, Morris and Pratt [7]. Depending on the data structure used for the preprocessed pattern graph, this algorithm may be made to run “real-time” or merely in linear time. Extensions can be made to nonrectangular arrays, multiple arrays of dissimilar sizes, and arrays of more than two dimensions. Possible applications are foreseen to problems such as detection of edges in digital pictures and detection of local conditions in board games.

134 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839