scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Journal ArticleDOI
TL;DR: A stacked ensemble approach combined with fuzzy matching for biomedical named entity recognition of disease names and fuzzy string matching to tag rare disease names from the authors' in-house disease dictionary is implemented.

46 citations

Book ChapterDOI
21 Jun 2000
TL;DR: The algorithm can be adapted to run in O(k2n+min(mkn,m2(mσ)k) + R) average time, where σ is the alphabet size, and results show a speedup over the basic approach for moderate m and small k.
Abstract: We present a solution to the problem of performing approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family, specifically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k insertions, deletions and substitutions, in O(mkn+R) time. The existence problem needs O(mkn) time. We also show that the algorithm can be adapted to run in O(k2n+min(mkn,m2(mσ)k) + R) average time, where σ is the alphabet size. The experimental results show a speedup over the basic approach for moderate m and small k.

46 citations

Journal ArticleDOI
Gene Myers1
TL;DR: An O(PN2(N + log P)) algorithm for approximately matching a string of length N and a context-free language specified by a grammar of size P is given, which generalizes the Cocke-Younger-Kasami algorithm for determining membership in a context -free language.

45 citations

Patent
29 Jun 2007
TL;DR: Fuzzy matching fields as discussed by the authors can be used to replace the input field, thereby correcting the input information, which can also be used for the verification of a digital representation of an input digital representation.
Abstract: Systems, methods, and software determine whether a field of an input digital representation of information, such as the street name field in an address, is correct by quickly comparing the field to a list of valid choices for that field. The list of valid choices is generated based on information from the input digital representation, such as a character string. If an exact match is not found, a fuzzy match comparison determines the most closely matching valid choice. If a suitable fuzzy match is not found, then the input information is invalid. Otherwise, another field of the input information, such as the building number field of an address, is tested for validity. If the second field passes the validity check, then the fuzzy match (or exact match) for the field is valid. A fuzzy matching field may replace the input field, thereby correcting the input information.

45 citations

Patent
26 Jul 2001
TL;DR: In this article, a pattern is partitioned into context and value components, and candidate matches for each of the components is identified by calculating an edit distance between that component and each potentially matching set (sub-string) of symbols within the string.
Abstract: A system and method for examining a string of symbols and identifying portions of the string which match a predetermined pattern using adaptively weighted, partitioned context edit distances. A pattern is partitioned into context and value components, and candidate matches for each of the components is identified by calculating an edit distance between that component and each potentially matching set (sub-string) of symbols within the string. One or more candidate matches having the lowest edit distances are selected as matches for the pattern. The weighting of each of the component matches may be adapted to optimize the pattern matching and, in one embodiment, the context components may be heavily weighted to obtain matches of a value for which the corresponding pattern is not well defined. In one embodiment, an edit distance matrix is evaluated for each of a prefix component, a value component and a suffix component of a pattern. The evaluation of the prefix matrix provides a basis for identifying indicators of the beginning of a value window, while the evaluation of the suffix matrix provides a basis for identifying the alignment of the end of the value window. The value within the value window can then be evaluated via the value matrix to determine a corresponding value match score.

45 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839