scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Journal ArticleDOI
TL;DR: A special case where deletion is the only allowed edition operation is shown to have the longest common subsequence of the strings as its solution.
Abstract: The string merging problem is to determine a merged string from a given set of strings. The distinguishing property of a solution is that the total cost of editing all of the given strings into this solution is minimal. Necessary and sufficient conditions are presented for the case where this solution matches the solution to the string-to-string correction problem. A special case where deletion is the only allowed edition operation is shown to have the longest common subsequence of the strings as its solution.

36 citations

Proceedings ArticleDOI
25 Oct 2008
TL;DR: A discriminative approach for generating candidate strings that uses substring substitution rules as features and scores them using an L1-regularized logistic regression model and demonstrates the remarkable performance of the proposed method in normalizing inflected words and spelling variations.
Abstract: String transformation, which maps a source string s into its desirable form t*, is related to various applications including stemming, lemmatization, and spelling correction. The essential and important step for string transformation is to generate candidates to which the given string s is likely to be transformed. This paper presents a discriminative approach for generating candidate strings. We use substring substitution rules as features and score them using an L1-regularized logistic regression model. We also propose a procedure to generate negative instances that affect the decision boundary of the model. The advantage of this approach is that candidate strings can be enumerated by an efficient algorithm because the processes of string transformation are tractable in the model. We demonstrate the remarkable performance of the proposed method in normalizing inflected words and spelling variations.

35 citations

Proceedings ArticleDOI
03 Jul 2014
TL;DR: Experiments show that the new system along with novel algorithms, comparing with two representative mathematics retrieval systems, provides more efficient mathematical formula index and retrieval, while simplifying user query input for PDF documents.
Abstract: The semantics of mathematical formulae depend on their spatial structure, and they usually exist in layout presentations such as PDF, LaTeX, and Presentation MathML, which challenges previous text index and retrieval methods. This paper proposes an innovative mathematics retrieval system along with the novel algorithms, which enables efficient formula index and retrieval from both webpages and PDF documents. Unlike prior studies, which require users to manually input formula markup language as query, the new system enables users to "copy" formula queries directly from PDF documents. Furthermore, by using a novel indexing and matching model, the system is aimed at searching for similar mathematical formulae based on both textual and spatial similarities. A hierarchical generalization technique is proposed to generate sub-trees from the semi-operator tree of formulae and support substructure match and fuzzy match. Experiments based on massive Wikipedia and CiteSeer repositories show that the new system along with novel algorithms, comparing with two representative mathematics retrieval systems, provides more efficient mathematical formula index and retrieval, while simplifying user query input for PDF documents.

35 citations

Book ChapterDOI
Tadao Takaoka1
25 Aug 1994
TL;DR: A more general analysis of expected time with the simplified algorithm for the one-dimensional case under a non-uniform probability distribution, and it is shown that the method can easily be generalized to the two-dimensional approximate pattern matching problem with sublinear expected time.
Abstract: We simplify in this paper the algorithm by Chang and Lawler for the approximate string matching problem, by adopting the concept of sampling. We have a more general analysis of expected time with the simplified algorithm for the one-dimensional case under a non-uniform probability distribution, and we show that our method can easily be generalized to the two-dimensional approximate pattern matching problem with sublinear expected time.

35 citations

01 Jan 1999
TL;DR: In this paper, the authors introduce two new notions of approximate matching with application in computer assisted music analysis, and present algorithms for each notion of approximation: for approximate string matching and for computing approximate squares.
Abstract: Here we introduce two new notions of approximate matching with application in computer assisted music analysis. We present algorithms for each notion of approximation: for approximate string matching and for computing approximate squares.

35 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839