scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Posted Content
TL;DR: In this article, the authors consider the problem of reconstructing a string from the multiset of its substring compositions and derive lower and upper bounds on the largest number of strings with given substring composition.
Abstract: Motivated by mass-spectrometry protein sequencing, we consider a simply-stated problem of reconstructing a string from the multiset of its substring compositions. We show that all strings of length 7, one less than a prime, or one less than twice a prime, can be reconstructed uniquely up to reversal. For all other lengths we show that reconstruction is not always possible and provide sometimes-tight bounds on the largest number of strings with given substring compositions. The lower bounds are derived by combinatorial arguments and the upper bounds by algebraic considerations that precisely characterize the set of strings with the same substring compositions in terms of the factorization of bivariate polynomials. The problem can be viewed as a combinatorial simplification of the turnpike problem, and its solution may shed light on this long-standing problem as well. Using well known results on transience of multi-dimensional random walks, we also provide a reconstruction algorithm that reconstructs random strings over alphabets of size $\ge4$ in optimal near-quadratic time.

8 citations

Patent
10 Mar 2010
TL;DR: In this paper, a method and a system for counting machine translation based on phrases is presented, which comprises a step of performing fuzzy match for the phrases input into a sentence in a presetphrase list.
Abstract: The invention provides a method and a system for counting machine translation based on phrases The method comprises a step of performing fuzzy match for the phrases input into a sentence in a presetphrase list By performing the fuzzy match for the phrases, the method and the system can generate high-quality translation for longer phrases input into the sentence, and can effectively improve thequality of the translation compared with a machine translation system for precise matching based on the phrases

8 citations

Journal ArticleDOI
TL;DR: It is shown that the problem can be approximated in linear time for general patterns, and efficient exact solutions for different variants of the problem are provided, as well as a faster approximation.

8 citations

Proceedings ArticleDOI
Kensuke Baba1
28 Jun 2017
TL;DR: A plagiarism detection algorithm based on approximate string matching to be specified in “copy and paste”-type plagiarisms, and a speed improvement to an implementation of the algorithm are proposed.
Abstract: Plagiarism detection in a large number of documents requires efficient methods. This paper proposes a plagiarism detection algorithm based on approximate string matching to be specified in “copy and paste”-type plagiarisms, and a speed improvement to an implementation of the algorithm. Most of the computations required in the algorithm are omitted by two kinds of approximations of the output used for plagiarism detection, while the decrease of accuracy caused by the approximations is acceptable. The effect of the improvement on the processing time and accuracy of the algorithm is evaluated by conducting experiments with a data set. The experimental results show that the improvement can reduce the processing time to approximately one-twentieth for a 6.4% decrease of the accuracy from those for the normal implementation of the algorithm.

8 citations

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This work extends existing filtering-based subgraph matching algorithms and proposes a new set of filters leveraging the monotone function properties in the multiplex setting that enables effective pruning of irrelevant subgraph regions and expedites the overall matching process.
Abstract: We study the problem of detecting matching subgraphs in a large multiplex background network based on predefined subgraph templates. Our approach extends existing filtering-based subgraph matching algorithms and proposes a new set of filters leveraging the monotone function properties in the multiplex setting. This enables effective pruning of irrelevant subgraph regions and expedites the overall matching process. In addition, our approach proposes a new strategy based on maximum likelihood estimate to identify “closely matched” subgraphs that are not isomorphic to the given templates from a noisy background network. This allows us to generalize this approach to real-world networks, which are often noisy, incomplete and ambiguous. We demonstrate the effectiveness of the proposed method on a real-world multiplex network provided by the DARPA Modeling Adversarial Activity (MAA) program. Our approach obtains highly accurate subgraph matching results for both the clean and noisy versions of the network, which significantly outperforms the baseline filtering methods. Furthermore, our proposed approach is parallelizable such that it can scale up to handle large input networks.

8 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839