Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

EPMA: Efficient pattern matching algorithm for DNA sequences

[...]

Muhammad Tahir¹, Muhammad Sardaraz¹, Ataul Aziz Ikram•Institutions (1)

COMSATS Institute of Information Technology¹

01 Sep 2017-Expert Systems With Applications

TL;DR: A novel DNA sequences pattern matching algorithm called EPMA is presented, which utilizes fixed length 2-bits binary encoding, segmentation and multi-threading and can be extended to generalized string matching and their applications.

...read moreread less

Abstract: To solve, manage and analyze biological problems using computer technology is called bioinformatics. With the emergent evolution in computing era, the volume of biological data has increased significantly. These large amounts of data have increased the need to analyze it in reasonable space and time. DNA sequences contain basic information of species, and pattern matching between different species is an important and challenging issue to cope with. There exist generalized string matching and some specialized DNA pattern matching algorithms in the literature. There is still need to develop fast and space efficient pattern matching algorithms that consider new hardware development. In this paper, we present a novel DNA sequences pattern matching algorithm called EPMA. The proposed algorithm utilizes fixed length 2-bits binary encoding, segmentation and multi-threading. The idea is to find the pattern with multiple searcher agents concurrently. The proposed algorithm is validated with comparative experimental results. The results show that the new algorithm is a good candidate for DNA sequence pattern matching applications. The algorithm effectively utilizes modern hardware and will help researchers in the sequence alignment, short read error correction, phylogenetic inference etc. Furthermore, the proposed method can be extended to generalized string matching and their applications.

...read moreread less

21 citations

Journal Article•DOI•

A memory-efficient parallel string matching for intrusion detection systems

[...]

HyunJin Kim¹, Hyejeong Hong¹, Hong-Sik Kim¹, Sungho Kang¹•Institutions (1)

Yonsei University¹

15 Dec 2009-IEEE Communications Letters

TL;DR: An Aho-Corasick algorithm based parallel string matching that outperforms the existing bit-split string matching in the evaluations of Snort rules is proposed.

...read moreread less

Abstract: As the variety of hazardous packet payload contents increases, the intrusion detection system (IDS) should be able to detect numerous patterns in real time. For this reason, this paper proposes an Aho-Corasick algorithm based parallel string matching. In order to balance memory usage between homogeneous finite-state machine (FSM) tiles for each string matcher, an optimal set of bit position groups is determined. Target patterns are sorted by binary-reflected gray code (BRGC), which reduces bit transitions in patterns mapped onto a string matcher. In the evaluations of Snort rules, the proposed string matching outperforms the existing bit-split string matching.

...read moreread less

20 citations

Journal Article•DOI•

Perfect Hashing Based Parallel Algorithms for Multiple String Matching on Graphic Processing Units

[...]

Cheng-Hung Lin¹, Jin-Cheng Li², Chen-Hsiung Liu², Shih-Chieh Chang²•Institutions (2)

National Taiwan Normal University¹, National Tsing Hua University²

01 Sep 2017-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Two parallel string matching algorithms which adopt perfect hashing to compact a state transition table are proposed which reduce up to 99.5 percent memory requirements for storing the state Transition table compared to the traditional two-dimensional memory architecture.

...read moreread less

Abstract: Multiple string matching has a wide range of applications such as network intrusion detection systems, spam filters, information retrieval systems, and bioinformatics. To accelerate multiple string matching, many hardware approaches are proposed to accelerate string matching. Among the hardware approaches, memory architectures have been widely adopted because of their flexibility and scalability. A conventional memory architecture compiles multiple string patterns into a state machine and performs string matching by traversing the corresponding state transition table. Due to the ever-increasing number of attack patterns, the memory used for storing the state transition table increased tremendously. Therefore, memory reduction has become a crucial issue in optimizing memory architectures. In this paper, we propose two parallel string matching algorithms which adopt perfect hashing to compact a state transition table. Different from most state-of-the-art approaches implemented on specific hardware such as TCAM, FPGA, or ASIC, our proposed approaches are easily implemented on commodity DRAM and extremely suitable to be implemented on GPUs. The proposed algorithms reduce up to 99.5 percent memory requirements for storing the state transition table compared to the traditional two-dimensional memory architecture. By studying existing approaches, our results obtain significant improvements in memory efficiency.

...read moreread less

20 citations

Proceedings Article•DOI•

Effective text extraction and recognition for WWW images

[...]

Jun Sun¹, Zhulong Wang¹, Hao Yu¹, Fumihito Nishino¹, Yukata Katsuyama¹, Satoshi Naoi¹ - Show less +2 more•Institutions (1)

Fujitsu¹

20 Nov 2003

TL;DR: A novel stroke verification algorithm is used to effectively remove non-character strokes and build the binary text line image, which is segmented and recognized by dynamic programming.

...read moreread less

Abstract: Images play a very important role in web content delivery. Many WWW images contain text information that can be used for web indexing and searching. A new text extraction and recognition algorithm is proposed in this paper. The character strokes in the image are first extracted by color clustering and connected component analysis. A novel stroke verification algorithm is used to effectively remove non-character strokes. The verified strokes are then used to build the binary text line image, which is segmented and recognized by dynamic programming. Since text in WWW image usually has close relationship with webpage content, approximate string matching is used to revise the recognition result by matching the content in the webpage with the content in the image. This effective post-processing not only improves the recognition performance, but also can be used in other applications such like image - webpage paragraph corresponding.

...read moreread less

20 citations

Journal Article•DOI•

A string pattern regression algorithm and its application to pattern discovery in long introns.

[...]

Hideo Bannai¹, Shunsuke Inenaga¹, Ayumi Shinohara¹, Masayuki Takeda¹, Satoru Miyano¹ - Show less +1 more•Institutions (1)

University of Tokyo¹

01 Jan 2002-Genome Informatics

TL;DR: A new approach to pattern discovery called string pattern regression is presented, where a data set is given that consists of a string attribute and an objective numerical attribute, and an exact but efficient branch-and-bound algorithm is presented which is applicable to various pattern classes.

...read moreread less

Abstract: We present a new approach to pattern discovery called string pattern regression, where we are given a data set that consists of a string attribute and an objective numerical attribute. The problem is to find the best string pattern that divides the data set in such a way that the distribution of the numerical attribute values of the set for which the pattern matches the string attribute, is most distinct, with respect to some appropriate measure, from the distribution of the numerical attribute values of the set for which the pattern does not match the string attribute. By solving this problem, we are able to discover, at the same time, a subset of the data whose objective numerical attributes are significantly different from rest of the data, as well as the splitting rule in the form of a string pattern that is conserved in the subset. Although the problem can be solved in linear time for the substring pattern class, the problem is NP-hard in the general case (i.e. more complex patterns), and we present an exact but efficient branch-and-bound algorithm which is applicable to various pattern classes. We apply our algorithm to intron sequences of human, mouse, fly, and zebrafish, and show the practicality of our approach and algorithm. We also discuss possible extensions of our algorithm, as well as promising applications, such as microarray gene expression data.

...read moreread less

20 citations

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics