scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Proceedings ArticleDOI
20 Sep 1999
TL;DR: The proposed method converts a two-dimensional image into a one-dimensional string and computes the edit distance by the modified approximate string matching algorithm and presents the details of applications in handwriting analysis and both online and offline character recognition.
Abstract: Given two character images, we would like to measure their similarity or difference. Such a similarity or difference measure facilitates the solution to character recognition and handwriting analysis problems. There is, however, no universal definition for similarity measure satisfying a wide range of characteristics such as the slant, deformation or other invariant constraints. For this reason, we propose a new definition for the character similarity measure. First, the proposed method converts a two-dimensional image into a one-dimensional string. Next, it computes the edit distance by the modified approximate string matching algorithm. We describe how to extract the string information and compute the distance and then present the details of applications in handwriting analysis and both online and offline character recognition.

27 citations

Proceedings ArticleDOI
01 Mar 2010
TL;DR: This paper proposes an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages and generates an expanded set of equivalent strings for each entity.
Abstract: Recognizing the alternative ways people use to reference an entity, is important for many Web applications that query structured data. In such applications, there is often a mismatch between how content creators describe entities and how different users try to retrieve them. In this paper, we consider the problem of determining whether a candidate query approximately matches with an entity. We propose an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages. This way, given a set of strings that reference entities, we generate an expanded set of equivalent strings for each entity. The proposed method is verified with experiments on real-life data sets showing that we can dramatically increase the queries that can be matched.

27 citations

Book ChapterDOI
27 Jun 2011
TL;DR: A simple observation about the locations of critical factorizations is used to derive a real-time variation of the Crochemore-Perrin constant-space string matching algorithm that has a simple and efficient control structure.
Abstract: We use a simple observation about the locations of critical factorizations to derive a real-time variation of the Crochemore-Perrin constant-space string matching algorithm. The real-time variation has a simple and efficient control structure.

27 citations

Journal Article
TL;DR: In this paper, a reconfigurable systolic architecture is presented for the efficient treatment of several dynamic program-ming methods for resolving well-known problems, such as global and local sequence alignment, approximate string matching and longest com- mon subsequence.
Abstract: Reconfigurable systolic arrays can be adapted to effi- ciently resolve a wide spectrum of computational problems; parallelism is naturally explored in systolic arrays and reconfigurability allows for redefinition of the interconnections and operations even during run time (dynamically). We present a reconfigurable systolic architecture that can be applied for the efficient treatment of several dynamic program- ming methods for resolving well-known problems, such as global and local sequence alignment, approximate string matching and longest com- mon subsequence. The dynamicity of the reconfigurability was found to be useful for practical applications in the construction of sequence align- ments. A VHDL (VHSIC hardware description language) version of this new architecture was implemented on an APEX FPGA (Field pro- grammable gate array). It would be several magnitudes faster than the software algorithm alternatives.

27 citations

Patent
Jeremy S. De Bonet1
13 Jul 1999
TL;DR: In this paper, an approximate string matching scheme was proposed for lossless data compression employing an entropy-based compression technique, where the residual data represents the difference between each value of an earlier occurring block of source data, whose location and length is identified by a pointer, and an equal-sized block of the source data associated with the pointer.
Abstract: A system and process for lossless data compression employing a unique approximate string matching scheme. The encoder of the system characterizes source data as a set of pointers and associated blocks of residual data. Each pointer identifies a location earlier in the source data, as well as the number of source data values associated with the identified location. The residual data represents the difference between each value of an earlier occurring block of source data, whose location and length is identified by a pointer, and an equal-sized block of source data associated with the pointer. The choice of a block of earlier occurring source data for use in forming a residual data block is based on a cost analysis which is designed to minimize the entropy of the differences between the previous block and the new block of source data to a desired degree. The encoded data, which will exhibit a significantly lower entropy, can be compressed effectively using an entropy-based compression technique. The decoder portion of the system operates by initially decompressing the encoded data. Next, the first data value is decoded by adding the first residual to a predetermined constant. Once the first data value has been decoded, subsequent data values are decoded by first finding the block in the previously decoded data indicated by a pointer, and then adding each data value in the block to its corresponding data element in the residual data block associated with the pointer. The process is repeated until all the data is decoded.

27 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839