Topic
Approximate string matching
About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.
Papers published on a yearly basis
Papers
More filters
••
20 Sep 1999TL;DR: The proposed method converts a two-dimensional image into a one-dimensional string and computes the edit distance by the modified approximate string matching algorithm and presents the details of applications in handwriting analysis and both online and offline character recognition.
Abstract: Given two character images, we would like to measure their similarity or difference. Such a similarity or difference measure facilitates the solution to character recognition and handwriting analysis problems. There is, however, no universal definition for similarity measure satisfying a wide range of characteristics such as the slant, deformation or other invariant constraints. For this reason, we propose a new definition for the character similarity measure. First, the proposed method converts a two-dimensional image into a one-dimensional string. Next, it computes the edit distance by the modified approximate string matching algorithm. We describe how to extract the string information and compute the distance and then present the details of applications in handwriting analysis and both online and offline character recognition.
27 citations
••
01 Mar 2010TL;DR: This paper proposes an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages and generates an expanded set of equivalent strings for each entity.
Abstract: Recognizing the alternative ways people use to reference an entity, is important for many Web applications that query structured data. In such applications, there is often a mismatch between how content creators describe entities and how different users try to retrieve them. In this paper, we consider the problem of determining whether a candidate query approximately matches with an entity. We propose an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages. This way, given a set of strings that reference entities, we generate an expanded set of equivalent strings for each entity. The proposed method is verified with experiments on real-life data sets showing that we can dramatically increase the queries that can be matched.
27 citations
••
27 Jun 2011TL;DR: A simple observation about the locations of critical factorizations is used to derive a real-time variation of the Crochemore-Perrin constant-space string matching algorithm that has a simple and efficient control structure.
Abstract: We use a simple observation about the locations of critical factorizations to derive a real-time variation of the Crochemore-Perrin constant-space string matching algorithm. The real-time variation has a simple and efficient control structure.
27 citations
•
TL;DR: In this paper, a reconfigurable systolic architecture is presented for the efficient treatment of several dynamic program-ming methods for resolving well-known problems, such as global and local sequence alignment, approximate string matching and longest com- mon subsequence.
Abstract: Reconfigurable systolic arrays can be adapted to effi- ciently resolve a wide spectrum of computational problems; parallelism is naturally explored in systolic arrays and reconfigurability allows for redefinition of the interconnections and operations even during run time (dynamically). We present a reconfigurable systolic architecture that can be applied for the efficient treatment of several dynamic program- ming methods for resolving well-known problems, such as global and local sequence alignment, approximate string matching and longest com- mon subsequence. The dynamicity of the reconfigurability was found to be useful for practical applications in the construction of sequence align- ments. A VHDL (VHSIC hardware description language) version of this new architecture was implemented on an APEX FPGA (Field pro- grammable gate array). It would be several magnitudes faster than the software algorithm alternatives.
27 citations
•
13 Jul 1999TL;DR: In this paper, an approximate string matching scheme was proposed for lossless data compression employing an entropy-based compression technique, where the residual data represents the difference between each value of an earlier occurring block of source data, whose location and length is identified by a pointer, and an equal-sized block of the source data associated with the pointer.
Abstract: A system and process for lossless data compression employing a unique approximate string matching scheme. The encoder of the system characterizes source data as a set of pointers and associated blocks of residual data. Each pointer identifies a location earlier in the source data, as well as the number of source data values associated with the identified location. The residual data represents the difference between each value of an earlier occurring block of source data, whose location and length is identified by a pointer, and an equal-sized block of source data associated with the pointer. The choice of a block of earlier occurring source data for use in forming a residual data block is based on a cost analysis which is designed to minimize the entropy of the differences between the previous block and the new block of source data to a desired degree. The encoded data, which will exhibit a significantly lower entropy, can be compressed effectively using an entropy-based compression technique. The decoder portion of the system operates by initially decompressing the encoded data. Next, the first data value is decoded by adding the first residual to a predetermined constant. Once the first data value has been decoded, subsequent data values are decoded by first finding the block in the previously decoded data indicated by a pointer, and then adding each data value in the block to its corresponding data element in the residual data block associated with the pointer. The process is repeated until all the data is decoded.
27 citations