Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Approximate stroke sequence string matching algorithm for character recognition and analysis

[...]

Sung-Hyuk Cha¹, Yong-Chul Shin², Sargur N. Srihari²•Institutions (2)

State University of New York System¹, University at Buffalo²

20 Sep 1999

TL;DR: The proposed method converts a two-dimensional image into a one-dimensional string and computes the edit distance by the modified approximate string matching algorithm and presents the details of applications in handwriting analysis and both online and offline character recognition.

...read moreread less

Abstract: Given two character images, we would like to measure their similarity or difference. Such a similarity or difference measure facilitates the solution to character recognition and handwriting analysis problems. There is, however, no universal definition for similarity measure satisfying a wide range of characteristics such as the slant, deformation or other invariant constraints. For this reason, we propose a new definition for the character similarity measure. First, the proposed method converts a two-dimensional image into a one-dimensional string. Next, it computes the edit distance by the modified approximate string matching algorithm. We describe how to extract the string information and compute the distance and then present the details of applications in handwriting analysis and both online and offline character recognition.

...read moreread less

27 citations

Proceedings Article•DOI•

Fuzzy matching of Web queries to structured data

[...]

Tao Cheng¹, Hady W. Lauw², Stelios Paparizos²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Microsoft²

01 Mar 2010

TL;DR: This paper proposes an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages and generates an expanded set of equivalent strings for each entity.

...read moreread less

Abstract: Recognizing the alternative ways people use to reference an entity, is important for many Web applications that query structured data. In such applications, there is often a mismatch between how content creators describe entities and how different users try to retrieve them. In this paper, we consider the problem of determining whether a candidate query approximately matches with an entity. We propose an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages. This way, given a set of strings that reference entities, we generate an expanded set of equivalent strings for each entity. The proposed method is verified with experiments on real-life data sets showing that we can dramatically increase the queries that can be matched.

...read moreread less

27 citations

Book Chapter•DOI•

Simple real-time constant-space string matching

[...]

Dany Breslauer¹, Roberto Grossi², Filippo Mignosi•Institutions (2)

University of Haifa¹, University of Pisa²

27 Jun 2011

TL;DR: A simple observation about the locations of critical factorizations is used to derive a real-time variation of the Crochemore-Perrin constant-space string matching algorithm that has a simple and efficient control structure.

...read moreread less

Abstract: We use a simple observation about the locations of critical factorizations to derive a real-time variation of the Crochemore-Perrin constant-space string matching algorithm. The real-time variation has a simple and efficient control structure.

...read moreread less

27 citations

Journal Article•

Reconfigurable systems for sequence alignment and for general dynamic programming

[...]

Ricardo P. Jacobi¹, Mauricio Ayala-Rincón, Luis G. A. Carvalho, Carlos H. Llanos, Reiner W. Hartenstein - Show less +1 more•Institutions (1)

University of Brasília¹

30 Sep 2005-Genetics and Molecular Research

TL;DR: In this paper, a reconfigurable systolic architecture is presented for the efficient treatment of several dynamic program-ming methods for resolving well-known problems, such as global and local sequence alignment, approximate string matching and longest com- mon subsequence.

...read moreread less

Abstract: Reconfigurable systolic arrays can be adapted to effi- ciently resolve a wide spectrum of computational problems; parallelism is naturally explored in systolic arrays and reconfigurability allows for redefinition of the interconnections and operations even during run time (dynamically). We present a reconfigurable systolic architecture that can be applied for the efficient treatment of several dynamic program- ming methods for resolving well-known problems, such as global and local sequence alignment, approximate string matching and longest com- mon subsequence. The dynamicity of the reconfigurability was found to be useful for practical applications in the construction of sequence align- ments. A VHDL (VHSIC hardware description language) version of this new architecture was implemented on an APEX FPGA (Field pro- grammable gate array). It would be several magnitudes faster than the software algorithm alternatives.

...read moreread less

27 citations

Patent•

Approximate string matching system and process for lossless data compression

[...]

Jeremy S. De Bonet¹•Institutions (1)

Microsoft¹

13 Jul 1999

TL;DR: In this paper, an approximate string matching scheme was proposed for lossless data compression employing an entropy-based compression technique, where the residual data represents the difference between each value of an earlier occurring block of source data, whose location and length is identified by a pointer, and an equal-sized block of the source data associated with the pointer.

...read moreread less

Abstract: A system and process for lossless data compression employing a unique approximate string matching scheme. The encoder of the system characterizes source data as a set of pointers and associated blocks of residual data. Each pointer identifies a location earlier in the source data, as well as the number of source data values associated with the identified location. The residual data represents the difference between each value of an earlier occurring block of source data, whose location and length is identified by a pointer, and an equal-sized block of source data associated with the pointer. The choice of a block of earlier occurring source data for use in forming a residual data block is based on a cost analysis which is designed to minimize the entropy of the differences between the previous block and the new block of source data to a desired degree. The encoded data, which will exhibit a significantly lower entropy, can be compressed effectively using an entropy-based compression technique. The decoder portion of the system operates by initially decompressing the encoded data. Next, the first data value is decoded by adding the first residual to a predetermined constant. Once the first data value has been decoded, subsequent data values are decoded by first finding the block in the previously decoded data indicated by a pointer, and then adding each data value in the block to its corresponding data element in the residual data block associated with the pointer. The process is repeated until all the data is decoded.

...read moreread less

27 citations

Collapse

Network Information

Performance

Metrics

1,942

Papers

64,998

Citations

No. of papers in the topic in previous years
Year	Papers
2023	8
2022	30
2021	32
2020	30
2019	48
2018	39

Approximate string matching

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics