scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
01 Jan 1988
TL;DR: An algorithm that runs in <9(logmlogrt) time and uses mn processors on a CRCW PRAM, where m and n are the lengths of the strings and the largest common submatrix of two matrices is considered and shown to be NP-hard.
Abstract: We consider the problem of determining in parallel the cost of converting a source string to a destination string by a sequence of insert, delete and transform operations. Each operation has an integer cost in some fixed range. We present an algorithm that runs in <9(logmlogrt) time and uses mn processors on a CRCW PRAM, where m and n are the lengths of the strings. The best known sequential algorithm [MP83] runs in time 0(n/ log n) for strings of length n, indicating that our parallel algorithm (with time-processor product equal to 0(mn log m log n)) is nearly optimal. An instance of the edit distance problem is represented as a graph. The algorithm finds the shortest path in the graph using a path doubling method with efficient pruning due to the structure of the problem. Extensions of the algorithm solve approximate string matching and local best fit problems. The problem of finding the largest common submatrix of two matrices is considered and shown to be NP-hard. Finally we present an algorithm for exact two-dimensional pattern matching that runs in OClog n) time using n processors for a n x n search matrix.

20 citations

Proceedings ArticleDOI
17 Oct 2008
TL;DR: This paper has developed a methodology to detect those frequently appeared hot-spots in pre-OPC design, as well as post OPC designs to separate them from the rest of designs, which provide the opportunity to treat them differently in early OPC flow.
Abstract: Foundry companies encounter again and again the same or similar lithography unfriendly patterns (Hot-spots) in different designs within the same technology node and across different technology nodes, which eluded design rule check (DRC), but detected again and again in OPC verification step. Since Model-based OPC tool applies OPC on whole-chip design basis, individual hot-spot patterns are treated same as the rest of design patterns, regardless of its severity. We have developed a methodology to detect those frequently appeared hot-spots in pre-OPC design, as well as post OPC designs to separate them from the rest of designs, which provide the opportunity to treat them differently in early OPC flow. The methodology utilizes the combination of rule based and pattern based detection algorithms. Some hotspot patterns can be detected using rule-based algorithm, which offer the flexibility of detecting similar patterns within pre-defined ranges. However, not all patterns can be detected (or defined) by rules. Thus, a pattern-based approach is developed using defect pattern library concept. The GDS/OASIS format hot-spot patterns can be saved into a defect pattern library. Fast pattern matching algorithm is used to detect hot-spot patterns in a design using the library as a pattern template database. Even though the pattern matching approach lacks the flexibility to detect patterns’ similarity, but it has the capability to detect any patterns as long as a template exists. The pattern-matching algorithm can be either exact match or a fuzzy match. The rule based and pattern based hot-spot pattern detection algorithms complement each other and offer both speed and flexibility in hot spot pattern detection in pre-OPC and post-OPC designs. In this paper, we will demonstrate the methodology in our OPC flow and the benefits of such methodology application in production environment for 90nm designs. After the hot spot pattern detection, examples of special treatment to selected hot spot patterns will be shown.

20 citations

Book ChapterDOI
02 Mar 2015
TL;DR: A new algorithm for approximate circular string matching under the edit distance model with optimal average-case search time \(\mathcal {O}(n(k + \log m) /m)\).
Abstract: Approximate string matching is the problem of finding all factors of a text \(t\) of length \(n\) that are at a distance at most \(k\) from a pattern \(x\) of length \(m\). Approximate circular string matching is the problem of finding all factors of \(t\) that are at a distance at most \(k\) from \(x\) or from any of its rotations. In this article, we present a new algorithm for approximate circular string matching under the edit distance model with optimal average-case search time \(\mathcal {O}(n(k + \log m) /m)\). Optimal average-case search time can also be achieved by the algorithms for multiple approximate string matching (Fredriksson and Navarro, 2004) using \(x\) and its rotations as the set of multiple patterns. Here we reduce the preprocessing time and space requirements compared to that approach.

19 citations

Patent
Li Yiqiang, Ma Guoyao, Cai Jun, Sun Yongtao, Xiao Hua 
19 Nov 2014
TL;DR: In this paper, a mapping processing system and method for solving a problem of standard code control of medical data is presented, which consists of a resource word bank, a target value range bank, an accurate matching unit, a matching table and a labor management unit.
Abstract: The invention provides a mapping processing system and method for solving a problem of standard code control of medical data. The mapping processing system comprises a resource word bank, a target value range bank, a simulation semantic word segmentation unit, a fuzzy matching unit, an accurate matching unit, a matching table and a labor management unit. The mapping processing method includes steps of acquiring data from the medical data, mapping the acquired data in an accurate match manner according to standard codes stored in the target value range bank or mapping matching rule information stored in the matching table; analyzing fuzzy semantics of the data acquired from a data source; subjecting simulation semantic word segmentation results to fuzzy matching to generate a mapping matching result; mapping the medical data into the standard codes according to the mapping matching rule information in the matching table and generating a medical data mapping processing result. The mapping processing system and method build an automatic mapping matching process, and high accuracy and matching accuracy of the mapping matching results are achieved by combination of fuzzy matching, labor check, mechanical training and the like.

19 citations

Patent
23 Jun 2009
TL;DR: In this paper, the authors describe techniques for error-tolerant auto-completion, where characters of an input string are displayed as they are inputted by a user, and when a character is added to the input string by the user, matching strings may be selected from among a set of candidate strings by determining which of the candidate strings have a prefix whose characters match the characters of the input text within a given edit distance of input text.
Abstract: Techniques for error-tolerant autocompletion are described. While displaying characters of an input string as they are inputted by a user, when a character is added to the input string by the user, matching strings may be selected from among a set of candidate strings by determining which of the candidate strings have a prefix whose characters match the characters of the input string within a given edit distance of the input string.

19 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839