scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Fast Pattern Matching in Strings

01 Jun 1977-SIAM Journal on Computing (Society for Industrial and Applied Mathematics)-Vol. 6, Iss: 2, pp 323-350
TL;DR: An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time.
Abstract: An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings. The constant of proportionality is low enough to make this algorithm of practical use, and the procedure can also be extended to deal with some more general pattern-matching problems. A theoretical application of the algorithm shows that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time. Other algorithms which run even faster on the average are also considered.
Citations
More filters
Book
01 Jan 2009
TL;DR: This text can be used as the basis for an advanced undergraduate or a graduate course on the subject, or for self-study, and is certain to become the definitive reference on the topic.
Abstract: Analytic Combinatorics is a self-contained treatment of the mathematics underlying the analysis of discrete structures, which has emerged over the past several decades as an essential tool in the understanding of properties of computer programs and scientific models with applications in physics, biology and chemistry. Thorough treatment of a large number of classical applications is an essential aspect of the presentation. Written by the leaders in the field of analytic combinatorics, this text is certain to become the definitive reference on the topic. The text is complemented with exercises, examples, appendices and notes to aid understanding therefore, it can be used as the basis for an advanced undergraduate or a graduate course on the subject, or for self-study.

3,616 citations


Cites background from "Fast Pattern Matching in Strings"

  • ...(The corresponding automaton is in fact known as a Knuth–Morris–Pratt automaton [382]....

    [...]

Journal ArticleDOI
TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
Abstract: This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

3,270 citations


Cites background or methods from "Fast Pattern Matching in Strings"

  • ...Hopcroft and Karp (unpublished) have suggested a scheme similar to Algorithm 1 for finding the first occurrence of any of a finite set of keywords in a text string [13]....

    [...]

  • ...Our approach combines the ideas in the Knuth-Morris-Prat t algorithm [13] with those of finite state machines....

    [...]

  • ...[13] shows that, if there is only one keyword in K, O(logd) is the maximum number of failure transitions which can be made in one operating cycle....

    [...]

  • ...To avoid making unnecessary failure transitions we can use f ' , a generalization of the n e x t function from [13], in place of f in Algorithm 1....

    [...]

  • ...Algorithm 1 is patterned after the Knuth-Morris-Prat t algorithm for finding one keyword in a text string [13] and can be viewed as an extension of the "tr ie" search discussed in [11]....

    [...]

Journal ArticleDOI
TL;DR: This work surveys the current techniques to cope with the problem of string matching that allows errors, and focuses on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms.
Abstract: We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices. We conclude with some directions for future work and open problems.

2,723 citations


Cites background from "Fast Pattern Matching in Strings"

  • ...Good references about the relation of approximate string matching and information retrieval are Wagner and Fisher [1974], Lowrance and Wagner [1975], Nesbit [1986], Owolabi and McGregor [1988], Kukich [1992], Zobel and Dart [1996], French et al....

    [...]

Journal ArticleDOI
TL;DR: The algorithm has the unusual property that, in most cases, not all of the first i.” in another string, are inspected.
Abstract: An algorithm is presented that searches for the location, “il” of the first occurrence of a character string, “pat,” in another string, “string.” During the search operation, the characters of pat are matched starting with the last character of pat. The information gained by starting the match at the end of the pattern often allows the algorithm to proceed in large jumps through the text being searched. Thus the algorithm has the unusual property that, in most cases, not all of the first i characters of string are inspected. The number of characters actually inspected (on the average) decreases as a function of the length of pat. For a random English pattern of length 5, the algorithm will typically inspect i/4 characters of string before finding a match at i. Furthermore, the algorithm has been implemented so that (on the average) fewer than i + patlen machine instructions are executed. These conclusions are supported with empirical evidence and a theoretical analysis of the average behavior of the algorithm. The worst case behavior of the algorithm is linear in i + patlen, assuming the availability of array space for tables linear in patlen plus the size of the alphabet.

2,542 citations

Book
12 Jun 1992
TL;DR: For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents.
Abstract: An edited volume containing data structures and algorithms for information retrieved including a disk with examples written in C. For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents.

2,359 citations

References
More filters
Book
01 Jan 1974
TL;DR: This text introduces the basic data structures and programming techniques often used in efficient algorithms, and covers use of lists, push-down stacks, queues, trees, and graphs.
Abstract: From the Publisher: With this text, you gain an understanding of the fundamental concepts of algorithms, the very heart of computer science. It introduces the basic data structures and programming techniques often used in efficient algorithms. Covers use of lists, push-down stacks, queues, trees, and graphs. Later chapters go into sorting, searching and graphing algorithms, the string-matching algorithms, and the Schonhage-Strassen integer-multiplication algorithm. Provides numerous graded exercises at the end of each chapter. 0201000296B04062001

9,262 citations

Journal ArticleDOI
TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
Abstract: This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

3,270 citations

Journal ArticleDOI
TL;DR: The algorithm has the unusual property that, in most cases, not all of the first i.” in another string, are inspected.
Abstract: An algorithm is presented that searches for the location, “il” of the first occurrence of a character string, “pat,” in another string, “string.” During the search operation, the characters of pat are matched starting with the last character of pat. The information gained by starting the match at the end of the pattern often allows the algorithm to proceed in large jumps through the text being searched. Thus the algorithm has the unusual property that, in most cases, not all of the first i characters of string are inspected. The number of characters actually inspected (on the average) decreases as a function of the length of pat. For a random English pattern of length 5, the algorithm will typically inspect i/4 characters of string before finding a match at i. Furthermore, the algorithm has been implemented so that (on the average) fewer than i + patlen machine instructions are executed. These conclusions are supported with empirical evidence and a theoretical analysis of the average behavior of the algorithm. The worst case behavior of the algorithm is linear in i + patlen, assuming the availability of array space for tables linear in patlen plus the size of the alphabet.

2,542 citations

Proceedings ArticleDOI
15 Oct 1973
TL;DR: A linear time algorithm for obtaining a compacted version of a bi-tree associated with a given string is presented and indicated how to solve several pattern matching problems, including some from [4] in linear time.
Abstract: In 1970, Knuth, Pratt, and Morris [1] showed how to do basic pattern matching in linear time. Related problems, such as those discussed in [4], have previously been solved by efficient but sub-optimal algorithms. In this paper, we introduce an interesting data structure called a bi-tree. A linear time algorithm for obtaining a compacted version of a bi-tree associated with a given string is presented. With this construction as the basic tool, we indicate how to solve several pattern matching problems, including some from [4] in linear time.

1,985 citations

Journal ArticleDOI
TL;DR: LR(k) grammars are defined, which are perhaps the most general ones of this type, and they provide the basis for understanding all of the special tricks which have been used in the construction of parsing algorithms for languages with simple structure, e.g. algebraic languages.
Abstract: There has been much recent interest in languages whose grammar is sufficiently simple that an efficient left-to-right parsing algorithm can be mechanically produced from the grammar. In this paper, we define LR(k) grammars, which are perhaps the most general ones of this type, and they provide the basis for understanding all of the special tricks which have been used in the construction of parsing algorithms for languages with simple structure, e.g. algebraic languages. We give algorithms for deciding if a given grammar satisfies the LR(k) condition, for given k, and also give methods for generating recognizes for LR(k) grammars. It is shown that the problem of whether or not a grammar is LR(k) for some k is undecidable, and the paper concludes by establishing various connections between LR(k) grammars and deterministic languages. In particular, the LR(k) condition is a natural analogue, for grammars, of the deterministic condition, for languages.

819 citations