Home
/
Authors
/
Margaret J. Corasick

Author

Margaret J. Corasick

Bio: Margaret J. Corasick is an academic researcher from Bell Labs. The author has contributed to research in topics: Commentz-Walter algorithm & String (computer science). The author has an hindex of 1, co-authored 1 publications receiving 3174 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Efficient string matching: an aid to bibliographic search

[...]

Alfred V. Aho¹, Margaret J. Corasick¹•Institutions (1)

Bell Labs¹

01 Jun 1975-Communications of The ACM

TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

...read moreread less

Abstract: This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

...read moreread less

3,270 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Fast Pattern Matching in Strings

[...]

Donald E. Knuth, James Morris, Vaughan R. Pratt

01 Jun 1977-SIAM Journal on Computing

TL;DR: An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time.

...read moreread less

Abstract: An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings. The constant of proportionality is low enough to make this algorithm of practical use, and the procedure can also be extended to deal with some more general pattern-matching problems. A theoretical application of the algorithm shows that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time. Other algorithms which run even faster on the average are also considered.

...read moreread less

3,156 citations

Journal Article•DOI•

A guided tour to approximate string matching

[...]

Gonzalo Navarro¹•Institutions (1)

University of Chile¹

01 Mar 2001-ACM Computing Surveys

TL;DR: This work surveys the current techniques to cope with the problem of string matching that allows errors, and focuses on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms.

...read moreread less

Abstract: We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices. We conclude with some directions for future work and open problems.

...read moreread less

2,723 citations

Journal Article•DOI•

A fast string searching algorithm

[...]

Robert S. Boyer¹, J. Strother Moore²•Institutions (2)

SRI International¹, PARC²

01 Oct 1977-Communications of The ACM

TL;DR: The algorithm has the unusual property that, in most cases, not all of the first i.” in another string, are inspected.

...read moreread less

Abstract: An algorithm is presented that searches for the location, “il” of the first occurrence of a character string, “pat,” in another string, “string.” During the search operation, the characters of pat are matched starting with the last character of pat. The information gained by starting the match at the end of the pattern often allows the algorithm to proceed in large jumps through the text being searched. Thus the algorithm has the unusual property that, in most cases, not all of the first i characters of string are inspected. The number of characters actually inspected (on the average) decreases as a function of the length of pat. For a random English pattern of length 5, the algorithm will typically inspect i/4 characters of string before finding a match at i. Furthermore, the algorithm has been implemented so that (on the average) fewer than i + patlen machine instructions are executed. These conclusions are supported with empirical evidence and a theoretical analysis of the average behavior of the algorithm. The worst case behavior of the algorithm is linear in i + patlen, assuming the availability of array space for tables linear in patlen plus the size of the alphabet.

...read moreread less

2,542 citations

Book•

Information Retrieval: Data Structures and Algorithms

[...]

William B. Frakes, Ricardo Baeza-Yates¹•Institutions (1)

University of Chile¹

12 Jun 1992

TL;DR: For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents.

...read moreread less

Abstract: An edited volume containing data structures and algorithms for information retrieved including a disk with examples written in C. For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents.

...read moreread less

2,359 citations

Journal Article•DOI•

Automated generation of heuristics for biological sequence comparison

[...]

Guy Slater¹, Ewan Birney¹•Institutions (1)

European Bioinformatics Institute¹

15 Feb 2005-BMC Bioinformatics

TL;DR: B bounded sparse dynamic programming (BSDP) is introduced to allow rapid implementation of heuristics approximating to many complex alignment models, and has been incorporated into the freely available sequence alignment program, exonerate.

...read moreread less

Abstract: Exhaustive methods of sequence alignment are accurate but slow, whereas heuristic approaches run quickly, but their complexity makes them more difficult to implement. We introduce bounded sparse dynamic programming (BSDP) to allow rapid approximation to exhaustive alignment. This is used within a framework whereby the alignment algorithms are described in terms of their underlying model, to allow automated development of efficient heuristic implementations which may be applied to a general set of sequence comparison problems. The speed and accuracy of this approach compares favourably with existing methods. Examples of its use in the context of genome annotation are given. This system allows rapid implementation of heuristics approximating to many complex alignment models, and has been incorporated into the freely available sequence alignment program, exonerate.

...read moreread less

2,292 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse