Showing papers on "Approximate string matching published in 1994"

PDF

Open Access

Book•

[...]

15 Jan 1994

TL;DR: String Matching String Distance and Common Sequences Suffix Trees Approximate String Matching Repeated Substrings.

...read moreread less

Abstract: String Matching String Distance and Common Sequences Suffix Trees Approximate String Matching Repeated Substrings.

...read moreread less

346 citations

Proceedings Article•DOI•

Combinatorial pattern discovery for scientific data: some preliminary results

[...]

Jason T. L. Wang¹, Gung-Wei Chirn¹, Thomas G. Marr², Bruce A. Shapiro³, Dennis Shasha⁴, Kaizhong Zhang⁵ - Show less +2 more•Institutions (5)

New Jersey Institute of Technology¹, Cold Spring Harbor Laboratory², National Institutes of Health³, Courant Institute of Mathematical Sciences⁴, University of Western Ontario⁵

24 May 1994

TL;DR: This paper presents an example of combinatorial pattern discovery: the discovery of patterns in protein databases, which give information that is complementary to the best protein classifier available today.

...read moreread less

Abstract: Suppose you are given a set of natural entities (e.g., proteins, organisms, weather patterns, etc.) that possess some important common externally observable properties. You also have a structural description of the entities (e.g., sequence, topological, or geometrical data) and a distance metric. Combinatorial pattern discovery is the activity of finding patterns in the structural data that might explain these common properties based on the metric.This paper presents an example of combinatorial pattern discovery: the discovery of patterns in protein databases. The structural representation we consider are strings and the distance metric is string edit distance permitting variable length don't cares. Our techniques incorporate string matching algorithms and novel heuristics for discovery and optimization, most of which generalize to other combinatorial structures. Experimental results of applying the techniques to both generated data and functionally related protein families obtained from the Cold Spring Harbor Laboratory show the effectiveness of the proposed techniques. When we apply the discovered patterns to perform protein classification, they give information that is complementary to the best protein classifier available today.

...read moreread less

193 citations

Journal Article•DOI•

Speeding up two string-matching algorithms

[...]

Maxime Crochemore¹, Artur Czumaj², Leszek Gasieniec², Stefan Jarominek², Thierry Lecroq¹, Wojciech Plandowski², Wojciech Rytter² - Show less +3 more•Institutions (2)

University of Paris¹, University of Warsaw²

01 Nov 1994-Algorithmica

TL;DR: It is shown how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm), and its version called here the reverse factor algorithm (RF algorithm), based on factor graphs for the reverse of the pattern.

...read moreread less

Abstract: We show how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm), and its version called here the reverse factor algorithm (RF algorithm). The RF algorithm is based on factor graphs for the reverse of the pattern. The main feature of both algorithms is that they scan the text right-to-left from the supposed right position of the pattern. The BM algorithm goes as far as the scanned segment (factor) is a suffix of the pattern. The RF algorithm scans while the segment is a factor of the pattern. Both algorithms make a shift of the pattern, forget the history, and start again. The RF algorithm usually makes bigger shifts than BM, but is quadratic in the worst case. We show that it is enough to remember the last matched segment (represented by two pointers to the text) to speed up the RF algorithm considerably (to make a linear number of inspections of text symbols, with small coefficient), and to speed up the BM algorithm (to make at most 2 ·n comparisons). Only a constant additional memory is needed for the search phase. We give alternative versions of an accelerated RF algorithm: the first one is based on combinatorial properties of primitive words, and the other two use the power of suffix trees extensively. The paper demonstrates the techniques to transform algorithms, and also shows interesting new applications of data structures representing all subwords of the pattern in compact form.

...read moreread less

190 citations

Journal Article•DOI•

Sublinear approximate string matching and biological applications

[...]

William I. Chang¹, Eugene L. Lawler²•Institutions (2)

Cold Spring Harbor Laboratory¹, University of California, Berkeley²

01 Nov 1994-Algorithmica

TL;DR: This work gives an algorithm that is sublinear time0((n/m)k logbm) when the text is random andk is bounded by the threshold m/(logbm + O(1)).

...read moreread less

Abstract: Given a text string of lengthn and a pattern string of lengthm over ab-letter alphabet, thek differences approximate string matching problem asks for all locations in the text where the pattern occurs with at mostk differences (substitutions, insertions, deletions). We treatk not as a constant but as a fraction ofm (not necessarily constant-fraction). Previous algorithms require at leastO(kn) time (or exponential space). We give an algorithm that is sublinear time0((n/m)k log b m) when the text is random andk is bounded by the threshold m/(logb m + O(1)). In particular, whenk=o(m/logb m) the expected running time iso(n). In the worst case our algorithm is O(kn), but is still an improvement in that it is practical and uses0(m) space compared with0(n) or0(m 2). We define three problems motivated by molecular biology and describe efficient algorithms based on our techniques: (1) approximate substring matching, (2) approximate-overlap detection, and (3) approximate codon matching. Respectively, applications to biology are local similarity search, sequence assembly, and DNA-protein matching.

...read moreread less

183 citations

Book Chapter•DOI•

Approximate String Matching

[...]

Graham A Stephen

01 Oct 1994

111 citations

Journal Article•DOI•

An algorithm for approximate membership checking with application to password security

[...]

Udi Manber¹, Sun Wu¹•Institutions (1)

University of Arizona¹

25 May 1994-Information Processing Letters

TL;DR: A new data structure is presented that allows such queries to be answered very quickly even for huge sets if the words are not too long and the query is quite close.

...read moreread less

98 citations

Book Chapter•DOI•

Approximate String Matching and Local Similarity

[...]

William I. Chang¹, Thomas G. Marr¹•Institutions (1)

Cold Spring Harbor Laboratory¹

05 Jun 1994

TL;DR: This paper describes how the distance-based sublinear expected time algorithm of Chang and Lawler can be extended to solve efficiently the local similarity problem and presents a new theoretical result, polynomialspace, constant-fraction-error matching that is provably optimal.

...read moreread less

Abstract: The best known rigorous method for biological sequence comparison has been the algorithm of Smith and Waterman. It computes in quadratic time the highest scoring local alignment of two sequences given a (nonmetric) similarity measure and gap penalty. In this paper, we describe how the distance-based sublinear expected time algorithm of Chang and Lawler can be extended to solve efficiently the local similarity problem. We present both a new theoretical result, polynomialspace, constant-fraction-error matching that is provably optimal, and a practical adaptation of it that produces nearly identical results as Smith-Waterman, at speedups of 2X (PAM 120, roughly corresponding to 33% identity) to 10X (PAM 90, 50% identity) or better. Further improvements are anticipated. What makes this possible is the addition of a new constraint on unit score (average score per residue), which filters out both very short alignments and very long alignments with unacceptably low average. This program is part of a package called Genome Analyst that is being developed at CSHL.

...read moreread less

89 citations

Journal Article•DOI•

Approximate tree matching in the presence of variable length don't cares

[...]

Kaizhong Zhang¹, Dennis Shasha², Jason T. L. Wang³•Institutions (3)

University of Western Ontario¹, New York University², New Jersey Institute of Technology³

01 Jan 1994-Journal of Algorithms

TL;DR: This paper presents algorithms for three problems having to do with approximate matching for such trees with variable length don′t cares (VLDCs) with time complexity O(|P| × |D| × min(depth(P, leaves(P)) × min (depth(D), leaves(D)))

...read moreread less

89 citations

Patent•

Method and apparatus for detecting error strings in a text

[...]

Andreas Arning¹•Institutions (1)

IBM¹

11 Jul 1994

TL;DR: In this paper, a system for checking the spelling of words and character strings without the need for a stored dictionary of words was proposed, where the system selects an error free string and modifies it according to one or more rules which change the error-free string to a possible error string.

...read moreread less

Abstract: A system for checking the spelling of words and character strings without the need for a stored dictionary of words and the memory required thereby. The system selects an error-free string and modifies it according to one or more rules which change the error-free string to a possible error string. The rules creating the possible error string can modify the error-free string by predictable character manipulation to yield usual and common errors of the character string. The frequency of occurrence of both the error and error-free strings within the text are determined. These frequencies are compared to each other and, based upon the comparison, the system decides whether the possible error string is an actual error string. The system can use modifying rules which are psychological or technically related to the computer system or operator, and rules which correspond to errors common with specialized input methods, such as character and speech recognition.

...read moreread less

64 citations

Book Chapter•DOI•

String Matching Algorithms and Automata

[...]

Imre Simon¹•Institutions (1)

University of São Paulo¹

10 Jun 1994

TL;DR: The structure of finite automata recognizing sets of the form A*p, for some word p, is studied, and the results obtained are used to improve the Knuth-Morris-Pratt string searching algorithm.

...read moreread less

Abstract: In this paper we study the structure of finite automata recognizing sets of the form A*p, for some word p, and use the results obtained to improve the Knuth-Morris-Pratt string searching algorithm. We also determine the average number of nontrivial edges of the above automata.

...read moreread less

56 citations

Book•

Computer Algorithms: String Pattern Matching Strategies

[...]

Jun-ichi Aoe

30 May 1994

TL;DR: The text describes and evaluates the BF, KMP, BM, and KR algorithms, discusses improvements for string pattern matching machines, and details a technique for detecting and removing the redundant operation of the AC machine.

...read moreread less

Abstract: From the Publisher: Introduces the basic concepts and characteristics of string pattern matching strategies and provides numerous references for further reading. The text describes and evaluates the BF, KMP, BM, and KR algorithms, discusses improvements for string pattern matching machines, and details a technique for detecting and removing the redundant operation of the AC machine. Also explored are typical problems in approximate string matching. In addition, the reader will find a description for applying string pattern matching algorithms to multidimensional matching problems, an investigation of numerous hardware-based solutions for pattern matching, and an examination of hardware approaches for full text search. The first chapter's survey paper describes the basic concepts of algorithm classifications. The five chapters that follow include 15 papers further illustrating these classifications: single keyword matching, matching sets of keywords, approximate string matching, multidimensional matching, and hardware matching.

...read moreread less

Patent•

Method for performing string matching

[...]

Richard Hull¹•Institutions (1)

Hewlett-Packard¹

28 Oct 1994

TL;DR: In this paper, an improved method of matching a query string against a plurality of candidate strings replaces a highly computationally intensive string edit distance calculation with a less computational intensive lower bound estimate.

...read moreread less

Abstract: An improved method of matching a query string against a plurality of candidate strings replaces a highly computationally intensive string edit distance calculation with a less computationally intensive lower bound estimate. The lower bound estimate of the string edit distance between the two strings is calculated by equalising the lengths of the two strings by adding padding elements to the shorter one. The elements of the strings are then sorted and the substitution costs between corresponding elements are summed.

...read moreread less

Journal Article•DOI•

An approach to designing very fast approximate string matching algorithms

[...]

M.-W. Du, S.C. Chang

01 Aug 1994-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The experiments show that performing approximate string matching for a large dictionary in real-time on an ordinary sequential computer under the multiple fault model is feasible.

...read moreread less

Abstract: An approach to designing very fast algorithms for approximate string matching in a dictionary is proposed. Multiple spelling errors corresponding to insert, delete, change, and transpose operations on character strings are considered in the fault model. The design of very fast approximate string matching algorithms through a four-step reduction procedure is described. The final and most effective step uses hashing techniques to avoid comparing the given word with words at large distances. The technique has been applied to a library book catalog textbase. The experiments show that performing approximate string matching for a large dictionary in real-time on an ordinary sequential computer under our multiple fault model is feasible. >

...read moreread less

Book Chapter•DOI•

Approximate Pattern Matching with Samples

[...]

Tadao Takaoka¹•Institutions (1)

Ibaraki University¹

25 Aug 1994

TL;DR: A more general analysis of expected time with the simplified algorithm for the one-dimensional case under a non-uniform probability distribution, and it is shown that the method can easily be generalized to the two-dimensional approximate pattern matching problem with sublinear expected time.

...read moreread less

Abstract: We simplify in this paper the algorithm by Chang and Lawler for the approximate string matching problem, by adopting the concept of sampling. We have a more general analysis of expected time with the simplified algorithm for the one-dimensional case under a non-uniform probability distribution, and we show that our method can easily be generalized to the two-dimensional approximate pattern matching problem with sublinear expected time.

...read moreread less

Journal Article•DOI•

Approximate string matching using within-word parallelism

[...]

Alden H. Wright¹•Institutions (1)

University of Montana¹

01 Apr 1994-Software - Practice and Experience

TL;DR: An implementation of the dynamic programming algorithm for this problem is given that packs several characters and mod‐4 integers into a computer word and a 21‐fold parallelism over the conventional algorithm can be obtained.

...read moreread less

Abstract: Given a text string, a pattern string, and an integer k, the problem of approximate string matching with k differences is to find all substrings of the text string whose edit distance from the pattern string is less than k. The edit distance between two strings is defined as the minimum number of differences, where a difference can be a substitution, insertion, or deletion of a single character. An implementation of the dynamic programming algorithm for this problem is given that packs several characters and mod-4 integers into a computer word. Thus, it is a parallelization of the conventional implementation that runs on ordinary processors. Since a small alphabet means that characters have short binary codes, the degree of parallelism is greatest for small alphabets and for processors with long words. For an alphabet of size 8 or smaller and a 64 bit processor, a 21-fold parallelism over the conventional algorithm can be obtained. Empirical comparisons to the basic dynamic programming algorithm, to a version of Ukkonen's algorithm, to the algorithm of Galil and Park, and to a limited implementation of the Wu-Manber algorithm are given.

...read moreread less

Patent•

VLSI circuit structure for determining the edit distance between strings

[...]

Nagarajan Ranganathan¹, Raghu Sastry¹•Institutions (1)

University of South Florida¹

30 Sep 1994

TL;DR: In this paper, a VLSI circuit structure for computing the edit distance between two strings over a given alphabet is presented, which can perform approximate string matching for variable edit costs, and does not place any constraint on the lengths of the strings that can be compared.

...read moreread less

Abstract: The edit distance between two strings a1, . . . , am and b1, . . . , bn is the minimum cost s of a sequence of editing operations (insertions, deletions and substitutions) that convert one string into the other. This invention provides VLSI circuit structure for computing the edit distance between two strings over a given alphabet. The circuit structure can perform approximate string matching for variable edit costs. More importantly, the circuit structure does not place any constraint on the lengths of the strings that can be compared. It makes use of simple basic cells and requires regular nearest-neighbor communication, which makes it suitable for VLSI implementation.

...read moreread less

Book•

Computer algorithms : string pattern matching strategies

[...]

順一青江

01 Jan 1994

Book Chapter•DOI•

Polynomial-Time Algorithms for Computing Characteristic Strings

[...]

Minoru Ito¹, Kuniyasu Shimizu², Michio Nakanishi³, Akihiro Hashimoto³•Institutions (3)

Nara Institute of Science and Technology¹, Toshiba², Osaka University³

05 Jun 1994

TL;DR: It can be decided in O(∥T∥+l2 · ¦S− T¦+l ·δ·¦¦S − T) time whether or not there exists a δ-characteristic string of T under S, where l denotes the length of a shortest string in T, the cardinality of S − T, and ∥T ∥ the size of T.

...read moreread less

Abstract: The difference between two strings is the minimum number of editing steps (insertions, deletions, changes) that convert one string into the other. Let S be a finite set of strings, let T be a subset of S, and let δ be a positive integer. A δ-characteristic string of T under S is a string that is a common substring of T and that has at least δ-differences from any substring of any string in S − T. In this paper, the following result is presented.lt can be decided in O(∥T∥+l2 · ¦S− T¦+l ·δ·¦¦S−T¦¦) time whether or not there exists a δ-characteristic string of T under S, where l denotes the length of a shortest string in T, ¦S− T¦ the cardinality of S − T, and ∥T∥ the size of T. If such a string exits, then all the shortest δ-characteristic strings of T under S can also be obtained in that time.

...read moreread less

Proceedings Article•

On a Parallel-Algorithms Method for String Matching Problems.

[...]

Süleyman Cenk Sahinalp, Uzi Vishkin

01 Jan 1994

TL;DR: In this article, the authors show how to break symmetries that occur in the process of assigning labels using the Deterministic Coin Tossing (DCT) technique, and thereby reduce the number of labeled substrings to linear.

...read moreread less

Abstract: Suffix trees are the main data-structure in string matching algorithmes. There are several serial algorithms for suffix tree construction which run in linear time, but the number of operations in the only parallel algorithm available, due to Apostolico, Iliopoulos, Landau, Schieber and Vishkin, is proportional to n log n. The algorithm is based on labeling substrings, similar to a classical serial algorithm, with the same operations bound, by Karp, Miller and Rosenberg. We show how to break symmetries that occur in the process of assigning labels using the Deterministic Coin Tossing (DCT) technique, and thereby reduce the number of labeled substrings to linear.

...read moreread less

Book Chapter•DOI•

A Lossy Data Compression Based on String Matching: Preliminary Analysis and Suboptimal Algorithms

[...]

Tomasz Luczak¹, Wojciech Szpankowski²•Institutions (2)

Polish Academy of Sciences¹, Purdue University²

05 Jun 1994

TL;DR: In this conference version, this work considers only Bernoulli model (i.e., memoryless channel) but the results hold under much weaker probabilistic assumptions.

...read moreread less

Abstract: A practical suboptimal algorithm (source coding) for lossy (non-faithful) data compression is discussed. This scheme is based on an approximate string matching, and it naturally extends lossless (faithful) Lempel-Ziv data compression scheme. The construction of the algorithm is based on a careful probabilistic analysis of an approximate string matching problem that is of its own interest. This extends Wyner-Ziv model to lossy environment. In this conference version, we consider only Bernoulli model (i.e., memoryless channel) but our results hold under much weaker probabilistic assumptions.

...read moreread less

Book Chapter•DOI•

A Linear-Time Algorithm for Computing Characteristic Strings

[...]

Michio Nakanishi¹, Morio Hasidume¹, Minoru Ito², Akihiro Hashimoto¹•Institutions (2)

Osaka University¹, Nara Institute of Science and Technology²

25 Aug 1994

TL;DR: This work presents a lineartime algorithm for deciding whether or not there exists a characteristic string of T under S, and if such a string exists, then the algorithm returns all the shortest characteristic strings of Tunder S in that time.

...read moreread less

Abstract: Let S be a finite set of strings and let T be a subset of S. A characteristic string of T under S is a string that is a common substring of T and that is not a substring of any string in S-T. We present a lineartime algorithm for deciding whether or not there exists a characteristic string of T under S. If such a string exists, then the algorithm returns all the shortest characteristic strings of T under S in that time.

...read moreread less

Book Chapter•DOI•

Approximate String Matching with Don't Care Characters

[...]

Tatsuya Akutsu¹•Institutions (1)

Gunma University¹

05 Jun 1994

TL;DR: This paper presents parallel and serial approximate matching algorithms for strings with don't care characters based on Landau and Vishkin's approximate string matching algorithm and Fisher and Paterson's exact string matching algorithms with don's care characters.

...read moreread less

Abstract: This paper presents parallel and serial approximate matching algorithms for strings with don't care characters. They are based on Landau and Vishkin's approximate string matching algorithm and Fisher and Paterson's exact string matching algorithm with don't care characters. The serial algorithm works in O(√kmn log¦Σ¦ log2m/k log log m/k) time, and the parallel algorithm works in O(k log m) time using O(√m/kn log ¦Σ¦ log m/k log log m/k) Processors on a CRCW-PRAM, where n denotes the length of a text string, m denotes the length of a pattern string, k denotes the maximum number of differences, and ∑ denotes the alphabet (i.e. the set of characters). Several extensions are also described.

...read moreread less

Journal Article•DOI•

Spectral retrieval by fuzzy matching

[...]

A. R. Goss, M. J. Adams

01 Jan 1994

TL;DR: Infrared spectra are identified by matching to a standard database using a fuzzy peak matching algorithm, and the results obtained can be compared to those obtained from conventional X-ray diffraction analysis.

...read moreread less

Abstract: Infrared spectra are identified by matching to a standard database using a fuzzy peak matching algorithm.

...read moreread less

Book Chapter•DOI•

On the Exact Complexity of the String Prefix-Matching Problem (Extended Abstract)

[...]

Dany Breslauer¹, Livio Colussi², Laura Toniolo³•Institutions (3)

National Research Foundation of South Africa¹, University of Padua², University of Marne-la-Vallée³

26 Sep 1994

TL;DR: In this article, the exact comparison complexity of the string prefix-matching problem in the deterministic sequential comparison model with equality tests was studied, and almost tight lower and upper bounds on the number of comparisons required in the worst case by on-line prefix matching algorithms for any fixed pattern and variable text were derived.

...read moreread less

Abstract: In this paper we study the exact comparison complexity of the string prefix-matching problem in the deterministic sequential comparison model with equality tests We derive almost tight lower and upper bounds on the number of comparisons required in the worst case by on-line prefix-matching algorithms for any fixed pattern and variable text Unlike previous results on the comparison complexity of string-matching and prefix-matching algorithms, our bounds are almost tight for any particular pattern

...read moreread less

Book•

Combinatorial Pattern Matching: 5th Annual Symposium, Cpm 94 Asilomar, Ca, Usa, June 5-8, 1994 : Proceedings

[...]

Maxime Crochemore, Dan Gusfield

01 May 1994

TL;DR: A space efficient algorithm for finding the best non-overlapping alignment score and a lossy data compression scheme that allows fast searching directly in the compressed file.

...read moreread less

Abstract: A space efficient algorithm for finding the best non-overlapping alignment score.- The parameterized complexity of sequence alignment and consensus.- Computing all suboptimal alignments in linear space.- Approximation algorithms for multiple sequence alignment.- A context dependent method for comparing sequences.- Fast identification of approximately matching substrings.- Alignment of trees - An alternative to tree edit.- Parametric recomputing in alignment graphs.- A lossy data compression based on string matching: Preliminary analysis and suboptimal algorithms.- A text compression scheme that allows fast searching directly in the compressed file.- An alphabet-independent optimal parallel search for three dimensional pattern.- Unit route upper bound for string-matching on hypercube.- Computation of squares in a string.- Minimization of sequential transducers.- Shortest common superstrings for strings of random letters.- Maximal common subsequences and minimal common supersequences.- Dictionary-matching on unbounded alphabets: Uniform length dictionaries.- Proximity matching using fixed-queries trees.- Query primitives for tree-structured data.- Multiple matching of parameterized patterns.- Approximate string matching with don't care characters.- Matching with matrix norm minimization.- Approximate string matching and local similarity.- Polynomial-time algorithms for computing characteristic strings.- Recent methods for RNA modeling using stochastic context-free grammars.- Efficient bounds for oriented chromosome inversion distance.

...read moreread less

Book Chapter•DOI•

An Approximate String Matching Method for Handwriting Recognition Post-Processing Using a Dictionary

[...]

Dimo Dimov¹•Institutions (1)

Bulgarian Academy of Sciences¹

01 Jan 1994

TL;DR: A methods for search in a dictionary based on knowledge about error statistics at the output of a HR-classifier and method applicability to HR is assessed in terms of “Damerau-Levenstein” metrics that is frequently used for similarity definition of HR strings.

...read moreread less

Abstract: A brief analysis of existing methods for performance improvement of handwriting recognition (HR) that are based on text-to-lexicon matching postprocessing is provided in the paper. A method for search in a dictionary is proposed, based on knowledge about error statistics at the output of a HR-classifier. The method is developed for the most probable misspellings, namely a character substitution, omission, insertion and neighboring character reversal. Method applicability to HR is assessed in terms of “Damerau-Levenstein” metrics that is frequently used for similarity definition of HR strings. The error modelling proposed is distributed between the dictionary structure and the processing algorithm. Character deletion is the only operation utilized. Thus, a substantial amount of errors are implicitly represented, resulting in a comparatively low processing time and space complexity. The method is implemented in a software subsystem for fault-tolerant keyboard input processing of natural language names. The experimental results are briefly reported.

...read moreread less

Proceedings Article•DOI•

A data structure for approximate string searching

[...]

Roos, Fei Shi, Widmayer

01 Jan 1994

TL;DR: The authors provide a data structure for approximate string searching and discuss the searching algorithm.

...read moreread less

Abstract: Summary form only given. The problem of searching for approximate occurrences of a pattern in a set of strings is called the approximate string searching problem. The recent interest in this problem comes from DNA sequence analysis: whenever a sequence investigator determines a new sequence, one of the first things he must do is to compare it with all available sequences to see if it resembles something already known. The authors provide a data structure for approximate string searching and discuss the searching algorithm. >

...read moreread less

A Data Structure for Approximate String Searching (Extended Abstract)

[...]

Thomas Roos, Fei Shi, Peter Widmayer

01 Jan 1994

Journal Article•

Recognition of Numeric Strings with Notation Rules Using String Checking

[...]

Katsumi Marukawa, Kazutoshi Takakura, Taro Hayashi, Masashi Koga, Yoshihiro Shima - Show less +1 more

01 Jan 1994-Journal of Machine Vision and Applications

TL;DR: A string check function that removes extraneous neighboring characters from recognized character strings based on notation rules and determines whether error correction is required and is determined to be effective by conducting experiments.

...read moreread less

Abstract: At1 algorithm ,for recognizing nutvreric strings with notation rules by using string checking has been developed. The proposed string check function removes extraneous characters from recognized character strings by using the notation rules. This function also determines whether to carry out recognition error correction. In this correction process, recognized characters are compared with string in a dictionary. Errors in character strings are autotnatically corrected to meaninRfu1 letters by using the notcltion rules and dictionary. The string space of the dictionar.~ to be compared is restricted based on the notation rules; this reduces processing time. The string characters. These systems that recognize such strings as an 1. D. code as a numeric string are thus unsuitable. These strings should be read as a word. The above tasks needs to be solved by utilizing sources of notation rules and a dictionary. We propose a string check function that removes extraneous neighboring characters from recognized character strings based on notation rules. It also determines whether error correction is required. Correction is done by comparing recognized character strings with strings in a dictionary. This function was determined to be effective by conducting experiments using a sample set of 3.983 input pages. check function improved the string recognition rate from 98.5 % to 99.7 %. and decreases the error rate by 98 %. 2. Numeric String

...read moreread less

Journal Article•DOI•

Padded string: treating string as sequence of machine words

[...]

Pei-Chi Wu¹, Feng-Jian Wang¹•Institutions (1)

National Chiao Tung University¹

01 Sep 1994-Sigplan Notices

TL;DR: A data type called padded string is presented, a string type with faster operations that can run faster than traditional strings such as char* in C language.

...read moreread less

Abstract: A string is a sequence of characters. The operations such as copy and comparison on strings ar e usually performed character by character . This note presents a data type called padded string, a string type with faster operations . A padded string is a sequence of machine words . For 32-bi t machines, four characters can be operated in one machine instruction . Operations on padde d strings can then run faster than traditional strings such as char* in C language . An experimen t sorting an array of strings shows speedup 24% using padded string.

...read moreread less