scispace - formally typeset
Search or ask a question

Showing papers on "String (computer science) published in 1987"


Journal ArticleDOI
Dana Angluin1
TL;DR: In this article, the problem of identifying an unknown regular set from examples of its members and nonmembers is addressed, where the regular set is presented by a minimaMy adequate teacher, which can answer membership queries about the set and can also test a conjecture and indicate whether it is equal to the unknown set and provide a counterexample if not.
Abstract: The problem of identifying an unknown regular set from examples of its members and nonmembers is addressed. It is assumed that the regular set is presented by a minimaMy adequate Teacher, which can answer membership queries about the set and can also test a conjecture and indicate whether it is equal to the unknown set and provide a counterexample if not. (A counterexample is a string in the symmetric difference of the correct set and the conjectured set.) A learning algorithm L* is described that correctly learns any regular set from any minimally adequate Teacher in time polynomial in the number of states of the minimum dfa for the set and the maximum length of any counterexample provided by the Teacher. It is shown that in a stochastic setting the ability of the Teacher to test conjectures may be replaced by a random sampling oracle, EX( ). A polynomial-time learning algorithm is shown for a particular problem of context-free language identification.

2,157 citations


Journal ArticleDOI
TL;DR: In this article, the first occurrence of a string X as a consecutive block within a text Y is found by using a randomized algorithm. But the algorithm requires a constant number of storage locations, and essentially runs in real time.
Abstract: We present randomized algorithms to solve the following string-matching problem and some of its generalizations: Given a string X of length n (the pattern) and a string Y (the text), find the first occurrence of X as a consecutive block within Y. The algorithms represent strings of length n by much shorter strings called fingerprints, and achieve their efficiency by manipulating fingerprints instead of longer strings. The algorithms require a constant number of storage locations, and essentially run in real time. They are conceptually simple and easy to implement. The method readily generalizes to higher-dimensional patternmatching problems.

1,400 citations


Journal ArticleDOI
TL;DR: This approach allows an efficient and natural way to construct iconic indexes for pictures and proves the necessary and sufficient conditions to characterize ambiguous pictures for reduced 2D strings as well as normal 2-D strings.
Abstract: In this paper, we describe a new way of representing a symbolic picture by a two-dimensional string. A picture query can also be specified as a 2-D string. The problem of pictorial information retrieval then becomes a problem of 2-D subsequence matching. We present algorithms for encoding a symbolic picture into its 2-D string representation, reconstructing a picture from its 2-D string representation, and matching a 2-D string with another 2-D string. We also prove the necessary and sufficient conditions to characterize ambiguous pictures for reduced 2-D strings as well as normal 2-D strings. This approach thus allows an efficient and natural way to construct iconic indexes for pictures.

674 citations


Journal ArticleDOI
TL;DR: In this paper, the Auslander-reiten sequences with few middle terms and applications to string algebras are presented. But they do not have any application to string algebra.
Abstract: (1987). Auslander-reiten sequences with few middle terms and applications to string algebrass. Communications in Algebra: Vol. 15, No. 1-2, pp. 145-179.

555 citations


Journal ArticleDOI
TL;DR: A generalization of string matching, in which the pattern is a sequence of pattern elements, each compatible with a set of symbols, is investigated, which shows that generalized string matching requires a time-space product of $\Omega ({{n^2 } / {\log n}})$ on a powerful model of computation, when the alphabet is restricted to n symbols.
Abstract: Given a pattern string of length n and an object string of length m, the string matching problem asks for the positions of all occurrences of the pattern in the object string. This paper investigates a generalization of string matching, in which the pattern is a sequence of pattern elements, each compatible with a set of symbols. The alphabet of symbols is infinite, with its members encoded in a finite alphabet. In contrast to standard string matching, which can be solved in simultaneous linear time and constant space, it is shown that generalized string matching requires a time-space product of $\Omega ({{n^2 } / {\log n}})$ on a powerful model of computation, when the alphabet is restricted to n symbols. Our proof uses a method of Borodin. The obvious algorithm for generalized string matching requires time $O(NM)$, where N is the length of the encoding of the pattern, and M is that of the object string. We describe an algorithm which solves generalized string matching in time $O(N + M + mN^{{1 / 2}} {\o...

351 citations


Proceedings Article
01 Jan 1987
TL;DR: It is shown that knowledge complexity can be used to show that a language is easy to prove and that there are not any perfect zero-knowledge protocols for NP-complete languages unless the polynomial time hierarchy collapses.
Abstract: A Perfect Zero-Knowledge interactive proof system convinces a verifier that a string is in a language without revealing any additional knowledge in an information-theoretic sense. We show that for any language that has a perfect zero-knowledge proof system, its complement has a short interactive protocol. This result implies that there are not any perfect zero-knowledge protocols for NP-complete languages unless the polynomial time hierarchy collapses. This paper demonstrates that knowledge complexity can be used to show that a language is easy to prove.

203 citations


Proceedings ArticleDOI
01 Jan 1987
TL;DR: This paper showed that for any language that has a perfect zero-knowledge proof system, its complement has a short interactive protocol, which implies that there are not any perfect zero knowledge protocols for NP-complete languages unless the polynomial time hierarchy collapses.
Abstract: A Perfect Zero-Knowledge interactive proof system convinces a verifier that a string is in a language without revealing any additional knowledge in an information-theoretic sense. We show that for any language that has a perfect zero-knowledge proof system, its complement has a short interactive protocol. This result implies that there are not any perfect zero-knowledge protocols for NP-complete languages unless the polynomial time hierarchy collapses. This paper demonstrates that knowledge complexity can be used to show that a language is easy to prove.

190 citations


Patent
30 Jun 1987
TL;DR: A system for remotely controlling multiple light strings to achieve a wide variety of visual effects including, on the same string, variations in color, blink rate, and brightness is described in this article.
Abstract: A system for remotely controlling multiple light strings to achieve a wide variety of visual effects including, on the same string, variations in color, blink rate, and brightness

113 citations


Journal ArticleDOI
TL;DR: A new class of Hermite normal form solution procedures which perform modulo determinant arithmetic throughout the computation is described which is shown to possess a polynomial time complexity bound which is a function of the length of the input string.
Abstract: This paper describes a new class of Hermite normal form solution procedures which perform modulo determinant arithmetic throughout the computation. This class of procedures is shown to possess a polynomial time complexity bound which is a function of the length of the input string. Computational results are also given.

108 citations


Journal ArticleDOI
TL;DR: In this paper, modular invariant partition functions for strings which propagate on a group manifold are constructed upon automorphisms of affine Kac-Moody algebras.

87 citations


Journal ArticleDOI
D.M.W. Evans1
TL;DR: An elegant algorithm has been found that performs this "perfect shuffle" more efficiently and, according to timing experiments, runs about eight times faster than the fastest other algorithm known to the author.
Abstract: All radix-B fast Fourier transforms (FFT) or fast Hartley transforms (FHT) performed "in-place" require at some point that the sequence elements he permuted such that, indexing the elements 0 to N - 1, the element with index i is swapped with the element whose index is j. The permutation is called digit-reversing, because if i is represented as a string of digits, base B, then j is that index whose representation is the same string of digits written in reverse order. N is a power of B and B \geq 2 . An elegant algorithm has been found that Performs this "perfect shuffle" more efficiently and, according to timing experiments, runs about eight times faster than the fastest other algorithm known to the author. The algorithm is of order O(N) and led, for example, to a saving of 7 percent in the total (radix-2) FFT running time for N = 1024.

Journal ArticleDOI
TL;DR: An algorithm to compute the constrained edit distance subject to any arbitrary edit constraint involving the number and type of edit operations to be performed has been presented and demonstrates remarkable accuracy.
Abstract: Let X* be any unknown word from a finite dictionary H Let U be any arbitrary subsequence of X* We consider the problem of estimating X* by processing Y, which is a noisy version of U We do this by defining the constrained edit distance between XH and Y subject to any arbitrary edit constraint involving the number and type of edit operations to be performed An algorithm to compute this constrained edit distance has been presented Although in general the algorithm has a cubic time complexity, within the framework of our solution the algorithm possesses a quadratic time complexity Recognition using the constrained edit distance as a criterion demonstrates remarkable accuracy Experimental results which involve strings of lengths between 40 and 80 and which contain an average of 26547 errors per string demonstrate that the scheme has about 995 percent accuracy

Journal ArticleDOI
TL;DR: A VLSI architecture based on the space-time domain expansion approach which can compute the string distance and also give the matching index-pairs which correspond to the edit sequence is proposed and can obtain high throughput by using extensive pipelining and parallelism.

Journal ArticleDOI
TL;DR: A brief introduction to object‐oriented programming and how it is supported by the C+ + programming language is given and two of the class library's more interesting features, object I/O and processes are described.
Abstract: The Object-Oriented Program Support (OOPS) class library is a portable collection of classes similar to those of Smalltalk-80 that has been developed using the C++ programming language under the UNIX operating system. The OOPS library includes generally useful data types, such as String, Date and Time, and most of the Smalltalk-80 collection classes such as OrderedCtn (indexed arrays), LinkedList (singly linked lists), Set (hash tables), and Dictionary (associative arrays). Arbitrarily complex data structures comprised of OOPS and user-defined objects can be stored on disk files or moved between UNIX processes by means of an object I/O facility. The classes Process, Scheduler, Semaphore and SharedQueue provide multiprogramming with coroutines. This paper gives a brief introduction to object-oriented programming and how it is supported by the C+ + programming language. An overview of the OOPS library is also presented, followed by a programming example. The implementation details of two of the class library's more interesting features, object I/O and processes, are described. The paper concludes with a discussion of the differences between the OOPS library and Smalltalk-80 and some observations based on our programming experience with C++ and OOPS.

01 Jan 1987
TL;DR: The Object-Oriented Program Support (OOPS) class library as discussed by the authors is a portable collection of classes similar to those of Smalltalk-80 that has been developed using the C++ programming language under the UNIX operating system.
Abstract: The Object-Oriented Program Support (OOPS) class library is a portable collection of classes similar to those of Smalltalk-80 that has been developed using the C++ programming language under the UNIX operating system. The OOPS library includes generally useful data types, such as String, Date and Time, and most of the Smalltalk-80 collection classes such as OrderedCtn (indexed arrays), LinkedList (singly linked lists), Set (hash tables), and Dictionary (associative arrays). Arbitrarily complex data structures comprised of OOPS and user-defined objects can be stored on disk files or moved between UNIX processes by means of an object I/O facility. The classes Process, Scheduler, Semaphore and SharedQueue provide multiprogramming with coroutines. This paper gives a brief introduction to object-oriented programming and how it is supported by the C+ + programming language. An overview of the OOPS library is also presented, followed by a programming example. The implementation details of two of the class library's more interesting features, object I/O and processes, are described. The paper concludes with a discussion of the differences between the OOPS library and Smalltalk-80 and some observations based on our programming experience with C++ and OOPS.

Journal ArticleDOI
M. Bush1, G. Kopec
TL;DR: A system for speaker-independent connected digit recognition is described in which explicit acoustic-phonetic features and constraints play a significant role and the best configurations of the recognizer achieve string recognition accuracies.
Abstract: A system for speaker-independent connected digit recognition is described in which explicit acoustic-phonetic features and constraints play a significant role. The digit vocabulary is modeled using a finite-state pronunciation network whose branches correspond to meaningful acoustic-phonetic units. Each branch is associated with an acoustic pattern matcher which employs a combination of whole-spectrum and feature-based metrics. The system has been evaluated using 17 000 utterances from the Texas Instruments (TI) multidialect, connected digits database. The best configurations of the recognizer achieve string recognition accuracies of approximately 96 and 97 percent when the length of the input string is unknown and known, respectively, and when different talkers are used for training and testing.

Patent
Jose Pastor1
31 Dec 1987
TL;DR: In this article, a system for authenticating a plurality of documents includes a device for solving a set of polynomial equations to develop a string of characters and having a decryption key therein that reveals not only a plain text message indicating the source of the authentication but, in addition, provides the decoding key for use with the information provided by the mailer.
Abstract: A system for conveying information for the reliable authentication of a plurality of documents includes a device for solving a set of polynomial equations to develop a string of characters and having a decryption key therein that, upon application to the string of characters provided, reveals not only a plain text message indicating the source of the authentication but, in addition, provides the decryption key for use with the information provided by the mailer. The solution of the set of polynomial equations requires the accumulation of individual documents, each having a random x i and a solution f(x i ) associated therewith.

Patent
Nukiyama Tomoji1
17 Feb 1987
TL;DR: In this article, a shift control circuit comprising an arithmetic circuit, a logic circuit, and a single-bit shifter circuit is proposed to detect the positive or negative sign of the bit string.
Abstract: A shift control circuit comprising an arithmetic circuit (20) for producing a string of a predetermined number of data bits, a logic circuit (22) for detecting the positive or negative sign of the bit string and producing a first switch signal responsive to the positive sign of the bit string or a second switch signal responsive to the negative sign of the bit string, a ones complement generator circuit (24) for producing a signal representative of the ones complement of the bit string, a first selective signal transfer circuit (26) such as a multiplexer which is transparent directly to the bit string in response to the first switch signal or to the signal from the ones complement generator circuit in response to the second switch signal, a decoder circuit (28) for decording the bit string or the signal passed through the first selective signal transfer circuit for producing a decoded output signal, a single-bit shifter circuit (30) for shifting the bit of the decoded output signal by a single bit in a predetermined direction for producing a single-bit shifted output signal, and a second selective signal transfer circuit (32) such as a multiplexer which is transparent directly to the decoded output signal in response to the first switch signal or to the signal from the single-bit shifter circuit (30) in response to the second switch signal.

Patent
13 Mar 1987
TL;DR: In this paper, structural features from a perceived target are extracted by producing a compact one-dimensional description of the perceived target's boundary, and the structural features are classified by using string-to-string matching.
Abstract: A method for determining whether a perceived target is acceptably close to a model target. The perceived target is first segmented using a relaxation based procedure. Structural features from the perceived target are extracted by producing a compact one-dimensional description of the perceived target's boundary. Said structural features are classified by using string-to-string matching, wherein one of two symbolic strings is a representation of the compact one-dimensional description of the boundary of the perceived target, and the other of said two symbolic strings is a pre-stored representation of the model target. The string-to-string matching entails measuring the distance between the two strings based upon deletion, insertion, and substitution of symbols from one string to the other. Performing the string-to-string matching measures how closely local structural features of the perceived target resemble local structural features of the model target.

Patent
30 Dec 1987
TL;DR: In this article, a document storage and retrieval system for storing a document body in the form of image, means for storing text information in a form of a character code string for retrieval, apparatus for executing a retrieval with reference to the text information, and apparatus for displaying a document image relating thereto on a retrieval terminal according to the retrieval result.
Abstract: A document storage and retrieval system for storing a document body in the form of image, means for storing text information in the form of a character code string for retrieval, apparatus for executing a retrieval with reference to the text information, and apparatus for displaying a document image relating thereto on a retrieval terminal according to the retrieval result. Such a form of the system is available for retrieving the full contents of a document and also for displaying the document body printed in a format easy to read straight in the form of image. Users are capable of retrieving documents with arbitrary words and also capable of reading even such a document as is complicated to include mathematical expressions and charts through a terminal in the form of image, the same as on paper. A system is provided wherein the text information for retrieval is extracted automatically from the document image through character recognition. Since a precision of the character recognition has not been satisfactory hitherto, a visual retrieval and correction have been carried out without fail by operators. However, there is no necessity for the operators to attend therefor.

Journal ArticleDOI
TL;DR: An algorithm has been developed for the identification of unknown patterns which are distinctive for a set of short DNA sequences believed to be functionally equivalent and allows a 'fair' simultaneous testing of patterns of all degrees of degeneracy.
Abstract: An algorithm has been developed for the identification of unknown patterns which are distinctive for a set of short DNA sequences believed to be functionally equivalent. A pattern is defined as being a string, containing fully or partially specified nucleotides at each position of the string. The advantage of this 'vague' definition of the pattern is that it imposes minimum constraints on the characterization of patterns. A new feature of the approach developed here is that it allows a 'fair' simultaneous testing of patterns of all degrees of degeneracy. This analysis is based on an evaluation of inhomogeneity in the empirical occurrence distribution of any such pattern within a set of sequences. The use of the nonparametric kernel density estimation of Parzen allows one to assess small disturbances among the sequence alignments. The method also makes it possible to identify sequence subsets with different characteristic patterns. This algorithm was implemented in the analysis of patterns characteristic of sets of promoters, terminators and splice junction sequences. The results are compared with those obtained by other methods.

Book ChapterDOI
01 Mar 1987
TL;DR: This paper presents a technique for synthesizing systolic arrays which have non-uniform data flow governed by control signals and discusses how it is possible to automatically derive control signals that govern the data flow by applying the same pipelining transformations to these linear conditional expressions.
Abstract: We present a technique for synthesizing systolic arrays which have non-uniform data flow governed by control signals. The starting point for the synthesis is a Recurrence Equation with Linear Depencencies (RELD) which is a generalization of the simple recurrences encountered in mathematics. A large class of programs, including all (single and multiple) nested-loop programs can be described by such recurrences. In this paper we extend some of our earlier work [17] in two principal directions. Firstly, we describe a transformation called multistage pipelining and show that it yields recurrences that have linear conditional expressions governing the computation. Secondly, we discuss how it is possible to automatically derive control signals that govern the data flow by applying the same pipelining transformations to these linear conditional expressions. The approach is illustrated by deriving the Guibas-Kung-Thompson architecture for optimum string parenthesization.

Journal ArticleDOI
TL;DR: Algorithms of lower complexity are obtained for solving the problem of whether or not a given finite string-rewriting system R is confluent on a given congruence class [w]R, when only length-reducing systems are considered.

Journal ArticleDOI
TL;DR: A character string search engine for rapid text retrieval has been developed which accommodates a novel string-search architecture which combines a 512-stage finite-state automaton (FSA) logic with a recently developed content addressable memory (CAM).
Abstract: A character string search engine (SSE) for rapid text retrieval has been developed. The SSE accommodates a novel string-search architecture which combines a 512-stage finite-state automaton (FSA) logic with a recently developed content addressable memory (CAM) to achieve an approximate string comparison of 80 million strings per second. The CAM cell consists of four conventional static RAM (SRAM) cells and a read/write circuit. Concurrent comparison of 64 stored strings with variable length has been achieved in 50 ns for an input text stream of 10 million characters/s, permitting performance despite the presence of single character errors in the form of character codes. Furthermore, this chip allows nonanchor string search and variable-length `don't care' (VLDC) string search. The SSE chip has 217600 transistors in an 8.62/spl times/12.76-mm die area. The technology used was a double-metal 1.6-/spl mu/m n-well CMOS process.

Journal ArticleDOI
01 Jun 1987
TL;DR: A preliminary computer implementation of the procedure has been used to prove a theorem about minimal presentations of free nilpotent groups of class 3 and may be combined with work of Baumslag et al. (1981) to prove that the polycyclicity of a finitely presented group can be verified.
Abstract: This paper describes a new procedure, based on string rewriting rules, for verifying that a finitely presented group G is nilpotent. If G is not nilpotent, the procedure may not terminate. A preliminary computer implementation of the procedure has been used to prove a theorem about minimal presentations of free nilpotent groups of class 3. Finally, it is shown that the ideas presented here may be combined with work of Baumslag et al. (1981) to prove that the polycyclicity of a finitely presented group can be verified.

PatentDOI
TL;DR: The authors propose a method for synthesizing word baseforms for words not spoken during a training session, where each synthesized baseform represents a series of models from a first set of models.
Abstract: Apparatus and method for synthesizing word baseforms for words not spoken during a training session, wherein each synthesized baseform represents a series of models from a first set of models, which include: (a) uttering speech during a training session and representing the uttered speech as a sequence of models from a second set of models; (b) for each of at least some of the second set models spoken in a given phonetic model context during the training session, storing a respective string of first set models; and (c) constructing a word baseform of first set models for a word not spoken during the training session, including the step of representing each piece of a word that corresponds to a second set model in a given context by the stored respective string, if any, corresponding thereto.

Patent
24 Jun 1987
TL;DR: In this paper, a plurality of code books corresponding to the sorts of features of input voices are prepared, respective code books are quantized and recognition is executed by using a plurality found code strings.
Abstract: PURPOSE: To reduce learning samples and to shorten calculation time by using separative vector quantization for individually generating a code book in each feature as vector quantization and executing individual vector quantization. CONSTITUTION: A plurality of code books corresponding to the sorts of features of input voices are prepared, respective code book are quantized and recognition is executed by using a plurality of found code strings. Namely a voice signal is amplified by an amplifier, a return noise is removed by a low pass filter 2, the noise-removed voice signal is converted into a digital signal by an A/D converter 3 and the feature of the voice is extracted by a computer 5. A feature string in each extracted feature is collated with an already stored reference pattern by a matching part, based upon a splite method, the matching distance is sent to a result judging part 5, whether the result is suitable for a recognition candidate or not is judged, and the recognition result is outputted. Consequently, learning samples can be reduced and calculation volume can be reduced. COPYRIGHT: (C)1989,JPO

Book ChapterDOI
01 Jul 1987
TL;DR: In this article, a parallel algorithm for constructing a suffix tree is presented, which runs in O(log n) time and uses n processors, where n is the number of processors.
Abstract: Weiner's [We-73] suffix tree is known to be a powerful tool for string manipulations. We present a parallel algorithm for constructing a suffix tree. The algorithm runs in O(log n) time and uses n processors. We also present applications for designing efficient parallel algorithms for several string problems.

Patent
10 Feb 1987
TL;DR: In this paper, a skip table is prepared from which a state of a subsequent symbol string and an address of one or plural symbols to be subsequently inputted can be readily determined by making reference to a set of a current symbol string search state and one or multiple symbols to subsequently be input into the skip table.
Abstract: A skip table is prepared from which a state of a subsequent symbol string and an address of one or plural symbols to be subsequently inputted can be readily determined by making reference to a set of a current symbol string search state and one or plural symbols to be subsequently inputted of the symbol string. When executing searching for the symbol string, data stored in the skip table are looked up to assure the symbol string search by inputting only a minimized number of necessary characters of the symbol string. Necessity of inputting all the characters of the symbol string for searching is eliminated and the processing speed can be increased considerably. A plurality of symbol strings may be searched for.

Patent
30 Apr 1987
TL;DR: In this article, the authors proposed a method to recognize characters written in the running hand even if the number of character patterns stored in a character dictionary table is small, by absorbing variation of the numbers of strokes and a pattern even if it is generated in a partial pattern.
Abstract: PURPOSE:To recognize characters written in the running hand even if the number of character patterns stored in a character dictionary table is small, by absorbing variation of the number of strokes and a pattern even if it is generated in a partial pattern, by an identification operation in a partial identification means CONSTITUTION:As for an input character pattern from an information input part 1, the feature of the number of strokes and a feature point coordinate of each stroke, etc is extracted by a feature extracting means 3, and when this feature is inputted, a partial identification means 5 compares the input character pattern and all partial patterns which have been registered in advance A character identification means 8 limits the number of strokes of all character patterns with regard to the partial pattern which has been identified by the identification means 5, and decides whether they are similar or not In this way, prior to comparison of an input stroke string and a pattern which has been registered in a character dictionary table 7, classification is executed by deriving the similarly of the input stroke string and the partial pattern of the character which has been determined in advance, the processing quantity is decreased remarkably, also, especially variation of the number of strokes by characters written in the running hand in a sub-pattern are corrected and many characters can be recognized without increasing the capacity of a dictionary table