scispace - formally typeset
Search or ask a question

Showing papers on "String (computer science) published in 1988"


Journal ArticleDOI
TL;DR: The development and implementation of an algorithm for automated text string separation that is relatively independent of changes in text font style and size and of string orientation are described and showed superior performance compared to other techniques.
Abstract: The development and implementation of an algorithm for automated text string separation that is relatively independent of changes in text font style and size and of string orientation are described. It is intended for use in an automated system for document analysis. The principal parts of the algorithm are the generation of connected components and the application of the Hough transform in order to group components into logical character strings that can then be separated from the graphics. The algorithm outputs two images, one containing text strings and the other graphics. These images can then be processed by suitable character recognition and graphics recognition systems. The performance of the algorithm, both in terms of its effectiveness and computational efficiency, was evaluated using several test images and showed superior performance compared to other techniques. >

664 citations


Journal ArticleDOI
TL;DR: It is shown that there exist exponentially many square-free and cube-free strings of each length over these alphabets and arguments for the nonexistence of various RT(n)th power-free homomorphisms are provided.

184 citations


Journal ArticleDOI
TL;DR: This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms and special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximatestring matching.

153 citations


Journal ArticleDOI
TL;DR: This paper presents a CRCW parallel RAM algorithm that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors that requires Θ(n2) space.
Abstract: Many string manipulations can be performed efficiently on suffix trees. In this paper a CRCW parallel RAM algorithm is presented that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors. The algorithm requires ź(n2) space. However, the space needed can be reduced toO(n1+ź) for any 0< ź ≤1, with a corresponding slow-down proportional to 1/ź. Efficient parallel procedures are also given for some string problems that can be solved with suffix trees.

152 citations


Proceedings ArticleDOI
11 Apr 1988
TL;DR: This investigation provides an estimate of an appropriate number of Japanese phoneme sequences in this synthesis scheme, which has two advantages in the usage of speech segment units: flexible use of nonuniform synthesis units and the optimal choice of a unit sequence for an input phoneme string using appropriateness measures.
Abstract: A synthesis scheme is proposed for the optimal selection and extraction of synthesis units. This synthesis scheme has two advantages in the usage of speech segment units. One is the flexible use of nonuniform synthesis units and the other is the optimal choice of a unit sequence for an input phoneme string using appropriateness measures. In this synthesis system, all phoneme subsequences in each synthesis unit are also used as synthesis units. To facilitate the search for appropriate unit candidates, the phoneme sequence entry dictionary is made for a given speech set. Using this entry dictionary, all candidate unit templates are listed for a given input phoneme string. The most appropriate one is selected based on an appropriateness measure. In addition, the author presents a statistic analysis results of Japanese phoneme sequences based on distributions in the word dictionary and text corpora. This investigation provides an estimate of an appropriate number of Japanese phoneme sequences in this synthesis scheme. >

127 citations


Patent
24 Aug 1988
TL;DR: In this article, a user-interactive speech recognition control system is disclosed for recognizing a complete sequence of keywords (e.g., a telephone number such as 123-4567) via entering, verifying, and editing variable-length utterance strings separated by the user-defined placement of pauses.
Abstract: A user-interactive speech recognition control system is disclosed for recognizing a complete sequence of keywords (e.g., a telephone number such as 123-4567) via entering, verifying, and editing variable-length utterance strings (e.g., 1-2-3; 4-5; 6-7) separated by the user-defined placement of pauses. The device controller (120) utilizes timers (124) to monitor the pause time between partial-sequence digit strings recognized by the speech recognizer (110). When a string of digits is followed by a predetermined pause time interval, the recognized digits will be replied via the speech synthesizer (130). An additional string of digits can then be entered, and only the subsequent string will be replied after the next pause. Furthermore, the user has the flexibility to correct only the last digit string entered, or the entire sequence. Hence, if there is an error in only one digit, the erroneous digit string can be corrected without having to re-enter the entire digit sequence. The invention is well-suited to be used in a hands-free voice command dialing system for a mobile radiotelephone, wherein vehicular background noise may affect recognition accuracy.

123 citations


Patent
10 Feb 1988
TL;DR: A parallel string processor for use in a minicomputer for searching portions of text or binary bit strings for the presence of desired words or bit strings is described in this article.
Abstract: A parallel string processor for use in a minicomputer for searching portions of text or binary bit strings for the presence of desired words or bit strings. The processor includes a first register (136) in which a keyword string is stored and a pair of interconnected shift registers (186 and 188) in which the string to be searched for the presence of the keyword is stored. An arithmetic logic unit (140) compares the contents of the first register with one of the shift registers to determine whether the keyword is present in the portion of the string being searched. After each such comparison, the contents of the interconnected shift registers are shifted with respect to the keyword stored in the first register. When the processor is searching for the presence of a keyword having a predetermined number of bytes, the contents of the shift registers are shifted one byte at a time, and when the processor is searching for the presence of a keyword having a predetermined number of bits, the contents of the shift registers are shifted one bit at a time.

120 citations


Proceedings ArticleDOI
11 Apr 1988
TL;DR: An enhanced analysis feature set consisting of both instantaneous and transitional spectral information is used and the hidden-Markov-model-based connected digit recognizer is tested in speaker-trained, multispeaker, and speaker-independent modes.
Abstract: Algorithms for connected-word recognition based on whole-word reference patterns have become increasingly sophisticated and have been shown capable of achieving high recognition performance for small or syntax-constrained moderate-size vocabularies in a speaker-trained mode. An enhanced analysis feature set consisting of both instantaneous and transitional spectral information is used and the hidden-Markov-model-based connected digit recognizer is tested in speaker-trained, multispeaker, and speaker-independent modes. The performance achieved was 0.35, 1.65 and 1.75% string error rates, respectively, for known length strings and 0.78, 2.85 and 2.94% string error rates, respectively, for unknown length strings. >

110 citations


Patent
09 Aug 1988
TL;DR: A computer input system for Chinese and Japanese characters is described in this article, where the different strokes for composing the characters are classified into different groups, each identified by a code number, and the strings of code numbers are stored in memory where the strings contain only as many code numbers as are necessary to identify the characters.
Abstract: A computer input system for Chinese and Japanese characters. The different strokes for composing the characters are classified into different groups, each identified by a code number. The strings of code numbers are stored in memory where the strings contain only as many code numbers as are necessary to identify the characters. Strings for two or more characters used together as compounds are also stored. When a code number entered by an operator matches a string stored, a controller causes the shape of the actual character to be fetched from memory and displayed. For some characters, partial characters are also stored and are fetched and displayed when the string of code numbers for such partial characters matches the code numbers entered to aid beginners. The string of code numbers representing each character follows exactly a traditional; writing sequence of the character from the very first stroke to the end of the string of code numbers. However, the computer usually identifies the character before the entire writing sequence is completed, particularly for characters used together in compounds.

102 citations


Journal ArticleDOI
TL;DR: A new similarity measure based on the Levenshtein metric is defined for this comparison and the resulting method is both computationally fast and storage‐efficient.
Abstract: Approximate string matching is an important operation in information systems because an input string is often an inexact match to the strings already stored. Commonly known accurate methods are computationally expensive as they compare the input string to every entry in the stored dictionary. This paper describes a two-stage process. The first uses a very compact n-gram table to preselect sets of roughly similar strings. The second stage compares these with the input string using an accurate method to give an accurately matched set of strings. A new similarity measure based on the Levenshtein metric is defined for this comparison. The resulting method is both computationally fast and storage-efficient.

70 citations


Patent
Kousuke Takahashi1
20 Jan 1988
TL;DR: In this article, a character identification device for identifying an input character to produce an identified code was proposed, where a memory circuit (40) decides a match between the input character and stored characters preliminarily stored therein to produce a character match signal.
Abstract: In a character identification device for identifying an input character to produce an identified code, a memory circuit (40) decides a match between the input character and stored characters preliminarily stored therein to produce a character match signal. An encoder (50) encodes the character match signal into the identified code. The memory circuit (40) includes a plurality of memory areas (41 to 44). A memory area selector (54) selects a specific memory area in response to a selection signal produced from a signal producing circuit (52) to supply the input character to the specific memory area. In a character string identification device, a processing circuit (90) uses the character match signal to generate a string match signal which is encoded into the identified code by the encoder (51). The processing circuit (90) may include several processing sections (91 to 94) equal in number to the memory areas. An activating arrangement (182) activates a particular one of the processing sections that corresponds to the specific memory area. The identified code may be provided to the memory circuit as the input character through an interruption switch circuit (226).

Patent
03 Jun 1988
TL;DR: In this article, a data compression/decompression apparatus employs common circuitry and a single string table for compression and decompression, with a throttle control to prevent data under-runs and an optimizing start-up control to delay the start of the recording device until the compression apparatus has compressed sufficient data to effeciently reduce throttling and loss of compression when the output device is started.
Abstract: A data compression/decompression apparatus employs common circuitry and a single string table for compression and decompression. A throttle control is provided to prevent data under-runs and an optimizing start-up control delays the start-up of the recording device until the compression apparatus has compressed sufficient data to effeciently reduce throttling and loss of compression when the output device is started. The decompression apparatus may operate to decompress compressed data when the compressed data is read in either the same direction as it was recorded, or read in the direction reverse to that in which it was recorded. A further feature is the provision of a counter which is incremented by one after a predetermined number of string codes have been written into the string table. The output of the counter is stored in the string table with each string code and prefix code. When searching for an empty or usable location in the string table, the count value read from the location is compared with the count in the counter. If the two counts are not equal then the location is considered "empty" and may be written into. This arrangement avoids the usual procedure of intermittently clearing each location of the string table individually since stepping the counter is equivalent to clearing the entire table.

Proceedings ArticleDOI
01 Dec 1988
TL;DR: It is shown that the problem of learning a subfamily of regular languages can be reduced to theproblem of learning its finite members and this reduction shows that the family of κ-bounded regular languages is learnable in polynomial time.
Abstract: We study the problem of learning an unknown language given a teacher which can only answer equivalence queries. The teacher presenting a language L can test (in unit time) whether a conjectured language L ′ is equal to L and, if L ′ ≠ L , provide a counterexample (i.e., a string in the symmetric difference of L and L ′). It has recently been shown that the family of regular languages and the family of pattern languages are not learnable in polynomial time under this protocol. We consider the learnability of subfamilies of regular languages. It is shown that the problem of learning a subfamily of regular languages can be reduced to the problem of learning its finite members. Using this reduction, we show that the family of κ-bounded regular languages is learnable in polynomial time. We investigate how a partial ordering on counterexamples affects the learnability of the family of regular languages and the family of pattern languages. Two partial orderings are considered: ordering by length and lexicographical ordering. We show that the first ordering on counterexamples does not reduce the complexity of learning the family of regular languages. In contrast, the family of pattern languages is learnable in polynomial time if the teacher always provides counterexamples of minimal length and the family of regular languages is learnable in polynomial time if the teacher always provides the lexicographically first counterexamples.

Journal ArticleDOI
TL;DR: In this article, the composite string representation and the generalized Ocneanu's trace lead to a sequence of two-variable link polynomials, and algebraic aspects of composite string representations are studied in some detail.
Abstract: New link polynomials, reported in I and II of the series, are extended into those with two variables. A concept of composite string is introduced. It is shown that the composite string representation and the generalized Ocneanu's trace lead to a sequence of two-variable link polynomials. In addition, algebraic aspects of the composite string representation are studied in some detail.

Journal ArticleDOI
TL;DR: Experimental results are given which indicate that, with the exception of the don't-care method, each of these methods has a problem class in which it is clearly superior to the others.
Abstract: A description is given of a theory for, and the application of, a general algorithm for determining whether a given multilevel Boolean function is a tautology or whether two given multilevel Boolean functions are equivalent. Four specific cases of this general algorithm are examined. These are termed the flattening method, the don't-care method, the simulation method, and the algebraic string comparison method. A single unifying algorithm frame is given for the implementation of any of these four methods, depending on parameterization. Experimental results are given which indicate that, with the exception of the don't-care method, each of these methods has a problem class in which it is clearly superior to the others. The primary application of these algorithms is as a verification tool for silicon compilation systems. However, these algorithms are also being used as the foundation for multilevel logic minimization and automatic test pattern generation programs. >

Proceedings ArticleDOI
22 Aug 1988
TL;DR: It is argued that a SSTC is in fact composed of two interrelated correspondences, one between nodes and substrings, and the other between subtrees and sub strings, the substrings being possibly discontinucus in both cases.
Abstract: The correspondence between a string of a language and its abstract representation, usually a (decorated) tree, is not straightforward. However, it is desirable to maintain it, for example to build structured editors for texts written in natural language. As such correspondences must be compositional, we call them "Structured String-free Correspondences" (SSTC).We argue that a SSTC is in fact composed of two interrelated correspondences, one between nodes and substrings, and the other between subtrees and substrings, the substrings being possibly discontinucus in both cases. We then proceed to show how to define a SSTC with a Structural Correspondence Static Grammar (SCSG), and which constraints to put on the rules of the SCSG to get a "natural" SSTC.

Journal ArticleDOI
TL;DR: The addition of certain information to a string descriptor and enhancements to existing copying garbage collection algorithms that permit linked data structures and strings to be allocated and garbage collected from a shared region of memory in real time are described.
Abstract: Modern high-level languages frequently need to collect garbage not only from regions of linked structures similar to LISP's dotted pairs, but also from string regions where data is organized as an array of characters. Some characteristics of string regions that make garbage collection particularly difficult are as follows: multiple pointers to the same characters within the array are allowed and encouraged; all possible character values are legitimate as data, so it is not possible to ‘mark’ a string by overwriting with a reserved character; and a character is generally much smaller than a pointer, so it is not possible to overwrite a single character value with a forwarding pointer to a new location for a particular string. This paper describes the addition of certain information to a string descriptor and enhancements to existing copying garbage collection algorithms that permit linked data structures and strings to be allocated and garbage collected from a shared region of memory in real time. This algorithm is real-time in the sense that the time required for allocation of each basic unit of memory is bounded by a constant. An analysis of performance is reported, and comparisons are made with traditional garbage collection.

Patent
10 Aug 1988
TL;DR: In this paper, a highly efficient string search algorithm and circuit are disclosed, which utilizes candidate-data-parallel, target data serial comparisons along with an early mismatch detection mechanism to locate a target in a candidate data base.
Abstract: A highly efficient string search algorithm and circuit are disclosed. The string search algorithm utilizes candidate-data-parallel, target data serial comparisons along with an early mismatch detection mechanism to locate a target in a candidate data base in a highly efficient manner.

Proceedings ArticleDOI
07 Jun 1988
TL;DR: This paper analyzed definitions from Webster's Seventh New Collegiate Dictionary using Sager's Linguistic String Parser and again using basic UNIX text processing utilities such as grep and awk, and discusses possible future lines of research exploiting and combining their respective strengths.
Abstract: We have analyzed definitions from Webster's Seventh New Collegiate Dictionary using Sager's Linguistic String Parser and again using basic UNIX text processing utilities such as grep and awk. This paper evaluates both procedures, compares their results, and discusses possible future lines of research exploiting and combining their respective strengths.

PatentDOI
TL;DR: In this paper, the authors describe a guitar tuner that includes a manually graspable body with a head projecting outwardly from one end driven by a motor inside the body to rotate about an axis.
Abstract: A guitar tuner includes a manually graspable body with a head projecting outwardly from one end driven by a motor inside the body to rotate about an axis. The head includes a slot shaped opening for grasping the key of a conventional instrument such as a guitar. An input sensor in the body detects a tone from the string of the instrument and converts it to square wave of the detected frequency. This is compared by a microprocessor with the closest adjacent intended frequency of the instrument selected which acts to drive the motor to tighten or loosen the string as required to attain the required frequency. A digital display provides a readout of the number and letter of the string concerned. The user can select any one of a number of different states of the device for use with different instruments or different tunings. The user can also override the automatic string selection in cases where the string is a long way from its required frequency.

Patent
Yoshiyuki Murata1, Hajime Manabe1
28 Dec 1988
TL;DR: In this paper, a musical tone having a corresponding sound frequency is generated based on data for defining said sound frequency. But, the frequency is not defined in terms of a reference pitch data extracted through string-picking manipulation.
Abstract: In an electronic tuning apparatus used in electronic stringed instruments such as an electronic guitar, an electronic violin, and so on, at least one string is extended along the fingerboard. Prior to picking performance, a present state of the extended string is examined through picking said string. Preferably, a reference pitch data extracted through the string-picking manipulation is stored. During a live picking performance, a performance-pitch data extracted is converted into a data for defining a properly-tuned sound-frequency in accordance with said extended string state. A musical-tone having a corresponding sound frequency is generated based on data for defining said sound frequency.

Patent
29 Apr 1988
TL;DR: In this paper, a character recognition apparatus is arranged such that: a character pattern is extracted as a rectangle from the inputted picture image data; the number of picture element points corresponding to the direction code of the boundary point of the character portion is provided as the boundary direction density for each region of the recognition object character pattern for each subregion provided by the division of character pattern; the scanning operation is performed with respect to the side opposite respectively from four sides of the extracted rectangle.
Abstract: A character recognition apparatus is arranged such that: a character pattern is extracted as a rectangle from the inputted picture image data; the number of picture element points corresponding to the direction code of the boundary point of the character portion is provided as the boundary direction density for each region of the recognition object character pattern for each sub-region provided by the division of character pattern; the scanning operation is performed with respect to the side opposite respectively from four sides of the extracted rectangle; the picture element point which changes from the background to the character portion is defined as a change point; the picture element string number is increased by one each time the change point is detected so as to add the picture string number to each picture element; the number of the picture element point corresponding to the picture element string number is provided as the background density for each region of the recognition object character pattern for each sub-region to be divided; the character recognition is performed using the boundary direction density for each region and the background density for each region

Book ChapterDOI
11 Feb 1988
TL;DR: A pattern is a string consisting of terminals and variables that simulates another pattern when its language contains that of the other one.
Abstract: A pattern is a string consisting of terminals and variables. The language defined by a pattern is the set of terminal strings obtained by substituting (consistently) terminal strings to its variables. A pattern simulates another pattern when its language contains that of the other one.

Journal ArticleDOI
TL;DR: The main result of this paper is to establish the equivalence of the sequences which appear random to all FPMs and the ∞-distributed sequences, where every string of length k occurs in the sequence with frequency 2 − k , for all positive integers k .

Patent
25 May 1988
TL;DR: In this article, a look-up table is used to encode all character sets of one or two letters into a coded string, where search of the memory is limited to only a few main branches of the tree.
Abstract: Improvements in a hand-held spelling machine increase the speed with which a query word is compared against the words in memory. One technique is to provide a look up table to encode all character sets of one or two letters into a coded string. Where the set of letters is three or more characters, a previously known algorithm is employed. Search of the memory is limited to only a few main branches of the tree. The limitation is a function of the first query word letter. The time it takes to calculate the similarity function is saved in two circumstances. When a similarity function is calculated at a particular level of the tree and found to be great enough so that there is no prune of the tree, then that decision not to prune is carried forward for other tree branches having the same letters prior to the level involved.

PatentDOI
TL;DR: In this article, a tremolo device for stringed musical instruments comprising a string support assembly consisting of at least two individually pivoted segments, each individually carrying a string anchoring member, a spring for biasing the string assembly segment in the direction for applying a tension to the string and a handle bar for pivoting the segment against the biasing force of the spring to reduce the tension of the string.
Abstract: To the end of obtaining favorable vibrato effects, provided is a tremolo device for stringed musical instruments comprising a string support assembly consisting of at least two individually pivoted segments, each individually carrying a string anchoring member, a spring for biasing the string support assembly segment in the direction for applying a tension to the string and a handle bar for pivoting the segment against the biasing force of the spring in the direction to reduce the tension of the string. By selectively or jointly moving the handle bars of the different segments, hitherto unknown special vibrato effects can be created. Optionally, a provision may be made so that the different segments may be coupled into a single body for producing conventional vibrato effects. Since the string support assembly is segmented, breakage of any single string will only affect the corresponding segment and not the other segment, the tuning process subsequent to the replacement of the broken string can be facilitated.

01 Jan 1988
TL;DR: An algorithm that runs in <9(logmlogrt) time and uses mn processors on a CRCW PRAM, where m and n are the lengths of the strings and the largest common submatrix of two matrices is considered and shown to be NP-hard.
Abstract: We consider the problem of determining in parallel the cost of converting a source string to a destination string by a sequence of insert, delete and transform operations. Each operation has an integer cost in some fixed range. We present an algorithm that runs in <9(logmlogrt) time and uses mn processors on a CRCW PRAM, where m and n are the lengths of the strings. The best known sequential algorithm [MP83] runs in time 0(n/ log n) for strings of length n, indicating that our parallel algorithm (with time-processor product equal to 0(mn log m log n)) is nearly optimal. An instance of the edit distance problem is represented as a graph. The algorithm finds the shortest path in the graph using a path doubling method with efficient pruning due to the structure of the problem. Extensions of the algorithm solve approximate string matching and local best fit problems. The problem of finding the largest common submatrix of two matrices is considered and shown to be NP-hard. Finally we present an algorithm for exact two-dimensional pattern matching that runs in OClog n) time using n processors for a n x n search matrix.

Proceedings ArticleDOI
Allen Louis Gorin1, D.B. Roe1
11 Apr 1988
TL;DR: The authors describe a parallel frame-synchronous level-building algorithm, utilizing HMM word-models, for connected-speech recognition on a tree-structured parallel computer that achieves 98.3% string accuracy on the Texas Instruments digit data base.
Abstract: The authors describe a parallel frame-synchronous level-building algorithm, utilizing HMM word-models, for connected-speech recognition on a tree-structured parallel computer. The algorithm is scalable in the sense that the source code and execution time remain essentially the same as vocabulary size increases, so long as the hardware is scaled proportionally. An illustrative sizing and timing analysis of a speaker-independent connected-digit recognizer on the ASPEN tree-machine is described. This algorithm executes in real-time and achieves 98.3% string accuracy on the Texas Instruments digit data base. >


Patent
08 Apr 1988
TL;DR: A sorting technique which relies on the operating system collating weights of characters to the extent that a collating weight difference exists in any of the pairs of corresponding characters of two different strings of characters being compared is described in this article.
Abstract: A sorting technique which relies on the operating system collating weights of characters to the extent that a collating weight difference exists in any of the pairs of corresponding characters of two different strings of characters being compared. While this comparision is being made, the first tie of collating weights for a pair of nonidentical corresponding characters triggers a comparison of the ASCII code values of the two corresponding characters which tied. Assuming that such a tie has occurred, and if, after reaching the end of this process no differences in the collating weights of corresponding characters are found, then if one string has a corresponding character with a lower ASCII value, that string is considered to precede the other string in the alphabetic sequence. This results in an automated alphabetizing procedure which is consistent regardless of the order in which the character strings are sorted, while retaining the flavor of the language conventions when possible, and while providing a solution not requiring substantial extra computing power.