Showing papers on "String (computer science) published in 1993"

PDF

Open Access

Patent•

Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation

[...]

Julian M. Kupiec¹•Institutions (1)

Xerox¹

24 Jun 1993

TL;DR: In this article, a computerized method for organizing information retrieval based on the content of a set of primary documents is presented, which generates answer hypotheses based on text found in the primary documents and, typically, a natural language input string such as a question.

...read moreread less

Abstract: A computerized method for organizing information retrieval based on the content of a set of primary documents. The method generates answer hypotheses based on text found in the primary documents and, typically, a natural-language input string such as a question. The answer hypotheses can include phrases or words not present in the input string. Answer hypotheses are verified and ranked based on their verification evidence. A text corpus can be queried to provide verification evidence not present in the primary documents. In another aspect the method is implemented in the context of a larger two-phase method, of which the first phase comprises the method of the invention and the second phase of the method comprises answer extraction.

...read moreread less

242 citations

Patent•DOI•

Recognition unit model training based on competing word and word string models

[...]

Wu Chou¹, Biing-Hwang Juang¹•Institutions (1)

Alcatel-Lucent¹

15 Mar 1993-Journal of the Acoustical Society of America

TL;DR: The principle of minimum recognition error rate is applied by the present invention using discriminative training and various issues related to the special structure of HMMs are presented.

...read moreread less

Abstract: A system pattern-based speech recognition, e.g., a hidden Markov model (HMM) based speech recognizer using Viterbi scoring. The principle of minimum recognition error rate is applied by the present invention using discriminative training. Various issues related to the special structure of HMMs are presented. Parameter update expressions for HMMs are provided.

...read moreread less

238 citations

Patent•DOI•

Automatic speech recognizer

[...]

Enrico Bocchieri¹, Sedat Ibrahim Gahanna Gokcen¹, Rajendra Prasad Gahanna Mikkilineni¹, David B. Roe¹, Jay G. Wilpon¹ - Show less +1 more•Institutions (1)

AT&T¹

25 Mar 1993-Journal of the Acoustical Society of America

TL;DR: In this paper, the authors present a method for recording data in a speech recognition system and recognizing spoken data corresponding to the recorded data using phonetic transcriptions from the entered data.

...read moreread less

Abstract: Apparatus and method for recording data in a speech recognition system and recognizing spoken data corresponding to the recorded data. The apparatus and method responds to entered data by generating a string of phonetic transcriptions from the entered data. The data and generated phonetic transcription string associated therewith is recorded in a vocabulary lexicon of the speech recognition system. The apparatus and method responds to receipt of spoken data by constructing a model of subwords characteristic of the spoken data and compares the constructed subword model with ones of the recorded lexicon vocabulary recorded phonetic transcription strings to recognize the spoken data as the data identified by and associated with a phonetic transcription string matching the constructed subword string.

...read moreread less

213 citations

Journal Article•DOI•

Efficient methods for multiple sequence alignment with guaranteed error bounds

[...]

Dan Gusfield¹•Institutions (1)

University of California, Davis¹

01 Jan 1993-Bulletin of Mathematical Biology

TL;DR: This paper considers two previously proposed measures, and given two computationaly efficient multiple alignment methods whose deviation from the optimal value is guaranteed to be less than a factor of two, gives a related randomized method which gives, with high probability, multiple alignments with fairly small error bounds.

...read moreread less

198 citations

Patent•DOI•

Method for recognizing alphanumeric strings spoken over a telephone network

[...]

Alan K. Hunt, Thomas B. Schalk

22 Jun 1993-Journal of the Acoustical Society of America

TL;DR: In this article, a method for recognizing alphanumeric strings spoken over a telephone network is described, where individual character recognition need not be uniformly high in order to achieve high string recognition accuracy.

...read moreread less

Abstract: The present invention describes a method for recognizing alphanumeric strings spoken over a telephone network (10) wherein individual character recognition need not to be uniformly high in order to achieve high string recognition accuracy. Preferably, the method uses a processing system (14) having a digital processor, an interface (42) to the telephone network (10), and a database (32) for storing a predetermined set of reference alphanumeric strings. In operation, the system (10) prompts the caller to speak the characters of a string, and characters are recognized using a speaker-independent voice recognition algorithm (48). The method calculates recognition distances between each spoken input character and the corresponding letter or digit in the same position within each reference alphanumeric string. After each character is spoken (206), captured and analyzed (208), each reference string distance is incremented (204) and the process is continued, accumulating distances for each reference string, until the last character is spoken. The reference string with the lowest cumulative distance is then declared to be the recognized string (210).

...read moreread less

184 citations

Patent•

Language-sensitive text searching system with modified Boyer-Moore process

[...]

Mark E. Davis, Judy Lin

25 Mar 1993

TL;DR: In this paper, a method and system for providing a language-sensitive text search that performs text comparison of any Unicode strings is presented. But this method is limited to the case where the string is examined and a compare is performed one or more characters at a time based on a predefined character precedence.

...read moreread less

Abstract: A method and system for providing a language-sensitive text search that performs text comparison of any Unicode strings. For any language an ordering is defined based on features of the language. Then, an interactive compare function is performed to determine the relationship of a pair of strings. The string is examined and a compare is performed one or more characters at a time based on a predefined character precedence.

...read moreread less

164 citations

Patent•

Forward and reverse Boyer-Moore string searching of multilingual text having a defined collation order

[...]

Mark Edward Davis

25 Mar 1993

TL;DR: In this article, a method and system for providing a language-sensitive text compare is presented that performs text comparison of any Unicode strings, and an interactive compare function is performed to determine the relationship of a pair of strings.

...read moreread less

Abstract: A method and system for providing a language-sensitive text compare. An innovative system and method for performing the compare is presented that performs text comparison of any Unicode strings. For any language an ordering is defined based on features of the language. Then, an interactive compare function is performed to determine the relationship of a pair of strings. The string is examined and a compare is performed one character at a time based on a predefined character precedence.

...read moreread less

163 citations

Book Chapter•DOI•

Approximate String-Matching over Suffix Trees

[...]

Esko Ukkonen¹•Institutions (1)

University of Helsinki¹

02 Jun 1993

TL;DR: It is shown how the searches can be done fast using the suffix tree of T augmented with the suffix links as the preprocessed form of T and applying dynamic programming over the tree.

...read moreread less

Abstract: The classical approximate string-matching problem of finding the locations of approximate occurrences P′ of pattern string P in text string T such that the edit distance between P and P′ is ≤ k is considered. We concentrate on the special case in which T is available for preprocessing before the searches with varying P and k. It is shown how the searches can be done fast using the suffix tree of T augmented with the suffix links as the preprocessed form of T and applying dynamic programming over the tree. Three variations of the search algorithm are developed with running times O(mq + n), O(mq log q + size of the output), and O(m2q + size of the output). Here n = ¦T¦, m = ¦P¦, and q varies depending on the problem instance between 0 and n. In the case of the unit cost edit distance it is shown that q = O(min(n, mk+1¦∑¦ k )) where ∑ is the alphabet.

...read moreread less

159 citations

Journal Article•DOI•

Applications of approximate string matching to 2D shape recognition

[...]

Horst Bunke¹, Urs Bühler¹•Institutions (1)

University of Bern¹

01 Dec 1993-Pattern Recognition

TL;DR: A new method for the recognition of arbitrary two-dimensional shapes based on string edit distance computation is described, which is invariant under translation, rotation, scaling and partial occlusion.

...read moreread less

154 citations

A String Representation of LDAP Search Filters

[...]

T. Howes

01 Dec 1993

TL;DR: This document defines a human-readable string format for representing LDAP search filters, a network representation of a search filter transmitted to an LDAP server.

...read moreread less

Abstract: The Lightweight Directory Access Protocol (LDAP) [1] defines a network representation of a search filter transmitted to an LDAP server. Some applications may find it useful to have a common way of representing these search filters in a human-readable form. This document defines a human-readable string format for representing LDAP search filters.

...read moreread less

140 citations

Patent•

Universal symbolic handwriting recognition system

[...]

II Frank C. Carman

17 Sep 1993

TL;DR: In this article, a universal symbolic handwriting recognition system for converting user entered time ordered stroke sequences into computer readable text is described, which operates on two levels: (1) a word-level recognizer, which recognizes the entire group of strokes as a unit, and (2) a parser-level recognition, which breaks the strokes into segments and recognizes groups of stroke segments within a word.

...read moreread less

Abstract: A universal symbolic handwriting recognition system for converting user entered time ordered stroke sequences into computer readable text is described. The system operates on two levels: (1) a word-level recognizer, which recognizes the entire group of strokes as a unit, and (2) a parser-level recognizer, which breaks the strokes into segments and recognizes groups of stroke segments within a word, thus recognizing separate characters or character sequences within a word to build a complete recognition string. In both recognition levels, the system trains on actual user samples, either on an entire word, or on a character or character sequence within a word. It does so by building a user specific sample recognition data-base file of text/pattern pairs, where the text is specified by the user in a word confirmation process and the pattern, composed of an index and a feature vector, is created from the actual user input strokes. Thus, as the user continues to use the recognition system and augments his/her user specific sample recognition data-base file, the correct recognition rate climbs approaching 100 percent in normal usage. The word-level recognizer can also be used to train on abbreviations, custom shorthands, and pictographic characters, such as the Japanese Kanji, or Chinese. An abbreviated Japanese Kanji or Chinese handwritten entry can even be trained for recognition. The text in the user specific sample data-base file is maintained in the Unicode format, and the user can specify the recognized return string format as either Unicode, ANSI, or JIS.

...read moreread less

Journal Article•DOI•

Approximate Boyer-Moore string matching

[...]

Jorma Tarhio, Esko Ukkonen

01 Apr 1993-SIAM Journal on Computing

TL;DR: The generalized Boyer–Moore algorithm is shown to solve the k mismatches problem and a related algorithm is developed for the k differences problem, where the task is to find all approximate occurrences of a pattern in a text with k differences.

...read moreread less

Abstract: The Boyer–Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. The generalized Boyer–Moore algorithm is shown (under a mild independence assumption) to solve the problem in expected time $O(kn({1 / {(m - k) + ({k / c})}}))$, where c is the size of the alphabet. A related algorithm is developed for the k differences problem, where the task is to find all approximate occurrences of a pattern in a text with $ \leqslant k$ differences (insertions, deletions, changes). Experimental evaluation of the algorithms is reported, showing that the new algorithms are often significantly faster than the old ones. Both algorithms are functionally equivalent with the Horspool version of the Boyer–Moore algorithm when $k = 0$.

...read moreread less

NESL: A Nested Data-Parallel Language (Version 2.6)

[...]

Guy E. Blelloch

01 Apr 1993

TL;DR: NESL is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms, and several examples of algorithms coded in the language are described.

...read moreread less

Abstract: This report describes NESL, a strongly-typed, applicative, data-parallel language. NESL is intended to be used as a portable interface for programming a variety of parallel and vector supercomputers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of data-parallel constructs based on sequences (ordered sets), including a mechanism for applying any function over the elements of a sequence in parallel and a rich set of parallel functions that manipulate sequences. NESL fully supports nested sequences and nested parallelism -- the ability to take a parallel function and apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with complex and dynamically changing data structures, such as required in many graph and sparse matrix algorithms. NESL also provides a mechanism for calculating the asymptotic running time for a program on various parallel machine models, including the parallel random access machine (PRAM). This is useful for estimating running times of algorithms on actual machines and, when teaching algorithms, for supplying a close correspondence between the code and the theoretical complexity. This report defines NESL and describes several examples of algorithms coded in the language. The examples include algorithms for median finding, sorting, string searching, finding prime numbers, and finding a planar convex hull. NESL currently compiles to an intermediate language called Vcode, which runs on the Cray Y-MP, Connection Machine CM-2, and Encore Multimax. For many algorithms, the current implementation gives performance close to optimized machine-specific code for these machines. Note: This report is an updated version of CMU-CS-92-103, which described version 2.4 of the language. The most significant changes in version 2.6 are that it supports polymorphic types, has an ML-like syntax instead of a lisp-like syntax, and includes support for I/O.

...read moreread less

Patent•

Object-oriented rule-based text input transliteration system

[...]

Mark E. Davis, Judy Lin

26 Apr 1993

TL;DR: In this article, a computer implemented system and method utilizing rules instantiated in objects of an object-oriented operating system to transliterate text as it is input into a computer is disclosed.

...read moreread less

Abstract: A computer implemented system and method utilizing rules instantiated in objects of an object-oriented operating system to transliterate text as it is input into a computer is disclosed. A number of transliterator objects are created in the storage of the computer, each one of the transliterator objects include transliteration rules arranged in the storage in a preferred order. Each of the transliteration rules contain a first language character string, a second language character string, and logic for comparing the first language character string in each of the transliteration rules to a text string that is entered into a computer to determine a subset of transliteration rules which match the entered text string. The entered text is displayed on a computer display as it is input into a computer and a particular one of the plurality of transliterator objects' logic is utilized in response to the preferred order for selecting one of the subset of transliteration rules and applying it to the first text string to display the second language character string of the selected transliteration rule on the display.

...read moreread less

Proceedings Article•

Towards High-Quality Sound Synthesis of the Guitar and String Instruments

[...]

Matti Karjlainen, Vesa Välimäki, Zoltán Jánosy

01 Jan 1993

TL;DR: New principles to make model-based sound synthesis of the guitar and other plucked string instruments more attractive from the viewpoint of sound quality are introduced.

...read moreread less

Abstract: The sound quality of real-time synthesis based on physical models has so far been inferior to sampling techniques. In this paper we introduce new principles to make model-based sound synthesis of the guitar and other plucked string instruments more attractive from the viewpoint of sound quality. A major improvement is achieved by estimating the model parameters and the excitation signal from the sound of an acoustic instrument. It is shown that the impulse response of the body is included in this excitation. More complex string behavior, including nonlinearities in some instruments, is briefly studied. Furthermore, different aspects of controlling the real-time synthesis model are discussed. High-quality real-time synthesis is shown to be feasible by using a single digital signal processor.

...read moreread less

Proceedings Article•DOI•

Minimum error rate training based on N-best string models

[...]

Wu Chou¹, Chin-Hui Lee¹, Biing-Hwang Juang¹•Institutions (1)

Bell Labs¹

27 Apr 1993

TL;DR: A minimum string error rate training algorithm, segmental minimum stringerror rate training, is described, which takes a further step in modeling the basic speech recognition units by directly applying discriminative analysis to string level acoustic model matching.

...read moreread less

Abstract: The authors study issues related to string level acoustic modeling in continuous speech recognition. They derive the formulation of minimum string error rate training. A minimum string error rate training algorithm, segmental minimum string error rate training, is described. It takes a further step in modeling the basic speech recognition units by directly applying discriminative analysis to string level acoustic model matching. One of the advantages of this training algorithm lies in its ability to model strings which are competitive with the correct string but are unseen in the training material. The robustness and acoustic resolution of the unit model set can therefore be significantly improved. Various experimental results have shown that significant error rate reduction can be achieved using this approach. >

...read moreread less

Proceedings Article•DOI•

On the power of polynomial time bit-reductions

[...]

Ulrich Hertrampf¹, Clemens Lautemann, Thomas Schwentick, Heribert Vollmer, Klaus W. Wagner - Show less +1 more•Institutions (1)

University of Trier¹

18 May 1993

TL;DR: The question of how complex a leaf language must be in order to characterize some given class C is investigated, which leads to the examination of the closure of different language classes under bit-reducibility.

...read moreread less

Abstract: For a nondeterministic polynomial-time Turing machine M and an input string x, the leaf string of M on x is the 0-1-sequence of leaf-values (0 approximately reject, 1 approximately accept) of the computation tree of M with input x. The set A is said to be bit-reducible to B if there exists and M as above such that every input x is in A if and only if the leaf string of M on x is in B. A class C is definable via leaf language B, if C is the class of all languages that are bit-reducible to B. The question of how complex a leaf language must be in order to characterize some given class C is investigated. This question leads to the examination of the closure of different language classes under bit-reducibility. The question is settled for subclasses of regular languages, context free languages, and a number of time and space bounded classes, resulting in a number of surprising characterizations for PSPACE. >

...read moreread less

Proceedings Article•

On the Power of Polynomial Time Bit-Reductions Extended Abstract

[...]

Ulrich Hertrampf, Clemens Lautemann, Thomas Schwentick, Heribert Vollmer, Klaus W. Wagner - Show less +1 more

01 Jan 1993

TL;DR: In this paper, the question of how complex a leaf language must be in order to characterize a given class of regular languages is investigated. And the question is answered in terms of the complexity of the set of languages that are bit-reducible to a given set of leaf languages.

...read moreread less

Abstract: For a nondeterministic polynomial time %ring machine M and an input string x, the leaf string of M on x is the 0-1-sequence of leaf-values (0 - reject, 1 - accept) of the computation tree of M with input x. The set A is said to be bit-reducible to B if there exists an M as above such that for every input x, x is in A if and only if the leaf string of M on x is an B. A class C is definable via leaf language B, if C is the class of all languages that are bit-reducible to B. We are interested in the question how complex a leaf language must be in order to characterize some given class C. This question leads to the examination of ihe closure of different language classes under bit-reducibility. We settle this question for subclasses of regular languages, context free languages, and a number of lime and space bounded classes. As consequences we get a number of surprising characterizations for PSPACE.

...read moreread less

Patent•

Computer system and method for converting a conversational statement to computer command language

[...]

Yasuharu Namba¹, Hiroshi Kinukawa¹, Hiroshi Tsuji¹, Satoshi Wakayama¹•Institutions (1)

Hitachi¹

14 May 1993

TL;DR: In this paper, a natural language or conversational statement in the form of a character string is input into a processor to convert the input natural language into a command language instruction for a computer program.

...read moreread less

Abstract: An input device (12) receives a natural language or conversational statement in the form of a character string (1). A processor (14) performs a natural language analysis (2) to convert the input natural language into a command language instruction for a computer program. A morphological analysis (3) compares words of the input character string with contents of a dictionary (10) to convert the input words into preselected words indicated by the dictionary which are output as another character string. In a semantic or syntax analysis (4; FIG. 9), one of the inputted and another character strings are analyzed to generate a corresponding chained functions structure (FIGS. 2, 3). From knowledge (FIG. 5, 7) described by the plurality of chained function structures and from rules stored in knowledge memory (11), a new character string is generated. If the new character string is in command language, the command is executed (7). If the new character string is not in command language, it is reanalyzed (8, 9) to generate yet another character string. In this manner, instructions can be input by a user in any of a multiplicity of national languages in conversational format and converted into appropriate command instructions for an executed computer program.

...read moreread less

Suffix trees and their applications in string algorithms

[...]

Roberto Grossi¹, Giuseppe F. Italiano²•Institutions (2)

University of Pisa¹, Ca' Foscari University of Venice²

01 Jan 1993

TL;DR: Special emphasis is given to the most recent developments in this area, such as parallel algorithms for suffix tree construction and generalizations of suffix trees to higher dimensions, which are important in multidimensional pattern matching.

...read moreread less

Abstract: The suffix tree is a compacted trie that stores all suffixes of a given text string. This data structure has been intensively employed in pattern matching on strings and trees, with a wide range of applications, such as molecular biology, data processing, text editing, term rewriting, interpreter design, information retrieval, abstract data types and many others. In this paper, we survey some applications of suffix trees and some algorithmic techniques for their construction. Special emphasis is given to the most recent developments in this area, such as parallel algorithms for suffix tree construction and generalizations of suffix trees to higher dimensions, which are important in multidimensional pattern matching. Work partially supported by the ESPRIT BRA ALCOM II under contract no. 7141 and by the Italian MURST Project “Algoritmi, Modelli di Calcolo e Strutture Informative”. Part of this work was done while the author was visiting AT&T Bell Laboratories. Email: grossi@di.unipi.it Work supported in part by the Commission of the European Communities under ESPRIT LTR Project no. 20244 (ALCOM–IT), by the Italian MURST Project “Efficienza di Algoritmi e Progetto di Strutture Informative”, and by a Research Grant from University of Venice “Ca’ Foscari”. Part of this work was done while at University of Salerno. Email: italiano@dsi.unive.it. URL: http://www.dsi.unive.it/∼italiano.

...read moreread less

Proceedings Article•DOI•

Using an annotated corpus as a stochastic grammar

[...]

Rens Bod¹•Institutions (1)

University of Amsterdam¹

21 Apr 1993

TL;DR: It is proved that it is not possible to create for every DOP-model a strongly equivalent stochastic CFG which also assigns the same probabilities to the parses, and it is shown that the maximum probability parse can be estimated in polynomial time by applying Monte Carlo techniques.

...read moreread less

Abstract: In Data Oriented Parsing (DOP), an annotated corpus is used as a stochastic grammar. An input string is parsed by combining subtrees from the corpus. As a consequence, one parse tree can usually be generated by several derivations that involve different subtrees. This leads to a statistics where the probability of a parse is equal to the sum of the probabilities of all its derivations. In (Scha, 1990) an informal introduction to DOP is given, while (Bod, 1992a) provides a formalization of the theory. In this paper we compare DOP with other stochastic grammars in the context of Formal Language Theory. It it proved that it is not possible to create for every DOP-model a strongly equivalent stochastic CFG which also assigns the same probabilities to the parses. We show that the maximum probability parse can be estimated in polynomial time by applying Monte Carlo techniques. The model was tested on a set of hand-parsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments yield 96% test set parsing accuracy.

...read moreread less

A dynamic routing control based on a genetic algorithm

[...]

Norio Shimamoto, Atsushi Hiramatsu, Kimiyoshi Yamasaki

01 Jan 1993

TL;DR: It is demonstrated that dynamic routing control based on a genetic algorithm can provide flexible real-time management of the dynamic traffic changes in broadband networks and a string structure is proposed, each of whose elements represents paths between each pair of origin and destination terminal nodes.

...read moreread less

Abstract: It is demonstrated that dynamic routing control based on a genetic algorithm can provide flexible real-time management of the dynamic traffic changes in broadband networks. A string structure is proposed, each of whose elements represents paths between each pair of origin and destination terminal nodes, together with a new technique using the past solutions as the initial data for new searches. These techniques dramatically improve the efficiency and convergence speed of the genetic algorithm. Computer simulations show that the genetic algorithm using the proposed techniques can generate the exact solution of path arrangement and can find a routing arrangement that keeps the traffic loss-rate below a target value, even after changes in traffic. >

...read moreread less

Patent•

Software-efficient pseudorandom function and the use thereof for encryption

[...]

Don Coppersmith¹, Phillip Rogaway¹•Institutions (1)

IBM¹

06 Dec 1993

TL;DR: In this article, a software-efficient pseudorandom function maps an index and an encryption key to a pseudoreandom bit string useful for constructing a stream cipher, which is then used to generate a set of initial values for the registers, at least some of the register values are modified in part by taking a current value of a register and replacing it with a function of the current value and a value retrieved from the table.

...read moreread less

Abstract: A software-efficient pseudorandom function maps an index and an encryption key to a pseudorandom bit string useful for constructing a stream cipher. The method begins by preprocessing the encryption key into a table of pseudorandom values. The index and a set of values from the table is then used to generate a set of initial values for the registers. At least some of the register values are modified in part by taking a current value of a register and replacing the current value with a function of the current value and a value retrieved from the table, the latter value being determined by the values in one or more other registers. After modifying the register values in this fashion, the values are masked using other values from the table and the results then concatenated into the pseudorandom bit string. The modification step is repeated and a new masked function of the register values is then concatenated into the pseudorandom bit string. The modification and concatenation steps are repeated to continue growing the pseudorandom bit string until the string reaches some desired length.

...read moreread less

Patent•

Character inputting method allowing input of a plurality of different types of character species, and information processing equipment adopting the same

[...]

Ito Jun¹, Yasumasa Matsuda¹, Hiroyuki Kumai¹, Nakajima Akira¹, Shigeki Taniguchi¹, Hirobumi Kashiwa¹, Toyokazu Suzuki¹, Masaki Kawase¹, Hiromi Tomita¹ - Show less +5 more•Institutions (1)

Hitachi¹

11 Jun 1993

TL;DR: In this article, an information processing equipment providing easy operations of changing-over character species, and, in which characters can be entered without being conscious of the designation of a character mode, is presented.

...read moreread less

Abstract: An information processing equipment providing easy operations of changing-over character species, and, in which characters can be entered without being conscious of the designation of a character mode. Signals entered from an input device are handled in conformity with all of a romaji (Roman character) input system, a kana (Japanese syllabary) input system and an alphanumeric input system. Results obtained with the respective input systems are all displayed in the predetermined part of a display screen. In the equipment, the character mode intended by a user is estimated as to the entered character string, and is automatically selected. In another aspect of performance, a controller determines whether or not a character mode likelihood decision unit and a character code translation unit are started, in accordance with the results of the comparisons between the key code string and the registered contents of a learning information buffer. The position of the entered key code string as corresponds to the length of the longest one of the key code strings is detected as a boundary position, the entered key code string is translated into character codes with the unit of the translation being a key code string which extends up to the detected boundary position, and the translated character codes are displayed.

...read moreread less

Patent•

Processor for processing data string by byte-by-byte

[...]

Robert M. Dinkjian¹, Lisa C. Heller¹, Steven R. Kordus¹, Kenneth A. Lauricella¹, Thomas W. Seigendall¹, Robert A. Skaggs¹, Nelson S. Xu¹ - Show less +3 more•Institutions (1)

IBM¹

12 Jan 1993

TL;DR: In this paper, a byte-count mask circuit generates a byte count mask which has all 1s for each byte count greater than the number of bytes per memory word and 0s for positions not belonging to the string.

...read moreread less

Abstract: A data processor processes data strings from memory where the data strings do not begin or end at a memory boundary. A string is defined in memory by a starting address, a byte count defining the total number of bytes in the string, and a byte offset defining the position of the first byte in the starting address location. The processor stores the byte count and decrements the byte count as each multi-byte word is processed. A byte count mask circuit generates a byte count mask which has all 1s for each byte count greater than the number of bytes per memory word. When the number of bytes remaining to be processed is below the number of bytes in a memory word, the byte count mask generates 1s only for the positions corresponding to the positions of bytes of the string in the last memory word. An offset register stores the offset defining the position of the first byte in the first memory word of the string. The offset is used to shift the byte count mask by a number of positions corresponding to the position of the first byte of the string and inserts 0s in the byte count mask for positions not belonging to the string. A byte-by-byte comparator determines string end conditions and provides an output word with a significant bit indication for each byte for which an end condition has been detected. The output of the byte-by-byte comparator is combined with the shifted byte count mask, and the result is decoded by means of a prioritized decoder which generates a string write mask.

...read moreread less

Proceedings Article•DOI•

Optimally fast parallel algorithms for preprocessing and pattern matching in one and two dimensions

[...]

Richard Cole¹, Maxime Crochemore², Z. Galil², Leszek Gasieniec², R. Eariharan², S. Muthukrishnan², Kunsoo Park², Wojciech Rytter² - Show less +4 more•Institutions (2)

Courant Institute of Mathematical Sciences¹, New York University²

03 Nov 1993

TL;DR: An algorithm that computes a deterministic sample of a sufficiently long substring in constant time for string matching, solving the main open problem remaining in string matching.

...read moreread less

Abstract: All algorithms below are optimal alphabet-independent parallel CRCW PRAM algorithms. In one dimension: Given a pattern string of length m for the string-matching problem, we design an algorithm that computes a deterministic sample of a sufficiently long substring in constant time. This problem used to be a bottleneck in the pattern preprocessing for one- and two-dimensional pattern matching. The best previous time bound was O(log/sup 2/ m/log log m). We use this algorithm to obtain the following results. 1. Improving the preprocessing of the constant-time text search algorithm from O(log/sup 2/ m/log log m) to n(log log m), which is now best possible. 2. A constant-time deterministic string-matching algorithm in the case that the text length n satisfies n=/spl Omega/(m/sup 1+/spl epsiv//) for a constant /spl epsiv/>0. 3. A simple probabilistic string-matching algorithm that has constant time with high probability for random input. 4. A constant expected time Las-Vegas algorithm for computing the period of the pattern and all witnesses and thus string matching itself, solving the main open problem remaining in string matching. >

...read moreread less

Patent•

Incremental search content addressable memory for increased data compression efficiency

[...]

Terry Parks

07 Jul 1993

TL;DR: In this paper, the first character of a string search is used to compare the contents of a previous flip-flop, and the comparison operation takes into account the value latched in the preceding byte of the CAM.

...read moreread less

Abstract: A content addressable memory (CAM) which is capable of performing string search functions in hardware. The implementation of string search in hardware eliminates the requirement of software to perform this function and thus significantly increases data compression performance. Each byte or memory storage unit of the CAM includes a comparator and a single bit flip-flop. The comparator asserts a match signal to the flip-flop if the contents of a memory storage unit match external data and a prior memory storage unit match signal is asserted. Two types of comparison operations are provided by each memory storage unit. The first ignores the contents of a previous flip-flop, and this comparison operation is used for the first character of a string search. The second type of comparison operation takes into account the value latched in the preceding byte of the CAM. For example, the comparison for byte N only matches if the previous comparison for byte N-1 matched. Thus long sequences of bytes can be searched for with a byte wide CAM. Therefore, string searches can be performed in hardware, thus increasing data compression performance.

...read moreread less

Patent•

Comparing prioritizing memory for string searching in a data compression system

[...]

Bijan Eskandari-Gharnin¹, Galen G. Kerber¹•Institutions (1)

Storage Technology Corporation¹

22 Dec 1993

TL;DR: The Comparing and Prioritizing (CAP) memory as discussed by the authors allows data stored therein to be string searched at high speeds, where the output of the CAP memory is used for data compression.

...read moreread less

Abstract: A method and apparatus that allows very fast string searches, wherein a new type of data structure called a Comparing and Prioritizing (CAP) Memory is utilized. The CAP memory allows data stored therein to be string searched at high speeds. That is, the CAP memory provides the ability to sequentially determine one or more locations of strings that exist in its data memory that are identical to a string in an incoming data stream. In a preferred embodiment, the output of the CAP memory is used for data compression.

...read moreread less

Journal Article•DOI•

Dynamic programming alignment of sequences representing cyclic patterns

[...]

Jens Gregor¹, Michael G. Thomason¹•Institutions (1)

University of Tennessee¹

01 Feb 1993-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A guided search algorithm uses bounds on alignment costs to find all optimal cyclic shifts and corresponding optimal alignment cost for strings representing cyclic patterns.

...read moreread less

Abstract: String alignment by dynamic programming is generalized to include cyclic shift and corresponding optimal alignment cost for strings representing cyclic patterns. A guided search algorithm uses bounds on alignment costs to find all optimal cyclic shifts. The bounds are derived from submatrices of an initial dynamic programming matrix. Algorithmic complexity is analyzed for major stages in the search. The applicability of the method is illustrated with satellite DNA sequences and circularly permuted protein sequences. >

...read moreread less

Using the Domain Name System To Store Arbitrary String Attributes

[...]

R. Rosenbaum

01 May 1993

TL;DR: While the Domain Name System (DNS) is generally used to store predefined types of information (e.g., addresses of hosts), it is possible to use it to store information that has not been previously classified.

...read moreread less

Abstract: While the Domain Name System (DNS) [2,3] is generally used to store predefined types of information (e.g., addresses of hosts), it is possible to use it to store information that has not been previously classified.

...read moreread less

Collapse