scispace - formally typeset
Search or ask a question

Showing papers on "String (computer science) published in 1995"


Journal ArticleDOI
13 Oct 1995-Science
TL;DR: The results suggest that the representation of different parts of the body in the primary somatosensory cortex of humans depends on use and changes to conform to the current needs and experiences of the individual.
Abstract: Magnetic source imaging revealed that the cortical representation of the digits of the left hand of string players was larger than that in controls. The effect was smallest for the left thumb, and no such differences were observed for the representations of the right hand digits. The amount of cortical reorganization in the representation of the fingering digits was correlated with the age at which the person had begun to play. These results suggest that the representation of different parts of the body in the primary somatosensory cortex of humans depends on use and changes to conform to the current needs and experiences of the individual.

1,821 citations


Journal ArticleDOI
TL;DR: An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string, developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries.
Abstract: An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string. The new algorithm has the desirable property of processing the string symbol by symbol from left to right. It always has the suffix tree for the scanned part of the string ready. The method is developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries. Regardless of its quadratic worst case this latter algorithm can be a good practical method when the string is not too long. Another variation of this method is shown to give, in a natural way, the well-known algorithms for constructing suffix automata (DAWGs).

1,528 citations


Book
Robert P. Kurshan1
06 Feb 1995
TL;DR: Theories of L-automaton/L-process, L-matrix, and String Acceptors are compared to Boolean Algebra, which describes the construction of language-based Algebra.
Abstract: Preface Introduction 2 Boolean Algebra 3 L-matrix 4 L-language 5 String Acceptors 6 [omega]-theory: L-automaton/L-process 7 The Selection/Resolution Model 8 Reduction of Verification 9 Structural Induction 10 Binary Decision Diagrams Appendices Bibliography Glossary Index

882 citations


Journal Article
TL;DR: An extension of Earley's parser for stochastic context-free grammars that computes probabilities of successive prefixes being generated by the grammar and an input string and posterior expected number of applications of each grammar production, as required for reestimating rule probabilities.
Abstract: We describe an extension of Earley's parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by the grammar; b) probabilities of substrings being generated by the nonterminals, including the entire string being generated by the grammar; c) most likely (Viterbi) parse of the string; d) posterior expected number of applications of each grammar production, as required for reestimating rule probabilities. Probabilities (a) and (b) are computed incrementally in a single left-to-right pass over the input. Our algorithm compares favorably to standard bottom-up parsing methods for SCFGs in that it works efficiently on sparse grammars by making use of Earley's top-down control structure. It can process any context-free rule format without conversion to some normal form, and combines computations for (a) through (d) in a single algorithm. Finally, the algorithm has simple extensions for processing partially bracketed inputs, and for finding partial parses and their likelihoods on ungrammatical inputs.

319 citations


Journal ArticleDOI
TL;DR: The fragment assembly problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem.
Abstract: The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally, the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov–Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximally likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is very important as the un...

258 citations


Journal ArticleDOI
TL;DR: In the present study, genetic algorithms are proposed to automatically configure RBF networks and the network configuration is formed as a subset selection problem to find an optimal subset of nc terms from the Nt training data samples.

242 citations


Posted Content
Kemal Oflazer1
TL;DR: Error-tolerant recognition as mentioned in this paper enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite state recognizer, which can be applied to morphological analysis of any language whose morphology is fully captured by a single (and possibly very large) finite state transducer.
Abstract: Error-tolerant recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite state recognizer. Such recognition has applications in error-tolerant morphological processing, spelling correction, and approximate string matching in information retrieval. After a description of the concepts and algorithms involved, we give examples from two applications: In the context of morphological analysis, error-tolerant recognition allows misspelled input word forms to be corrected, and morphologically analyzed concurrently. We present an application of this to error-tolerant analysis of agglutinative morphology of Turkish words. The algorithm can be applied to morphological analysis of any language whose morphology is fully captured by a single (and possibly very large) finite state transducer, regardless of the word formation processes and morphographemic phenomena involved. In the context of spelling correction, error-tolerant recognition can be used to enumerate correct candidate forms from a given misspelled string within a certain edit distance. Again, it can be applied to any language with a word list comprising all inflected forms, or whose morphology is fully described by a finite state transducer. We present experimental results for spelling correction for a number of languages. These results indicate that such recognition works very efficiently for candidate generation in spelling correction for many European languages such as English, Dutch, French, German, Italian (and others) with very large word lists of root and inflected forms (some containing well over 200,000 forms), generating all candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a SparcStation 10/41. For spelling correction in Turkish, error-tolerant

190 citations


Journal ArticleDOI
TL;DR: This paper shows how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon, and proposes methods for combining these techniques, and shows experimentally that these combinations yield good retrieval effectiveness while keeping index size and retrieval time low.
Abstract: Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and permuted lexicons, and several string matching techniques, including string similarity measures and phonetic coding. We propose methods for combining these techniques, and show experimentally that these combinations yield good retrieval effectiveness while keeping index size and retrieval time low. Our experiments also suggest that, in contrast to previous claims, phonetic codings are markedly inferior to string distance suggest measures, which are demonstrated to be suitable for both spelling correction and personal name matching

173 citations


Journal ArticleDOI
01 Oct 1995-The Auk
TL;DR: The authors presented four groups of Common Ravens (Corvus corax) with a problem that they had never encountered before: reaching of food suspended by a string, which required perching above the string and the food, reaching down, pulling up a loop of string, setting the looped-up string onto the perch, stepping onto the string, releasing the string with the bill while simultaneously applying pressure with the foot onto it, then reaching down again to repeat the cycle six to eight times in that precise order before finally securing a piece of dried meat.
Abstract: I presented four groups of Common Ravens (Corvus corax) with a problem that they had never encountered before. Could they demonstrate the solution to this problem without first practicing or learning the correct sequence of intermediary steps? The problem posed was the reaching of food suspended by a string. The solution required perching above the string and the food, reaching down, pulling up a loop of string, setting the looped-up string onto the perch, stepping onto the string, releasing the string with the bill while simultaneously applying pressure with the foot onto it, then reaching down again to repeat the cycle six to eight times in that precise order before finally securing a piece of dried meat. The results varied enormously between individuals. However, typically a bird approached the string nervously, pecked or briefly yanked on the string, repeated the approach when given another opportunity, extinguished the approach behavior, or suddenly did the entire string-pulling sequence correctly. One of the wild birds performed the entire sequence correctly on his first approach to the string, even though no other bird of that group had shown the behavior. After a bird had acquired the behavior it thereafter performed the behavior correctly without fail. Other behaviors were associated with successful string pull- ing. From their first trial, the four hand-reared individuals dropped the meat attached to string (and perch) if they were shooed from the perch. In contrast, other birds that were handed the food attached to string attempted to fly off with it, and it required five to nine trials before they refused to do so, apparently learning the consequences of this behavior. Other problems related to food presented on the dangling string also were solved without first overtly trying out the alternatives. These problems involved: (1) crossing the string with food with another string that held a rock; (2) using a novel string with food, next to the previously-rewarded "old" string; and (3) having food on string next to a rock on string, but with insertion of the string on the perch now displaced laterally. In contrast, the birds performed very poorly at some tasks where simple trial-and-error learning quickly would have resulted in appropriate responses. For example, three birds never once (in 79 trials) pulled the correct string in the crossed-strings experiments that another mustered with no trials. The results are discussed in terms of possible insight and alternative mechanisms, including innate behavior and learning. Received 22 June 1994, accepted 31 August 1994.

145 citations


Patent
16 Jun 1995
TL;DR: In this paper, the distance between two handwritten strings in a database is determined by extracting global features from each string, including a number of points, maximum angle between a first point in the string and a corner of the tallest bounding box, positive inversions, and negative inversions.
Abstract: Apparatus for determining a distance between two handwritten strings in a database. A processor extracts global features from each string. The processor divides the string into strokes, and identifies a plurality of bounding boxes. Each box contains a different stroke. The processor extracts global features from the suing, including: (1) a number of points; (2) a maximum angle between a first point in the string and a corner of the tallest bounding box; (3) a number of positive inversions; and (4) a number of negative inversions. The apparatus calculates the distance between the strings based on all of the numbers of points, maximum angles, numbers of positive inversions and numbers of negative inversions. A fixed query tree index may be formed. The tree has leaves and internal nodes belonging to multiple levels. A different key is associated with each level. Each key is a handwritten string. Each string is associated with one of the leaves, such that each child of each internal node in any of the levels between the one leaf and the root node is a root of a respective subtree. Each string associated with any leaf in the subtree which includes the one leaf is equally distant from the key associated with the one level, based on the global features. The tree is queried to search for a subset of the strings, such that each string in the subset is within a threshold distance of an input string, according to the distance function.

134 citations


Patent
Eric M. Visser1
31 Aug 1995
TL;DR: The authors retrieves an entry of a dictionary which corresponds to an input character string while comparing input characters, one by one, with entries of TRIE tables stored in a dictionary storing unit.
Abstract: A retrieving unit retrieves an entry of a dictionary which corresponds to an input character string while comparing input characters, one by one, with entries of TRIE tables stored in a dictionary storing unit. When a character of the input character string does not coincide with any of the entries in the currently-used TRIE table, a skipping unit locates a next effective pseudo-syllable border in the input character string to find candidates of those TRIE tables which correspond to the effective pseudo-syllable border. The retrieving unit retrieves a character string consisting of those characters which follow the pseudo-syllable border thus located, while using the candidates of these TRIE tables, and retrieves an entry in the dictionary which corresponds to the input character string to thereby output it as a recognized word.

Patent
Mitsuru Akizawa1, Kouki Noguchi1, Takehisa Hayashi1, Kanji Kato1, Hitoshi Matsushima1 
13 Mar 1995
TL;DR: In this paper, a symbol string search arithmetic operation is performed at high speed with a small hardware scale processing module, such as a symbol search module, which is connected to a CPU through address and data buses and includes a function definition section for defining a function of the apparatus in accordance with a command from the CPU.
Abstract: A character string search arithmetic operation is performed at high speed with a small hardware scale processing module, such as a symbol string search module. The search module is connected to a CPU through address and data buses and includes a function definition section for defining a function of the apparatus in accordance with a command from the CPU, a data input/output section for receiving a symbol string to be searched through the data bus and for outputting the result of a search. A search processing section performs the search based on a function defined by the function definition section. A symbol string to be searched for, which is internally stored, is compared with the symbol string data input to the module's data input/output means. A condition holding section holds data indicative of an internal condition corresponding to the result of the search processing. Thereby, the CPU and the symbol string search module can perform the search at high speed.

Journal ArticleDOI
TL;DR: A faster algorithm for dynamic string dictionary matching with bounded alphabets, and a novel method to efficiently manipulate failure links for two-dimensional patterns.
Abstract: In the dynamic dictionary matching problem, a dictionary D contains a set of patterns that can change over time by insertion and deletion of individual patterns. The user also presents text strings and asks for all occurrences of any patterns in the text. The two main contributions of this paper are: (1) a faster algorithm for dynamic string dictionary matching with bounded alphabets, and (2) a dynamic dictionary matching algorithm for two-dimensional texts and patterns. The first contribution is based on an algorithm that solves the general problem of maintaining a sequence of well-balanced parentheses under the operations insert, delete, and find nearest enclosing parenthesis pair. The main new idea behind the second contribution is a novel method to efficiently manipulate failure links for two-dimensional patterns.

Proceedings ArticleDOI
29 May 1995
TL;DR: The theory of string matching has a long association with compression algorithms, and data structures from string matching can be used to derive fast implementations of many important compression schemes, most notably the Lempel—Ziv (LZ77) algorithm.
Abstract: String matching and compression are two widely studied areas of computer science. The theory of string matching has a long association with compression algorithms. Data structures from string matching can be used to derive fast implementations of many important compression schemes, most notably the Lempel—Ziv (LZ77) algorithm. Intuitively, once a string has been compressed—and therefore its repetitive nature has been elucidated—one might be tempted to exploit this knowledge to speed up string matching. The Compressed Matching Problem is that of performing string matching in a compressed text, without uncompressing it. More formally, let T be a text, let Z be the compressed string representing T , and let P be a pattern. The Compressed Matching Problem is that of deciding if P occurs in T , given only P and Z . Compressed matching algorithms have been given for several compression schemes such as LZW.

Patent
27 Jan 1995
TL;DR: In this paper, a method for accessing a database server using pass-through queries includes parsing a database query to separate a passthrough string, and then sending a pass through string to retrieve information regarding the structure of a remote table.
Abstract: A method for accessing a database server using pass-through queries includes parsing a database query to separate a pass-through string, and then sending a pass-through string to retrieve information regarding the structure of a remote table. The method further includes fetching data as needed from the remote table, and caching the remote data in a temporary table in memory of the local computer system. The system includes a query processor to compile the database query, a remote engine to retrieve table structure information and fetch data as needed, and a temporary table manager to manage caching of the fetched data.

Proceedings ArticleDOI
13 Dec 1995
TL;DR: This paper proposes a new spacing policy in which the time headway varies linearly with the velocity error, which significantly reduces the transient errors and allows us to use much smaller spacing in the autonomous mode of platoon operation.
Abstract: We present adaptive nonlinear schemes for longitudinal control of automated heavy duty vehicles. An important control objective is string stability, which ensures that errors decrease as they propagate downstream through the platoon. It is well known that string stability requires intervehicle communication if a constant spacing policy is adopted. When vehicles operate autonomously, string stability can be achieved if speed-dependent spacing with constant time headway is used. This, however, results in larger steady-state spacing, which increase the platoon length hence decreasing traffic throughput. In this paper we propose a new spacing policy in which the time headway varies linearly with the velocity error. Our simulation results demonstrate that this modification significantly reduces the transient errors and allows us to use much smaller spacing in the autonomous mode of platoon operation.

Patent
21 Jul 1995
TL;DR: In this paper, a list of candidate recognized words is identified as a function of both comparison of dictionary entries to various combinations of recognized character combinations, and through a most likely character string and most likely string of digits analysis as developed without reference to the dictionary.
Abstract: In a handwriting recognition process, a list of candidate recognized words is identified (202) as a function of both comparison of dictionary entries to various combinations of recognized character combinations, and through a most likely character string and most likely string of digits analysis as developed without reference to the dictionary. The process selects (301) a word from the list and presents (302) this word to the user. The user then has the option of displaying (303) this list. When displaying the list, candidate words developed with reference to the dictionary are displayed in segregated manner from the most likely character string words and the most likely string of digits. The user can charge the selected word by choosing from the list, or edit the selected word. When the user selects the most likely character string as the correct representation of the handwritten input to be recognized, the process automatically updates (310) the dictionary to include the most likely character string The same process can occur when the user selects the most likely string of digits.

Patent
William W. Luciw1
15 May 1995
TL;DR: In this paper, a method and apparatus for processing natural language and deducing meaning from a natural language input characterized by the steps of (a) receiving an ordered string of word objects having natural language meaning, (b) selecting a word window length, and (c) successively moving the word window along the ordered string and analyzing the meaning of a substring of a word objects that fall within the window.
Abstract: A method and apparatus for processing natural language and deducing meaning from a natural language input characterized by the steps of (a) receiving an ordered string of word objects having a natural language meaning, (b) selecting a word window length, and (c) successively moving the word window along the ordered string and analyzing the meaning of a substring of word objects that fall within the word window. The substring is removed from the ordered string if the substring has a recognized meaning, until all substrings of the ordered string that fit within the window have been analyzed. In a step (d), the word window length is reduced and step (c) is repeated until only an unrecognized residual of the ordered string remains. The meaning of the substring is analyzed by mapping the substring against a database using one or more mapping routines. The mapping routines are preferably arranged in a hierarchy, wherein a successive mapping routine is used to analyze the substring when a previous mapping routine in the hierarchy cannot map the substring. A computer-implemented task is determined from the recognized substrings and performed by the computer system. The apparatus of the present invention implements the method on a pen-based computer system, and the ordered string is preferably received from strokes entered by a stylus on a display screen of the pen-based computer or from a microphone receiving speech input.

Proceedings ArticleDOI
09 May 1995
TL;DR: A discriminative training procedure is proposed for verifying the occurrence of string hypotheses produced by a hidden Markov model (HMM) based continuous speech recognizer to increase the power of a hypothesis test for utterance verification.
Abstract: A procedure is proposed for verifying the occurrence of string hypotheses produced by a hidden Markov model (HMM) based continuous speech recognizer. Most existing procedures verify word hypotheses through likelihood ratio scoring procedures computed using ad hoc approximations for the density of the alternative hypothesis in the denominator of the likelihood ratio statistic. The discriminative training procedure described in this paper attempts to adjust the parameters of the null hypothesis and the alternate hypothesis models to increase the power of a hypothesis test for utterance verification. The training procedure was evaluated for its ability to detect a twenty word vocabulary in a subset of the Switchboard conversational speech corpus. Experimental results show that the use of this procedure results in significant improvement in the word verification operating characteristic, as well as an improvement in the overall system performance.

Patent
Liang Li1
07 Jun 1995
TL;DR: In this article, a system and method for more efficiently comparing an unverified string to a lexicon, which filters the lexicon through multiple steps to reduce the number of entries to be directly compared with the unverified strings, is presented.
Abstract: A system and method for more efficiently comparing an unverified string to a lexicon, which filters the lexicon through multiple steps to reduce the number of entries to be directly compared with the unverified string. The method begins by preparing the lexicon with an n-gram encoding, partitioning and hashing process, which can be accomplished in advance of any processing of unverified strings. The unknown is compared first by partitioning and hashing it in the same way to reduce the lexicon in a computationally inexpensive manner. This is followed by an encoded vector comparison step, and finally by a direct string comparison step, which is the most computationally expensive. The reduction of the lexicon is accomplished without arbitrarily eliminating any large portions of the lexicon that might contain relevant candidates. At the same time, the method avoids the need to compare the unverified string directly or indirectly with all the entries in the lexicon. The final candidate list includes only highly possible and ranked candidates for the unverified string, and the size of the final list is adjustable.

Journal ArticleDOI
TL;DR: Experimental results for string matching algorithms which are known to be fast in practice show that for large alphabets and small patterns the Quick Search algorithm of Sunday is the most efficient and that for small alphABets and large patterns it is the Reverse Factor algorithm of Crochemore et al. which is themost efficient.
Abstract: We present experimental results for string matching algorithms which are known to be fast in practice. We compare these algorithms through two aspects : the number of text character inspections and the running time. These experiments show that for large alphabets and small patterns the Quick Search algorithm of Sunday is the most efficient and that for small alphabets and large patterns it is the Reverse Factor algorithm of Crochemore et al. which is the most efficient.

Journal Article
TL;DR: In this article, the authors provide an exposition of three lemmas that relate general properties of distributions over bit strings to the exclusive-or (xor) of values of certain bit locations.
Abstract: We provide an exposition of three lemmas that relate general properties of distributions over bit strings to the exclusive-or (xor) of values of certain bit locations. The first XOR-Lemma, commonly attributed to Umesh Vazirani (1986), relates the statistical distance of a distribution from the uniform distribution over bit strings to the maximum bias of the xor of certain bit positions. The second XOR-Lemma, due to Umesh and Vijay Vazirani (19th STOC, 1987), is a computational analogue of the first. It relates the pseudorandomness of a distribution to the difficulty of predicting the xor of bits in particular or random positions. The third Lemma, due to Goldreich and Levin (21st STOC, 1989), relates the difficulty of retrieving a string and the unpredictability of the xor of random bit positions. The most notable XOR Lemma - that is the so-called Yao XOR Lemma - is not discussed here. We focus on the proofs of the aforementioned three lemma. Our exposition deviates from the original proofs, yielding proofs that are believed to be simpler, of wider applicability, and establishing somewhat stronger quantitative results. Credits for these improved proofs are due to several researchers.

Journal ArticleDOI
TL;DR: It turns out that the new subalgorithm called COMPUTE-COVERS is itself sufficient to solve the original problem — that is, to compute all the covers of a given string in time linear in the string length — and so it is presented here as a self-contained algorithm in its own right.

Journal ArticleDOI
TL;DR: This work claims that the traditional implementations of strings, and often the supported functionality, are not well suited to general‐purpose use and presents ‘ropes’ or ‘heavyweight’ strings as an alternative that leads to systems that are more robust, both in functionality and in performance.
Abstract: Programming languages generally provide a ‘string’ or ‘text’ type to allow manipulation of sequences of characters. This type is usually of crucial importance, since it is normally mentioned in most interfaces between system components. We claim that the traditional implementations of strings, and often the supported functionality, are not well suited to such general-purpose use. They should be confined to applications with specific, and unusual, performance requirements. We present ‘ropes’ or ‘heavyweight’ strings as an alternative that, in our experience leads to systems that are more robust, both in functionality and in performance. Ropes have been in use in the Cedar environment almost since its inception, but this appears to be neither well-known, nor discussed in the literature. The algorithms have been gradually refined. We have also recently built a second similar, but somewhat lighter weight, C-language implementation, which is included in our publically released garbage collector distribution. We describe the algorithms used in both, and give some performance measurements for the C version.

Journal ArticleDOI
TL;DR: It is proved that learning with limited memory is exactly the same as learning via set driven machines (when the order of the input string is not taken into account), and it is shown that every language learnable via a set driven machine is learnable through a conservative machine.
Abstract: The paper explores language learning in the limit under various constraints on the number of mindchanges, memory, and monotonicity. We define language learning with limited (long term) memory and prove that learning with limited memory is exactly the same as learning via set driven machines (when the order of the input string is not taken into account). Further we show that every language learnable via a set driven machine is learnable via a conservative machine (making only justifiable mindchanges). We get a variety of separation results for learning with bounded number of mindchanges or limited memory under restrictions on monotonicity. A surprising result is that there are families of languages that can be monotonically learned with at most one mindchange, but can neither be weak-monotonically nor conservatively learned. Many separation results have a variant: If a criterion A can be separated from B , then often it is possible to find a family L of languages such that L is A and B learnable, but while it is possible to restrict the number of mindchanges or long term memory on criterion A , this is impossible for B .

Patent
14 Jul 1995
TL;DR: In this paper, a remote access server limits access to a local computer network and allows the remote computer to access the local computer and to communicate on the local network, but the remote user is prevented from communicating with a predetermined resource because of the access filter associated with the user identification string.
Abstract: A remote access server limits access to a local computer network. The server includes at least one communication port for allowing communication with a remote computer and at least one network port for coupling to a local computer network to allow communication with the local computer network. The server also includes processing electronics which control the communication and network ports. The processing electronics also receive a user identification string from the communication port. The string having been entered by a remote user at a remote computer, and it identifies the remote user. The server uses the string to access a database and determine at least one access filter associated with the string. The access filter is used to prevent the remote computer from communicating with at least one predetermined resource on the local computer network. The database includes a user identification string for each remote user and at least one access filter for each user identification string. The server allows the remote computer to access the local computer network and to communicate on the local computer network, but the remote computer is prevented from communicating with the predetermined resource because of the access filter associated with the remote user.

Journal ArticleDOI
TL;DR: Two efficient concurrent-read concurrent-write parallel algorithms that find all palindromes in a given string by using smaller auxiliary space and either by making fewer operations or by achieving a faster running time are presented.

PatentDOI
David Nahamoo1, Mukund Padmanabhan1
TL;DR: In this article, a method for estimating the probability of phone boundaries and the accuracy of the acoustic modelling in reducing a search space in a speech recognition system is presented, which includes a microphone for converting an utterance into an electrical signal, which is processed by an acoustic processor and label match which finds the best matched acoustic label prototype.
Abstract: A method for estimating the probability of phone boundaries and the accuracy of the acoustic modelling in reducing a search-space in a speech recognition system. The accuracy of the acoustic modelling is quantified by the rank of the correct phone. The system includes a microphone for converting an utterance into an electrical signal, which is processed by an acoustic processor and label match which finds the best-matched acoustic label prototype. A probability distribution on phone boundaries is produced for every time frame using a first decision tree. These probabilities are compared to a threshold and some time frames are identified as boundaries between phones. An acoustic score is computed for all phones between every given pair of hypothesized boundaries, and the phones are ranked on the basis of this score. A second decision tree is traversed for every time frame to obtain the worst case rank of the correct phone at that time, and a short list of allowed phones is made for every time frame. A fast acoustic word match processor matches the label string from the acoustic processor to produce an utterance signal which includes at least one word. From recognition candidates produced by the fast acoustic match and the language model, the detailed acoustic match matches the label string from the acoustic processor against acoustic word models and outputs a word string corresponding to an utterance.

Journal ArticleDOI
TL;DR: It is shown that building an optimal decision tree is NP-complete, then an approximation algorithm is given that gives trees within a constant multiplicative factor of optimal, and it is demonstrated that subsequence queries are significantly more powerful than substring queries, matching the information theoretic lower bound.
Abstract: We consider an interactive approach to DNA sequencing by hybridization, where we are permitted to ask questions of the form "is s a substring of the unknown sequence S?", where s is a specific query string. We are not told where s occurs in S, nor how many times it occurs, just whether or not s a substring of S. Our goal is to determine the exact contents of S using as few queries as possible. Through interaction, far fewer queries are necessary than using conventional fixed sequencing by hybridization (SBH) sequencing chips. We provide tight bounds on the complexity of reconstructing unknown strings from substring queries. Our lower bound, which holds even for a stronger model that returns the number of occurrence of s as a substring of S, relies on interesting arguments based on de Bruijn sequences. We also demonstrate that subsequence queries are significantly more powerful than substring queries, matching the information theoretic lower bound. Finally, in certain applications, something may already be known about the unknown string, and hence it can be determined faster than an arbitrary string. We show that building an optimal decision tree is NP-complete, then give an approximation algorithm that gives trees within a constant multiplicative factor of optimal.

Book ChapterDOI
01 Jan 1995
TL;DR: In this article, a finite-dimensional monomial algebraic module is constructed from a string module M(w) as a kind of completion, which is obtained from the corresponding string module m(w).
Abstract: Given a finite dimensional monomial algebra, one knows that some finite dimensional indecomposable modules may be described by words (finite sequences of letters) using as letters the arrows of the quiver and their formal inverses. To every word w, one can attach a so-called string module M(w). Here, we are going to construct certain infinite dimensional modules: We will consider ℕ-words and ℤ-words (thus infinite sequences of letters) satisfying suitable periodicity conditions. To every such ℕ-word or ℤ-word x, we describe an algebraically compact module C(x). This module C(x) is obtained from the corresponding string module M(x) as a kind of completion.