scispace - formally typeset
Search or ask a question

Showing papers on "String (computer science) published in 1999"


Journal ArticleDOI
TL;DR: The practical implementation of this procedure yielded satisfactory results when the EP-based algorithm was tested on a reported UC problem previously addressed by some existing techniques such as Lagrange relaxation (LR), dynamic programming (DP), and genetic algorithms (GAs).
Abstract: The work was conducted with the aim of finding a general method for solving the unit commitment (UC) problem. The proposed algorithm employs the evolutionary programming (EP) technique in which populations of contending solutions are evolved through random changes, competition, and selection. In the subject algorithm an overall UC schedule is coded as a string of symbols and viewed as a candidate for reproduction. Initial populations of such candidates are randomly produced to form the basis of subsequent generations. The practical implementation of this procedure yielded satisfactory results when the EP-based algorithm was tested on a reported UC problem previously addressed by some existing techniques such as Lagrange relaxation (LR), dynamic programming (DP), and genetic algorithms (GAs). Numerical results for systems of up to 100 units are given and commented on.

523 citations


Journal ArticleDOI
TL;DR: This work introduces a new text-indexing data structure, the String B-Tree, that can be seen as a link between some traditional external-memory and string-matching data structures that is made more effective by adding extra pointers to speed up search and update operations.
Abstract: We introduce a new text-indexing data structure, the String B-Tree, that can be seen as a link between some traditional external-memory and string-matching data structures. In a short phrase, it is a combination of B-trees and Patricia tries for internal-node indices that is made more effective by adding extra pointers to speed up search and update operations. Consequently, the String B-Tree overcomes the theoretical limitations of inverted files, B-trees, prefix B-trees, suffix arrays, compacted tries and suffix trees. String B-trees have the same worst-case performance as B-trees but they manage unbounded-length strings and perform much more powerful search operations such as the ones supported by suffix trees. String B-trees are also effective in main memory (RAM model) because they improve the online suffix tree search on a dynamic set of strings. They also can be successfully applied to database indexing and software duplication.

364 citations


Journal Article
TL;DR: This work shows that for the simplest form of statistical models, this problem is NP-complete, i.e., probably exponential in the length of the observed sentence, and traces this complexity to factors not present in other decoding problems.
Abstract: Statistical machine translation is a relatively new approach to the long-standing problem of translating human languages by computer. Current statistical techniques uncover translation rules from bilingual training texts and use those rules to translate new texts. The general architecture is the source-channel model: an English string is statistically generated (source), then statistically transformed into French (channel). In order to translate (or "decode") a French string, we look for the most likely English source. We show that for the simplest form of statistical models, this problem is NP-complete, i.e., probably exponential in the length of the observed sentence. We trace this complexity to factors not present in other decoding problems.

353 citations


Patent
08 Apr 1999
TL;DR: In this paper, a system for tokenization and named entity recognition of ideographic language is described, where a word lattice is generated for a string of characters using finite state grammars and a system lexicon.
Abstract: A system (100, 200) for tokenization and named entity recognition of ideographic language is disclosed In the system, a word lattice is generated for a string of ideographic characters using finite state grammars (150) and a system lexicon (240) Segmented text is generated by determining word boundaries in the string of ideographic characters using the word lattice dependent upon a contextual language model (152A) and one or more entity language models (152B) One or more named entities is recognized in the string of ideographic characters using the word lattice dependent upon the contextual language model (152A) and the one or more entity language models (152B) The contextual language model (152A) and the one or more entity language models (152B) are each class-based language models The lexicon (240) includes single ideographic characters, words , and predetermined features of the characters and words

236 citations


Journal ArticleDOI
TL;DR: An experiment is described assessing whether or not parsing of a string requiring coercion—in addition to syntactic composition—is more computationally costly than parsing a syntactically transparent counterpart, a string that provides for an interpretable representation via syntactic compositions alone.
Abstract: This study reports results on the real-time consequences of aspectual coercion. We define aspectual coercion as a combinatorial semantic operation requiring computation over and above that provided by combining lexical items through expected syntactic processes. An experiment is described assessing whether or not parsing of a string requiring coercion--in addition to syntactic composition--is more computationally costly than parsing a syntactically transparent counterpart, a string that provides for an interpretable representation via syntactic composition alone. The prediction of a higher computational cost for this process is borne out by the results.

154 citations


Patent
03 Aug 1999
TL;DR: In this article, a tokenizer is proposed to generate from an input string information retrieval tokens that characterize the semantic relationship expressed in the input string, which can be used for both constructing an index representing target documents and processing a query against that index.
Abstract: The present invention is directed to performing information retrieval utilizing semantic representation of text. In a preferred embodiment, a tokenizer generates from an input string information retrieval tokens that characterize the semantic relationship expressed in the input string. The tokenizer first creates from the input string a primary logical form characterizing a semantic relationship between selected words in the input string. The tokenizer then identifies hypemyms that each have an “is a” relationship with one of the selected words in the input string. The tokenizer then constructs from the primary logical form one or more alternative logical forms. The tokenizer constructs each alternative logical form by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word. Finally, the tokenizer generates tokens representing both the primary logical form and the alternative logical forms. The tokenizer is preferably used to generate tokens for both constructing an index representing target documents and processing a query against that index.

144 citations


Journal ArticleDOI
TL;DR: It is proved that computing the median string corresponds to a NP-complete decision problems, thus proving that this problem is NP-hard.

137 citations


Patent
Richard Theodore Gillam1
01 Sep 1999
TL;DR: Disclosed as mentioned in this paper is a system, method, and program for generating a data structure in computer memory for storing strings, such as words in a dictionary, each string includes at least one character from a set of characters.
Abstract: Disclosed is a system, method, and program for generating a data structure in computer memory for storing strings, such as words in a dictionary. Each string includes at least one character from a set of characters. An arrangement of nodes is determined to store the characters such that the arrangement of the nodes is capable of defining a tree structure. An array data structure is generated to store the nodes. The array includes a row for each node and a column for each character in the set of characters. A non-empty cell identifies a node for the character indicated in the column of the cell that has descendant nodes in the row indicated in the cell content for the node. The array data structure is processed to eliminate at least one row in the array data structure to reduce a number of bytes needed to represent the array data structure. In this way, the array data structure following the processing requires less bytes of storage space then before the processing.

137 citations


Patent
30 Apr 1999
TL;DR: In this paper, an apparatus and method for obtaining samples of pristine formation or formation fluid, using a work string designed for performing other downhole work such as drilling, workover operations, or re-entry operations.
Abstract: An apparatus and method for obtaining samples of pristine formation or formation fluid, using a work string designed for performing other downhole work such as drilling, workover operations, or re-entry operations. An extendable element extends against the formation wall to obtain the pristine formation or fluid sample. While the test tool is in a standby condition, the extendable element is withdrawn within the work string, protected by other structure from damage during operation of the work string. The test apparatus is mounted on a sliding, non-rotating, sleeve on the work string.

135 citations


Journal ArticleDOI
TL;DR: Understanding the basic requirements for successful string parsing helps to resolve the conflict between mainly negative reports of imitation in experiments and more positive evidence from natural conditions.
Abstract: A theory of imitation is proposed, string parsing, which separates the copying of behavioural organization by observation from an understanding of the cause of its effectiveness. In string parsing, recurring patterns in the visible stream of behaviour are detected and used to build a statistical sketch of the underlying hierarchical structure. This statistical sketch may in turn aid the subsequent comprehension of cause and effect. Three cases of social learning of relatively complex skills are examined, as potential cases of imitation by string parsing. Understanding the basic requirements for successful string parsing helps to resolve the conflict between mainly negative reports of imitation in experiments and more positive evidence from natural conditions. Since string parsing does not depend on comprehension of the intentions of other agents or the everyday physics of objects, separate tests of these abilities are needed even in animals shown to learn by imitation.

132 citations


Patent
Richard Theodore Gillam1
01 Sep 1999
TL;DR: Disclosed as mentioned in this paper is a system, method, and program for determining boundaries in a string of characters using a dictionary, wherein the substrings in the dictionary may comprise words and the boundaries follow each of the initial substrings and the at least one substring that includes all the characters following the initial substring.
Abstract: Disclosed is a system, method, and program for determining boundaries in a string of characters using a dictionary, wherein the substrings in the dictionary may comprise words. A determination is made of all possible initial substrings of the string in the dictionary. One initial substring is selected such that all the characters following the initial substring can be divided into at least one substring in the dictionary. The boundaries follow each of the initial substring and the at least one substring that includes all the characters following the initial substring.

Patent
12 May 1999
TL;DR: In this article, the authors propose a method for the automated apprehension of textual information conveyed in an input string, where the input string is segmented to generate segments and/or semantical units, and then these subsets are combined to form a resulting semantic network.
Abstract: Scheme for the automated apprehension of textual information conveyed in an input string. The input string is segmented to generate segments and/or semantical units. The following steps are repeated for each segment in the input string until a subset for each segment in said input string is identified: a. identifying a matching semantical unit in a fractal hierarchical knowledge database of semantical units and pointers, said matching semantical units being deemed to be related to a segment of said input string, b. determining the fitness of said matching semantical unit by taking into consideration said semantical unit's associations, c. defining a subset of information related to said matching semantical unit within said fractal hierarchical knowledge database. Then these subsets are combined to form a resulting semantic network.

Book ChapterDOI
TL;DR: The experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy top-down construction is often faster and more space efficient than other methods.
Abstract: We present an efficient implementation of a write-only top-down construction for suffix trees. Our implementation is based on a new, space-efficient representation of suffix trees which requires only 12 bytes per input character in the worst case, and 8:5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of suffix trees such that a subtree is evaluated not before it is traversed for the first time. Our experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy top-down construction is often faster and more space efficient than other methods.

Patent
29 Oct 1999
TL;DR: In this paper, a serializer receives a data element for serialization, wherein the data element includes a class name string, and the serializer replaces it with a code having a smaller size than the class name to form a modified data element.
Abstract: A method and apparatus in a data processing system for serialization data. A serializer receives a data element for serialization, wherein the data element includes a class name string. Responsive to receiving the data element, the serializer replaces the class name string with a code having a smaller size than the class name string to form a modified data element. Responsive to forming the modified data element, in which the serializer serializes the modified data element. The serialized data is transmitted and deserialized by deserializer which replaces the indicator with the class name.

Patent
12 Oct 1999
TL;DR: In this article, a lossless bandwidth compression method for use in a distributed processor system for communicating graphical text data from a remote application server to a user workstation over a low bandwidth transport mechanism enables the workstation display to support the illusion that the application program is running locally rather than at the remote application servers.
Abstract: A lossless bandwidth compression method for use in a distributed processor system for communicating graphical text data from a remote application server to a user workstation over a low bandwidth transport mechanism enables the workstation display to support the illusion that the application program is running locally rather than at the remote application server. At the application server, the graphical text data is represented by a string of glyphs, each glyph being a bit mask representing the foreground/background state of the graphical text data pixels. Each unique glyph is encoded by assigning a unique identification code (IDC). Each IDC is compared with the previous IDCs in the string and, if a match is found, the IDC is transmitted to the workstation. If a match with a prior IDC is not found, the IDC and the corresponding glyph pattern are transmitted to the workstation. At the workstation, the IDCs are queued in the order received while the glyph patterns are cached using the corresponding IDCs as addresses. The string of glyphs is reconstructed by using the queued IDCs in their natural order for accessing the cached glyph patterns as required to reproduce the original string of glyphs.

Proceedings ArticleDOI
06 Jul 1999
TL;DR: An EP method with representation specific variation operators is proposed and tested on several data sets and compared to other algorithms suggests that this algorithm is well suited to the multiple sequence alignment problem.
Abstract: Multiple sequence alignment can be used as a tool for the identification of common structure in an ordered string of nucleotides (in DNA or RNA) or amino acids (in proteins). Current multiple sequence alignment algorithms work well for sequences with high similarity but do not scale well when either the length or number of the sequences is large or if the similarity is low. The focus of the paper is to develop an evolutionary programming (EP) algorithm for multiple sequence alignment. An EP method with representation specific variation operators is proposed and tested on several data sets. Comparisons to other algorithms suggests that this algorithm is well suited to the multiple sequence alignment problem.

Patent
14 Apr 1999
TL;DR: In this paper, the deployment valve is positioned in a tubular string, such as casing, at a well bore depth at or preferably substantially below the string light point of the drilling string.
Abstract: Apparatus and methods for a deployment valve used with an underbalanced drilling system to enhance the advantages of underbalanced drilling. The underbalanced drilling system may typically comprise elements such as a rotating blow out preventer and drilling recovery system. The deployment valve is positioned in a tubular string, such as casing, at a well bore depth at or preferably substantially below the string light point of the drilling string. When the drilling string is above the string light point then the upwardly acting forces on the drilling string become greater than downwardly acting forces such that the drilling string begins to accelerate upwardly. The deployment valve has a bore sufficiently large to allow passage of the drill bit therethrough in the open position. The deployment valve may be closed when the drill string is pulled within the casing as may be necessary to service the drill string after drilling into a reservoir having a reservoir pressure. To allow the drill string to be removed from the casing, the pressure produced by the formation can be bled off and the drill string removed for servicing. The drill string can then be reinserted, the pressure in the casing above the deployment valve applied to preferably equalize pressure above and below the valve and the drill string run into the hole for further drilling.

Patent
Ya-Cherng Chu1
24 Nov 1999
TL;DR: In this article, a tree structure representing word sequence(s) in the input string is constructed in an iterative manner, and each word of a dictionary is compared with the beginning of the working string.
Abstract: A system 100 is capable of segmenting a connected text, such as Japanese or Chinese sentence, into words. The system includes means 110 for reading an input string representing the connected text. Segmentation means 120 identifies at least one word sequence in the connected text by building a tree structure representing word sequence(s) in the input string in an iterative manner. Initially the input string is taken as a working string. Each word of a dictionary 122 is compared with the beginning of the working string. A match is represented by a node in the tree, and the process is continued with the remaining part of the input string. The system further includes means 130 for outputting at least one of the identified word sequences. A language model may be used to select between candidate sequences. Preferably the system is used in a speech recognition system to update the lexicon based on representative texts.

Patent
13 Apr 1999
TL;DR: In this article, a class bi-multigram model is proposed to generate a statistical class sequence model from input training strings of discrete-valued units, where bigram dependencies are assumed between adjacent variable length sequences of maximum length N units, and where class labels are assigned to the sequences.
Abstract: An apparatus generates a statistical class sequence model called A class bi-multigram model from input training strings of discrete-valued units, where bigram dependencies are assumed between adjacent variable length sequences of maximum length N units, and where class labels are assigned to the sequences. The number of times all sequences of units occur are counted, as well as the number of times all pairs of sequences of units co-occur in the input training strings. An initial bigram probability distribution of all the pairs of sequences is computed as the number of times the two sequences co-occur, divided by the number of times the first sequence occurs in the input training string. Then, the input sequences are classified into a pre-specified desired number of classes. Further, an estimate of the bigram probability distribution of the sequences is calculated by using an EM algorithm to maximize the likelihood of the input training string computed with the input probability distributions. The above processes are then iteratively performed to generate statistical class sequence model.

Proceedings ArticleDOI
21 Mar 1999
TL;DR: This paper presents a new dynamic SPT algorithm that makes use of the structure of the previously computed SPT by recasting the SPT problem into an optimization problem in a dual linear programming framework, which can also be interpreted using a ball-and-string model.
Abstract: A key functionality in today's widely used interior gateway routing protocols such as OSPF and IS-IS involves the computation of a shortest path tree (SPT). In many existing commercial routers, the computation of an SPT is done from scratch following changes in the link states of the network. As there may coexist multiple SPTs in a network with a set of given link states, such recomputation of an entire SPT not only is inefficient but also causes frequent unnecessary changes in the topology of an existing SPT and creates routing instability. This paper presents a new dynamic SPT algorithm that makes use of the structure of the previously computed SPT. This algorithm is derived by recasting the SPT problem into an optimization problem in a dual linear programming framework, which can also be interpreted using a ball-and-string model. In this model, the increase (or decrease) of an edge weight in the tree corresponds to the lengthening (or shortening) of a string. By stretching the strings until each node is attached to a tight string, the resulting topology of the model defines an (or multiple) SPT(s). By emulating the dynamics of the ball-and-string model, we can derive an efficient algorithm that propagates changes in distances to all affected nodes in a natural order and in a most economical way. Compared with existing results, our algorithm has the best-known performance in terms of computational complexity as well as minimum changes made to the topology of an SPT. Rigorous proofs for correctness of our algorithm and simulation results illustrating its complexity are also presented.

Patent
04 Sep 1999
TL;DR: In this paper, a method and system for locating an address of a space based upon latitude and longitude coordinates is presented, where a global positioning satellite is used to measure the location of the space to be addressed and a device for generating a unique variable string based upon the measured latitude and Longitude coordinates.
Abstract: A method and system for locating an address of a space based upon latitude and longitude coordinates. The latitude and longitude positioning system includes a global positioning satellite for measuring the latitude and longitude coordinates of the space to be addressed and a device for generating a unique variable string based upon the measured latitude and longitude coordinates. The variable string is stored in a storage device and a device for selectively disseminating the variable string is provided for informing persons desirous of learning the location of the space of the variable address string. The variable string may be represented as a numerical string. A keypad is also provided connected to the generating device for inputting data related to special features related to the location of the space such as a height of the space and floor on which the space is located within a structure. The dissemination device includes a modem for connection with a telephone line for transmitting the address string to a party on an opposite end of a telephone communication channel established on the telephone line, a send button for initiating transmission of the variable string across the telephone line and an emergency button for initiating a telephone call by the modem to an emergency services station for transmission of the variable string across the telephone line to the emergency services station.

Proceedings Article
01 Jan 1999
TL;DR: Lanctot et al. as discussed by the authors presented a collection of string algorithms that are at the core of several biological problems such as discovering potential drug targets, creating diagnostic probes, universal primers or unbiased consensus sequences.
Abstract: This paper presents a collection of string algorithms that are at the core of several biological problems such as discovering potential drug targets, creating diagnostic probes, universal primers or unbiased consensus sequences. All these problems reduce to the task of finding a pattern that, with some error, occurs in one set of strings (Closest Substring Problem) and does not occur in another set (Farthest String Problem). In this paper, we break down the problem into several subproblems and prove the following results. 1. The following are all NP-Hard: the Farthest String Problem, the Closest Substring Problem, and the Closest String Problem of finding a string that is close to each string in a set. 2. There is a PTAS for the Farthest String Problem based on a linear programming relaxation technique. 3. There is a polynomial-time ( 4 3 + ) -approximation algorithm for the Closest String Problem for any small constant > 0. Using this algorithm, we also provide an efficient heuristic algorithm for the Closest Substring Problem. 4. The problem of finding a string that is at least Hamming distance d from as many strings in a set as possible, cannot be approximated within n in polynomial time for some fixed constant unless NP = P , where n is the number of strings in the set. An extended abstract of this paper appeared in Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms. ∗ Corresponding author. E-mail addresses: jklanctot@wh.math.uwaterloo.ca (J.K. Lanctot), mli@math.uwaterloo.ca (M. Li), bma@csd.uwo.ca (B. Ma), jwang@ca.pmc-vacc.com (S. Wang), matzlx@nus.edu.sg (L. Zhang). 1 Supported in part by the NSERC Research Grant OGP0046506, a CGAT grant, and the Steacie Fellowship. 2 The work was done in Kent Ridge Digtial Labs. 0890-5401/$ see front matter © 2003 Elsevier Science (USA). All rights reserved. doi:10.1016/S0890-5401(03)00057-9 42 J.K. Lanctot et al. / Information and Computation 185 (2003) 41–55 5. There is a polynomial-time 2-approximation for finding a string that is both the Closest Substring to one set, and the Farthest String from another set. © 2003 Elsevier Science (USA). All rights reserved.

Patent
29 Dec 1999
TL;DR: In this paper, a highly accurate technique for recognizing spoken digit strings is described, in which a spoken digit string is received and analyzed by a speech recognizer, which generates a list of hypothesized digit strings arranged in ranked order based on a likelihood of matching the spoken string.
Abstract: A highly accurate technique for recognizing spoken digit strings is described. A spoken digit string is received (14) and analyzed by a speech recognizer (18), which generates a list of hypothesized digit strings arranged in ranked order (16) based on a likelihood of matching the spoken digit string (20). The individual hypothesized strings are then analyzed in order beginning with the hypothesized string having the greatest likelihood of matching the spoken string to determine whether they satisfy a given constraint. The first hypothesized string in the list satisfying the constraint is selected as the recognized string (22).

Patent
24 Mar 1999
TL;DR: In this article, a dynamic shortest path tree (SPT) algorithm for a router determines a new SPT for a root node in response to a link-state or other network topology change.
Abstract: A dynamic shortest path tree (SPT) algorithm for a router determines a new SPT for a root node in response to a link-state or other network topology change. The dynamic SPT algorithm determines the new SPT as an optimization problem in a linear programming framework based in an existing SPT in the router. The dynamic SPT algorithm emulates maximum decrement of a ball and string model by iteratively selecting nodes of the existing SPT for consideration and update of parent node, child nodes, and distance attributes based on the maximum decrement. For the maximum decrement, a node in the existing SPT is selected by each iteration based on the greatest potential decrease (or least increase) in its distance attribute. The ball and string model that may be employed for the dynamic SPT algorithm represents a network of nodes and links with a ball representing a node and a string representing a link or edge. The length of a string is defined by its link's weight. The set of strings connecting the balls defines a path between the root node and a particular node. The shortest path is the path defined by the strings from a root node to a particular node that are tight. For the dynamic SPT algorithm, an increase (or decrease) in an edge weight in an existing SPT corresponds to a lengthening (or shortening) of a string. By sequentially pulling balls away in a single direction from the ball of the root node, the new SPT becomes defined by the balls and tight strings.

Journal ArticleDOI
TL;DR: This paper develops signiicantly faster algorithms for a special class of strings which emerge frequently in pattern matching problems, and compares this run-length encoded string against the ith row or column of each of the character image-models.

Journal ArticleDOI
TL;DR: The investigation described in this report documents the types of behavior that take place in the studios of nationally and/or regionally acclaimed string teachers whose instruction is based on the... as mentioned in this paper.
Abstract: The investigation described in this report documents the types of behavior that take place in the studios of nationally and/or regionally acclaimed string teachers whose instruction is based on the...

Patent
Anders Heie1
28 May 1999
TL;DR: In this paper, a user interface is provided that permits a replacement of text to be made if a defined term is detected, followed by further replacements of text if any defined terms are found within the first replacement text.
Abstract: A translator for wirelessly provided messages. A user interface is provided that permits a replacement of text to be made if a defined term is detected, followed by further replacements of text if any defined terms are found within the first replacement text. Replacement of text may also be staged, such that a first part of a replacement may be made, suspending the replacement until a condition is met. A user may fulfill the condition by entering in a string including a delimiter, whereupon a second part of the replacement is completed.

Journal ArticleDOI
TL;DR: This article identified teacher strategies for attracting school orchestra students to string teaching, and surveyed full-time string music education professors at 17 universities surveyed to identify teacher strategies to attract students to music education.
Abstract: The objective of this study was to identify teacher strategies for attracting school orchestra students to string teaching. Full-time string music education professors at 17 universities surveyed t...

Patent
Jason Zien1
19 Jan 1999
TL;DR: In this paper, a method, system, and article of manufacture for generating a list of candidate objects for a requested object is presented, wherein an identifier for the desired object is accepted, wherein the identifier comprises a target string.
Abstract: A method, system, and article of manufacture for generating a list of candidate objects for a requested object. An identifier for the requested object is accepted, wherein the identifier comprises a target string. A list of candidate objects is generated when the requested object cannot be found by performing a hierarchical string match for the target string against a set of source strings using multi-path dynamic programming, wherein the set of source strings represent a set of objects from which the list of candidate objects is generated.

Patent
09 Mar 1999
TL;DR: In this paper, an apparatus and method for displaying a page of a user interface of a computer system in multiple languages is presented. But it does not specify a set of variables to be associated with string values, and a processor selectively associates the variables with the string values that correspond to the language selected by the user.
Abstract: An apparatus and method for displaying a page of a user interface of computer system in multiple languages. The computer system includes a memory that contains a plurality of string values that correspond to text in different languages. A memory object represents a page of the user interface and includes a set of variables formatted to be associated with string values. A processor selectively associates the variables with the string values that correspond to the language selected by the user.