scispace - formally typeset
Search or ask a question
Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.


Papers
More filters
Patent
27 May 1998
TL;DR: In this article, a device and method for converting product-specific identification numbers associated with bar code indicia on pharmaceutical products to an industry standard identification number is presented, which can include a removable member for interchanging and updating bar code indicators information rather than reprogramming the device.
Abstract: A device and method is provided for converting product-specific identification numbers associated with bar code indicia on pharmaceutical products to an industry standard identification number. The process involves reading a bar code indicia, converting the indicia into an input string and standardizing the input string by means of adding or subtracting characters in accordance with rules based on the bar code type and length of the input string. By means of the invention pharmaceutical products of two different sources may be compared to determine if they contain the same drug as determined by the standard identification number. The device can include a removable member for interchanging and updating bar code indicia information rather than reprogramming the device.

66 citations

Patent
13 Jul 1994
TL;DR: In this article, a method of making a speech recognition model database is disclosed, which is formed based on a training string utterance signal and a plurality of sets of current speech recognition models.
Abstract: A method of making a speech recognition model database is disclosed. The database is formed based on a training string utterance signal and a plurality of sets of current speech recognition models. The sets of current speech recognition models may include acoustic models, language models, and other knowledge sources. In accordance with an illustrative embodiment of the invention, a set of confusable string models is generated, each confusable string model comprising speech recognition models from two or more sets of speech recognition models (such as acoustic and language models). A first scoring signal is generated based on the training string utterance signal and a string model for that utterance, wherein the string model for the utterance comprises speech recognition models from two or more sets of speech recognition models. One or more second scoring signals are also generated, wherein a second scoring signal is based on the training string utterance signal and a confusable string model. A misrecognition signal is generated based on the first scoring signal and the one or more second scoring signals. Current speech recognition models are modified, based on the misrecognition signal to increase the probability that a correct string model will have a rank order higher than other confusable string models.

66 citations

Journal ArticleDOI
TL;DR: This work claims that the traditional implementations of strings, and often the supported functionality, are not well suited to general‐purpose use and presents ‘ropes’ or ‘heavyweight’ strings as an alternative that leads to systems that are more robust, both in functionality and in performance.
Abstract: Programming languages generally provide a ‘string’ or ‘text’ type to allow manipulation of sequences of characters. This type is usually of crucial importance, since it is normally mentioned in most interfaces between system components. We claim that the traditional implementations of strings, and often the supported functionality, are not well suited to such general-purpose use. They should be confined to applications with specific, and unusual, performance requirements. We present ‘ropes’ or ‘heavyweight’ strings as an alternative that, in our experience leads to systems that are more robust, both in functionality and in performance. Ropes have been in use in the Cedar environment almost since its inception, but this appears to be neither well-known, nor discussed in the literature. The algorithms have been gradually refined. We have also recently built a second similar, but somewhat lighter weight, C-language implementation, which is included in our publically released garbage collector distribution. We describe the algorithms used in both, and give some performance measurements for the C version.

66 citations

Book ChapterDOI
18 Jun 2008
TL;DR: This paper improves very much this bound using a combination of theory and computer verification on the number of occurrences of maximal repetitions in a string of length n.
Abstract: The "runs" conjecture, proposed by [Kolpakov and Kucherov, 1999], states that the number of occurrences of maximal repetitions (runs) in a string of length nis at most n. The best bound to date, due to [Crochemore and Ilie, 2007], is 1.6n. Here we improve very much this bound using a combination of theory and computer verification. Our best bound is 1.048nbut actually solving the conjecture seems to be now only a matter of time.

66 citations

Proceedings Article
01 Jan 2002
TL;DR: The Question Answering approach applied first at TREC-10 QA track and developed systematically in TREC 2002 experiments is described, based on the assumption that answers can be identified by their correspondence to formulas describing the structure of strings carrying certain (generalized) semantics, supposed by the question type.
Abstract: The paper describes the Question Answering approach applied first at TREC-10 QA track and developed systematically in TREC 2002 experiments. The approach is based on the assumption that answers can be identified by their correspondence to formulas describing the structure of strings carrying certain (generalized) semantics, supposed by the question type. These formulas, or patterns, are like regular expressions but include elements corresponding to predefined lists of terms. Complex patterns can be constructed from blocks corresponding to such semantic entities as persons' or organizations' names, posts, dates, locations, etc. Using various combinations of blocks and intermediate syntactic elements allows to build a great variety of patterns. Exact position of elements corresponding to the "exact answer" was localized within the structure of each pattern. Each pattern is characterized by a generalized semantics, thus the pattern-matching string must be checked for correlation with the question terms and/or their synonyms/substitutes. Essentials of the Approach In 2002 TREC QA track tests we have further developed the approach described in [Soubbotin, 2001]. In general, our method lies in the domain of approaches examining the potential of information extraction for question answering tasks [Srihari, Wei Li, 1999; De Boni, 2001]. The evolution of IE systems, as represented, in particular, at Message Understanding Conferences (MUCs), shows a certain shift from deep text analysis based on computational linguistic and NLP methods to surface techniques [Eagles, 1998]. Our approach can be considered as being in line with this tendency. More specifically, our approach is based on the use of formulas describing the structure of strings likely bearing certain semantic information. For example, string "FBI Director Louis Freeh" can be recognized, according to one of such formulas, as likely bearing the following information: a person represented by his/her first and last names occupies a (leading) post in an organization. The formula for this string is: a word composed of capital letters; an item from the list of posts in an organization; an item from the list of first names; a capitalized word. We can mark two first items in this formula as "exact answer", if we want to get answer to the question "Who is Louis Freeh?", and two last items, if the question is "Who is FBI head?" (question 1583 at TREC 2002). First used at TREC-10 QA track, formulas of such kind were called "patterns" [Soubbotin M.M and Soubbotin S.M, 2001]. The term "pattern" is widely used in the field of Information Extraction. Our concept of patterns as structural formulas for strings is obviously different from that in "traditional" IE field, but keeping this difference in mind, we consider it convenient to use this term. Each pattern is characterized by a certain generalized semantics, because the formulas' items refer to certain semantic categories (e.g., "posts") and not to specific semantic units (e.g., "president", "head", "director"). Therefore, after a string corresponding to a formula is recognized, the next step is to identify the question terms (or their synonyms/substitutes) within it or in its surrounding. To increase the likelihood of getting the right answer, the surrounding of the found string must be checked for the presence of expressions negating its semantics (e.g., "former", "-elect", "deputy", etc., located before or after the term from the list of posts). After a question's type is defined (e.g., question about a person occupying certain post in an organization, question about husband/wife/relative of a person, question about acronym, etc.), a set of formulas, prepared for this type, is applied to match the strings in question-relevant passages. Our approach does not need to distinguish linguistic entities in the text. We handle the source text strictly as string, i.e. consisting only of characters. The patterns used in our QA approach are aimed only at recognizing sequences of elements that correspond to the predefined formulas. As surface patterns, our formulas for strings are similar to wrappers [Adams, 2001; Kushmerick, 2000] and look like regular expressions. However, patterns used by the wrapper techniques are mostly resource-specific, they relate to the document formats rather than the ways information is presented in written texts per se. As for difference from regular expressions, it is worth noting that patterns, that we use, include elements referring to the lists of predefined words/phrases. Currently, increased attention is seen on surface approaches in QA. In some recent publications surface patterns similar to those used by us were discussed [Magnini, et al., 2002; Brill, et al., 2002; Brill, et al., 2001; Ravichandran and Hovy, 2002; Hovy et al., 2002]. Patterns and Question Types The IE task, as presented at its main forum the Message Understanding Conferences (MUCs), is focused on certain topics, or domains (Terrorism, Management Successions, Natural Disasters, Outbreaks of Infectious Diseases, etc.). The QA task requires another way to categorize the addressed Information. The usual praxis of TRECs' QA tracks participants is to predefine a set of potential question types. The questions accumulated from several TRECs represent a good source for defining question types on a more or less detailed basis. The paradigm of "information categories" defined by question types (in contrast to "topic/domain" paradigm) allows to create systematically a variety of patterns, basing on potential semantic relationships inside each question category. So, for the question type "Who is person X?" we can presuppose among the main alternative possibilities that this person is known for the (top-level) position he/she occupies in a organization, company or government; for his/her contributions as author, inventor, founder, etc.; as outstanding figure in a professional area; as wife/husband/relative of a well-known person; as involved in well-known event (e.g., as a criminal/perpetrator). In each case, a relationship is established between two or more entities: person, post, and organization/company; author and work; etc. The same entities are present if the Who-questions refer to posts, authors, etc. (e.g., "Who occupies the post Y in the organization Z?".) For most Where-questions, we can suggest geographical items as answers. This is achieved by constructing structural formulas like: item from the list of cities/towns/counties, etc.; comma; item from the list of countries/states. There are question types suggesting as answers combinations of digits with units of measurement or currencies names. Completeness of lists corresponding to "semantic" pattern elements is evidently important (e.g., the list of currencies must include not frequently used words, such as "dlrs"). The type of the processed question is defined basing both on its interrogative and on the presence of words/expressions that are included in the list of characteristic terms for the corresponding question type.

66 citations


Network Information
Related Topics (5)
Time complexity
36K papers, 879.5K citations
88% related
Tree (data structure)
44.9K papers, 749.6K citations
86% related
Graph (abstract data type)
69.9K papers, 1.2M citations
85% related
Computational complexity theory
30.8K papers, 711.2K citations
82% related
Supervised learning
20.8K papers, 710.5K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20222
2021491
2020704
2019759
2018816
2017806