scispace - formally typeset
Search or ask a question
Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.


Papers
More filters
Patent
Eric D. Brill1
29 Jul 2003
TL;DR: In this paper, a linguistic disambiguation system and method created a knowledge base by training on patterns in strings that contain ambiguity sites, described by a set of reduced regular expressions (RREs) or very reduced regular expression (VRREs).
Abstract: A linguistic disambiguation system and method creates a knowledge base by training on patterns in strings that contain ambiguity sites. The string patterns are described by a set of reduced regular expressions (RREs) or very reduced regular expressions (VRREs). The knowledge base utilizes the RREs or VRREs to resolve ambiguity based upon the strings in which the ambiguity occurs. The system is trained on a training set, such as a properly labeled corpus. Once trained, the system may then apply the knowledge base to raw input strings that contain ambiguity sites. The system uses the RRE- and VRRE-based knowledge base to disambiguate the sites.

80 citations

Journal ArticleDOI
25 Oct 2009
TL;DR: PRuby is presented, an extension to Diamondback Ruby (DRuby), a static type inference system for Ruby that augments DRuby with a novel dynamic analysis and transformation that allows us to precisely type uses of highly dynamic constructs.
Abstract: Many popular scripting languages such as Ruby, Python, and Perl include highly dynamic language constructs, such as an eval method that evaluates a string as program text. While these constructs allow terse and expressive code, they have traditionally obstructed static analysis. In this paper we present PRuby, an extension to Diamondback Ruby (DRuby), a static type inference system for Ruby. PRuby augments DRuby with a novel dynamic analysis and transformation that allows us to precisely type uses of highly dynamic constructs. PRuby's analysis proceeds in three steps. First, we use run-time instrumentation to gather per-application profiles of dynamic feature usage. Next, we replace dynamic features with statically analyzable alternatives based on the profile. We also add instrumentation to safely handle cases when subsequent runs do not match the profile. Finally, we run DRuby's static type inference on the transformed code to enforce type safety.We used PRuby to gather profiles for a benchmark suite of sample Ruby programs. We found that dynamic features are pervasive throughout the benchmarks and the libraries they include, but that most uses of these features are highly constrained and hence can be effectively profiled. Using the profiles to guide type inference, we found that DRuby can generally statically type our benchmarks modulo some refactoring, and we discovered several previously unknown type errors. These results suggest that profiling and transformation is a lightweight but highly effective approach to bring static typing to highly dynamic languages.

80 citations

Patent
03 Jan 1992
TL;DR: In this paper, a method and apparatus for extracting multi-word technical terms from a text file in a computer system is presented, where word string which occur less than a specified minimum number of times in the text file are deleted.
Abstract: A method and apparatus for extracting multi-word technical terms from a text file in a computer system. Word strings are selected from the text that have at least two words, that have at most a specified maximum number of words, that include none of a special set of selected tokens, and that only include selected characters. Word string which occur less than a specified minimum number of times in the text file are deleted. The remaining strings form a set of word strings very likely to be multi-word technical terms. Improvements on the quality of the set of word strings can be accomplished by deleting word strings which do not satisfy certain grammatical constraints.

80 citations

Book ChapterDOI
28 Mar 1998
TL;DR: A new technique is presented that allows the application of the well known and established interprocedural analysis theory to loops and is implemented in the Program Analyzer Generator PAG, which is used to demonstrate the findings by applying the techniques to several real world programs.
Abstract: Programs spend most of their time in loops and procedures. Therefore, most program transformations and the necessary static analyses deal with these. It has been long recognized, that different execution contexts for procedures may induce different execution properties. There are well established techniques for interprocedural analysis like the call string approach. Loops have not received similar attention in the area of data flow analysis and abstract interpretation. All executions are treated in the same way, although typically the first and later executions may exhibit very different properties. In this paper a new technique is presented that allows the application of the well known and established interprocedural analysis theory to loops. It turns out that the call string approach has limited flexibility in its possibilities to group several calling contexts together for the analysis. An extension to overcome this problem is presented that relies on a similar approach but gives more useful results in practice. The classical and the new techniques are implemented in our Program Analyzer Generator PAG, which is used to demonstrate our findings by applying the techniques to several real world programs.

80 citations

Proceedings ArticleDOI
Rens Bod1
21 Apr 1993
TL;DR: It is proved that it is not possible to create for every DOP-model a strongly equivalent stochastic CFG which also assigns the same probabilities to the parses, and it is shown that the maximum probability parse can be estimated in polynomial time by applying Monte Carlo techniques.
Abstract: In Data Oriented Parsing (DOP), an annotated corpus is used as a stochastic grammar. An input string is parsed by combining subtrees from the corpus. As a consequence, one parse tree can usually be generated by several derivations that involve different subtrees. This leads to a statistics where the probability of a parse is equal to the sum of the probabilities of all its derivations. In (Scha, 1990) an informal introduction to DOP is given, while (Bod, 1992a) provides a formalization of the theory. In this paper we compare DOP with other stochastic grammars in the context of Formal Language Theory. It it proved that it is not possible to create for every DOP-model a strongly equivalent stochastic CFG which also assigns the same probabilities to the parses. We show that the maximum probability parse can be estimated in polynomial time by applying Monte Carlo techniques. The model was tested on a set of hand-parsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments yield 96% test set parsing accuracy.

80 citations


Network Information
Related Topics (5)
Time complexity
36K papers, 879.5K citations
88% related
Tree (data structure)
44.9K papers, 749.6K citations
86% related
Graph (abstract data type)
69.9K papers, 1.2M citations
85% related
Computational complexity theory
30.8K papers, 711.2K citations
82% related
Supervised learning
20.8K papers, 710.5K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20222
2021491
2020704
2019759
2018816
2017806