Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.

...read moreread less

Papers published on a yearly basis

1 / 3

Papers

PDF

Open Access

More filters

Patent•

Linguistic disambiguation system and method using string-based pattern training to learn to resolve ambiguity sites

[...]

Eric D. Brill¹•Institutions (1)

Microsoft¹

29 Jul 2003

TL;DR: In this paper, a linguistic disambiguation system and method created a knowledge base by training on patterns in strings that contain ambiguity sites, described by a set of reduced regular expressions (RREs) or very reduced regular expression (VRREs).

...read moreread less

Abstract: A linguistic disambiguation system and method creates a knowledge base by training on patterns in strings that contain ambiguity sites. The string patterns are described by a set of reduced regular expressions (RREs) or very reduced regular expressions (VRREs). The knowledge base utilizes the RREs or VRREs to resolve ambiguity based upon the strings in which the ambiguity occurs. The system is trained on a training set, such as a properly labeled corpus. Once trained, the system may then apply the knowledge base to raw input strings that contain ambiguity sites. The system uses the RRE- and VRRE-based knowledge base to disambiguate the sites.

...read moreread less

80 citations

Journal Article•DOI•

Profile-guided static typing for dynamic scripting languages

[...]

Michael Furr¹, Jong-hoon (David) An¹, Jeffrey S. Foster¹•Institutions (1)

University of Maryland, College Park¹

25 Oct 2009

TL;DR: PRuby is presented, an extension to Diamondback Ruby (DRuby), a static type inference system for Ruby that augments DRuby with a novel dynamic analysis and transformation that allows us to precisely type uses of highly dynamic constructs.

...read moreread less

Abstract: Many popular scripting languages such as Ruby, Python, and Perl include highly dynamic language constructs, such as an eval method that evaluates a string as program text. While these constructs allow terse and expressive code, they have traditionally obstructed static analysis. In this paper we present PRuby, an extension to Diamondback Ruby (DRuby), a static type inference system for Ruby. PRuby augments DRuby with a novel dynamic analysis and transformation that allows us to precisely type uses of highly dynamic constructs. PRuby's analysis proceeds in three steps. First, we use run-time instrumentation to gather per-application profiles of dynamic feature usage. Next, we replace dynamic features with statically analyzable alternatives based on the profile. We also add instrumentation to safely handle cases when subsequent runs do not match the profile. Finally, we run DRuby's static type inference on the transformed code to enforce type safety.We used PRuby to gather profiles for a benchmark suite of sample Ruby programs. We found that dynamic features are pervasive throughout the benchmarks and the libraries they include, but that most uses of these features are highly constrained and hence can be effectively profiled. Using the profiles to guide type inference, we found that DRuby can generally statically type our benchmarks modulo some refactoring, and we discovered several previously unknown type errors. These results suggest that profiling and transformation is a lightweight but highly effective approach to bring static typing to highly dynamic languages.

...read moreread less

80 citations

Patent•

Method for extracting multi-word technical terms from text

[...]

Roy J. Byrd¹, John S. Justeson¹, Slava M. Katz¹•Institutions (1)

IBM¹

03 Jan 1992

TL;DR: In this paper, a method and apparatus for extracting multi-word technical terms from a text file in a computer system is presented, where word string which occur less than a specified minimum number of times in the text file are deleted.

...read moreread less

Abstract: A method and apparatus for extracting multi-word technical terms from a text file in a computer system. Word strings are selected from the text that have at least two words, that have at most a specified maximum number of words, that include none of a special set of selected tokens, and that only include selected characters. Word string which occur less than a specified minimum number of times in the text file are deleted. The remaining strings form a set of word strings very likely to be multi-word technical terms. Improvements on the quality of the set of word strings can be accomplished by deleting word strings which do not satisfy certain grammatical constraints.

...read moreread less

80 citations

Book Chapter•DOI•

Analysis of Loops

[...]

Florian Martin, Martin Alt, Reinhard Wilhelm, Christian Ferdinand

28 Mar 1998

TL;DR: A new technique is presented that allows the application of the well known and established interprocedural analysis theory to loops and is implemented in the Program Analyzer Generator PAG, which is used to demonstrate the findings by applying the techniques to several real world programs.

...read moreread less

Abstract: Programs spend most of their time in loops and procedures. Therefore, most program transformations and the necessary static analyses deal with these. It has been long recognized, that different execution contexts for procedures may induce different execution properties. There are well established techniques for interprocedural analysis like the call string approach. Loops have not received similar attention in the area of data flow analysis and abstract interpretation. All executions are treated in the same way, although typically the first and later executions may exhibit very different properties. In this paper a new technique is presented that allows the application of the well known and established interprocedural analysis theory to loops. It turns out that the call string approach has limited flexibility in its possibilities to group several calling contexts together for the analysis. An extension to overcome this problem is presented that relies on a similar approach but gives more useful results in practice. The classical and the new techniques are implemented in our Program Analyzer Generator PAG, which is used to demonstrate our findings by applying the techniques to several real world programs.

...read moreread less

80 citations

Proceedings Article•DOI•

Using an annotated corpus as a stochastic grammar

[...]

Rens Bod¹•Institutions (1)

University of Amsterdam¹

21 Apr 1993

TL;DR: It is proved that it is not possible to create for every DOP-model a strongly equivalent stochastic CFG which also assigns the same probabilities to the parses, and it is shown that the maximum probability parse can be estimated in polynomial time by applying Monte Carlo techniques.

...read moreread less

Abstract: In Data Oriented Parsing (DOP), an annotated corpus is used as a stochastic grammar. An input string is parsed by combining subtrees from the corpus. As a consequence, one parse tree can usually be generated by several derivations that involve different subtrees. This leads to a statistics where the probability of a parse is equal to the sum of the probabilities of all its derivations. In (Scha, 1990) an informal introduction to DOP is given, while (Bod, 1992a) provides a formalization of the theory. In this paper we compare DOP with other stochastic grammars in the context of Formal Language Theory. It it proved that it is not possible to create for every DOP-model a strongly equivalent stochastic CFG which also assigns the same probabilities to the parses. We show that the maximum probability parse can be estimated in polynomial time by applying Monte Carlo techniques. The model was tested on a set of hand-parsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments yield 96% test set parsing accuracy.

...read moreread less

80 citations

Collapse

Network Information

Performance

Metrics

19,430

Papers

362,272

Citations

No. of papers in the topic in previous years
Year	Papers
2022	2
2021	491
2020	704
2019	759
2018	816
2017	806

String (computer science)

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics