Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.

...read moreread less

Papers published on a yearly basis

1 / 3

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Z3-str: a z3-based string solver for web application analysis

[...]

Yunhui Zheng¹, Xiangyu Zhang¹, Vijay Ganesh²•Institutions (2)

Purdue University¹, University of Waterloo²

18 Aug 2013

TL;DR: A general purpose string solver, called Z3-str, is developed as an extension of the Z3 SMT solver through its plug-in interface, which treats strings as a primitive type, thus avoiding the inherent limitations observed in many existing solvers that encode strings in terms of other primitives.

...read moreread less

Abstract: Analyzing web applications requires reasoning about strings and non-strings cohesively. Existing string solvers either ignore non-string program behavior or support limited set of string operations. In this paper, we develop a general purpose string solver, called Z3-str, as an extension of the Z3 SMT solver through its plug-in interface. Z3-str treats strings as a primitive type, thus avoiding the inherent limitations observed in many existing solvers that encode strings in terms of other primitives. The logic of the plug-in has three sorts, namely, bool, int and string. The string-sorted terms include string constants and variables of arbitrary length, with functions such as concatenation, sub-string, and replace. The int-sorted terms are standard, with the exception of the length function over string terms. The atomic formulas are equations over string terms, and (in)-equalities over integer terms. Not only does our solver have features that enable whole program symbolic, static and dynamic analysis, but also it performs better than other solvers in our experiments. The application of Z3-str in remote code execution detection shows that its support of a wide spectrum of string operations is key to reducing false positives.

...read moreread less

205 citations

Journal Article•DOI•

High performance connected digit recognition using hidden Markov models

[...]

Lawrence R. Rabiner¹, Jay G. Wilpon¹, F.K. Soong¹•Institutions (1)

Bell Labs¹

01 Aug 1989-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: An enhanced analysis feature set consisting of both instantaneous and transitional spectral information is used and the hidden-Markov-model (HMM)-based connected-digit recognizer in speaker-trained, multispeaker, and speaker-independent modes is tested.

...read moreread less

Abstract: The authors use an enhanced analysis feature set consisting of both instantaneous and transitional spectral information and test the hidden-Markov-model (HMM)-based connected-digit recognizer in speaker-trained, multispeaker, and speaker-independent modes. For the evaluation, both a 50-talker connected-digit database recorded over local, dialed-up telephone lines, and the Texas Instruments, 225-adult-talker, connected-digits database are used. Using these databases, the performance achieved was 0.35, 1.65, and 1.75% string error rates for known-length strings, for speaker-trained, multispeaker, and speaker-independent modes, respectively, and 0.78, 2.85, and 2.94% string error rates for unknown-length strings of up to seven digits in length for the three modes. Several experiments were carried out to determine the best set of conditions (e.g., training, recognition, parameters, etc.) for recognition of digits. The results and the interpretation of these experiments are described. >

...read moreread less

205 citations

Patent•

Method and device for phonetically encoding Chinese textual data for data processing entry

[...]

Colman Bernath

26 May 1989

TL;DR: In this article, a method and device for coded entry of Chinese character text data into a word processing, display, printing, telecommunication, etc. system is presented, where an electronic input keyboard is used that has keys marked with phonetic notations suitable to represent Chinese speech sounds, as well as a set of "character position keys," operated by the following encoding rules: (1) the text is divided into blocks of characters, where one block may contain one or more characters, each block to be encoded by one uninterrupted typing sequence; (2) if the pronunciation of

...read moreread less

Abstract: A method and device for coded entry of Chinese character text data into a word processing, display, printing, telecommunication, etc. system. In the principal embodiment of the invention, an electronic input keyboard is used that has keys marked with phonetic notations suitable to represent Chinese speech sounds, as well as a set of "character position keys," operated by the following encoding rules: (1) the text is divided into blocks of characters, where one block may contain one or more characters, each block to be encoded by one uninterrupted typing sequence; (2) if the pronunciation of a block is unique, encoding is done simply by entering on the keyboard the phonetic data of the character(s) making up the block; (3) if the pronunciation of a block is not unique, first the phonetic data of a string of characters making up a longer block is entered, the pronunciation of that longer block being unique and the block to be encoded being a part of the longer block, and then by using the "character position keys" the operator enters the "position data," that is, the position(s) which the character(s) of the block to be encoded occupy within that longer block. In an alternative embodiment, part or all of the phonetic data of the characters are entered into the encoding apparatus, not by keyboard means, but by the use of an acoustic speech sound analyzer.

...read moreread less

204 citations

Proceedings Article•DOI•

Word identification for Mandarin Chinese sentences

[...]

Keh-Jiann Chen¹, Shing-Huan Liu¹•Institutions (1)

Academia Sinica¹

23 Aug 1992

TL;DR: This paper proposes a matching algorithm with 6 different heuristic rules to resolve the ambiguities of Chinese sentences and supports that the maximal matching algorithm is the most effective heuristics.

...read moreread less

Abstract: Chinese sentences are composed with string of characters without blanks to mark words. However the basic unit for sentence parsing and understanding is word. Therefore the first step of processing Chinese sentences is to identify the words. The difficulties of identifying words include (1) the identification of complex words, such as Determinative-Measure, reduplications, derived words etc., (2) the identification of proper names, (3) resolving the ambiguous segmentations. In this paper, we propose the possible solutions for the above difficulties. We adopt a matching algorithm with 6 different heuristic rules to resolve the ambiguities and achieve an 99.77% of the success rate. The statistical data supports that the maximal matching algorithm is the most effective heuristics.

...read moreread less

204 citations

Patent•

Method and system for approximate string matching

[...]

Alexei Nevidomski¹, Pavel Volkov¹•Institutions (1)

IBM¹

16 Jun 2005

TL;DR: In this article, a trie data structure has a root node and generations of child nodes each node representing at least one character in an alphabet to provide a lexicon of words and word fragments.

...read moreread less

Abstract: A method and system are provided for approximate string matching of a target string to a trie data structure. The trie data structure has a root node and generations of child nodes each node representing at least one character in an alphabet to provide a lexicon of words and word fragments. The method involves traversing the trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string and adding characters traversed in a branch of the trie data structure to a gathered string to provide suggestions of approximate matches. If the method reaches a node flagged as a node for a word or a word fragment and, if the target string is longer than the gathered string, the method loops back to the root node, and continues the traverse from the root node. This enables the trie data structure to use word fragments for compound words and to split non-delimited words where appropriate. The method also includes, at each node, determining if there is a correction rule for one or more characters in the remainder of the target string from the current node, and if so, applying the correction rule to the target string to obtain a modified target string.

...read moreread less

204 citations

Collapse

Network Information

Performance

Metrics

19,430

Papers

362,272

Citations

No. of papers in the topic in previous years
Year	Papers
2022	2
2021	491
2020	704
2019	759
2018	816
2017	806

String (computer science)

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics