scispace - formally typeset
Search or ask a question
Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.


Papers
More filters
Proceedings ArticleDOI
18 Aug 2013
TL;DR: A general purpose string solver, called Z3-str, is developed as an extension of the Z3 SMT solver through its plug-in interface, which treats strings as a primitive type, thus avoiding the inherent limitations observed in many existing solvers that encode strings in terms of other primitives.
Abstract: Analyzing web applications requires reasoning about strings and non-strings cohesively. Existing string solvers either ignore non-string program behavior or support limited set of string operations. In this paper, we develop a general purpose string solver, called Z3-str, as an extension of the Z3 SMT solver through its plug-in interface. Z3-str treats strings as a primitive type, thus avoiding the inherent limitations observed in many existing solvers that encode strings in terms of other primitives. The logic of the plug-in has three sorts, namely, bool, int and string. The string-sorted terms include string constants and variables of arbitrary length, with functions such as concatenation, sub-string, and replace. The int-sorted terms are standard, with the exception of the length function over string terms. The atomic formulas are equations over string terms, and (in)-equalities over integer terms. Not only does our solver have features that enable whole program symbolic, static and dynamic analysis, but also it performs better than other solvers in our experiments. The application of Z3-str in remote code execution detection shows that its support of a wide spectrum of string operations is key to reducing false positives.

205 citations

Journal ArticleDOI
TL;DR: An enhanced analysis feature set consisting of both instantaneous and transitional spectral information is used and the hidden-Markov-model (HMM)-based connected-digit recognizer in speaker-trained, multispeaker, and speaker-independent modes is tested.
Abstract: The authors use an enhanced analysis feature set consisting of both instantaneous and transitional spectral information and test the hidden-Markov-model (HMM)-based connected-digit recognizer in speaker-trained, multispeaker, and speaker-independent modes. For the evaluation, both a 50-talker connected-digit database recorded over local, dialed-up telephone lines, and the Texas Instruments, 225-adult-talker, connected-digits database are used. Using these databases, the performance achieved was 0.35, 1.65, and 1.75% string error rates for known-length strings, for speaker-trained, multispeaker, and speaker-independent modes, respectively, and 0.78, 2.85, and 2.94% string error rates for unknown-length strings of up to seven digits in length for the three modes. Several experiments were carried out to determine the best set of conditions (e.g., training, recognition, parameters, etc.) for recognition of digits. The results and the interpretation of these experiments are described. >

205 citations

Patent
26 May 1989
TL;DR: In this article, a method and device for coded entry of Chinese character text data into a word processing, display, printing, telecommunication, etc. system is presented, where an electronic input keyboard is used that has keys marked with phonetic notations suitable to represent Chinese speech sounds, as well as a set of "character position keys," operated by the following encoding rules: (1) the text is divided into blocks of characters, where one block may contain one or more characters, each block to be encoded by one uninterrupted typing sequence; (2) if the pronunciation of
Abstract: A method and device for coded entry of Chinese character text data into a word processing, display, printing, telecommunication, etc. system. In the principal embodiment of the invention, an electronic input keyboard is used that has keys marked with phonetic notations suitable to represent Chinese speech sounds, as well as a set of "character position keys," operated by the following encoding rules: (1) the text is divided into blocks of characters, where one block may contain one or more characters, each block to be encoded by one uninterrupted typing sequence; (2) if the pronunciation of a block is unique, encoding is done simply by entering on the keyboard the phonetic data of the character(s) making up the block; (3) if the pronunciation of a block is not unique, first the phonetic data of a string of characters making up a longer block is entered, the pronunciation of that longer block being unique and the block to be encoded being a part of the longer block, and then by using the "character position keys" the operator enters the "position data," that is, the position(s) which the character(s) of the block to be encoded occupy within that longer block. In an alternative embodiment, part or all of the phonetic data of the characters are entered into the encoding apparatus, not by keyboard means, but by the use of an acoustic speech sound analyzer.

204 citations

Proceedings ArticleDOI
23 Aug 1992
TL;DR: This paper proposes a matching algorithm with 6 different heuristic rules to resolve the ambiguities of Chinese sentences and supports that the maximal matching algorithm is the most effective heuristics.
Abstract: Chinese sentences are composed with string of characters without blanks to mark words. However the basic unit for sentence parsing and understanding is word. Therefore the first step of processing Chinese sentences is to identify the words. The difficulties of identifying words include (1) the identification of complex words, such as Determinative-Measure, reduplications, derived words etc., (2) the identification of proper names, (3) resolving the ambiguous segmentations. In this paper, we propose the possible solutions for the above difficulties. We adopt a matching algorithm with 6 different heuristic rules to resolve the ambiguities and achieve an 99.77% of the success rate. The statistical data supports that the maximal matching algorithm is the most effective heuristics.

204 citations

Patent
Alexei Nevidomski1, Pavel Volkov1
16 Jun 2005
TL;DR: In this article, a trie data structure has a root node and generations of child nodes each node representing at least one character in an alphabet to provide a lexicon of words and word fragments.
Abstract: A method and system are provided for approximate string matching of a target string to a trie data structure. The trie data structure has a root node and generations of child nodes each node representing at least one character in an alphabet to provide a lexicon of words and word fragments. The method involves traversing the trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string and adding characters traversed in a branch of the trie data structure to a gathered string to provide suggestions of approximate matches. If the method reaches a node flagged as a node for a word or a word fragment and, if the target string is longer than the gathered string, the method loops back to the root node, and continues the traverse from the root node. This enables the trie data structure to use word fragments for compound words and to split non-delimited words where appropriate. The method also includes, at each node, determining if there is a correction rule for one or more characters in the remainder of the target string from the current node, and if so, applying the correction rule to the target string to obtain a modified target string.

204 citations


Network Information
Related Topics (5)
Time complexity
36K papers, 879.5K citations
88% related
Tree (data structure)
44.9K papers, 749.6K citations
86% related
Graph (abstract data type)
69.9K papers, 1.2M citations
85% related
Computational complexity theory
30.8K papers, 711.2K citations
82% related
Supervised learning
20.8K papers, 710.5K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20222
2021491
2020704
2019759
2018816
2017806