scispace - formally typeset
Search or ask a question
Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.


Papers
More filters
Proceedings ArticleDOI
07 Apr 2008
TL;DR: A programmatic framework of record matching that takes such user-defined string transformations as input, and is the first proposal for such a framework to be proposed.
Abstract: Today's record matching infrastructure does not allow a flexible way to account for synonyms such as "Robert" and "Bob" which refer to the same name, and more general forms of string transformations such as abbreviations. We propose a programmatic framework of record matching that takes such user-defined string transformations as input. To the best of our knowledge, this is the first proposal for such a framework. This transformational framework, while expressive, poses significant computational challenges which we address. We empirically evaluate our techniques over real data.

151 citations

Proceedings ArticleDOI
Fan Bai1, Zhanzhan Cheng, Yi Niu, Shiliang Pu, Shuigeng Zhou1 
18 Jun 2018
TL;DR: Zhang et al. as discussed by the authors proposed a novel method called edit probability (EP) for scene text recognition, which tries to estimate the probability of generating a string from the output sequence of probability distribution conditioned on the input image, while considering the possible occurrences of missing/superfluous characters.
Abstract: We consider the scene text recognition problem under the attention-based encoder-decoder framework, which is the state of the art. The existing methods usually employ a frame-wise maximal likelihood loss to optimize the models. When we train the model, the misalignment between the ground truth strings and the attention's output sequences of probability distribution, which is caused by missing or superfluous characters, will confuse and mislead the training process, and consequently make the training costly and degrade the recognition accuracy. To handle this problem, we propose a novel method called edit probability (EP) for scene text recognition. EP tries to effectively estimate the probability of generating a string from the output sequence of probability distribution conditioned on the input image, while considering the possible occurrences of missing/superfluous characters. The advantage lies in that the training process can focus on the missing, superfluous and unrecognized characters, and thus the impact of the misalignment problem can be alleviated or even overcome. We conduct extensive experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets. Experimental results show that the EP can substantially boost scene text recognition performance.

151 citations

Journal ArticleDOI
01 Feb 1997
TL;DR: An off-line handwritten word recognition system that assigns confidence that pairs of segments are compatible with character confidence assignments and that this confidence is integrated into the dynamic programming is described.
Abstract: An off-line handwritten word recognition system is described. Images of handwritten words are matched to lexicons of candidate strings. A word image is segmented into primitives. The best match between sequences of unions of primitives and a lexicon string is found using dynamic programming. Neural networks assign match scores between characters and segments. Two particularly unique features are that neural networks assign confidence that pairs of segments are compatible with character confidence assignments and that this confidence is integrated into the dynamic programming. Experimental results are provided on data from the U.S. Postal Service.

151 citations

Journal ArticleDOI
TL;DR: Three experiments examined the role of orthographic and phonotactic rules in the tachistoscopic recognition of letter strings and demonstrated that the perceptual accuracy for a string is correlated with the number of recoding steps needed to convert that string into speech.
Abstract: Three experiments examined the role of orthographic and phonotactic rules in the tachistoscopic recognition of letter strings. Experiment 1 showed that the presence of a vowel or multiletter spelling patterns facilitates perceptual accuracy. To account for these results a model was proposed in which an input string is first parsed into syllablelike units, which are then recorded into speech. It was demonstrated that the perceptual accuracy for a string is correlated with the number of recoding steps needed to convert that string into speech. Experiment 2 further demonstrated that this recoding process can predict perceptibility differences among strings with varying numbers of phonotactic violations, and Experiment 3 assessed some of the specific assumptions of the recoding process.

150 citations

Book ChapterDOI
18 Jul 2014
TL;DR: A set of algebraic techniques for solving constraints over the theory of unbounded strings natively, without reduction to other problems are presented and implemented in the SMT solver cvc4 to expand its already large set of built-in theories to a theory of strings with concatenation, length, and membership in regular languages.
Abstract: An increasing number of applications in verification and security rely on or could benefit from automatic solvers that can check the satisfiability of constraints over a rich set of data types that includes character strings. Unfortunately, most string solvers today are standalone tools that can reason only about (some fragment) of the theory of strings and regular expressions, sometimes with strong restrictions on the expressiveness of their input language. These solvers are based on reductions to satisfiability problems over other data types, such as bit vectors, or to automata decision problems. We present a set of algebraic techniques for solving constraints over the theory of unbounded strings natively, without reduction to other problems. These techniques can be used to integrate string reasoning into general, multi-theory SMT solvers based on the DPLL(T) architecture. We have implemented them in our SMT solver cvc4 to expand its already large set of built-in theories to a theory of strings with concatenation, length, and membership in regular languages. Our initial experimental results show that, in addition, over pure string problems, cvc4 is highly competitive with specialized string solvers with a comparable input language.

148 citations


Network Information
Related Topics (5)
Time complexity
36K papers, 879.5K citations
88% related
Tree (data structure)
44.9K papers, 749.6K citations
86% related
Graph (abstract data type)
69.9K papers, 1.2M citations
85% related
Computational complexity theory
30.8K papers, 711.2K citations
82% related
Supervised learning
20.8K papers, 710.5K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20222
2021491
2020704
2019759
2018816
2017806