scispace - formally typeset
Search or ask a question
Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.


Papers
More filters
Journal Article
TL;DR: This work shows that for the simplest form of statistical models, this problem is NP-complete, i.e., probably exponential in the length of the observed sentence, and traces this complexity to factors not present in other decoding problems.
Abstract: Statistical machine translation is a relatively new approach to the long-standing problem of translating human languages by computer. Current statistical techniques uncover translation rules from bilingual training texts and use those rules to translate new texts. The general architecture is the source-channel model: an English string is statistically generated (source), then statistically transformed into French (channel). In order to translate (or "decode") a French string, we look for the most likely English source. We show that for the simplest form of statistical models, this problem is NP-complete, i.e., probably exponential in the length of the observed sentence. We trace this complexity to factors not present in other decoding problems.

353 citations

Proceedings ArticleDOI
01 Nov 2016
TL;DR: This work investigates whether a neural, encoderdecoder translation system learns syntactic information on the source side as a by-product of training and proposes two methods to detect whether the encoder has learned local and global source syntax.
Abstract: We investigate whether a neural, encoderdecoder translation system learns syntactic information on the source side as a by-product of training. We propose two methods to detect whether the encoder has learned local and global source syntax. A fine-grained analysis of the syntactic structure learned by the encoder reveals which kinds of syntax are learned and which are missing.

352 citations

Journal ArticleDOI
TL;DR: A generalization of string matching, in which the pattern is a sequence of pattern elements, each compatible with a set of symbols, is investigated, which shows that generalized string matching requires a time-space product of $\Omega ({{n^2 } / {\log n}})$ on a powerful model of computation, when the alphabet is restricted to n symbols.
Abstract: Given a pattern string of length n and an object string of length m, the string matching problem asks for the positions of all occurrences of the pattern in the object string. This paper investigates a generalization of string matching, in which the pattern is a sequence of pattern elements, each compatible with a set of symbols. The alphabet of symbols is infinite, with its members encoded in a finite alphabet. In contrast to standard string matching, which can be solved in simultaneous linear time and constant space, it is shown that generalized string matching requires a time-space product of $\Omega ({{n^2 } / {\log n}})$ on a powerful model of computation, when the alphabet is restricted to n symbols. Our proof uses a method of Borodin. The obvious algorithm for generalized string matching requires time $O(NM)$, where N is the length of the encoding of the pattern, and M is that of the object string. We describe an algorithm which solves generalized string matching in time $O(N + M + mN^{{1 / 2}} {\o...

351 citations

Journal ArticleDOI
TL;DR: The set of allowable edit operations is extended to include the operation of interchanging the positions of two adjacent characters under certain restrictions on edit-operation costs, and it is shown that the extended problem can still be solved in time proportional to the product of the lengths of the given strings.
Abstract: The string-to-string correction problem asks for a sequence S of "edit operations" of minimal cost such that ~(A) = B, for given strings A and B The edit operations previously investi- gated allow changing one symbol of a string into another single symbol, deleting one symbol from a string, or inserting a single symbol into a string This paper extends the set of allowable edit opera- tions to include the operation of interchanging the positions of two adjacent characters Under certain restrictions on edit-operation costs, it is shown that the extended problem can still be solved in time proportional to the product of the lengths of the given strings

350 citations

Proceedings ArticleDOI
17 Jul 2006
TL;DR: A novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string that significantly outperforms Pharaoh, a state-of-the-art decoder for phrase-based models.
Abstract: We present a novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string. A TAT is capable of generating both terminals and non-terminals and performing reordering at both low and high levels. The model is linguistically syntax-based because TATs are extracted automatically from word-aligned, source side parsed parallel texts. To translate a source sentence, we first employ a parser to produce a source parse tree and then apply TATs to transform the tree into a target string. Our experiments show that the TAT-based model significantly outperforms Pharaoh, a state-of-the-art decoder for phrase-based models.

350 citations


Network Information
Related Topics (5)
Time complexity
36K papers, 879.5K citations
88% related
Tree (data structure)
44.9K papers, 749.6K citations
86% related
Graph (abstract data type)
69.9K papers, 1.2M citations
85% related
Computational complexity theory
30.8K papers, 711.2K citations
82% related
Supervised learning
20.8K papers, 710.5K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20222
2021491
2020704
2019759
2018816
2017806