scispace - formally typeset
Search or ask a question

Showing papers on "Phrase published in 2011"


Proceedings Article
27 Jul 2011
TL;DR: It is found that mapping conversational stimuli onto responses is more difficult than translating between languages, due to the wider range of possible responses, the larger fraction of unaligned words/phrases, and the presence of large phrase pairs whose alignment cannot be further decomposed.
Abstract: We present a data-driven approach to generating responses to Twitter status posts, based on phrase-based Statistical Machine Translation. We find that mapping conversational stimuli onto responses is more difficult than translating between languages, due to the wider range of possible responses, the larger fraction of unaligned words/phrases, and the presence of large phrase pairs whose alignment cannot be further decomposed. After addressing these challenges, we compare approaches based on SMT and Information Retrieval in a human evaluation. We show that SMT outperforms IR on this task, and its output is preferred over actual human responses in 15% of cases. As far as we are aware, this is the first work to investigate the use of phrase-based SMT to directly translate a linguistic stimulus into an appropriate response.

686 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: It is shown that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects.
Abstract: In this paper we introduce visual phrases, complex visual composites like “a person riding a horse”. Visual phrases often display significantly reduced visual complexity compared to their component objects, because the appearance of those objects can change profoundly when they participate in relations. We introduce a dataset suitable for phrasal recognition that uses familiar PASCAL object categories, and demonstrate significant experimental gains resulting from exploiting visual phrases. We show that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects. We argue that any multi-class detection system must decode detector outputs to produce final results; this is usually done with non-maximum suppression. We describe a novel decoding procedure that can account accurately for local context without solving difficult inference problems. We show this decoding procedure outperforms the state of the art. Finally, we show that decoding a combination of phrasal and object detectors produces real improvements in detector results.

518 citations


Proceedings Article
19 Jun 2011
TL;DR: A new data set is introduced that pairs English Wikipedia with Simple English Wikipedia and is orders of magnitude larger than any previously examined for sentence simplification and contains the full range of simplification operations including rewording, reordering, insertion and deletion.
Abstract: In this paper we examine the task of sentence simplification which aims to reduce the reading complexity of a sentence by incorporating more accessible vocabulary and sentence structure. We introduce a new data set that pairs English Wikipedia with Simple English Wikipedia and is orders of magnitude larger than any previously examined for sentence simplification. The data contains the full range of simplification operations including rewording, reordering, insertion and deletion. We provide an analysis of this corpus as well as preliminary results using a phrase-based translation approach for simplification.

237 citations


Journal ArticleDOI
TL;DR: This work examines activity in humans generated at the visual presentation of target nouns, such as “boat,” and varied the combinatorial operations induced by its surrounding context, and suggests that these regions play a role in basic syntactic and semantic composition, respectively.
Abstract: The expressive power of language lies in its ability to construct an infinite array of ideas out of a finite set of pieces. Surprisingly, few neurolinguistic investigations probe the basic processes that constitute the foundation of this ability, choosing instead to focus on relatively complex combinatorial operations. Contrastingly, in the present work, we investigate the neural circuits underlying simple linguistic composition, such as required by the minimal phrase "red boat." Using magnetoencephalography, we examined activity in humans generated at the visual presentation of target nouns, such as "boat," and varied the combinatorial operations induced by its surrounding context. Nouns in minimal compositional contexts ("red boat") were compared with those appearing in matched non-compositional contexts, such as after an unpronounceable consonant string ("xkq boat") or within a list ("cup, boat"). Source analysis did not implicate traditional language areas (inferior frontal gyrus, posterior temporal regions) in such basic composition. Instead, we found increased combinatorial-related activity in the left anterior temporal lobe (LATL) and ventromedial prefrontal cortex (vmPFC). These regions have been linked previously to syntactic (LATL) and semantic (vmPFC) combinatorial processing in more complex linguistic contexts. Thus, we suggest that these regions play a role in basic syntactic and semantic composition, respectively. Importantly, the temporal ordering of the effects, in which LATL activity (∼225 ms) precedes vmPFC activity (∼400 ms), is consistent with many processing models that posit syntactic composition before semantic composition during the construction of linguistic representations.

235 citations


Journal ArticleDOI
TL;DR: A Bayesian framework for grammar induction is used to address a version of this argument and shows that, given typical child-directed speech and certain innate domain-general capacities, an ideal learner could recognize the hierarchical phrase structure of language without having this knowledge innately specified as part of the language faculty.

187 citations


Journal ArticleDOI
TL;DR: This investigation of the role of hierarchical structure in sentence processing by implementing a range of probabilistic language models suggested that a sentence’s hierarchical structure, unlike many other sources of information, does not noticeably affect the generation of expectations about upcoming words.
Abstract: Although it is generally accepted that hierarchical phrase structures are instrumental in describing human language, their role in cognitive processing is still debated. We investigated the role of hierarchical structure in sentence processing by implementing a range of probabilistic language models, some of which depended on hierarchical structure, and others of which relied on sequential structure only. All models estimated the occurrence probabilities of syntactic categories in sentences for which reading-time data were available. Relating the models' probability estimates to the data showed that the hierarchical-structure models did not account for variance in reading times over and above the amount of variance accounted for by all of the sequential-structure models. This suggests that a sentence's hierarchical structure, unlike many other sources of information, does not noticeably affect the generation of expectations about upcoming words.

186 citations


Journal ArticleDOI
TL;DR: The authors found that both native and non-native speakers are sensitive to whether a phrase occurs in a particular configuration (binomial vs. reversed) in English, highlighting the contribution of entrenchment of a particular phrase in memory.
Abstract: Are speakers sensitive to the frequency with which phrases occur in language? The authors report an eye-tracking study that investigates this by examining the processing of multiword sequences that differ in phrasal frequency by native and proficient nonnative English speakers. Participants read sentences containing 3-word binomial phrases (bride and groom) and their reversed forms (groom and bride), which are identical in syntax and meaning but that differ in phrasal frequency. Mixed-effects modeling revealed that native speakers and nonnative speakers, across a range of proficiencies, are sensitive to the frequency with which phrases occur in English. Results also indicate that native speakers and higher proficiency nonnatives are sensitive to whether a phrase occurs in a particular configuration (binomial vs. reversed) in English, highlighting the contribution of entrenchment of a particular phrase in memory.

186 citations


Journal ArticleDOI
TL;DR: It is found that 6-mo-old infants can simultaneously segment a nonce auditory word form from prosodically organized continuous speech and associate it to a visual referent, suggesting that learning is enhanced when the language input is well matched to the learner's expectations.
Abstract: Human infants are predisposed to rapidly acquire their native language. The nature of these predispositions is poorly understood, but is crucial to our understanding of how infants unpack their speech input to recover the fundamental word-like units, assign them referential roles, and acquire the rules that govern their organization. Previous researchers have demonstrated the role of general distributional computations in prelinguistic infants’ parsing of continuous speech. We extend these findings to more naturalistic conditions, and find that 6-mo-old infants can simultaneously segment a nonce auditory word form from prosodically organized continuous speech and associate it to a visual referent. Crucially, however, this mapping occurs only when the word form is aligned with a prosodic phrase boundary. Our findings suggest that infants are predisposed very early in life to hypothesize that words are aligned with prosodic phrase boundaries, thus facilitating the word learning process. Further, and somewhat paradoxically, we observed successful learning in a more complex context than previously studied, suggesting that learning is enhanced when the language input is well matched to the learner's expectations.

180 citations


01 Jan 2011
TL;DR: Results indicate that native speakers and higher proficiency nonnatives are sensitive to whether a phrase occurs in a particular configuration (binomial vs. reversed) in English, highlighting the contribution of entrenchment of a particular phrase in memory.
Abstract: Are speakers sensitive to the frequency with which phrases occur in language? The authors report an eye-tracking study that investigates this by examining the processing of multiword sequences that differ in phrasal frequency by native and proficient nonnative English speakers. Participants read sentences containing 3-word binomial phrases (bride and groom) and their reversed forms (groom and bride), which are identical in syntax and meaning but that differ in phrasal frequency. Mixed-effects modeling revealed that native speakers and nonnative speakers, across a range of proficiencies, are sensitive to the frequency with which phrases occur in English. Results also indicate that native speakers and higher proficiency nonnatives are sensitive to whether a phrase occurs in a particular configuration (binomial vs. reversed) in English, highlighting the contribution of entrenchment of a particular phrase in memory.

172 citations


Proceedings Article
19 Jun 2011
TL;DR: Experiments demonstrate that the proposed phrase-based translation model significantly outperforms the state-of-the-art word-basedtranslation model for question retrieval.
Abstract: Community-based question answer (Q&A) has become an important issue due to the popularity of Q&A archives on the web. This paper is concerned with the problem of question retrieval. Question retrieval in Q&A archives aims to find historical questions that are semantically equivalent or relevant to the queried questions. In this paper, we propose a novel phrase-based translation model for question retrieval. Compared to the traditional word-based translation models, the phrase-based translation model is more effective because it captures contextual information in modeling the translation of phrases as a whole, rather than translating single words in isolation. Experiments conducted on real Q&A data demonstrate that our proposed phrase-based translation model significantly outperforms the state-of-the-art word-based translation model.

159 citations


Proceedings Article
24 Jun 2011
TL;DR: A new translation model for text simplification is introduced that extends a phrase-based machine translation approach to include phrasal deletion in a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia.
Abstract: In this paper we examine the sentence simplification problem as an English-to-English translation problem, utilizing a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia. This data set contains the full range of transformation operations including rewording, reordering, insertion and deletion. We introduce a new translation model for text simplification that extends a phrase-based machine translation approach to include phrasal deletion. Evaluated based on three metrics that compare against a human reference (BLEU, word-F1 and SSA) our new approach performs significantly better than two text compression techniques (including T3) and the phrase-based translation system without deletion.

Proceedings Article
19 Jun 2011
TL;DR: It is shown that unseen words account for a large part of the translation error when moving to new domains and several approaches to integrating such translations into a phrase-based translation system are shown, yielding consistent improvements in translations quality.
Abstract: We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrase-based translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs.

Proceedings Article
01 Nov 2011
TL;DR: This paper collects and analyse the compositionality judgments for a range of compound nouns using Mechanical Turk, and evaluates two different types of distributional models for compositionality detection – constituent based models and composition function based models.
Abstract: A multiword is compositional if its meaning can be expressed in terms of the meaning of its constituents. In this paper, we collect and analyse the compositionality judgments for a range of compound nouns using Mechanical Turk. Unlike existing compositionality datasets, our dataset has judgments on the contribution of constituent words as well as judgments for the phrase as a whole. We use this dataset to study the relation between the judgments at constituent level to that for the whole phrase. We then evaluate two different types of distributional models for compositionality detection – constituent based models and composition function based models. Both the models show competitive performance though the composition function based models perform slightly better. In both types, additive models perform better than their multiplicative counterparts.

Journal ArticleDOI
TL;DR: The ChemicalTagger parser is developed as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments and it is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser.
Abstract: The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision.

Proceedings ArticleDOI
30 Aug 2011
TL;DR: This paper presents a collection of mobile email sentences written by actual users on actual mobile devices, obtained from emails written by Enron employees on their BlackBerry mobile devices to construct a series of phrase sets for text entry evaluations.
Abstract: Mobile text entry methods are typically evaluated by having study participants copy phrases. However, currently there is no available phrase set that has been composed by mobile users. Instead researchers have resorted to using invented phrases that probably suffer from low external validity. Further, there is no available phrase set whose phrases have been verified to be memorable. In this paper we present a collection of mobile email sentences written by actual users on actual mobile devices. We obtained our sentences from emails written by Enron employees on their BlackBerry mobile devices. We provide empirical data on how easy the sentences were to remember and how quickly and accurately users could type these sentences on a full-sized keyboard. Using this empirical data, we construct a series of phrase sets we suggest for use in text entry evaluations.


Proceedings Article
19 Jun 2011
TL;DR: A novel machine translation model which models translation by a linear sequence of operations which includes not only translation but also reordering operations, and a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT.
Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the "N-gram" model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance re-orderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.

Journal ArticleDOI
TL;DR: The evaluation of the Affect Analysis Model algorithm showed promising results regarding its capability to accurately recognize fine-grained emotions reflected in sentences from diary-like blog posts, fairy tales and news headlines, and the algorithm outperformed eight other systems on several measures.
Abstract: In this paper, we address the tasks of recognition and interpretation of affect communicated through text messaging in online communication environments. Specifically, we focus on Instant Messaging (IM) or blogs, where people use an informal or garbled style of writing. We introduced a novel rule-based linguistic approach for affect recognition from text. Our Affect Analysis Model (AAM) was designed to deal with not only grammatically and syntactically correct textual input, but also informal messages written in an abbreviated or expressive manner. The proposed rule-based approach processes each sentence in stages, including symbolic cue processing, detection and transformation of abbreviations, sentence parsing and word/phrase/sentence-level analyses. Our method is capable of processing sentences of different complexity, including simple, compound, complex (with complement and relative clauses) and complex–compound sentences. Affect in text is classified into nine emotion categories (or neutral). The strength of the resulting emotional state depends on vectors of emotional words, relations among them, tense of the analysed sentence and availability of first person pronouns. The evaluation of the Affect Analysis Model algorithm showed promising results regarding its capability to accurately recognize fine-grained emotions reflected in sentences from diary-like blog posts (averaged accuracy is up to 77 per cent), fairy tales (averaged accuracy is up to 70.2 per cent) and news headlines (our algorithm outperformed eight other systems on several measures).

01 Jan 2011
TL;DR: This paper describes the integration of morpho-syntactic information in phrase-based and syntax-based Machine Translation systems and proposes further proposed enhancements for dealing with the above mentioned issues.
Abstract: This paper describes the integration of morpho-syntactic information in phrase-based and syntax-based Machine Translation systems. We mainly focus on translating in the hard direction which is translating from morphologically poor to morphologically richer languages and also between language pairs that have significant word order dierences. We intend to use hierarchical or surface syntactic models for languages of large vocabulary size and improve the translation quality using two-step approach (Fraser, 2009). The two-step scheme basically reduces the complexity of hypothesis construction and selection by separating the task of source-to-target reordering from the task of generating fully inflected target-side word forms. In the first step, reordering is performed on the source data to make it structurally similar to the target language and in the second step, lemmatized target words are mapped to fully inflected target words. We will first introduce the reader to the detailed architecture of the two-step translation setup and later its further proposed enhancements for dealing with the above mentioned issues. We plan to conduct experiments for two language pairs: English-Urdu and English-Czech.

09 Dec 2011
TL;DR: This paper compares techniques to combine diverse parallel corpora for domain-specific phrase-based SMT system training and focuses on phrase table fill-up: a method that effectively exploits background knowledge to improve model coverage, while preserving the more reliable information coming from the in-domain corpus.
Abstract: This paper compares techniques to combine diverse parallel corpora for domain-specific phrase-based SMT system training We address a common scenario where little in-domain data is available for the task, but where large background models exist for the same language pair In particular, we focus on phrase table fill-up: a method that effectively exploits background knowledge to improve model coverage, while preserving the more reliable information coming from the in-domain corpus We present experiments on an emerging transcribed speech translation task – the TED talks While performing similarly in terms of BLEU and NIST scores to the popular log-linear and linear interpolation techniques, filled-up translation models are more compact and easy to tune by minimum error training

Journal ArticleDOI
TL;DR: An illusion is explored in which a spoken phrase is perceptually transformed to sound like song rather than speech, simply by repeating it several times over.
Abstract: An illusion is explored in which a spoken phrase is perceptually transformed to sound like song rather than speech, simply by repeating it several times over. In experiment I, subjects listened to ten presentations of the phrase and judged how it sounded on a five-point scale with endpoints marked “exactly like speech” and “exactly like singing.” The initial and final presentations of the phrase were identical. When the intervening presentations were also identical, judgments moved solidly from speech to song. However, this did not occur when the intervening phrases were transposed slightly or when the syllables were presented in jumbled orderings. In experiment II, the phrase was presented either once or ten times, and subjects repeated it back as they finally heard it. Following one presentation, the subjects repeated the phrase back as speech; however, following ten presentations they repeated it back as song. The pitch values of the subjects’ renditions following ten presentations were closer to those of the original spoken phrase than were the pitch values following a single presentation. Furthermore, the renditions following ten presentations were even closer to a hypothesized representation in terms of a simple tonal melody than they were to the original spoken phrase.

Proceedings Article
27 Jul 2011
TL;DR: This paper presents three kinds of caches to store relevant document-level information: a dynamic cache, which stores bilingual phrase pairs from the best translation hypotheses of previous sentences in the test document; a static cache,which stores relevantilingual phrase pairs extracted from similar bilingual document pairs in the training parallel corpus; and a topic cache,Which stores the target-side topic words related with the test documents in the source-side.
Abstract: Statistical machine translation systems are usually trained on a large amount of bilingual sentence pairs and translate one sentence at a time, ignoring document-level information In this paper, we propose a cache-based approach to document-level translation Since caches mainly depend on relevant data to supervise subsequent decisions, it is critical to fill the caches with highly-relevant data of a reasonable size In this paper, we present three kinds of caches to store relevant document-level information: 1) a dynamic cache, which stores bilingual phrase pairs from the best translation hypotheses of previous sentences in the test document; 2) a static cache, which stores relevant bilingual phrase pairs extracted from similar bilingual document pairs (ie source documents similar to the test document and their corresponding target documents) in the training parallel corpus; 3) a topic cache, which stores the target-side topic words related with the test document in the source-side In particular, three new features are designed to explore various kinds of document-level information in above three kinds of caches Evaluation shows the effectiveness of our cache-based approach to document-level translation with the performance improvement of 081 in BLUE score over Moses Especially, detailed analysis and discussion are presented to give new insights to document-level translation

Patent
05 Jul 2011
TL;DR: In this article, a method and system for providing a representative phrase corresponding to a real-time (current time) popular keyword is presented. But it is not shown on a web page, or the like.
Abstract: A method and system for providing a representative phrase corresponding to a real time (current time) popular keyword. The method and system may extend a representative criterion word, determined by analyzing morphemes of words in documents grouped into a cluster, and may combine the extended representative criterion word and the popular keyword, thereby providing the representative phrases. The method and system may display the popular keyword and the representative phrases on a web page, or the like.

Proceedings Article
19 Jun 2011
TL;DR: An unsupervised model for joint phrase alignment and extraction using non-parametric Bayesian methods and inversion transduction grammars (ITGs) is presented, which matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.
Abstract: We present an unsupervised model for joint phrase alignment and extraction using non-parametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.

Patent
Steven D. Baker1, John Lamping1
17 Mar 2011
TL;DR: In this article, the authors present a system that identifies a synonym with N-gram agreement for a query phrase, which is then used to improve synonym mappings for query terms and phrases.
Abstract: One embodiment of the present invention provides a system that identifies a synonym with N-gram agreement for a query phrase. During operation, the system receives a candidate synonym for the query phrase. Then, for each term in the query phrase, the system determines whether the term is a lexical synonym of a corresponding term in the candidate synonym or the term shares meaning with the corresponding term in the candidate synonym. If this is true for all terms in the query phrase, the system identifies the candidate synonym as an N-gram agreement synonym for the query phrase. The system then uses this identified N-gram agreement synonym to improve synonym mappings for query terms and/or query phrases.

Proceedings Article
01 Sep 2011
TL;DR: This work automatically annotates the English version of a multi-parallel corpus and projects the annotations into all the other language versions, and uses a phrase-based statistical machine translation system as well as a lookup of known names from a multilingual name database for the translation of English entities.
Abstract: As developers of a highly multilingual named entity recognition (NER) system, we face an evaluation resource bottleneck problem: we need evaluation data in many languages, the annotation should not be too time-consuming, and the evaluation results across languages should be comparable. We solve the problem by automatically annotating the English version of a multi-parallel corpus and by projecting the annotations into all the other language versions. For the translation of English entities, we use a phrase-based statistical machine translation system as well as a lookup of known names from a multilingual name database. For the projection, we incrementally apply different methods: perfect string matching, perfect consonant signature matching and edit distance similarity. The resulting annotated parallel corpus will be made available for reuse.

Journal ArticleDOI
TL;DR: Two subject-verb agreement error elicitation studies indicate that agreement processes are strongly constrained by grammatical-level scope of planning, with local nouns planned closer to the head having a greater chance of interfering with agreement computation.

Patent
20 Sep 2011
TL;DR: In this paper, a method for real-time monitoring of changes in a sentiment respective of an input non-sentiment phrase was proposed, where the data storage contains a plurality of phrases.
Abstract: A method for real-time monitoring of changes in a sentiment respective of an input non-sentiment phrase. The method comprises receiving the input non-sentiment phrase and at least one tendency parameter respective of the input non-sentiment phrase; identifying in a data storage at least one of a term taxonomy that includes the input non-sentiment phrase, wherein the data storage contains a plurality of phrases including sentiment phrases, non-sentiment phrases, and a plurality of term taxonomies; computing a sentiment trend for the at least one term taxonomy; monitoring the sentiment trend to detect real-time changes in a direction of the sentiment trend with respect to the at least one tendency parameter; and generating at least a notification when a change in the direction of the sentiment trend with respect to the input tendency parameter has occurred.

Proceedings Article
27 Jul 2011
TL;DR: A source dependency structure based model that requires no heuristics or separate ordering models of the previous works to control the word order of translations and performs well on long distance reordering.
Abstract: Dependency structure, as a first step towards semantics, is believed to be helpful to improve translation quality. However, previous works on dependency structure based models typically resort to insertion operations to complete translations, which make it difficult to specify ordering information in translation rules. In our model of this paper, we handle this problem by directly specifying the ordering information in head-dependents rules which represent the source side as head-dependents relations and the target side as strings. The head-dependents rules require only substitution operation, thus our model requires no heuristics or separate ordering models of the previous works to control the word order of translations. Large-scale experiments show that our model performs well on long distance reordering, and outperforms the state-of-the-art constituency-to-string model (+1.47 BLEU on average) and hierarchical phrase-based model (+0.46 BLEU on average) on two Chinese-English NIST test sets without resort to phrases or parse forest. For the first time, a source dependency structure based model catches up with and surpasses the state-of-the-art translation models.

Proceedings Article
John DeNero1, Jakob Uszkoreit1
27 Jul 2011
TL;DR: This paper presents a method for inducing parse trees automatically from a parallel corpus, instead of using a supervised parser trained on a tree-bank, showing that the syntactic structure which is relevant to MT pre-ordering can be learned automatically from parallel text, thus establishing a new application for unsupervised grammar induction.
Abstract: When translating among languages that differ substantially in word order, machine translation (MT) systems benefit from syntactic pre-ordering---an approach that uses features from a syntactic parse to permute source words into a target-language-like order. This paper presents a method for inducing parse trees automatically from a parallel corpus, instead of using a supervised parser trained on a tree-bank. These induced parses are used to pre-order source sentences. We demonstrate that our induced parser is effective: it not only improves a state-of-the-art phrase-based system with integrated reordering, but also approaches the performance of a recent pre-ordering method based on a supervised parser. These results show that the syntactic structure which is relevant to MT pre-ordering can be learned automatically from parallel text, thus establishing a new application for unsupervised grammar induction.