Showing papers on "Phrase published in 2011"

PDF

Open Access

Proceedings Article•

Data-Driven Response Generation in Social Media

[...]

Alan Ritter¹, Colin Cherry², William B. Dolan³•Institutions (3)

University of Washington¹, National Research Council², Microsoft³

27 Jul 2011

TL;DR: It is found that mapping conversational stimuli onto responses is more difficult than translating between languages, due to the wider range of possible responses, the larger fraction of unaligned words/phrases, and the presence of large phrase pairs whose alignment cannot be further decomposed.

...read moreread less

Abstract: We present a data-driven approach to generating responses to Twitter status posts, based on phrase-based Statistical Machine Translation. We find that mapping conversational stimuli onto responses is more difficult than translating between languages, due to the wider range of possible responses, the larger fraction of unaligned words/phrases, and the presence of large phrase pairs whose alignment cannot be further decomposed. After addressing these challenges, we compare approaches based on SMT and Information Retrieval in a human evaluation. We show that SMT outperforms IR on this task, and its output is preferred over actual human responses in 15% of cases. As far as we are aware, this is the first work to investigate the use of phrase-based SMT to directly translate a linguistic stimulus into an appropriate response.

...read moreread less

686 citations

Proceedings Article•DOI•

Recognition using visual phrases

[...]

Mohammad Amin Sadeghi¹, Ali Farhadi¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

20 Jun 2011

TL;DR: It is shown that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects.

...read moreread less

Abstract: In this paper we introduce visual phrases, complex visual composites like “a person riding a horse”. Visual phrases often display significantly reduced visual complexity compared to their component objects, because the appearance of those objects can change profoundly when they participate in relations. We introduce a dataset suitable for phrasal recognition that uses familiar PASCAL object categories, and demonstrate significant experimental gains resulting from exploiting visual phrases. We show that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects. We argue that any multi-class detection system must decode detector outputs to produce final results; this is usually done with non-maximum suppression. We describe a novel decoding procedure that can account accurately for local context without solving difficult inference problems. We show this decoding procedure outperforms the state of the art. Finally, we show that decoding a combination of phrasal and object detectors produces real improvements in detector results.

...read moreread less

518 citations

Proceedings Article•

Simple English Wikipedia: A New Text Simplification Task

[...]

William Coster¹, David Kauchak¹•Institutions (1)

Pomona College¹

19 Jun 2011

TL;DR: A new data set is introduced that pairs English Wikipedia with Simple English Wikipedia and is orders of magnitude larger than any previously examined for sentence simplification and contains the full range of simplification operations including rewording, reordering, insertion and deletion.

...read moreread less

Abstract: In this paper we examine the task of sentence simplification which aims to reduce the reading complexity of a sentence by incorporating more accessible vocabulary and sentence structure. We introduce a new data set that pairs English Wikipedia with Simple English Wikipedia and is orders of magnitude larger than any previously examined for sentence simplification. The data contains the full range of simplification operations including rewording, reordering, insertion and deletion. We provide an analysis of this corpus as well as preliminary results using a phrase-based translation approach for simplification.

...read moreread less

237 citations

Journal Article•DOI•

Simple Composition: A Magnetoencephalography Investigation into the Comprehension of Minimal Linguistic Phrases

[...]

Douglas K. Bemis¹, Liina Pylkkänen•Institutions (1)

New York University¹

23 Feb 2011-The Journal of Neuroscience

TL;DR: This work examines activity in humans generated at the visual presentation of target nouns, such as “boat,” and varied the combinatorial operations induced by its surrounding context, and suggests that these regions play a role in basic syntactic and semantic composition, respectively.

...read moreread less

Abstract: The expressive power of language lies in its ability to construct an infinite array of ideas out of a finite set of pieces. Surprisingly, few neurolinguistic investigations probe the basic processes that constitute the foundation of this ability, choosing instead to focus on relatively complex combinatorial operations. Contrastingly, in the present work, we investigate the neural circuits underlying simple linguistic composition, such as required by the minimal phrase "red boat." Using magnetoencephalography, we examined activity in humans generated at the visual presentation of target nouns, such as "boat," and varied the combinatorial operations induced by its surrounding context. Nouns in minimal compositional contexts ("red boat") were compared with those appearing in matched non-compositional contexts, such as after an unpronounceable consonant string ("xkq boat") or within a list ("cup, boat"). Source analysis did not implicate traditional language areas (inferior frontal gyrus, posterior temporal regions) in such basic composition. Instead, we found increased combinatorial-related activity in the left anterior temporal lobe (LATL) and ventromedial prefrontal cortex (vmPFC). These regions have been linked previously to syntactic (LATL) and semantic (vmPFC) combinatorial processing in more complex linguistic contexts. Thus, we suggest that these regions play a role in basic syntactic and semantic composition, respectively. Importantly, the temporal ordering of the effects, in which LATL activity (∼225 ms) precedes vmPFC activity (∼400 ms), is consistent with many processing models that posit syntactic composition before semantic composition during the construction of linguistic representations.

...read moreread less

235 citations

Journal Article•DOI•

The Learnability of Abstract Syntactic Principles.

[...]

Amy Perfors¹, Joshua B. Tenenbaum², Terry Regier³•Institutions (3)

University of Adelaide¹, Massachusetts Institute of Technology², University of California, Berkeley³

01 Mar 2011-Cognition

TL;DR: A Bayesian framework for grammar induction is used to address a version of this argument and shows that, given typical child-directed speech and certain innate domain-general capacities, an ideal learner could recognize the hierarchical phrase structure of language without having this knowledge innately specified as part of the language faculty.

...read moreread less

187 citations

Journal Article•DOI•

Insensitivity of the Human Sentence-Processing System to Hierarchical Structure

[...]

Stefan L. Frank¹, Rens Bod¹•Institutions (1)

University of Amsterdam¹

17 May 2011-Psychological Science

TL;DR: This investigation of the role of hierarchical structure in sentence processing by implementing a range of probabilistic language models suggested that a sentence’s hierarchical structure, unlike many other sources of information, does not noticeably affect the generation of expectations about upcoming words.

...read moreread less

Abstract: Although it is generally accepted that hierarchical phrase structures are instrumental in describing human language, their role in cognitive processing is still debated. We investigated the role of hierarchical structure in sentence processing by implementing a range of probabilistic language models, some of which depended on hierarchical structure, and others of which relied on sequential structure only. All models estimated the occurrence probabilities of syntactic categories in sentences for which reading-time data were available. Relating the models' probability estimates to the data showed that the hierarchical-structure models did not account for variance in reading times over and above the amount of variance accounted for by all of the sequential-structure models. This suggests that a sentence's hierarchical structure, unlike many other sources of information, does not noticeably affect the generation of expectations about upcoming words.

...read moreread less

186 citations

Journal Article•DOI•

Seeing a phrase “time and again” matters: the role of phrasal frequency in the processing of multiword sequences

[...]

Anna Siyanova-Chanturia¹, Kathy Conklin², Walter J. B. van Heuven²•Institutions (2)

University of Modena and Reggio Emilia¹, University of Nottingham²

01 May 2011-Journal of Experimental Psychology: Learning, Memory and Cognition

TL;DR: The authors found that both native and non-native speakers are sensitive to whether a phrase occurs in a particular configuration (binomial vs. reversed) in English, highlighting the contribution of entrenchment of a particular phrase in memory.

...read moreread less

Abstract: Are speakers sensitive to the frequency with which phrases occur in language? The authors report an eye-tracking study that investigates this by examining the processing of multiword sequences that differ in phrasal frequency by native and proficient nonnative English speakers. Participants read sentences containing 3-word binomial phrases (bride and groom) and their reversed forms (groom and bride), which are identical in syntax and meaning but that differ in phrasal frequency. Mixed-effects modeling revealed that native speakers and nonnative speakers, across a range of proficiencies, are sensitive to the frequency with which phrases occur in English. Results also indicate that native speakers and higher proficiency nonnatives are sensitive to whether a phrase occurs in a particular configuration (binomial vs. reversed) in English, highlighting the contribution of entrenchment of a particular phrase in memory.

...read moreread less

186 citations

Journal Article•DOI•

Prosody guides the rapid mapping of auditory word forms onto visual objects in 6-mo-old infants

[...]

Mohinish Shukla¹, Katherine S. White, Richard N. Aslin•Institutions (1)

University of Rochester¹

12 Apr 2011-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is found that 6-mo-old infants can simultaneously segment a nonce auditory word form from prosodically organized continuous speech and associate it to a visual referent, suggesting that learning is enhanced when the language input is well matched to the learner's expectations.

...read moreread less

Abstract: Human infants are predisposed to rapidly acquire their native language. The nature of these predispositions is poorly understood, but is crucial to our understanding of how infants unpack their speech input to recover the fundamental word-like units, assign them referential roles, and acquire the rules that govern their organization. Previous researchers have demonstrated the role of general distributional computations in prelinguistic infants’ parsing of continuous speech. We extend these findings to more naturalistic conditions, and find that 6-mo-old infants can simultaneously segment a nonce auditory word form from prosodically organized continuous speech and associate it to a visual referent. Crucially, however, this mapping occurs only when the word form is aligned with a prosodic phrase boundary. Our findings suggest that infants are predisposed very early in life to hypothesize that words are aligned with prosodic phrase boundaries, thus facilitating the word learning process. Further, and somewhat paradoxically, we observed successful learning in a more complex context than previously studied, suggesting that learning is enhanced when the language input is well matched to the learner's expectations.

...read moreread less

180 citations

RESEARCH REPORT Seeing a Phrase "Time and Again" Matters: The Role of Phrasal Frequency in the Processing of Multiword Sequences

[...]

Anna Siyanova-Chanturia, Kathy Conklin, Walter J. B. van Heuven

01 Jan 2011

TL;DR: Results indicate that native speakers and higher proficiency nonnatives are sensitive to whether a phrase occurs in a particular configuration (binomial vs. reversed) in English, highlighting the contribution of entrenchment of a particular phrase in memory.

...read moreread less

172 citations

Proceedings Article•

Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives

[...]

Guangyou Zhou¹, Li Cai¹, Jun Zhao¹, Kang Liu¹•Institutions (1)

Chinese Academy of Sciences¹

19 Jun 2011

TL;DR: Experiments demonstrate that the proposed phrase-based translation model significantly outperforms the state-of-the-art word-basedtranslation model for question retrieval.

...read moreread less

Abstract: Community-based question answer (Q&A) has become an important issue due to the popularity of Q&A archives on the web. This paper is concerned with the problem of question retrieval. Question retrieval in Q&A archives aims to find historical questions that are semantically equivalent or relevant to the queried questions. In this paper, we propose a novel phrase-based translation model for question retrieval. Compared to the traditional word-based translation models, the phrase-based translation model is more effective because it captures contextual information in modeling the translation of phrases as a whole, rather than translating single words in isolation. Experiments conducted on real Q&A data demonstrate that our proposed phrase-based translation model significantly outperforms the state-of-the-art word-based translation model.

...read moreread less

159 citations

Proceedings Article•

Learning to Simplify Sentences Using Wikipedia

[...]

William Coster¹, David Kauchak¹•Institutions (1)

Pomona College¹

24 Jun 2011

TL;DR: A new translation model for text simplification is introduced that extends a phrase-based machine translation approach to include phrasal deletion in a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia.

...read moreread less

Abstract: In this paper we examine the sentence simplification problem as an English-to-English translation problem, utilizing a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia. This data set contains the full range of transformation operations including rewording, reordering, insertion and deletion. We introduce a new translation model for text simplification that extends a phrase-based machine translation approach to include phrasal deletion. Evaluated based on three metrics that compare against a human reference (BLEU, word-F1 and SSA) our new approach performs significantly better than two text compression techniques (including T3) and the phrase-based translation system without deletion.

...read moreread less

Proceedings Article•

Domain Adaptation for Machine Translation by Mining Unseen Words

[...]

Hal Daumé¹, Jagadeesh Jagarlamudi¹•Institutions (1)

University of Maryland, College Park¹

19 Jun 2011

TL;DR: It is shown that unseen words account for a large part of the translation error when moving to new domains and several approaches to integrating such translations into a phrase-based translation system are shown, yielding consistent improvements in translations quality.

...read moreread less

Abstract: We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrase-based translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs.

...read moreread less

Proceedings Article•

An Empirical Study on Compositionality in Compound Nouns

[...]

Siva Reddy¹, Diana McCarthy², Suresh Manandhar¹•Institutions (2)

University of York¹, University of Cambridge²

01 Nov 2011

TL;DR: This paper collects and analyse the compositionality judgments for a range of compound nouns using Mechanical Turk, and evaluates two different types of distributional models for compositionality detection – constituent based models and composition function based models.

...read moreread less

Abstract: A multiword is compositional if its meaning can be expressed in terms of the meaning of its constituents. In this paper, we collect and analyse the compositionality judgments for a range of compound nouns using Mechanical Turk. Unlike existing compositionality datasets, our dataset has judgments on the contribution of constituent words as well as judgments for the phrase as a whole. We use this dataset to study the relation between the judgments at constituent level to that for the whole phrase. We then evaluate two different types of distributional models for compositionality detection – constituent based models and composition function based models. Both the models show competitive performance though the composition function based models perform slightly better. In both types, additive models perform better than their multiplicative counterparts.

...read moreread less

Journal Article•DOI•

ChemicalTagger: A tool for semantic text-mining in chemistry

[...]

Lezan Hawizy¹, David M Jessop¹, Nico Adams², Peter Murray-Rust¹•Institutions (2)

University of Cambridge¹, European Bioinformatics Institute²

16 May 2011-Journal of Cheminformatics

TL;DR: The ChemicalTagger parser is developed as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments and it is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser.

...read moreread less

Abstract: The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision.

...read moreread less

Proceedings Article•DOI•

A versatile dataset for text entry evaluations based on genuine mobile emails

[...]

Keith Vertanen¹, Per Ola Kristensson²•Institutions (2)

Princeton University¹, University of Cambridge²

30 Aug 2011

TL;DR: This paper presents a collection of mobile email sentences written by actual users on actual mobile devices, obtained from emails written by Enron employees on their BlackBerry mobile devices to construct a series of phrase sets for text entry evaluations.

...read moreread less

Abstract: Mobile text entry methods are typically evaluated by having study participants copy phrases. However, currently there is no available phrase set that has been composed by mobile users. Instead researchers have resorted to using invented phrases that probably suffer from low external validity. Further, there is no available phrase set whose phrases have been verified to be memorable. In this paper we present a collection of mobile email sentences written by actual users on actual mobile devices. We obtained our sentences from emails written by Enron employees on their BlackBerry mobile devices. We provide empirical data on how easy the sentences were to remember and how quickly and accurately users could type these sentences on a full-sized keyboard. Using this empirical data, we construct a series of phrase sets we suggest for use in text entry evaluations.

...read moreread less

Patent•

Routing queries based on carrier phrase registration

[...]

Michael J. Lebeau¹, John Nicholas Jitkoff¹, William J. Byrne¹•Institutions (1)

Google¹

30 Sep 2011

Proceedings Article•

A Joint Sequence Translation Model with Integrated Reordering

[...]

Nadir Durrani¹, Helmut Schmid¹, Alexander Fraser¹•Institutions (1)

University of Stuttgart¹

19 Jun 2011

TL;DR: A novel machine translation model which models translation by a linear sequence of operations which includes not only translation but also reordering operations, and a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT.

...read moreread less

Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the "N-gram" model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance re-orderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.

...read moreread less

Journal Article•DOI•

Affect analysis model: Novel rule-based approach to affect sensing from text

[...]

Alena Neviarouskaya¹, Helmut Prendinger², Mitsuru Ishizuka¹•Institutions (2)

University of Tokyo¹, National Institute of Informatics²

01 Jan 2011-Natural Language Engineering

TL;DR: The evaluation of the Affect Analysis Model algorithm showed promising results regarding its capability to accurately recognize fine-grained emotions reflected in sentences from diary-like blog posts, fairy tales and news headlines, and the algorithm outperformed eight other systems on several measures.

...read moreread less

Abstract: In this paper, we address the tasks of recognition and interpretation of affect communicated through text messaging in online communication environments. Specifically, we focus on Instant Messaging (IM) or blogs, where people use an informal or garbled style of writing. We introduced a novel rule-based linguistic approach for affect recognition from text. Our Affect Analysis Model (AAM) was designed to deal with not only grammatically and syntactically correct textual input, but also informal messages written in an abbreviated or expressive manner. The proposed rule-based approach processes each sentence in stages, including symbolic cue processing, detection and transformation of abbreviations, sentence parsing and word/phrase/sentence-level analyses. Our method is capable of processing sentences of different complexity, including simple, compound, complex (with complement and relative clauses) and complex–compound sentences. Affect in text is classified into nine emotion categories (or neutral). The strength of the resulting emotional state depends on vectors of emotional words, relations among them, tense of the analysed sentence and availability of first person pronouns. The evaluation of the Affect Analysis Model algorithm showed promising results regarding its capability to accurately recognize fine-grained emotions reflected in sentences from diary-like blog posts (averaged accuracy is up to 77 per cent), fairy tales (averaged accuracy is up to 70.2 per cent) and news headlines (our algorithm outperformed eight other systems on several measures).

...read moreread less

Machine Translation with Significant Word Reordering and Rich Target-Side Morphology

[...]

Bushra Jawaid

01 Jan 2011

TL;DR: This paper describes the integration of morpho-syntactic information in phrase-based and syntax-based Machine Translation systems and proposes further proposed enhancements for dealing with the above mentioned issues.

...read moreread less

Abstract: This paper describes the integration of morpho-syntactic information in phrase-based and syntax-based Machine Translation systems. We mainly focus on translating in the hard direction which is translating from morphologically poor to morphologically richer languages and also between language pairs that have significant word order dierences. We intend to use hierarchical or surface syntactic models for languages of large vocabulary size and improve the translation quality using two-step approach (Fraser, 2009). The two-step scheme basically reduces the complexity of hypothesis construction and selection by separating the task of source-to-target reordering from the task of generating fully inflected target-side word forms. In the first step, reordering is performed on the source data to make it structurally similar to the target language and in the second step, lemmatized target words are mapped to fully inflected target words. We will first introduce the reader to the detailed architecture of the two-step translation setup and later its further proposed enhancements for dealing with the above mentioned issues. We plan to conduct experiments for two language pairs: English-Urdu and English-Czech.

...read moreread less

Fill-up versus Interpolation Methods for Phrase-based SMT Adaptation

[...]

Arianna Bisazza, Nicholas Ruiz, Marcello Federico

09 Dec 2011

TL;DR: This paper compares techniques to combine diverse parallel corpora for domain-specific phrase-based SMT system training and focuses on phrase table fill-up: a method that effectively exploits background knowledge to improve model coverage, while preserving the more reliable information coming from the in-domain corpus.

...read moreread less

Abstract: This paper compares techniques to combine diverse parallel corpora for domain-specific phrase-based SMT system training We address a common scenario where little in-domain data is available for the task, but where large background models exist for the same language pair In particular, we focus on phrase table fill-up: a method that effectively exploits background knowledge to improve model coverage, while preserving the more reliable information coming from the in-domain corpus We present experiments on an emerging transcribed speech translation task – the TED talks While performing similarly in terms of BLEU and NIST scores to the popular log-linear and linear interpolation techniques, filled-up translation models are more compact and easy to tune by minimum error training

...read moreread less

Journal Article•DOI•

Illusory transformation from speech to song.

[...]

Diana Deutsch¹, Trevor Henthorn, Rachael Lapidis•Institutions (1)

University of California, San Diego¹

08 Apr 2011-Journal of the Acoustical Society of America

TL;DR: An illusion is explored in which a spoken phrase is perceptually transformed to sound like song rather than speech, simply by repeating it several times over.

...read moreread less

Abstract: An illusion is explored in which a spoken phrase is perceptually transformed to sound like song rather than speech, simply by repeating it several times over. In experiment I, subjects listened to ten presentations of the phrase and judged how it sounded on a five-point scale with endpoints marked “exactly like speech” and “exactly like singing.” The initial and final presentations of the phrase were identical. When the intervening presentations were also identical, judgments moved solidly from speech to song. However, this did not occur when the intervening phrases were transposed slightly or when the syllables were presented in jumbled orderings. In experiment II, the phrase was presented either once or ten times, and subjects repeated it back as they finally heard it. Following one presentation, the subjects repeated the phrase back as speech; however, following ten presentations they repeated it back as song. The pitch values of the subjects’ renditions following ten presentations were closer to those of the original spoken phrase than were the pitch values following a single presentation. Furthermore, the renditions following ten presentations were even closer to a hypothesized representation in terms of a simple tonal melody than they were to the original spoken phrase.

...read moreread less

Proceedings Article•

Cache-based Document-level Statistical Machine Translation

[...]

Zhengxian Gong¹, Min Zhang², Guodong Zhou¹•Institutions (2)

Soochow University (Suzhou)¹, Institute for Infocomm Research Singapore²

27 Jul 2011

TL;DR: This paper presents three kinds of caches to store relevant document-level information: a dynamic cache, which stores bilingual phrase pairs from the best translation hypotheses of previous sentences in the test document; a static cache,which stores relevantilingual phrase pairs extracted from similar bilingual document pairs in the training parallel corpus; and a topic cache,Which stores the target-side topic words related with the test documents in the source-side.

...read moreread less

Abstract: Statistical machine translation systems are usually trained on a large amount of bilingual sentence pairs and translate one sentence at a time, ignoring document-level information In this paper, we propose a cache-based approach to document-level translation Since caches mainly depend on relevant data to supervise subsequent decisions, it is critical to fill the caches with highly-relevant data of a reasonable size In this paper, we present three kinds of caches to store relevant document-level information: 1) a dynamic cache, which stores bilingual phrase pairs from the best translation hypotheses of previous sentences in the test document; 2) a static cache, which stores relevant bilingual phrase pairs extracted from similar bilingual document pairs (ie source documents similar to the test document and their corresponding target documents) in the training parallel corpus; 3) a topic cache, which stores the target-side topic words related with the test document in the source-side In particular, three new features are designed to explore various kinds of document-level information in above three kinds of caches Evaluation shows the effectiveness of our cache-based approach to document-level translation with the performance improvement of 081 in BLUE score over Moses Especially, detailed analysis and discussion are presented to give new insights to document-level translation

...read moreread less

Patent•

Method and system for providing representative phrase

[...]

Jae Seung Shin¹, Young Sub Park¹, Jae Keol Choi¹, Won Sook Noh¹•Institutions (1)

Naver Corporation¹

05 Jul 2011

TL;DR: In this article, a method and system for providing a representative phrase corresponding to a real-time (current time) popular keyword is presented. But it is not shown on a web page, or the like.

...read moreread less

Abstract: A method and system for providing a representative phrase corresponding to a real time (current time) popular keyword. The method and system may extend a representative criterion word, determined by analyzing morphemes of words in documents grouped into a cluster, and may combine the extended representative criterion word and the popular keyword, thereby providing the representative phrases. The method and system may display the popular keyword and the representative phrases on a web page, or the like.

...read moreread less

Proceedings Article•

An Unsupervised Model for Joint Phrase Alignment and Extraction

[...]

Graham Neubig¹, Taro Watanabe², Eiichiro Sumita², Shinsuke Mori¹, Tatsuya Kawahara¹ - Show less +1 more•Institutions (2)

Kyoto University¹, National Institute of Information and Communications Technology²

19 Jun 2011

TL;DR: An unsupervised model for joint phrase alignment and extraction using non-parametric Bayesian methods and inversion transduction grammars (ITGs) is presented, which matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.

...read moreread less

Abstract: We present an unsupervised model for joint phrase alignment and extraction using non-parametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.

...read moreread less

Patent•

Identifying a synonym with N-gram agreement for a query phrase

[...]

Steven D. Baker¹, John Lamping¹•Institutions (1)

Google¹

17 Mar 2011

TL;DR: In this article, the authors present a system that identifies a synonym with N-gram agreement for a query phrase, which is then used to improve synonym mappings for query terms and phrases.

...read moreread less

Abstract: One embodiment of the present invention provides a system that identifies a synonym with N-gram agreement for a query phrase. During operation, the system receives a candidate synonym for the query phrase. Then, for each term in the query phrase, the system determines whether the term is a lexical synonym of a corresponding term in the candidate synonym or the term shares meaning with the corresponding term in the candidate synonym. If this is true for all terms in the query phrase, the system identifies the candidate synonym as an N-gram agreement synonym for the query phrase. The system then uses this identified N-gram agreement synonym to improve synonym mappings for query terms and/or query phrases.

...read moreread less

Proceedings Article•

Building a Multilingual Named Entity-Annotated Corpus Using Annotation Projection

[...]

Maud Ehrmann¹, Marco Turchi², Ralf Steinberger³•Institutions (3)

Sapienza University of Rome¹, University of Bristol², International Practical Shooting Confederation³

01 Sep 2011

TL;DR: This work automatically annotates the English version of a multi-parallel corpus and projects the annotations into all the other language versions, and uses a phrase-based statistical machine translation system as well as a lookup of known names from a multilingual name database for the translation of English entities.

...read moreread less

Abstract: As developers of a highly multilingual named entity recognition (NER) system, we face an evaluation resource bottleneck problem: we need evaluation data in many languages, the annotation should not be too time-consuming, and the evaluation results across languages should be comparable. We solve the problem by automatically annotating the English version of a multi-parallel corpus and by projecting the annotations into all the other language versions. For the translation of English entities, we use a phrase-based statistical machine translation system as well as a lookup of known names from a multilingual name database. For the projection, we incrementally apply different methods: perfect string matching, perfect consonant signature matching and edit distance similarity. The resulting annotated parallel corpus will be made available for reuse.

...read moreread less

Journal Article•DOI•

Hierarchy and scope of planning in subject-verb agreement production.

[...]

Maureen Gillespie¹, Neal J. Pearlmutter¹•Institutions (1)

Northeastern University¹

01 Mar 2011-Cognition

TL;DR: Two subject-verb agreement error elicitation studies indicate that agreement processes are strongly constrained by grammatical-level scope of planning, with local nouns planned closer to the head having a greater chance of interfering with agreement computation.

...read moreread less

Patent•

System and methods thereof for real-time monitoring of a sentiment trend with respect of a desired phrase

[...]

Amit Avner, Omer Dror, Itay Birnboim

20 Sep 2011

TL;DR: In this paper, a method for real-time monitoring of changes in a sentiment respective of an input non-sentiment phrase was proposed, where the data storage contains a plurality of phrases.

...read moreread less

Abstract: A method for real-time monitoring of changes in a sentiment respective of an input non-sentiment phrase. The method comprises receiving the input non-sentiment phrase and at least one tendency parameter respective of the input non-sentiment phrase; identifying in a data storage at least one of a term taxonomy that includes the input non-sentiment phrase, wherein the data storage contains a plurality of phrases including sentiment phrases, non-sentiment phrases, and a plurality of term taxonomies; computing a sentiment trend for the at least one term taxonomy; monitoring the sentiment trend to detect real-time changes in a direction of the sentiment trend with respect to the at least one tendency parameter; and generating at least a notification when a change in the direction of the sentiment trend with respect to the input tendency parameter has occurred.

...read moreread less

Proceedings Article•

A novel dependency-to-string model for statistical machine translation

[...]

Jun Xie, Haitao Mi, Qun Liu

27 Jul 2011

TL;DR: A source dependency structure based model that requires no heuristics or separate ordering models of the previous works to control the word order of translations and performs well on long distance reordering.

...read moreread less

Abstract: Dependency structure, as a first step towards semantics, is believed to be helpful to improve translation quality. However, previous works on dependency structure based models typically resort to insertion operations to complete translations, which make it difficult to specify ordering information in translation rules. In our model of this paper, we handle this problem by directly specifying the ordering information in head-dependents rules which represent the source side as head-dependents relations and the target side as strings. The head-dependents rules require only substitution operation, thus our model requires no heuristics or separate ordering models of the previous works to control the word order of translations. Large-scale experiments show that our model performs well on long distance reordering, and outperforms the state-of-the-art constituency-to-string model (+1.47 BLEU on average) and hierarchical phrase-based model (+0.46 BLEU on average) on two Chinese-English NIST test sets without resort to phrases or parse forest. For the first time, a source dependency structure based model catches up with and surpasses the state-of-the-art translation models.

...read moreread less

Proceedings Article•

Inducing Sentence Structure from Parallel Corpora for Reordering

[...]

John DeNero¹, Jakob Uszkoreit¹•Institutions (1)

Google¹

27 Jul 2011

TL;DR: This paper presents a method for inducing parse trees automatically from a parallel corpus, instead of using a supervised parser trained on a tree-bank, showing that the syntactic structure which is relevant to MT pre-ordering can be learned automatically from parallel text, thus establishing a new application for unsupervised grammar induction.

...read moreread less

Abstract: When translating among languages that differ substantially in word order, machine translation (MT) systems benefit from syntactic pre-ordering---an approach that uses features from a syntactic parse to permute source words into a target-language-like order. This paper presents a method for inducing parse trees automatically from a parallel corpus, instead of using a supervised parser trained on a tree-bank. These induced parses are used to pre-order source sentences. We demonstrate that our induced parser is effective: it not only improves a state-of-the-art phrase-based system with integrated reordering, but also approaches the performance of a recent pre-ordering method based on a supervised parser. These results show that the syntactic structure which is relevant to MT pre-ordering can be learned automatically from parallel text, thus establishing a new application for unsupervised grammar induction.

...read moreread less

Collapse