Topic
Phrase
About: Phrase is a research topic. Over the lifetime, 12580 publications have been published within this topic receiving 317823 citations. The topic is also known as: syntagma & phrases.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: Proposed system allows for a more explanatory analysis of GEN-Q assignment and accounts for several distinctions between QP and NP subjects within Russian, also motivating the absence of these distinctions in Serbo-Croatian.
Abstract: Numeral phrases in Russian display many unusual morphosyntactic properties, e.g., (i) the numeral sometimes assigns genitive (GEN-Q) to the following noun and sometimes agrees with it and (ii) the numeral phrase sometimes induces subject-verb agreement and sometimes does not. In this paper existing analyses of these properties are parametrized to accommodate related phenomena in other Slavic languages. First, Babby's (1987) proposal that GEN-Q is structural in Russian is shown not to extend to Serbo-Croatian, where it must be analyzed as inherent. Second, Pesetsky's (1982) idea that Russian numeral phrases may be either QPs or NPs also does not extend to Serbo-Croatian, where these are only NPs. This set of assumptions explains a range of seemingly unrelated facts about the behavior of numeral phrases in the two languages. Pesetsky's analysis is recast in terms of more recent hypotheses about phrase structure: (i) NPs are actually embedded in DPs and (ii) subjects are D-Structure VP-specifiers. Proposal (i) allows for a more explanatory analysis of GEN-Q assignment and proposal (ii) accounts for several distinctions between QP and NP subjects within Russian, also motivating the absence of these distinctions in Serbo-Croatian. Finally, it is shown that Polish can be assimilated to the proposed system.
104 citations
•
08 Sep 2014
TL;DR: This thesis addresses the technical and linguistic aspects of discourse-level processing in phrase-based statistical machine translation (SMT) with a focus on connected texts.
Abstract: This thesis addresses the technical and linguistic aspects of discourse-level processing in phrase-based statistical machine translation (SMT). Connected texts can have complex text-level linguisti ...
104 citations
•
27 Jul 2011TL;DR: This paper presents three kinds of caches to store relevant document-level information: a dynamic cache, which stores bilingual phrase pairs from the best translation hypotheses of previous sentences in the test document; a static cache,which stores relevantilingual phrase pairs extracted from similar bilingual document pairs in the training parallel corpus; and a topic cache,Which stores the target-side topic words related with the test documents in the source-side.
Abstract: Statistical machine translation systems are usually trained on a large amount of bilingual sentence pairs and translate one sentence at a time, ignoring document-level information In this paper, we propose a cache-based approach to document-level translation Since caches mainly depend on relevant data to supervise subsequent decisions, it is critical to fill the caches with highly-relevant data of a reasonable size In this paper, we present three kinds of caches to store relevant document-level information: 1) a dynamic cache, which stores bilingual phrase pairs from the best translation hypotheses of previous sentences in the test document; 2) a static cache, which stores relevant bilingual phrase pairs extracted from similar bilingual document pairs (ie source documents similar to the test document and their corresponding target documents) in the training parallel corpus; 3) a topic cache, which stores the target-side topic words related with the test document in the source-side In particular, three new features are designed to explore various kinds of document-level information in above three kinds of caches Evaluation shows the effectiveness of our cache-based approach to document-level translation with the performance improvement of 081 in BLUE score over Moses Especially, detailed analysis and discussion are presented to give new insights to document-level translation
104 citations
••
TL;DR: The results suggest that readers encode focused information more carefully, either upon first encountering it or during a second-pass reading of it, and that the enhanced memory representations for focused information found in previous studies may be due in part to differences in reading patterns.
Abstract: In two experiments, we explored how readers encode information that is linguistically focused. Subjects read sentences in which a word or phrase was focused by a syntactic manipulation (Experiment 1) or by a preceding context (Experiment 2) while their eye movements were monitored. Readers had longer reading times while reading a region of the sentence that was focused than when the same region was not focused. The results suggest that readers encode focused information more carefully, either upon first encountering it or during a second-pass reading of it. We conclude that the enhanced memory representations for focused information found in previous studies may be due in part to differences in reading patterns for focused information.
104 citations
••
26 Oct 2010TL;DR: This paper provides a quantitative analysis of the language discrepancy issue, and explores the use of clickthrough data to bridge documents and queries, and demonstrates that standard statistical machine translation techniques can be adapted for building a better Web document retrieval system.
Abstract: Web search is challenging partly due to the fact that search queries and Web documents use different language styles and vocabularies. This paper provides a quantitative analysis of the language discrepancy issue, and explores the use of clickthrough data to bridge documents and queries. We assume that a query is parallel to the titles of documents clicked on for that query. Two translation models are trained and integrated into retrieval models: A word-based translation model that learns the translation probability between single words, and a phrase-based translation model that learns the translation probability between multi-term phrases. Experiments are carried out on a real world data set. The results show that the retrieval systems that use the translation models outperform significantly the systems that do not. The paper also demonstrates that standard statistical machine translation techniques such as word alignment, bilingual phrase extraction, and phrase-based decoding, can be adapted for building a better Web document retrieval system.
104 citations