scispace - formally typeset
Search or ask a question

Showing papers on "Phrase published in 2004"


Journal ArticleDOI
16 Apr 2004-Science
TL;DR: Electroencephalogram data are presented that show the rapid parallel integration of both semantic and world knowledge during the interpretation of a sentence and indicate that the brain keeps a record of what makes a sentence hard to interpret.
Abstract: Although the sentences that we hear or read have meaning, this does not necessarily mean that they are also true. Relatively little is known about the critical brain structures for, and the relative time course of, establishing the meaning and truth of linguistic expressions. We present electroencephalogram data that show the rapid parallel integration of both semantic and world knowledge during the interpretation of a sentence. Data from functional magnetic resonance imaging revealed that the left inferior prefrontal cortex is involved in the integration of both meaning and world knowledge. Finally, oscillatory brain responses indicate that the brain keeps a record of what makes a sentence hard to interpret.

1,017 citations


Book ChapterDOI
28 Sep 2004
TL;DR: Pharaoh, a freely available decoder for phrase-based statistical machine translation models is described, which is the implement at ion of an efficient dynamic programming search algorithm with lattice generation and XML markup for external components.
Abstract: We describe Pharaoh, a freely available decoder for phrase-based statistical machine translation models. The decoder is the implement at ion of an efficient dynamic programming search algorithm with lattice generation and XML markup for external components.

722 citations


Book
02 Aug 2004
TL;DR: The Turkish Alphabet and writing conventions as discussed by the authors have been observed in the Turkish language since the early nineties and have been studied extensively in the Middle Ages and the early 1990s.
Abstract: Acknowledgements. Introduction. Abbreviations. List of Conventions Observed in this Book. The Turkish Alphabet and Writing Conventions. Part 1: Phonology: The Sound System 1. Phonological Units 2. Sound Changes in the Stem after Suffixiation 3. Vowel Harmony 4. Word Stress 5. Intonation Part 2. Morphology: The Strucutre of Words 6. Principles of Suffixiation 7. Word Classes and Derivational Suffixes 9. Reduplication 10. Noun Compounds Part 3. Syntax: The Structure of Sentences 12. Simple and Complex Sentences 13. The Verb Phrase 14. The Noun Phrase 15. Adjectival Constructions, Determiners and Numerals 16. Adverbal Constructions 17. The Postpositional Phrase 18. Pronouns and Reference 19. Questions 20. Negation 21. Tense, Aspect and Modality 22. Definiteness, Specificity and Generic Reference 23. Word Order 24. Noun Clauses 25. Relative Clauses 26. Adverbial Clauses 27. Conditional Sentences 28. Conjunctions, Co-ordination and Discourse Connection. Appendix 1. Reduplicated Stems. Appendix 2. Tense/Aspect/Modality Suffixes. Glossary. Bibliography. Index

485 citations


Journal ArticleDOI
TL;DR: Overall, the data indicate that readers anticipate and attend to the gender of both articles and nouns, and use gender in real time to maintain agreement and to build sentence meaning.
Abstract: Recent studies indicate that the human brain attends to and uses grammatical gender cues during sentence comprehension. Here, we examine the nature and time course of the effect of gender on word-by-word sentence reading. Event-related brain potentials were recorded to an article and noun, while native Spanish speakers read medium- to high-constraint Spanish sentences for comprehension. The noun either fit the sentence meaning or not, and matched the preceding article in gender or not; in addition, the preceding article was either expected or unexpected based on prior sentence context. Semantically anomalous nouns elicited an N400. Gender-disagreeing nouns elicited a posterior late positivity (P600), replicating previous findings for words. Gender agreement and semantic congruity interacted in both the N400 window—with a larger negativity frontally for double violations—and the P600 window—with a larger positivity for semantic anomalies, relative to the prestimulus baseline. Finally, unexpected articles elicited an enhanced positivity (500–700 msec post onset) relative to expected articles. Overall, our data indicate that readers anticipate and attend to the gender of both articles and nouns, and use gender in real time to maintain agreement and to build sentence meaning.

427 citations


Proceedings ArticleDOI
14 Apr 2004
TL;DR: The short Computer Architecture News note that coined the phrase "Memory Wall" is reviewed, including the motivation behind the note, the context in which it was written, and the controversy it sparked.
Abstract: This paper looks at the evolution of the "Memory Wall" problem over the past decade. It begins by reviewing the short Computer Architecture News note that coined the phrase, including the motivation behind the note, the context in which it was written, and the controversy it sparked. What has changed over the years? Are we hitting the Memory Wall? And if so, for what types of applications?

366 citations


Book ChapterDOI
26 Feb 2004
TL;DR: The authors make a distinction between utterance-type meaning and utterance token meaning, which is important from a linguistic point of view, and make use of this distinction to distinguish between utterances and tokens.
Abstract: At the outset, I want to make a distinction that is important from a linguistic point of view: a distinction between utterance-type meaning and utterance-token meaning (Levinson, 2000). Any word, phrase, or structure has a general range of possible meanings, what we might call its meaning range. This is its utterance-type meaning. For example, the word “cat” has to do, broadly, with the felines, and the (syntactic) structure “subject of a sentence” has to do, broadly, with naming a “topic” in the sense of “what is being talked about.”

364 citations


Journal ArticleDOI
TL;DR: Readers' eye movements were monitored as they read sentences describing events in which an individual performed an action with an implement, indicating that when a word is anomalous, it has an immediate effect on eye movements, but that the effect of implausibility is not as immediate.
Abstract: Readers’ eye movements were monitored as they read sentences describing events in which an individual performed an action with an implement. The noun phrase arguments of the verbs in the sentences were such that when thematic assignment occurred at the critical target word, the sentence was plausible (likely theme), implausible (unlikely theme), or anomalous (an inappropriate theme). Whereas the target word in the anomalous condition provided evidence of immediate disruption, the effect of the target word in the implausible condition was considerably delayed. The results thus indicate that when a word is anomalous, it has an immediate effect on eye movements, but that the effect of implausibility is not as

335 citations


Proceedings ArticleDOI
Fei Xia1, Michael C. McCord1
23 Aug 2004
TL;DR: This work proposes to use automatically learned rewrite patterns to preprocess the source sentences so that they have a word order similar to that of the target language.
Abstract: Current clump-based statistical MT systems have two limitations with respect to word ordering: First, they lack a mechanism for expressing and using generalization that accounts for reorderings of linguistic phrases. Second, the ordering of target words in such systems does not respect linguistic phrase boundaries. To address these limitations, we propose to use automatically learned rewrite patterns to preprocess the source sentences so that they have a word order similar to that of the target language. Our system is a hybrid one. The basic model is statistical, but we use broad-coverage rule-based parsers in two ways - during training for learning rewrite patterns, and at runtime for reordering the source sentences. Our experiments show 10% relative improvement in Bleu measure.

306 citations


Journal ArticleDOI
TL;DR: The ERP experiment suggests that lexico-semantic fit can be more important for word processing than the meaning of the sentence as determined by the syntactic structure, at least initially.

294 citations


Proceedings Article
01 Jan 2004
TL;DR: This paper proposes an indexing procedure for spoken utterance retrieval that works on lattices rather than just single-best text, and demonstrates that this procedure can improve F scores by over five points compared to singlebest retrieval on tasks with poor WER and low redundancy.
Abstract: Recent work on spoken document retrieval has suggested that it is adequate to take the singlebest output of ASR, and perform text retrieval on this output. This is reasonable enough for the task of retrieving broadcast news stories, where word error rates are relatively low, and the stories are long enough to contain much redundancy. But it is patently not reasonable if one’s task is to retrieve a short snippet of speech in a domain where WER’s can be as high as 50%; such would be the situation with teleconference speech, where one’s task is to find if and when a participant uttered a certain phrase. In this paper we propose an indexing procedure for spoken utterance retrieval that works on lattices rather than just single-best text. We demonstrate that this procedure can improve F scores by over five points compared to singlebest retrieval on tasks with poor WER and low redundancy. The representation is flexible so that we can represent both word lattices, as well as phone lattices, the latter being important for improving performance when searching for phrases containing OOV words.

275 citations


Proceedings ArticleDOI
25 Jul 2004
TL;DR: This work utilizes WordNet to disambiguate word senses of query terms and shows that its approach yields between 23% and 31% improvements over the best-known results on the TREC 9, 10 and 12 collections for short (title only) queries, without using Web data.
Abstract: Noun phrases in queries are identified and classified into four types: proper names, dictionary phrases, simple phrases and complex phrases. A document has a phrase if all content words in the phrase are within a window of a certain size. The window sizes for different types of phrases are different and are determined using a decision tree. Phrases are more important than individual terms. Consequently, documents in response to a query are ranked with matching phrases given a higher priority. We utilize WordNet to disambiguate word senses of query terms. Whenever the sense of a query term is determined, its synonyms, hyponyms, words from its definition and its compound words are considered for possible additions to the query. Experimental results show that our approach yields between 23% and 31% improvements over the best-known results on the TREC 9, 10 and 12 collections for short (title only) queries, without using Web data.

Journal ArticleDOI
TL;DR: In this article, a series of self-paced reading time experiments was performed to assess how characteristics of noun phrases (NPs) contribute to the difference in processing difficulty between object-and subject-extracted relative clauses.

Journal ArticleDOI
TL;DR: This paper found that infants turn their heads for isolated bisyllabic words when presented with sentences that either contained the familiarized words or contained both their syllables separated by a phonological phrase boundary.

Journal ArticleDOI
TL;DR: The findings indicate that syntactic decisions are guided by the listener's situation-specific evaluation of how to achieve the behavioral goal of an utterance.
Abstract: In 2 experiments, eye movements were monitored as participants followed instructions containing temporary syntactic ambiguities (e.g., "Pour the egg in the bowl over the flour"). The authors varied the affordances of task-relevant objects with respect to the action required by the instruction (e.g., whether 1 or both eggs in the visual workspace were in liquid form, allowing them to be poured). The number of candidate objects that could afford the action was found to determine whether listeners initially misinterpreted the ambiguous phrase ("in the bowl") as specifying a location. The findings indicate that syntactic decisions are guided by the listener's situation-specific evaluation of how to achieve the behavioral goal of an utterance.

Proceedings Article
01 Jan 2004
TL;DR: A highly efficient monotone search algorithm with a complexity linear in the input sentence length is described and translated results for three tasks: Verbmobil, Xerox and the Canadian Hansards are presented.
Abstract: In statistical machine translation, the currently best performing systems are based in some way on phrases or word groups. We describe the baseline phrase-based translation system and various refinements. We describe a highly efficient monotone search algorithm with a complexity linear in the input sentence length. We present translation results for three tasks: Verbmobil, Xerox and the Canadian Hansards. For the Xerox task, it takes less than 7 seconds to translate the whole test set consisting of more than 10K words. The translation results for the Xerox and Canadian Hansards task are very promising. The system even outperforms the alignment template system.

Book
01 Jan 2004
TL;DR: The purpose of this edited volume is to study the structure of the inflectional field and the left peripheral field of clauses, often described as the systems of IP and CP.
Abstract: The purpose of this edited volume is to study the structure of the inflectional field and the left peripheral field of clauses, often described as the systems of IP (Inflection Phrase, a syntactic category used to describe clauses without complement clauses) and CP (Complementizer Phrase, a word or phrase marking a complement clause). With contributions by a select group of syntacticians, The Structure of CP and IP will be useful to scholars with an interest in Italian and Romance languages and of substantial value to all linguists interested in contemporary research in generative grammar.

Patent
09 Mar 2004
TL;DR: In this paper, a computer program product for controlling the computer's processor to perform responsive actions a natural language input has: (1) a vocabulary, phrase and concept databases of words, phrases and concepts, respectively, that can be recognized in the inputted communication, wherein each of these database elements is representable by a designated semantic symbol, and (2) searching the input text to identify the words in the communication that are contained within the vocabulary database, (3) means for expressing the communication in terms of the word semantic symbols that correspond to each of the words identified in the
Abstract: A computer program product for controlling the computer's processor to perform responsive actions a natural language input has: (1) vocabulary, phrase and concept databases of words, phrase and concepts, respectively, that can be recognized in the inputted communication, wherein each of these database elements is representable by a designated semantic symbol, (2) means for searching the inputted communication to identify the words in the communication that are contained within the vocabulary database, (3) means for expressing the communication in terms of the word semantic symbols that correspond to each of the words identified in the inputted communication, (4) means for searching the communication when expressed in terms of its corresponding word semantic symbols so as to identify the phrases in the communication that are contained within the phrase database, (5) means for expressing the communication in terms of the phrase semantic symbols that correspond to each of the phrases identified in the communication, (6) means for searching the communication when expressed in terms of its corresponding phrase semantic symbols so as to identify the concepts in the communication that are contained within the concept database, and (7) means for expressing the communication in terms of the concept semantic symbols that correspond to each of the concepts identified in the inputted communication, wherein these concept semantic symbols are recognizable by the processor and can cause the processor to take to take action responsive to the inputted communication.

Proceedings ArticleDOI
Young-Suk Lee1
02 May 2004
TL;DR: A novel morphological analysis technique which induces a morphological and syntactic symmetry between two languages with highly asymmetrical morphological structures to improve statistical machine translation qualities.
Abstract: We present a novel morphological analysis technique which induces a morphological and syntactic symmetry between two languages with highly asymmetrical morphological structures to improve statistical machine translation qualities The technique pre-supposes fine-grained segmentation of a word in the morphologically rich language into the sequence of prefix(es)-stem-suffix(es) and part-of-speech tagging of the parallel corpus The algorithm identifies morphemes to be merged or deleted in the morphologically rich language to induce the desired morphological and syntactic symmetry The technique improves Arabic-to-English translation qualities significantly when applied to IBM Model 1 and Phrase Translation Models trained on the training corpus size ranging from 3,500 to 33 million sentence pairs

Journal ArticleDOI
TL;DR: The authors investigated the processing of long-distance filler-gap dependencies in Japanese, a strongly head-final language, and found that Japanese readers associate a fronted wh-phrase with the most deeply embedded clause of a multi-clause sentence.

Journal ArticleDOI
TL;DR: The authors showed that partial parses which are syntactically compatible with only a proper subpart of the input are sometimes constructed, at least temporarily, by using a bottom-up dynamical model.

Journal ArticleDOI
TL;DR: It is argued that the word order possibilities of a language are partly determined by the parts-of-speech system of that language, and there is a balanced trade-off between the syntactic, morphological, and lexical structure of alanguage.
Abstract: This paper argues that the word order possibilities of a language are partly determined by the parts-of-speech system of that language. In languages in which lexical items are specialized for certain functionally defined syntactic slots (e.g. the modifier slot within a noun phrase), the identifiability of these slots is ensured by the nature of the lexical items (e.g. adjectives) themselves. As a result, word order possibilities are relatively unrestricted in these languages. In languages in which lexical items are not specialized for certain syntactic slots, in that these items combine the functions of two or more of the traditional word classes, other strategies have to be invoked to enhance identifiability. In these languages word order constraints are used to make syntactic slots identifiable on the basis of their position within the clause or phrase. Hence the word order possibilities are rather restricted in these languages. Counterexamples to the latter claim all involve cases in which identifiability is ensured by morphological rather than syntactic means. This shows that there is a balanced trade-off between the syntactic, morphological, and lexical structure of a language.

Patent
Endong Xun1
15 Oct 2004
TL;DR: In this article, a computer-aided reading system for reading in a non-native language is presented, which allows the user to select a word, phrase, sentence, or other grouping of words in the nonnative text.
Abstract: A computer-aided reading system offers assistance to a user who is reading in a non-native language, as the user needs help, without requiring the user to divert attention away from the text. In one implementation, the reading system is implemented as a reading wizard for a browser program. The reading wizard is exposed via a graphical user interface (UI) that allows the user to select a word, phrase, sentence, or other grouping of words in the non-native text. The reading wizard automatically determines whether the selected one word comprises part of a phrase; allows the user to choose whether to view a translation of a single word or a translation of a phrase that includes the single word in response to selection by the user of the single word. The multiple translations are presented in a pop-up window, in the form of a scrollable box and is scrollable, located near the selected text to minimize distraction of the user.

Journal ArticleDOI
01 Aug 2004
TL;DR: By framing problems for utterance generation and synthesis so that they can draw closely on a talented performance, the techniques support the rapid construction of animated characters with rich and appropriate expression.
Abstract: We describe a method for using a database of recorded speech and captured motion to create an animated conversational character. People's utterances are composed of short, clearly-delimited phrases; in each phrase, gesture and speech go together meaningfully and synchronize at a common point of maximum emphasis. We develop tools for collecting and managing performance data that exploit this structure. The tools help create scripts for performers, help annotate and segment performance data, and structure specific messages for characters to use within application contexts. Our animations then reproduce this structure. They recombine motion samples with new speech samples to recreate coherent phrases, and blend segments of speech and motion together phrase-by-phrase into extended utterances. By framing problems for utterance generation and synthesis so that they can draw closely on a talented performance, our techniques support the rapid construction of animated characters with rich and appropriate expression.

Journal ArticleDOI
TL;DR: Two experiments used a sentence–picture verification task in which statements about photographs of natural scenes were read in order to make a true/false decision about the validity of the sentence, and in which eye movements were recorded.
Abstract: When we see combinations of text and graphics, such as photographs and their captions in printed media, how do we compare the information in the two components? Two experiments used a sentence-picture verification task in which statements about photographs of natural scenes were read in order to make a true/false decision about the validity of the sentence, and in which eye movements were recorded. In Experiment 1 the sentence and the picture were presented concurrently, and objects and words could be inspected in any order. In Experiment 2 the two components were presented one after the other, either picture first or sentence first. Fixation durations on pictures were characteristically longer than those on sentences in both experiments, and fixations on sentences varied according to whether they were being encoded as abstract propositions or as coreferents of objects depicted in a previously inspected picture. The decision time data present a difficulty for existing models of sentence verification tasks, with an inconsistent pattern of differences between true and false trials.

Proceedings ArticleDOI
23 Aug 2004
TL;DR: This work investigates different reordering constraints for phrase-based statistical machine translation, namely the IBM constraints and the ITG constraints and presents efficient dynamic programming algorithms for both constraints.
Abstract: In statistical machine translation, the generation of a translation hypothesis is computationally expensive. If arbitrary reorderings are permitted, the search problem is NP-hard. On the other hand, if we restrict the possible reorderings in an appropriate way, we obtain a polynomial-time search algorithm. We investigate different reordering constraints for phrase-based statistical machine translation, namely the IBM constraints and the ITG constraints. We present efficient dynamic programming algorithms for both constraints. We evaluate the constraints with respect to translation quality on two Japanese-English tasks. We show that the reordering constraints improve translation quality compared to an unconstrained search that permits arbitrary phrase reorderings. The ITG constraints preform best on both tasks and yield statistically significant improvements compared to the unconstrained search.

Patent
Anna Patterson1
26 Jul 2004
TL;DR: In this article, a method of personalizing a search of a document collection to a user comprises monitoring documents accessed by a user, identifying first phrases present in one or more of the accessed documents, identifying corresponding first related phrases related to the corresponding identified first phrase, selecting search results comprising documents responsive to the query, identifying by operation of a processor configured to manipulate data within a computer system, one OR more second phrases related with the query and that are present in a user model.
Abstract: A method of personalizing a search of a document collection to a user comprises monitoring documents accessed by a user, identifying first phrases present in one or more of the accessed documents, identifying one or more corresponding first related phrases related to the corresponding identified first phrase, receiving a query including one or more second phrases from the user, selecting search results comprising documents responsive to the query, identifying by operation of a processor configured to manipulate data within a computer system, one or more second phrases related to one or more second phrases of the query and that are present in a user model, weighting scores of corresponding search results according to the identified one or more second related phrases, ranking the search results according to their weighted scores to provide personalized search results and presenting them to the user.

Journal ArticleDOI
TL;DR: Five experiments and three meta-analyses ruled out alternative accounts based on plausibility, argumenthood, conceptual number, clause packaging, or hierarchical feature-passing, reinforcing the general finding that error rates increase with degree of semantic integration.

Patent
Hua-Jun Zeng1, Benyu Zhang1, Zheng Chen1, Wei-Ying Ma1, Li Li1, Ying Li1, Tarek Najm1 
15 Apr 2004
TL;DR: In this paper, a system and methods for related term suggestion are described, in which term clusters are generated as a function of calculated similarity of term vectors, each term vector having been generated from search results associated with a set of high frequency of occurrence (FOO) historical queries previously submitted to a search engine.
Abstract: Systems and methods for related term suggestion are described. In one aspect, term clusters are generated as a function of calculated similarity of term vectors. Each term vector having been generated from search results associated with a set of high frequency of occurrence (FOO) historical queries previously submitted to a search engine. Responsive to receiving a term/phrase from an entity, the term/phrase is evaluated in view of terms/phrases in the term clusters to identify one or more related term suggestions.

Journal ArticleDOI
TL;DR: This article argues that English noun-plus-noun constructions (‘NNs’) originate both in the lexicon and in the syntax, and distinguishes between complement–head and attribute–head NNs, as well as between fore-stressed and end-st stressed NNs.
Abstract: This article argues that English noun-plus-noun constructions (‘NNs’) originate both in the lexicon and in the syntax. It distinguishes between complement–head and attribute–head NNs, as well as between fore-stressed and end-stressed NNs. It argues that complement-head NNs are fore-stressed and originate in the lexicon while attribute–head NNs typically have end-stress and syntactic provenance. The latter are, however, potentially subject to diachronic lexicalization, which may moreover involve the adoption of fore-stress. Hence, lexical NNs may be fore-stressed or end-stressed while phrasal NNs must be end-stressed. Although further potential sources of irregularity are identified, it is demonstrated that the model's predictive power is fairly robust and that, where it fails to predict firm stress patterns, it predicts their variability.

Patent
22 Jun 2004
TL;DR: In this paper, the search engine system uses information about historical query submissions to a search engine to suggest previously-submitted, related search phrases to users, preferably suggested based on a most recent set of query submission data (e.g., the last two weeks of submissions).
Abstract: A search engine system uses information about historical query submissions to a search engine to suggest previously-submitted, related search phrases to users. The related search phrases are preferably suggested based on a most recent set of query submission data (e.g., the last two weeks of submissions), and thus strongly reflect the current searching patterns or interests of users. In one embodiment, the related search phrases are scored and selected for display based at least in-part on (a) a frequency with which each search phrase has been submitted, and/or (b) an evaluation of the “usefulness” of each search phrase, as reflected by actions performed by prior users while viewing corresponding search results.