scispace - formally typeset
Search or ask a question

Showing papers on "Phrase published in 2002"


Posted Content
TL;DR: A simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (Thumbs down) if the average semantic orientation of its phrases is positive.
Abstract: This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews.

4,526 citations


Proceedings Article
01 Jan 2002
TL;DR: This article proposed an unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended(thumbs down) based on the average semantic orientation of phrases in the review that contain adjectives or adverbs.
Abstract: This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down) The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs A phrase has a positive semantic orientation when it has good associations (eg, “subtle nuances”) and a negative semantic orientation when it has bad associations (eg, “very cavalier”) In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word “excellent” minus the mutual information between the given phrase and the word “poor” A review is classified as recommended if the average semantic orientation of its phrases is positive The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations) The accuracy ranges from 84% for automobile reviews to 66% for movie reviews

3,814 citations


Proceedings ArticleDOI
06 Jul 2002
TL;DR: In this article, an unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended(thumbs down) is presented. But the classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs.
Abstract: This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews.

1,904 citations


Journal ArticleDOI
TL;DR: This review argues that sentence processing is supported by a temporo-frontal network, within this network, temporal regions subserve aspects of identification and frontal regions the building of syntactic and semantic relations.

1,760 citations


Journal ArticleDOI
TL;DR: A system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame, based on statistical classifiers trained on roughly 50,000 sentences that were hand-annotated with semantic roles by the FrameNet semantic labeling project.
Abstract: We present a system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame. Given an input sentence and a target word and frame, the system labels constituents with either abstract semantic roles, such as AGENT or PATIENT, or more domain-specific semantic roles, such as SPEAKER, MESSAGE, and TOPIC.The system is based on statistical classifiers trained on roughly 50,000 sentences that were hand-annotated with semantic roles by the FrameNet semantic labeling project. We then parsed each training sentence into a syntactic tree and extracted various lexical and syntactic features, including the phrase type of each constituent, its grammatical function, and its position in the sentence. These features were combined with knowledge of the predicate verb, noun, or adjective, as well as information such as the prior probabilities of various combinations of semantic roles. We used various lexical clustering algorithms to generalize across possible fillers of roles. Test sentences were parsed, were annotated with these features, and were then passed through the classifiers.Our system achieves 82% accuracy in identifying the semantic role of presegmented constituents. At the more difficult task of simultaneously segmenting constituents and identifying their semantic role, the system achieved 65% precision and 61% recall.Our study also allowed us to compare the usefulness of different features and feature combination methods in the semantic role labeling task. We also explore the integration of role labeling with statistical syntactic parsing and attempt to generalize to predicates unseen in the training data.

1,666 citations


Proceedings ArticleDOI
06 Jul 2002
TL;DR: A joint probability model for statistical machine translation is presented, which automatically learns word and phrase equivalents from bilingual corpora, which is more accurate than translations produced using IBM Model 4.
Abstract: We present a joint probability model for statistical machine translation, which automatically learns word and phrase equivalents from bilingual corpora. Translations produced with parameters estimated using the joint model are more accurate than translations produced using IBM Model 4.

698 citations


Book ChapterDOI
TL;DR: A translation model that is based on bilingual phrases to explicitly model the local context is presented and it is shown that this model performs better than the single-word based model.
Abstract: This paper is based on the work carried out in the framework of the VERBMOBIL project, which is a limited-domain speech translation task (German-English). In the final evaluation, the statistical approach was found to perform best among five competing approaches.In this paper, we will further investigate the used statistical translation models. A shortcoming of the single-word based model is that it does not take contextual information into account for the translation decisions. We will present a translation model that is based on bilingual phrases to explicitly model the local context. We will show that this model performs better than the single-word based model. We will compare monotone and non-monotone search for this model and we will investigate the benefit of using the sum criterion instead of the maximum approximation.

408 citations


Journal ArticleDOI
TL;DR: The results of this experiment support a discourse-processing-based distance metric for computing locality and provide evidence against a pure similarity-based account of structural complexity such as proposed by Bever.

360 citations


Journal ArticleDOI
TL;DR: Eye movement patterns clearly established that the initial interpretation of the ambiguous phrase was the one consistent with the context, consistent with a broad theoretical framework in which real-time language comprehension immediately takes into account a rich array of relevant nonlinguistic context.

318 citations


Book
01 Jan 2002
TL;DR: In this paper, the topic-predicate articulation of the sentence is described as follows: 1. Introduction 2. The minimal predicate 3. Focusing 5. Quantification 6. Negation 7. The postpositional phrase 8.
Abstract: 1. Introduction 2. The topic-predicate articulation of the sentence 3. The minimal predicate 4. Focusing 5. Quantification 6. Negation 7. The noun phrases 8. The postpositional phrase 9. Non-finite and semi-finite verb phrases 10. The subordinate clause.

315 citations


Journal ArticleDOI
26 Jan 2002-Probus
TL;DR: The authors provide a detailed account of the various realizations of the accentual phrase in our phonological model of French intonation and introduce a slight revision in tone-syllable association.
Abstract: In this paper we provide a detailed account of the various realizations of the accentual phrase in our phonological model of French intonation (Jun & Fougeron 1995, 2000), and introduce a slight revision in tone-syllable association. In addition to the default and unmarked phrases, we examine the intonational contour of long polymorphemic words and utterances containing a sequence of several clitics. We discuss the status of additional H tones found in the marked phrases and the constraints on the distribution of these H tones.

Journal ArticleDOI
TL;DR: In this article, participants processed case-unambiguous German subject and object WH-questions with either a long or short distance between the WH-filler and its gap, and a sustained left anterior negativity was observed for object questions with long filler-gap distance but not for short object questions.

Journal ArticleDOI
TL;DR: It is found that agreement errors were more frequent following an intermediate modifier than an immediately preverbal modifier, and suggested that attraction is determined by the syntactic distance between the interfering noun and the head noun at a stage of the grammatical encoding of the sentence during which syntactic units are organised into a hierarchical structure.
Abstract: We report two parallel experiments conducted in French and in English in which we induced subject-verb agreement errors to explore the role of syntactic structure during sentence production. Previous studies have shown that attraction errors (i.e., a tendency of the verb to agree with an immediately preceding noun instead of with the subject of the sentence) occur when a preverbal local noun disagrees in number with the subject head noun. The attraction effect was accounted for either by the proximity of the local noun to the verb in the linearised sentence (linear distance hypothesis) or by the processing simultaneity of the head and local nouns situated in the same clause (clause packaging hypothesis). In the current experiments, speakers were asked to complete complex sentential preambles. Contrary to the predictions of these two hypotheses, we found that agreement errors were more frequent following an intermediate modifier (e.g., *The threat-S to the presidents-P of the company-S ARE serious) than an...

Journal ArticleDOI
TL;DR: This paper assess the extent to which information processing is encapsulated between different processing stages and propose an alternative framework that does not assume strict encapsulation but maintains multiple levels of integration for production.
Abstract: A discussion of modularity in language production processes, with special emphasis on processes for retrieving words and building syntactic structures for a to-be-uttered sentence, is presented. The authors' 1st goal was to assess the extent to which information processing is encapsulated between different processing stages. In particular, they assessed whether the input from one processing stage to the next is minimal and whether the flow of information in the system is strictly unidirectional. On the basis of the reviewed evidence, they conclude that both assumptions have to be revised. Their 2nd goal was to propose an alternative framework that does not assume strict encapsulation but that maintains multiple levels of integration for production.

Proceedings ArticleDOI
Dragomir R. Radev1, Weiguo Fan1, Hong Qi1, Harris Wu1, Amardeep Grewal1 
07 May 2002
TL;DR: The architecture that augments existing search engines so that they support natural language question answering, called NSIR, is developed and some probabilistic approaches to the last three of these stages are described.
Abstract: Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this paper we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR) using proximity and question type features achieves a total reciprocal document rank of .20 on the TREC 8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.

Journal ArticleDOI
TL;DR: It is found that a large‐scale distributed neural network covering the left mid‐inferior frontal and mid‐superior temporal cortices was responsible for the processing of Chinese phrases, and the overall pattern of results indicates that syntactic processing is less independent in reading Chinese.
Abstract: A functional magnetic resonance imaging (fMRI) study was conducted to map syntactic and semantic processes onto the brain. Chinese-English bilingual subjects performed two experimental tasks: a syntactic plausibility judgment task in which they decided whether a viewed verb phrase was syntactically legal, and a semantic plausibility judgment task in which they decided whether a viewed phrase was semantically acceptable. A font size judgment task was used as baseline. It is found that a large-scale distributed neural network covering the left mid-inferior frontal and mid-superior temporal cortices was responsible for the processing of Chinese phrases. The right homologue areas of these left cortical sites were also active, although the brain activity was obviously left-lateralized. Unlike previous research with monolingual English speakers that showed that distinct brain regions mediate syntactic and semantic processing of English, the cortical sites contributing to syntactic analysis of Chinese phrases coincided with the cortical sites relevant to semantic analysis. Stronger brain activity, however, was seen in the left middle frontal cortex for syntactic processing (relative to semantic processing), whereas for semantic processing stronger cortical activations were shown in the left inferior prefrontal cortex and the left mid-superior temporal gyri. The overall pattern of results indicates that syntactic processing is less independent in reading Chinese. This is attributable to the linguistic nature of the Chinese language that semantics and syntax are not always clearly demarcated. Equally interesting, we discovered that when our bilingual subjects performed syntactic and semantic acceptability judgments of English phrases, they applied the cerebral systems underlying Chinese reading to the processing of English.

Patent
22 Feb 2002
TL;DR: Pop-up edictionaries as mentioned in this paper is a digital pop-up space or spaces that appears when a cursor is placed over a difficult or hard-to-understand word or phrase on a computer screen.
Abstract: A digital pop-up space or spaces that appears when a cursor is placed over a difficult or hard-to-understand word or phrase on a computer screen ( 34 ). The pop-up space contains dictionary elements and other relevent elements that help the reader understand the difficult word ( 42 ). The dictionary elements include but are not limited to definitions, synonyms, antonyms, quotations, and etymologies. The pop-up edictionary can contain images and moving images of all kinds ( 54 ). These elements represent options that may be used and organized to best assist the readers of a text. The sources of the pop-up edictionary elements can either be from published works or from extemporaneous origins or mixed together in combination. There can be pop-up edictionaries within pop-up edictionaries. If there are multiple definitions presented, the intended meaning will be distinctly indicated ( 42 ). Finally, the language or dialect in a pop-up edictionary can be different from the language in the main text.

01 Jan 2002
TL;DR: An overview of work carried out on the intonation of Standard German is provided, both in auditory phonetic studies and in the instrumentally-based phonological accounts within the autosegmental-metrical framework, and a surface-oriented annotation framework, GToBI, is proposed.
Abstract: In this paper we provide an overview of work carried out on the intonation of Standard German, both in auditory phonetic studies and in the instrumentally-based phonological accounts within the autosegmental-metrical framework. We examine how far the different accounts shed light on controversial issues such as leading tones, levels of phrasing, and phrase accents, and propose a surface-oriented annotation framework, GToBI, which aims to capture all empirically observed distinctive intonation patterns. For illustration purposes, the contours which are reported to occur most commonly are given in schematic form, along with their GToBI transcription and examples of their usage.

Journal ArticleDOI
TL;DR: A web-based replication of Pickering and Branigan (1998), Experiment 1, using a typed sentence completion paradigm that made it possible to record not only the responses made but also the response onset latency for each sentence completion strengthens the support for an architectural account of syntactic priming as envisaged by Pickeringand Branigan.
Abstract: To date, syntactic priming in sentence production has been investigated categorically, in terms of the probabilities of reusing particular syntactic structures. In this paper, we report a web-based replication of Pickering and Branigan (1998), Experiment 1, using a typed sentence completion paradigm that made it possible to record not only the responses made but also the response onset latency for each sentence completion. In conditions where priming occurred (as determined categorically), responses took less time when target completions were of the same type as preceding prime completions than when they differed. As well as validating Internet-based research by direct comparison with laboratory-based work, our findings strengthen the support for an architectural account of syntactic priming as envisaged by Pickering and Branigan.

Journal ArticleDOI
TL;DR: This article found that the syntactic form of a prime sentence affected the form of participants' target completions, and that priming is a two-way process by comparing priming conditions with a baseline condition containing an intransitive verb.

Journal ArticleDOI
TL;DR: The authors presented memory-based learning approaches to shallow parsing and applied these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing.
Abstract: We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement.

Journal ArticleDOI
TL;DR: It is argued that structural case is the manifestation on the noun phrase of features which are semantically interpretable on verbal projections, and Icelandic case does not encode features of noun phrase interpretation, but it is not uninterpretable either; case is properly seen as reflecting (interpretable) tense, aspect, or Aktionsart features.
Abstract: I argue in this paper for a novel analysis of case in Icelandic, with implications for case theory in general. I argue that structural case is the manifestation on the noun phrase of features which are semantically interpretable on verbal projections. Thus, Icelandic case does not encode features of noun phrase interpretation, but it is not uninterpretable either; case is properly seen as reflecting (interpretable) tense, aspect, or Aktionsart features. Accusative case in Icelandic is available when the two subevents introduced in a transitive verb phrase are temporally identified with each other, and dative case is available when the two parts are distinct. This analysis bears directly on the theory of feature checking in the Minimalist Program. Specifically, it is consistent with a restrictive theory of feature checking in which no features are strictly uninterpretable: all formal features come in interpretable-uninterpretable pairs, and feature checking is the matching of such pairs, driven by legibility conditions at Spell-Out.


Patent
28 Jun 2002
TL;DR: This paper matched fragments of a source language sentence to be translated to source language portions of an example in example base, and then replaced the aligned target language phrases from the matched examples for the matched fragments in the source language sentences in the same example.
Abstract: The present invention performs machine translation by matching fragments of a source language sentence to be translated to source language portions of an example in example base. When all relevant examples have been identified in the example base, the examples are subjected to phrase alignment in which fragments of the target language sentence in each example are aligned against the matched fragments of the source language sentence in the same example. A translation component then substitutes the aligned target language phrases from the matched examples for the matched fragments in the source language sentence.

Journal ArticleDOI
TL;DR: This paper argues that the apparently paradoxical behavior of these two types of clausal comparative constructions is due to a derivational distinction between them: comparative deletion involves overt movement plus deletion of a compared phrase, while comparative subdeletion involves covert movement of the compared phrase.
Abstract: This paper investigates the syntax of comparative deletion and comparative subdeletion in English and argues that the apparently paradoxical behavior of these two types of clausal comparative constructions is due to a derivational distinction between them: comparative deletion involves overt movement plus deletion of a compared phrase, while comparative subdeletion involves covert movement of the compared phrase. Although this derivational difference must be stipulated in standard approaches, it follows from general constraints on the relation between movement and deletion in English in a model of syntax in which syntactic constraints are ranked and violable, and well-formedness is determined by evaluating competing representations against the set of constraints, as in Optimality Theory. The analysis receives independent support from the interaction of comparatives and ellipsis, and achieves a higher level of descriptive and explanatory adequacy than alternative analyses that do not make reference to ranked and violable constraints.

Journal ArticleDOI
TL;DR: It is shown that online processing difficulties induced by word order variations in German cannot be attributed to the relative infrequency of the constructions in question, but rather appear to reflect the application of grammatical principles during parsing.

Journal ArticleDOI
TL;DR: Sensitivity to grammatical agreements with the word detection procedure, in the context of sentence comprehension difficulty on a traditional measure, suggests that PD patients' executive resource limitations contribute to their sentences comprehension difficulty.

Journal ArticleDOI
TL;DR: It is argued that both types of strategies used during the reading of locally ambiguous but globally unambiguous Spanish sentences are consistent with a selective reanalysis process as described by Frazier and Rayner (1982).
Abstract: In an eye movement experiment, we examined the use of reanalysis strategies during the reading of locally ambiguous but globally unambiguous Spanish sentences. Among other measures, we examined regressive eye movements made while readers were recovering in reading mild garden path sentences. The sentences had an adverbial clause that, depending on the mood (indicative vs. subjunctive) of the subordinate clause verb, could attachhigh (to the main verb of the sentence) orlow (to the verb in the subordinate clause). Although Spanish speakers favor low attachment, the high attachment version was quite easy to understand. Readers predominately used two alternative strategies to recover from the mild garden path in our sentences. In the more common reanalysis strategy, their eyes regressed from the last region (disambiguation+1) directly to the main verb in the sentence. Following this, they reread the rest of the sentence, fixating the next region and the adverb (the beginning of the ambiguous part of the sentence). Less frequently, readers regressed from the last region (disambiguation+1) directly to the adverb. We argue that both types of strategies are consistent with a selective reanalysis process as described by Frazier and Rayner (1982).

Journal ArticleDOI
TL;DR: The authors examined the role of language, specifically the scope of noun phrases used to convey novel property information, in children's category-based induction and found that children made fewer category-base inferences than adults.
Abstract: What conditions foster or constrain children's category-based induction? This study examined the role of language, specifically the scope of noun phrases used to convey novel property information. We focused on generic noun phrases, which are especially common in child-directed speech and have been argued to play an important role in the growth of category knowledge. In Study 1, 4-year-olds and adults were taught novel properties about familiar categories, using 1 of 3 types of wording: a generic noun phrase (e.g., "Bears like to eat ants"), a universally quantified noun phrase (e.g., "All bears like to eat ants"), or an indefinite plural noun phrase (e.g., "Some bears like to eat ants"). Study 2 was a follow-up study with adults using a more sensitive task. Results indicated sensitivity to type of wording among both preschoolers and adults, with "all" eliciting the most inferences, "some" eliciting the fewest inferences, and generics in between "all" and "some." However, children made fewer category-base...

01 Jan 2002
TL;DR: This study shows that the diverse surface patterns can be accounted for by two consistent gestures: 1. Interrogative intonation has a higher phrase curve than declarativeintonation; and 2. Sentence final syllables have more careful intonations and wider pitch swings in interrogative sentences.
Abstract: We model the differences between declarative and interrogative intonation in Chinese with Stem-ML, an intonation description language combined with an algorithm for translating tags into quantitative prosody. Our study shows that the diverse surface patterns can be accounted for by two consistent gestures: 1. Interrogative intonation has a higher phrase curve than declarative intonation; 2. Sentence final syllables have more careful intonation and wider pitch swings in interrogative sentences. Phrase curves of the two intonation types tend to be parallel and boundary tones are not necessary for modeling the differences between the two intonation types in Chinese.