scispace - formally typeset
Search or ask a question
Proceedings Article

Determining Term Subjectivity and Term Orientation for Opinion Mining

TL;DR: The task of deciding whether a given term has a positive connotations, or a negative connotation, or has no subjective connotation at all is confronted, and it is shown that determining subjectivity and orientation is a much harder problem than determining orientation alone.
Abstract: Opinion mining is a recent subdiscipline of computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses To aid the extraction of opinions from text, recent work has tackled the issue of determining the orientation of “subjective” terms contained in text, ie deciding whether a term that carries opinionated content has a positive or a negative connotation This is believed to be of key importance for identifying the orientation of documents, ie determining whether a document expresses a positive or negative opinion about its subject matter We contend that the plain determination of the orientation of terms is not a realistic problem, since it starts from the nonrealistic assumption that we already know whether a term is subjective or not; this would imply that a linguistic resource that marks terms as “subjective” or “objective” is available, which is usually not the case In this paper we confront the task of deciding whether a given term has a positive connotation, or a negative connotation, or has no subjective connotation at all; this problem thus subsumes the problem of determining subjectivity and the problem of determining orientation We tackle this problem by testing three different variants of a semi-supervised method previously proposed for orientation detection Our results show that determining subjectivity and orientation is a much harder problem than determining orientation alone

Content maybe subject to copyright    Report

Citations
More filters
Book
08 Jul 2008
TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Abstract: An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

7,452 citations


Cites background or methods from "Determining Term Subjectivity and T..."

  • ...Each synset of WordNet [95], a publicly available thesaurus-like resource, is assigned one of three sentiment scores — positive, negative, or objective — where these scores were automatically generated using a semi-supervised method described in Esuli and Sebastiani [90]....

    [...]

  • ...[209] summarize the evidence of several projects on subsentential analysis [12, 90, 290, 320] as follows: “the problem of distinguishing subjective versus objective instances has often proved to be more difficult than subsequent polarity classification, so improvements in subjectivity classification promise to positively impact sentiment classification”....

    [...]

  • ...orientationin the literature) or subjectivity status [12, 45, 89, 90, 91, 92, 119, 131, 143, 146, 258, 287, 289, 290, 291, 300, 304, 306]....

    [...]

  • ...WordNet-defined relations, or other related words (and, along the same lines, opposite labels can be given based on similar information) [12, 20, 89, 90, 131, 146, 148, 155, 289, 299, 300]....

    [...]

Book
01 May 2012
TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.
Abstract: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. For the first time in human history, we now have a huge volume of opinionated data recorded in digital form for analysis. Sentiment analysis systems are being applied in almost every business and social domain because opinions are central to almost all human activities and are key influencers of our behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely conditioned on how others see and evaluate the world. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. This book is a comprehensive introductory and survey text. It covers all important topics and the latest developments in the field with over 400 references. It is suitable for students, researchers and practitioners who are interested in social media analysis in general and sentiment analysis in particular. Lecturers can readily use it in class for courses on natural language processing, social media analysis, text mining, and data mining. Lecture slides are also available online.

4,515 citations

Proceedings Article
01 Jan 2006
TL;DR: SENTIWORDNET is a lexical resource in which each WORDNET synset is associated to three numerical scores Obj, Pos and Neg, describing how objective, positive, and negative the terms contained in the synset are.
Abstract: Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. OM has a rich set of applications, ranging from tracking users’ opinions about products or about political candidates as expressed in online forums, to customer relationship management. In order to aid the extraction of opinions from text, recent research has tried to automatically determine the “PNpolarity” of subjective terms, i.e. identify whether a term that is a marker of opinionated content has a positive or a negative connotation. Research on determining whether a term is indeed a marker of opinionated content (a subjective term) or not (an objective term) has been instead much scarcer. In this work we describe SENTIWORDNET, a lexical resource in which each WORDNET synset sis associated to three numerical scores Obj(s), Pos(s) and Neg(s), describing how objective, positive, and negative the terms contained in the synset are. The method used to develop SENTIWORDNET is based on the quantitative analysis of the glosses associated to synsets, and on the use of the resulting vectorial term representations for semi-supervised synset classi.cation. The three scores are derived by combining the results produced by a committee of eight ternary classi.ers, all characterized by similar accuracy levels but different classification behaviour. SENTIWORDNET is freely available for research purposes, and is endowed with a Web-based graphical user interface.

2,625 citations


Cites background or methods from "Determining Term Subjectivity and T..."

  • ...The effectiveness results reported in (Esuli and Sebastiani, 2006) may thus be considered only approximately indicative of the accuracy of the SENTIWORDNET labelling....

    [...]

  • ...Each ternary classifier is generated using the semisupervised method described in (Esuli and Sebastiani, 2006)....

    [...]

  • ...The reader should however bear in mind a few differences between the method used in (Esuli and Sebastiani, 2006) and the one used here: (i) we here classify entire synsets, while in (Esuli and Sebastiani, 2006) we classified terms, which can sometimes be ambiguous and thus more difficult to…...

    [...]

  • ...In (Esuli and Sebastiani, 2006) we point out how different combinations of training set and learner perform differently, even though with similar accuracy....

    [...]

  • ...The task of determining whether a term is indeed a marker of opinionated content (i.e. is Subjective or Objective) has instead received much less attention (Esuli and Sebastiani, 2006; Riloff et al., 2003; Vegnaduzzo, 2004)....

    [...]

01 Jan 2010
TL;DR: In this article, the authors focus on opinion expressions that convey people's positive or negative sentiments, i.e., opinions are subjective expressions that describe people's sentiments, appraisals or feelings toward entities, events and their properties.
Abstract: Textual information in the world can be broadly categorized into two main types: facts and opinions. Facts are objective expressions about entities, events and their properties. Opinions are usually subjective expressions that describe people’s sentiments, appraisals or feelings toward entities, events and their properties. The concept of opinion is very broad. In this chapter, we only focus on opinion expressions that convey people’s positive or negative sentiments. Much of the existing research on textual information processing has been focused on mining and retrieval of factual information, e.g., information retrieval, Web search, text classification, text clustering and many other text mining and natural language processing tasks. Little work had been done on the processing of opinions until only recently. Yet, opinions are so important that whenever we need to make a decision we want to hear others’ opinions. This is not only true for individuals but also true for organizations. One of the main reasons for the lack of study on opinions is the fact that there was little opinionated text available before the World Wide Web. Before the Web, when an individual needed to make a decision, he/she typically asked for opinions from friends and families. When an organization wanted to find the opinions or sentiments of the general public about its products and services, it conducted opinion polls, surveys, and focus groups. However, with the Web, especially with the explosive growth of the usergenerated content on the Web in the past few years, the world has been transformed. The Web has dramatically changed the way that people express their views and opinions. They can now post reviews of products at merchant sites and express their views on almost anything in Internet forums, discussion groups, and blogs, which are collectively called the user-generated content. This online wordof-mouth behavior represents new and measurable sources of information with many practical applications. Now if one wants to buy a product, he/she is no longer limited to asking his/her friends and families because there are many product reviews on the Web which give opinions of existing users of the product. For a company, it may no longer be necessary to conduct surveys, organize focus groups or employ external consultants in order to find consumer opinions about its products and those of its competitors because the user-generated content on the Web can already give them such information.

1,575 citations

Proceedings ArticleDOI
11 Feb 2008
TL;DR: This paper proposes a holistic lexicon-based approach to solving the problem of determining the semantic orientations (positive, negative or neutral) of opinions expressed on product features in reviews by exploiting external evidences and linguistic conventions of natural language expressions.
Abstract: One of the important types of information on the Web is the opinions expressed in the user generated content, e.g., customer reviews of products, forum posts, and blogs. In this paper, we focus on customer reviews of products. In particular, we study the problem of determining the semantic orientations (positive, negative or neutral) of opinions expressed on product features in reviews. This problem has many applications, e.g., opinion mining, summarization and search. Most existing techniques utilize a list of opinion (bearing) words (also called opinion lexicon) for the purpose. Opinion words are words that express desirable (e.g., great, amazing, etc.) or undesirable (e.g., bad, poor, etc) states. These approaches, however, all have some major shortcomings. In this paper, we propose a holistic lexicon-based approach to solving the problem by exploiting external evidences and linguistic conventions of natural language expressions. This approach allows the system to handle opinion words that are context dependent, which cause major difficulties for existing algorithms. It also deals with many special words, phrases and language constructs which have impacts on opinions based on their linguistic patterns. It also has an effective function for aggregating multiple conflicting opinion words in a sentence. A system, called Opinion Observer, based on the proposed technique has been implemented. Experimental results using a benchmark product review data set and some additional reviews show that the proposed technique is highly effective. It outperforms existing methods significantly

1,404 citations

References
More filters
Book ChapterDOI
21 Apr 1998
TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.
Abstract: This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.

8,658 citations


"Determining Term Subjectivity and T..." refers methods in this paper

  • ...As a consequence, while for the former we use well-known learners for binary classification (the naive Bayesian learner using the multinomial model (NB) [9], support vector machines using linear kernels [6], the Rocchio learner, and its PrTFIDF probabilistic version [5]), for Approach III we use their multiclass versions....

    [...]

  • ...…binary classification (the naive Bayesian learner using the multinomial model (McCallum and Nigam, 1998), support vector machines using linear kernels (Joachims, 1998), the Rocchio learner, and its PrTFIDF probabilistic version (Joachims, 1997)), for Approach III we use their multiclass versions9....

    [...]

Posted Content
TL;DR: A simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (Thumbs down) if the average semantic orientation of its phrases is positive.
Abstract: This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews.

4,526 citations


"Determining Term Subjectivity and T..." refers background in this paper

  • ...latter problem is probably Turney’s (2002), who...

    [...]

  • ...determining document orientation (or polarity), as in deciding if a given Subjective text expresses a Positive or a Negative opinion on its subject matter (Pang and Lee, 2004; Turney, 2002);...

    [...]

Proceedings Article
01 Jan 2002
TL;DR: This article proposed an unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended(thumbs down) based on the average semantic orientation of phrases in the review that contain adjectives or adverbs.
Abstract: This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down) The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs A phrase has a positive semantic orientation when it has good associations (eg, “subtle nuances”) and a negative semantic orientation when it has bad associations (eg, “very cavalier”) In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word “excellent” minus the mutual information between the given phrase and the word “poor” A review is classified as recommended if the average semantic orientation of its phrases is positive The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations) The accuracy ranges from 84% for automobile reviews to 66% for movie reviews

3,814 citations

Proceedings Article
01 Jan 1998
TL;DR: It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size.
Abstract: Recent work in text classification has used two different first-order probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multi-variate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (e.g. Larkey and Croft 1996; Koller and Sahami 1997). Others use a multinomial model, that is, a uni-gram language model with integer word counts (e.g. Lewis and Gale 1994; Mitchell 1997). This paper aims to clarify the confusion by describing the differences and details of these two models, and by empirically comparing their classification performance on five text corpora. We find that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi-variate Bernoulli model at any vocabulary size.

3,601 citations


"Determining Term Subjectivity and T..." refers methods in this paper

  • ...As a consequence, while for the former we use well-known learners for binary classification (the naive Bayesian learner using the multinomial model (McCallum and Nigam, 1998), support vector machines using linear kernels (Joachims, 1998), the Roc-...

    [...]

Proceedings ArticleDOI
Bo Pang1, Lillian Lee1
21 Jul 2004
TL;DR: This paper proposed a machine learning method that applies text-categorization techniques to just the subjective portions of the document, extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs; this greatly facilitates incorporation of cross-sentence contextual constraints.
Abstract: Sentiment analysis seeks to identify the viewpoint(s) underlying a text span; an example application is classifying a movie review as "thumbs up" or "thumbs down". To determine this sentiment polarity, we propose a novel machine-learning method that applies text-categorization techniques to just the subjective portions of the document. Extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs; this greatly facilitates incorporation of cross-sentence contextual constraints.

3,459 citations