scispace - formally typeset
Open AccessJournal ArticleDOI

Self-training from labeled features for sentiment analysis

Reads0
Chats0
TLDR
This paper proposes a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using generalized expectation criteria.
Abstract
Sentiment analysis concerns about automatically identifying sentiment or opinion expressed in a given piece of text. Most prior work either use prior lexical knowledge defined as sentiment polarity of words or view the task as a text classification problem and rely on labeled corpora to train a sentiment classifier. While lexicon-based approaches do not adapt well to different domains, corpus-based approaches require expensive manual annotation effort. In this paper, we propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie-review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than existing weakly-supervised sentiment classification methods despite using no labeled documents.

read more

Content maybe subject to copyright    Report

Self-Training from Labeled Features for Sentiment
Analysis
Yulan He
a,
, Deyu Zhou
b
a
Knowledge Media Institute, Open University
Walton Hall, Milton Keynes MK6 6AA, UK
b
School of Computer Science and Engineering
Southeast University, Nanjing, China
Abstract
Sentiment analysis concerns about automatically identifying sentiment or opin-
ion expressed in a given piece of text. Most prior work either use prior lexical
knowledge defined as sentiment polarity of words or view the task as a text classi-
fication problem and rely on labeled corpora to train a sentiment classifier. While
lexicon-based approaches do not adapt well to different domains, corpus-based
approaches require expensive manual annotation effort.
In this paper, we propose a novel framework where an initial classifier is
learned by incorporating prior information extracted from an existing sentiment
lexicon with preferences on expectations of sentiment labels of those lexicon
words being expressed using generalized expectation criteria. Documents clas-
sified with high confidence are then used as pseudo-labeled examples for auto-
matical domain-specific feature acquisition. The word-class distributions of such
self-learned features are estimated from the pseudo-labeled examples and are used
to train another classifier by constraining the model’s predictions on unlabeled in-
Corresponding author. Tel.: +44 1908 858215; Fax: +44 1908 653169.
Email addresses: y.he@cantab.net (Yulan He), d.zhou@seu.edu.cn (Deyu Zhou)
Preprint submitted to Information Processing & Management June 10, 2011

stances. Experiments on both the movie review data and the multi-domain sen-
timent dataset show that our approach attains comparable or better performance
than existing weakly-supervised sentiment classification methods despite using no
labeled documents.
Keywords: Sentiment analysis, Opinion mining, Self-training, Generalized
expectation, Self-learned features.
1. Introduction
With the explosion of people’s attitudes and opinions expressed in social me-
dia including blogs, discussion forums, tweets, etc, detecting sentiment or opinion
from the Web is becoming an increasingly popular way of interpreting data. The
objective of sentiment analysis is to determine the overall attitude, either positive,
negative, or neutral, expressed in a give piece of text. Most prior work in senti-
ment analysis (Pang et al., 2002; Kim and Hovy, 2004; Pang and Lee, 2004; Choi
et al., 2005; Blitzer et al., 2007; Zhao et al., 2008; Narayanan et al., 2009) view
sentiment classification as a text classification problem where an annotated cor-
pus with documents labeled with their sentiment orientation is required to train
the classifiers. As such they lack of portability across different domains. More-
over, the rapid evolution of user-generated contents demands sentiment classifiers
that can easily adapt to new domains with minimum supervision. This thus mo-
tivates the investigation of weakly-supervised or unsupervised sentiment analysis
approaches.
While supervision for a sentiment classifier can come from labeled documents,
it can also come from labeled words. For example, the word excellent typi-
cally conveys positive sentiment. A simple approach of using such polarity words
2

for sentiment classification is to compare the frequency of occurrence of positive
and negative terms in a document. However, it does not normally give good re-
sults. In recent years, much effort has been devoted to incorporate prior belief
of word-sentiment associations from a sentiment lexicon into classifier learning
by combining such lexical knowledge with a small set of labeled documents (An-
dreevskaia and Bergler, 2008; Li et al., 2009; Melville et al., 2009).
Other weakly-supervised sentiment analysis approaches typically adopt the
self-training strategy (Zagibalov and Carroll, 2008b,a; Qiu et al., 2009). They start
with some initial seed sentiment lexicon and then use iterative training to enlarge
the lexicon. Documents classified at the current iteration are used as self-labeled
instances to train a classifier for the next iteration. Other approaches use ensemble
techniques by combining lexicon-based and corpus-based algorithms (Tan et al.,
2008). Nevertheless, all these approaches are either complex or require careful
tuning of domain and data specific parameters. More recently, Dasgupta and Ng
(2009) proposed a weakly-supervised sentiment classification algorithm by inte-
grating user feedbacks into a spectral clustering algorithm. Features induced for
each dimension of spectral clustering can be considered as sentiment-oriented top-
ics. Nevertheless, human judgement of identifying the most important dimensions
during spectral clustering is required.
In this paper
1
, we propose a simple and robust strategy that works by pro-
viding weak supervision at the level of features rather than instances. We obtain
an initial classifier by incorporating prior information extracted from an existing
sentiment lexicon into a sentiment classifier model learning, where preferences
1
The paper is a substantial extension of (He, 2010).
3

on expectations of sentiment labels of those lexicon words are expressed using
generalized expectation criteria (McCallum et al., 2007; Druck et al., 2008). Doc-
uments classified with high confidence by this initial classifier are used to derive
a set of self-learned and domain-specific features that are related to the distribu-
tion of the target classes. Such self-learned features are then used to train another
classifier by constraining the model’s predictions on unlabeled instances.
We evaluate our proposed framework on the movie review data and the multi-
domain sentiment dataset and show that our method attains comparable or better
performance than other previously proposed weakly-supervised or semi-supervised
methods for sentiment classification despite using no labeled instances. The rest
of the paper is structured as follows. Related work on weakly-supervised and
semi-supervised sentiment classification are discussed in Section 2. The proposed
framework is introduced in Section 3. The experimental setup and results are pre-
sented in Section 4. Finally, Section 5 concludes the paper and outlines directions
for future research.
2. Related Work
The pioneer work on sentiment classification that does not require labeled
data is that of Turney’s (Turney, 2002) which classifies a document as positive or
negative by the average semantic orientation of the phrases in the document that
contain adjectives or adverbs. The semantic orientation of a phrase is calculated as
the pointwise mutual information (PMI) with a positive word excellent minus
the PMI with a negative word poor”. His approach achieved an accuracy of
84% for automobile reviews and 66% for movie reviews. In the same vein, Read
and Carroll (2009) proposed three different ways, lexical association (using PMI),
4

semantic spaces, and distributional similarity, to measure the similarity between
words and polarity prototypes (such as excellent or good”). While Turney
only used one polarity prototype for each class, Read and Carroll chose seven
polarity prototypes which were obtained from Roget’s Thesaurus and WordNet
and selected based on their respective frequency in the Gigaword corpus. Still the
best result was achieved using PMI with 69.1% accuracy obtained on the movie
review data.
There have also been much interests in incorporating prior information from
sentiment lexicon containing a list of words bearing positive or negative polarity
into sentiment model learning, which we call weakly-supervised sentiment clas-
sification. Sentiment lexicons can be constructed in many different ways, ranging
from manual approaches (Whitelaw et al., 2005), to semi-automated approaches
(Kim and Hovy, 2004; Argamon et al., 2007; Abbasi et al., 2008), and even al-
most fully automated approaches (Turney and Littman, 2002; Kaji and Kitsure-
gawa, 2006; Kanayama and Nasukawa, 2006). When incorporating such prior
information into model learning, Andreevskaia and Bergler (2008) integrate a
corpus-based classifier trained on a small set of annotated in-domain data and
a lexicon-based system trained on WordNet for sentence-level sentiment annota-
tion across different domains. Li et al. (2009) employ lexical prior knowledge
for semi-supervised sentiment classification based on non-negative matrix tri-
factorization, where the domain-independent prior knowledge was incorporated
in conjunction with domain-dependent unlabeled data and a few labeled docu-
ments. Melville et al. (2009) also combine lexical information from a sentiment
lexicon with labeled documents where word-class probabilities in Na
¨
ıve Bayes
classifier learning are calculated as a weighted combination of word-class distri-
5

Citations
More filters
Journal ArticleDOI

Sentiment analysis algorithms and applications: A survey

TL;DR: This survey paper tackles a comprehensive overview of the last update in this field of sentiment analysis with sophisticated categorizations of a large number of recent articles and the illustration of the recent trend of research in the sentiment analysis and its related areas.
Journal ArticleDOI

Sentiment analysis

TL;DR: The goal of this work is to review and compare some free access web services, analyzing their capabilities to classify and score different pieces of text with respect to the sentiments contained therein.
Proceedings ArticleDOI

Arabic sentiment analysis: Lexicon-based and corpus-based

TL;DR: This paper starts by building a manually annotated dataset and then takes the reader through the detailed steps of building the lexicon, which addresses both approaches to SA for the Arabic language.
Journal ArticleDOI

A survey on classification techniques for opinion mining and sentiment analysis

TL;DR: This paper represents a complete, multilateral and systematic review of opinion mining and sentiment analysis to classify available methods and compare their advantages and drawbacks, in order to have better understanding of available challenges and solutions to clarify the future direction.
Journal ArticleDOI

A comprehensive survey on sentiment analysis: Approaches, challenges and trends

TL;DR: Sentiment analysis (SA) is the task of extracting and analyzing people's opinions, sentiments, attitudes, perceptions, etc., toward different entities such as topics, products, and services as discussed by the authors.
References
More filters

Thumbs up? Sentiment Classiflcation using Machine Learning Techniques

TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.
Proceedings ArticleDOI

Thumbs up? Sentiment Classification using Machine Learning Techniques

TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Posted Content

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

TL;DR: A simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (Thumbs down) if the average semantic orientation of its phrases is positive.
Proceedings Article

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

Peter, +1 more
TL;DR: This article proposed an unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended(thumbs down) based on the average semantic orientation of phrases in the review that contain adjectives or adverbs.
Proceedings ArticleDOI

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

TL;DR: This paper proposed a machine learning method that applies text-categorization techniques to just the subjective portions of the document, extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs; this greatly facilitates incorporation of cross-sentence contextual constraints.
Frequently Asked Questions (13)
Q1. How many polarity words are removed from the MR dataset?

After removing the polarity words that occurred less than 5 times in MR, the total number of matched polarity words is reduced to 1500 and the classification accuracy using either Heuristic labeling or Self-learned features improves. 

the rapid evolution of user-generated contents demands sentiment classifiers that can easily adapt to new domains with minimum supervision. 

By filtering the polarity words that occurred less than 5 times in the corpus, the number of matched polarity words drops dramatically with only about 500 matched words for Books and DVDs, and 160 for Electronics and Kitchen. 

With the explosion of people’s attitudes and opinions expressed in social media including blogs, discussion forums, tweets, etc, detecting sentiment or opinion from the Web is becoming an increasingly popular way of interpreting data. 

The pioneer work on sentiment classification that does not require labeled data is that of Turney’s (Turney, 2002) which classifies a document as positive or negative by the average semantic orientation of the phrases in the document that contain adjectives or adverbs. 

The authors also observe other domain-specific terms for the MR dataset, such as the actress name winslet (kate Winslet) with positive polarity and the movie name batman bearing negative polarity. 

As mentioned earlier, Books and DVDs are larger corpora and thus the number of matched polarity words without filter is about 2000. 

By adding a normalization term zk = ∑D d=1 δ(k ∈ wd) into fjk, the feature expectation becomes the predicted label distribution on the set of instances containing feature k, i.e.P̃ (j|k; Λ) = ∑Dd=1 δ(sd = j)δ(k ∈ wd) zk(4)The authors define a criterion that minimizes the KL divergence of the expected label distribution and a target expectation f̂ , which is essentially an instance ofgeneralized expectation criteria that penalizes the divergence of a specific model expectation from a target value. 

Since the authors are dealing with the binary classification problem here,the target expectation of a feature having its prior polarity (or associated class label) is 0.9 and 0.1 for its non-associated class.• 

Instead of incorporating prior information into model learning through sentiment lexicons, Dasgupta and Ng (2009) proposed an unsupervised sentiment classification algorithm where user feedbacks are provided in the spectral clustering process in an interactive manner to ensure that text are clustered along the sentiment dimension. 

An initial classifier is trained by incorporating prior information from the sentiment lexicon which consists of a list of words marked with their respective polarity. 

It performs fairly stable and only drops dramatically when too few polarity words were incorporated as prior knowledge, for example, when there are only 23 polarity words were selected at the cutoff point 40 for the Kitchen dataset. 

More recently, Dasgupta and Ng (2009) proposed a weakly-supervised sentiment classification algorithm by integrating user feedbacks into a spectral clustering algorithm.