How many polarity words are removed from the MR dataset?

After removing the polarity words that occurred less than 5 times in MR, the total number of matched polarity words is reduced to 1500 and the classification accuracy using either Heuristic labeling or Self-learned features improves.

What is the main reason for the rapid evolution of user-generated contents?

the rapid evolution of user-generated contents demands sentiment classifiers that can easily adapt to new domains with minimum supervision.

How many polarity words are matched in the MR dataset?

By filtering the polarity words that occurred less than 5 times in the corpus, the number of matched polarity words drops dramatically with only about 500 matched words for Books and DVDs, and 160 for Electronics and Kitchen.

What is Turney’s work on sentiment classification that does not require labeled data?

The pioneer work on sentiment classification that does not require labeled data is that of Turney’s (Turney, 2002) which classifies a document as positive or negative by the average semantic orientation of the phrases in the document that contain adjectives or adverbs.

What are some other domain-specific terms for the MR dataset?

The authors also observe other domain-specific terms for the MR dataset, such as the actress name winslet (kate Winslet) with positive polarity and the movie name batman bearing negative polarity.

How many polarity words are in the MR dataset?

As mentioned earlier, Books and DVDs are larger corpora and thus the number of matched polarity words without filter is about 2000.

What is the definition of a criterion that penalizes the divergence?

By adding a normalization term zk = ∑D d=1 δ(k ∈ wd) into fjk, the feature expectation becomes the predicted label distribution on the set of instances containing feature k, i.e.P̃ (j|k; Λ) = ∑Dd=1 δ(sd = j)δ(k ∈ wd) zk(4)The authors define a criterion that minimizes the KL divergence of the expected label distribution and a target expectation f̂ , which is essentially an instance ofgeneralized expectation criteria that penalizes the divergence of a specific model expectation from a target value.

What is the target expectation of a feature having its prior polarity?

Since the authors are dealing with the binary classification problem here,the target expectation of a feature having its prior polarity (or associated class label) is 0.9 and 0.1 for its non-associated class.•

What is the sentiment classification algorithm?

Instead of incorporating prior information into model learning through sentiment lexicons, Dasgupta and Ng (2009) proposed an unsupervised sentiment classification algorithm where user feedbacks are provided in the spectral clustering process in an interactive manner to ensure that text are clustered along the sentiment dimension.

What is the proposed framework for sentiment classifier learning?

An initial classifier is trained by incorporating prior information from the sentiment lexicon which consists of a list of words marked with their respective polarity.

How does the polarity word frequency cutoff work?

It performs fairly stable and only drops dramatically when too few polarity words were incorporated as prior knowledge, for example, when there are only 23 polarity words were selected at the cutoff point 40 for the Kitchen dataset.

How did Dasgupta and Ng (2009) propose a weakly-supervised?

More recently, Dasgupta and Ng (2009) proposed a weakly-supervised sentiment classification algorithm by integrating user feedbacks into a spectral clustering algorithm.

(Open Access) Self-training from labeled features for sentiment analysis (2011) | Yulan He

Self-Training from Labeled Features for Sentiment

Analysis

Yulan He

a,∗

, Deyu Zhou

Knowledge Media Institute, Open University

Walton Hall, Milton Keynes MK6 6AA, UK

School of Computer Science and Engineering

Southeast University, Nanjing, China

Abstract

Sentiment analysis concerns about automatically identifying sentiment or opin-

ion expressed in a given piece of text. Most prior work either use prior lexical

knowledge deﬁned as sentiment polarity of words or view the task as a text classi-

ﬁcation problem and rely on labeled corpora to train a sentiment classiﬁer. While

lexicon-based approaches do not adapt well to different domains, corpus-based

approaches require expensive manual annotation effort.

In this paper, we propose a novel framework where an initial classiﬁer is

learned by incorporating prior information extracted from an existing sentiment

lexicon with preferences on expectations of sentiment labels of those lexicon

words being expressed using generalized expectation criteria. Documents clas-

siﬁed with high conﬁdence are then used as pseudo-labeled examples for auto-

matical domain-speciﬁc feature acquisition. The word-class distributions of such

self-learned features are estimated from the pseudo-labeled examples and are used

to train another classiﬁer by constraining the model’s predictions on unlabeled in-

∗

Corresponding author. Tel.: +44 1908 858215; Fax: +44 1908 653169.

Email addresses: y.he@cantab.net (Yulan He), d.zhou@seu.edu.cn (Deyu Zhou)

Preprint submitted to Information Processing & Management June 10, 2011

stances. Experiments on both the movie review data and the multi-domain sen-

timent dataset show that our approach attains comparable or better performance

than existing weakly-supervised sentiment classiﬁcation methods despite using no

labeled documents.

Keywords: Sentiment analysis, Opinion mining, Self-training, Generalized

expectation, Self-learned features.

1. Introduction

With the explosion of people’s attitudes and opinions expressed in social me-

dia including blogs, discussion forums, tweets, etc, detecting sentiment or opinion

from the Web is becoming an increasingly popular way of interpreting data. The

objective of sentiment analysis is to determine the overall attitude, either positive,

negative, or neutral, expressed in a give piece of text. Most prior work in senti-

ment analysis (Pang et al., 2002; Kim and Hovy, 2004; Pang and Lee, 2004; Choi

et al., 2005; Blitzer et al., 2007; Zhao et al., 2008; Narayanan et al., 2009) view

sentiment classiﬁcation as a text classiﬁcation problem where an annotated cor-

pus with documents labeled with their sentiment orientation is required to train

the classiﬁers. As such they lack of portability across different domains. More-

over, the rapid evolution of user-generated contents demands sentiment classiﬁers

that can easily adapt to new domains with minimum supervision. This thus mo-

tivates the investigation of weakly-supervised or unsupervised sentiment analysis

approaches.

While supervision for a sentiment classiﬁer can come from labeled documents,

it can also come from labeled words. For example, the word “excellent” typi-

cally conveys positive sentiment. A simple approach of using such polarity words

for sentiment classiﬁcation is to compare the frequency of occurrence of positive

and negative terms in a document. However, it does not normally give good re-

sults. In recent years, much effort has been devoted to incorporate prior belief

of word-sentiment associations from a sentiment lexicon into classiﬁer learning

by combining such lexical knowledge with a small set of labeled documents (An-

dreevskaia and Bergler, 2008; Li et al., 2009; Melville et al., 2009).

Other weakly-supervised sentiment analysis approaches typically adopt the

self-training strategy (Zagibalov and Carroll, 2008b,a; Qiu et al., 2009). They start

with some initial seed sentiment lexicon and then use iterative training to enlarge

the lexicon. Documents classiﬁed at the current iteration are used as self-labeled

instances to train a classiﬁer for the next iteration. Other approaches use ensemble

techniques by combining lexicon-based and corpus-based algorithms (Tan et al.,

2008). Nevertheless, all these approaches are either complex or require careful

tuning of domain and data speciﬁc parameters. More recently, Dasgupta and Ng

(2009) proposed a weakly-supervised sentiment classiﬁcation algorithm by inte-

grating user feedbacks into a spectral clustering algorithm. Features induced for

each dimension of spectral clustering can be considered as sentiment-oriented top-

ics. Nevertheless, human judgement of identifying the most important dimensions

during spectral clustering is required.

In this paper

, we propose a simple and robust strategy that works by pro-

viding weak supervision at the level of features rather than instances. We obtain

an initial classiﬁer by incorporating prior information extracted from an existing

sentiment lexicon into a sentiment classiﬁer model learning, where preferences

The paper is a substantial extension of (He, 2010).

on expectations of sentiment labels of those lexicon words are expressed using

generalized expectation criteria (McCallum et al., 2007; Druck et al., 2008). Doc-

uments classiﬁed with high conﬁdence by this initial classiﬁer are used to derive

a set of self-learned and domain-speciﬁc features that are related to the distribu-

tion of the target classes. Such self-learned features are then used to train another

classiﬁer by constraining the model’s predictions on unlabeled instances.

We evaluate our proposed framework on the movie review data and the multi-

domain sentiment dataset and show that our method attains comparable or better

performance than other previously proposed weakly-supervised or semi-supervised

methods for sentiment classiﬁcation despite using no labeled instances. The rest

of the paper is structured as follows. Related work on weakly-supervised and

semi-supervised sentiment classiﬁcation are discussed in Section 2. The proposed

framework is introduced in Section 3. The experimental setup and results are pre-

sented in Section 4. Finally, Section 5 concludes the paper and outlines directions

for future research.

2. Related Work

The pioneer work on sentiment classiﬁcation that does not require labeled

data is that of Turney’s (Turney, 2002) which classiﬁes a document as positive or

negative by the average semantic orientation of the phrases in the document that

contain adjectives or adverbs. The semantic orientation of a phrase is calculated as

the pointwise mutual information (PMI) with a positive word “excellent” minus

the PMI with a negative word “poor”. His approach achieved an accuracy of

84% for automobile reviews and 66% for movie reviews. In the same vein, Read

and Carroll (2009) proposed three different ways, lexical association (using PMI),

semantic spaces, and distributional similarity, to measure the similarity between

words and polarity prototypes (such as “excellent” or “good”). While Turney

only used one polarity prototype for each class, Read and Carroll chose seven

polarity prototypes which were obtained from Roget’s Thesaurus and WordNet

and selected based on their respective frequency in the Gigaword corpus. Still the

best result was achieved using PMI with 69.1% accuracy obtained on the movie

review data.

There have also been much interests in incorporating prior information from

sentiment lexicon containing a list of words bearing positive or negative polarity

into sentiment model learning, which we call weakly-supervised sentiment clas-

siﬁcation. Sentiment lexicons can be constructed in many different ways, ranging

from manual approaches (Whitelaw et al., 2005), to semi-automated approaches

(Kim and Hovy, 2004; Argamon et al., 2007; Abbasi et al., 2008), and even al-

most fully automated approaches (Turney and Littman, 2002; Kaji and Kitsure-

gawa, 2006; Kanayama and Nasukawa, 2006). When incorporating such prior

information into model learning, Andreevskaia and Bergler (2008) integrate a

corpus-based classiﬁer trained on a small set of annotated in-domain data and

a lexicon-based system trained on WordNet for sentence-level sentiment annota-

tion across different domains. Li et al. (2009) employ lexical prior knowledge

for semi-supervised sentiment classiﬁcation based on non-negative matrix tri-

factorization, where the domain-independent prior knowledge was incorporated

in conjunction with domain-dependent unlabeled data and a few labeled docu-

ments. Melville et al. (2009) also combine lexical information from a sentiment

lexicon with labeled documents where word-class probabilities in Na

ıve Bayes

classiﬁer learning are calculated as a weighted combination of word-class distri-

Self-training from labeled features for sentiment analysis

Figures

Citations

Sentiment analysis algorithms and applications: A survey

Sentiment analysis

Arabic sentiment analysis: Lexicon-based and corpus-based

A survey on classification techniques for opinion mining and sentiment analysis

A comprehensive survey on sentiment analysis: Approaches, challenges and trends

References

Thumbs up? Sentiment Classiflcation using Machine Learning Techniques

Thumbs up? Sentiment Classification using Machine Learning Techniques

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

Related Papers (5)

Thumbs up? Sentiment Classification using Machine Learning Techniques

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

Lexicon-based methods for sentiment analysis

Opinion Mining and Sentiment Analysis

Mining and summarizing customer reviews

Frequently Asked Questions (13)

Q1. How many polarity words are removed from the MR dataset?

Q2. What is the main reason for the rapid evolution of user-generated contents?

Q3. How many polarity words are matched in the MR dataset?

Q4. What is the purpose of sentiment analysis?

Q5. What is Turney’s work on sentiment classification that does not require labeled data?

Q6. What are some other domain-specific terms for the MR dataset?

Q7. How many polarity words are in the MR dataset?

Q8. What is the definition of a criterion that penalizes the divergence?

Q9. What is the target expectation of a feature having its prior polarity?

Q10. What is the sentiment classification algorithm?

Q11. What is the proposed framework for sentiment classifier learning?

Q12. How does the polarity word frequency cutoff work?

Q13. How did Dasgupta and Ng (2009) propose a weakly-supervised?