scispace - formally typeset
Open AccessProceedings ArticleDOI

Mining and summarizing customer reviews

Reads0
Chats0
TLDR
This research aims to mine and to summarize all the customer reviews of a product, and proposes several novel techniques to perform these tasks.
Abstract
Merchants selling products on the Web often ask their customers to review the products that they have purchased and the associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make an informed decision on whether to purchase the product. It also makes it difficult for the manufacturer of the product to keep track and to manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the same product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.

read more

Content maybe subject to copyright    Report

Mining and Summarizing Customer Reviews
Minqing Hu and Bing Liu
Department of Computer Science
University of Illinois at Chicago
851 South Morgan Street
Chicago, IL 60607-7053
{mhu1, liub}@cs.uic.edu
ABSTRACT
Merchants selling products on the Web often ask their customers
to review the products that they have purchased and the
associated services. As e-commerce is becoming more and more
popular, the number of customer reviews that a product receives
grows rapidly. For a popular product, the number of reviews can
be in hundreds or even thousands. This makes it difficult for a
potential customer to read them to make an informed decision on
whether to purchase the product. It also makes it difficult for the
manufacturer of the product to keep track and to manage customer
opinions. For the manufacturer, there are additional difficulties
because many merchant sites may sell the same product and the
manufacturer normally produces many kinds of products. In this
research, we aim to mine and to summarize all the customer
reviews of a product. This summarization task is different from
traditional text summarization because we only mine the features
of the product on which the customers have expressed their
opinions and whether the opinions are positive or negative. We do
not summarize the reviews by selecting a subset or rewrite some
of the original sentences from the reviews to capture the main
points as in the classic text summarization. Our task is performed
in three steps: (1) mining product features that have been
commented on by customers; (2) identifying opinion sentences in
each review and deciding whether each opinion sentence is
positive or negative; (3) summarizing the results. This paper
proposes several novel techniques to perform these tasks. Our
experimental results using reviews of a number of products sold
online demonstrate the effectiveness of the techniques.
Categories and Subject Descriptors
H.2.8 [Database Management]: Database Applications – data
mining. I.2.7 [Artificial Intelligence]: Natural Language
Processing – text analysis.
General Terms
Algorithms, Experimentation, Human Factors.
Keywords
Text mining, sentiment classification, summarization, reviews.
1. INTRODUCTION
With the rapid expansion of e-commerce, more and more products
are sold on the Web, and more and more people are also buying
products online. In order to enhance customer satisfaction and
shopping experience, it has become a common practice for online
merchants to enable their customers to review or to express
opinions on the products that they have purchased. With more and
more common users becoming comfortable with the Web, an
increasing number of people are writing reviews. As a result, the
number of reviews that a product receives grows rapidly. Some
popular products can get hundreds of reviews at some large
merchant sites. Furthermore, many reviews are long and have
only a few sentences containing opinions on the product. This
makes it hard for a potential customer to read them to make an
informed decision on whether to purchase the product. If he/she
only reads a few reviews, he/she may get a biased view. The large
number of reviews also makes it hard for product manufacturers
to keep track of customer opinions of their products. For a product
manufacturer, there are additional difficulties because many
merchant sites may sell its products, and the manufacturer may
(almost always) produce many kinds of products.
In this research, we study the problem of generating feature-based
summaries of customer reviews of products sold online. Here,
features broadly mean product features (or attributes) and
functions. Given a set of customer reviews of a particular product,
the task involves three subtasks: (1) identifying features of the
product that customers have expressed their opinions on (called
product features); (2) for each feature, identifying review
sentences that give positive or negative opinions; and (3)
producing a summary using the discovered information.
Let us use an example to illustrate a feature-based summary.
Assume that we summarize the reviews of a particular digital
camera, digital_camera_1. The summary looks like the following:
Digital_camera_1:
Feature: picture quality
Positive: 253
<individual review sentences>
Negative: 6
<individual review sentences>
Feature: size
Positive: 134
<individual review sentences>
Negative: 10
<individual review sentences>
Figure 1: An example summary
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
KDD’04, August 22–25, 2004, Seattle, Washington, USA.
Copyright 2004 ACM 1-58113-888-1/04/0008…$5.00.

In Figure 1, picture quality and (camera) size are the product
features. There are 253 customer reviews that express positive
opinions about the picture quality, and only 6 that express
negative opinions. The <individual review sentences> link points
to the specific sentences and/or the whole reviews that give
positive or negative comments about the feature.
With such a feature-based summary, a potential customer can
easily see how the existing customers feel about the digital
camera. If he/she is very interested in a particular feature, he/she
can drill down by following the <individual review sentences>
link to see why existing customers like it and/or what they
complain about. For a manufacturer, it is possible to combine
summaries from multiple merchant sites to produce a single report
for each of its products.
Our task is different from traditional text summarization [15, 39,
36] in a number of ways. First of all, a summary in our case is
structured rather than another (but shorter) free text document as
produced by most text summarization systems. Second, we are
only interested in features of the product that customers have
opinions on and also whether the opinions are positive or
negative. We do not summarize the reviews by selecting or
rewriting a subset of the original sentences from the reviews to
capture their main points as in traditional text summarization.
As indicated above, our task is performed in three main steps:
(1) Mining product features that have been commented on by
customers. We make use of both data mining and natural
language processing techniques to perform this task. This
part of the study has been reported in [19]. However, for
completeness, we will summarize its techniques in this paper
and also present a comparative evaluation.
(2) Identifying opinion sentences in each review and deciding
whether each opinion sentence is positive or negative. Note
that these opinion sentences must contain one or more
product features identified above. To decide the opinion
orientation of each sentence (whether the opinion expressed
in the sentence is positive or negative), we perform three
subtasks. First, a set of adjective words (which are normally
used to express opinions) is identified using a natural
language processing method. These words are also called
opinion words in this paper. Second, for each opinion word,
we determine its semantic orientation, e.g., positive or
negative. A bootstrapping technique is proposed to perform
this task using WordNet [29, 12]. Finally, we decide the
opinion orientation of each sentence. An effective algorithm
is also given for this purpose.
(3) Summarizing the results. This step aggregates the results of
previous steps and presents them in the format of Figure 1.
Section 3 presents the detailed techniques for performing these
tasks. A system, called FBS (Feature-Based Summarization), has
also been implemented. Our experimental results with a large
number of customer reviews of 5 products sold online show that
FBS and its techniques are highly effectiveness.
2. RELATED WORK
Our work is closely related to Dave, Lawrence and Pennock’s
work in [9] on semantic classification of reviews. Using available
training corpus from some Web sites, where each review already
has a class (e.g., thumbs-up and thumbs-downs, or some other
quantitative or binary ratings), they designed and experimented a
number of methods for building sentiment classifiers. They show
that such classifiers perform quite well with test reviews. They
also used their classifiers to classify sentences obtained from Web
search results, which are obtained by a search engine using a
product name as the search query. However, the performance was
limited because a sentence contains much less information than a
review. Our work differs from theirs in three main aspects: (1)
Our focus is not on classifying each review as a whole but on
classifying each sentence in a review. Within a review some
sentences may express positive opinions about certain product
features while some other sentences may express negative
opinions about some other product features. (2) The work in [9]
does not mine product features from reviews on which the
reviewers have expressed their opinions. (3) Our method does not
need a corpus to perform the task.
In [30], Morinaga et al. compare reviews of different products in
one category to find the reputation of the target product.
However, it does not summarize reviews, and it does not mine
product features on which the reviewers have expressed their
opinions. Although they do find some frequent phrases indicating
reputations, these phrases may not be product features (e.g.,
“doesn’t work”, “benchmark result” and “no problem(s)”). In [5],
Cardie et al discuss opinion-oriented information extraction. They
aim to create summary representations of opinions to perform
question answering. They propose to use opinion-oriented
“scenario templates” to act as summary representations of the
opinions expressed in a document, or a set of documents. Our task
is different. We aim to identify product features and user opinions
on these features to automatically produce a summary. Also, no
template is used in our summary generation.
Our work is also related to but different from subjective genre
classification, sentiment classification, text summarization and
terminology finding. We discuss each of them below.
2.1 Subjective Genre Classification
Genre classification classifies texts into different styles, e.g.,
“editorial”, “novel”, “news”, “poem” etc. Although some
techniques for genre classification can recognize documents that
express opinions [23, 24, 14], they do not tell whether the
opinions are positive or negative. In our work, we need to
determine whether an opinion is positive or negative and to
perform opinion classification at the sentence level rather than at
the document level.
A more closely related work is [17], in which the authors
investigate sentence subjectivity classification and concludes that
the presence and type of adjectives in a sentence is indicative of
whether the sentence is subjective or objective. However, their
work does not address our specific task of determining the
semantic orientations of those subjective sentences. Neither do
they find features on which opinions have been expressed.
2.2 Sentiment Classification
Works of Hearst [18] and Sack [35] on sentiment-based
classification of entire documents use models inspired by
cognitive linguistics. Das and Chen [8] use a manually crafted
lexicon in conjunction with several scoring methods to classify
stock postings on an investor bulletin. Huettner and Subasic [20]

also manually construct a discriminant-word lexicon and use
fuzzy logic to classify sentiments. Tong [41] generates sentiment
timelines. It tracks online discussions about movies and displays a
plot of the number of positive and negative sentiment messages
over time. Messages are classified by looking for specific phrases
that indicate the author’s sentiment towards the movie (e.g.,
“great acting”, “wonderful visuals”, “uneven editing”). Each
phrase must be manually added to a special lexicon and manually
tagged as indicating positive or negative sentiment. The lexicon is
domain dependent (e.g., movies) and must be rebuilt for each new
domain. In contrast, in our work, we only manually create a small
list of seed adjectives tagged with positive or negative labels. Our
seed adjective list is also domain independent. An effective
technique is proposed to grow this list using WordNet.
Turney’s work in [42] applies a specific unsupervised learning
technique based on the mutual information between document
phrases and the words “excellent” and “poor”, where the mutual
information is computed using statistics gathered by a search
engine. Pang et al. [33] examine several supervised machine
learning methods for sentiment classification of movie reviews
and conclude that machine learning techniques outperform the
method that is based on human-tagged features although none of
existing methods could handle the sentiment classification with a
reasonable accuracy. Our work differs from these works on
sentiment classification in that we perform classification at the
sentence level while they determine the sentiment of each
document. They also do not find features on which opinions have
been expressed, which is very important in practice.
2.3 Text Summarization
Existing text summarization techniques mainly fall in one of the
two categories: template instantiation and passage extraction.
Work in the former framework includes [10, 39]. They emphasize
on identification and extraction of certain core entities and facts in
a document, which are packaged in a template. This framework
requires background knowledge in order to instantiate a template
to a suitable level of detail. Therefore, it is not domain or genre
independent [37, 38]. This is different from our work as our
techniques do not fill any template and are domain independent.
The passage extraction framework [e.g., 32, 25, 36] identifies
certain segments of the text (typically sentences) that are the most
representative of the document’s content. Our work is different in
that we do not extract representative sentences, but identify and
extract those specific product features and the opinions related to
them.
Boguraev and Kennedy [2] propose to find a few very prominent
expressions, objects or events in a document and use them to help
summarize the document. Our work is again different as we find
all product features in a set of customer reviews regardless
whether they are prominent or not. Thus, our summary is not a
traditional text summary.
Most existing works on text summarization focus on a single
document. Some researchers also studied summarization of
multiple documents covering similar information. Their main
purpose is to summarize the similarities and differences in the
information content among these documents [27]. Our work is
related but quite different because we aim to find the key features
that are talked about in multiple reviews. We do not summarize
similarities and differences of reviews.
2.4 Terminology Finding
In terminology finding, there are basically two techniques for
discovering terms in corpora: symbolic approaches that rely on
syntactic description of terms, namely noun phrases, and
statistical approaches that exploit the fact that the words
composing a term tend to be found close to each other and
reoccurring [21, 22, 7, 6]. However, using noun phrases tends to
produce too many non-terms (low precision), while using
reoccurring phrases misses many low frequency terms, terms with
variations, and terms with only one word. Our association mining
based technique does not have these problems, and we can also
find infrequent features by exploiting the fact that we are only
interested in features that the users have expressed opinions on.
3. THE PROPOSED TECHNIQUES
Figure 2 gives the architectural overview of our opinion
summarization system.
The inputs to the system are a product name and an entry Web
page for all the reviews of the product. The output is the summary
of the reviews as the one shown in the introduction section.
The system performs the summarization in three main steps (as
discussed before): (1) mining product features that have been
commented on by customers; (2) identifying opinion sentences in
each review and deciding whether each opinion sentence is
positive or negative; (3) summarizing the results. These steps are
performed in multiple sub-steps.
Given the inputs, the system first downloads (or crawls) all the
reviews, and put them in the review database. It then finds those
“hot” (or frequent) features that many people have expressed their
opinions on. After that, the opinion words are extracted using the
Opinion Sentence Orientation Identification
Summary Generation
Figure 2: Feature-based opinion summarization
Frequent
Features
Review
Database
Crawl Reviews
Infrequent
Feature
Identification
Opinion
Words
POS Tagging
Feature Pruning
Frequent Feature
Identification
Summary
Opinion word
Extraction
Opinion Orientation
Identification
Infrequent
Features

resulting frequent features, and semantic orientations of the
opinion words are identified with the help of WordNet. Using the
extracted opinion words, the system then finds those infrequent
features. In the last two steps, the orientation of each opinion
sentence is identified and a final summary is produced. Note that
POS tagging is the part-of-speech tagging [28] from natural
language processing, which helps us to find opinion features.
Below, we discuss each of the sub-steps in turn.
3.1 Part-of-Speech Tagging (POS)
Product features are usually nouns or noun phrases in review
sentences. Thus the part-of-speech tagging is crucial. We used the
NLProcessor linguistic parser [31] to parse each review to split
text into sentences and to produce the part-of-speech tag for each
word (whether the word is a noun, verb, adjective, etc). The
process also identifies simple noun and verb groups (syntactic
chunking). The following shows a sentence with POS tags.
<S> <NG><W C='PRP' L='SS' T='w' S='Y'> I </W> </NG>
<VG> <W C='VBP'> am </W><W C='RB'> absolutely
</W></VG> <W C='IN'> in </W> <NG> <W C='NN'> awe
</W> </NG> <W C='IN'> of </W> <NG> <W C='DT'> this
</W> <W C='NN'> camera </W></NG><W C='.'> .
</W></S>
NLProcessor generates XML output. For instance, <W C=‘NN’>
indicates a noun and <NG> indicates a noun group/noun phrase.
Each sentence is saved in the review database along with the POS
tag information of each word in the sentence. A transaction file is
then created for the generation of frequent features in the next
step. In this file, each line contains words from one sentence,
which includes only the identified nouns and noun phrases of the
sentence. Other components of the sentence are unlikely to be
product features. Some pre-processing of words is also performed,
which includes removal of stopwords, stemming and fuzzy
matching. Fuzzy matching is used to deal with word variants and
misspellings [19].
3.2 Frequent Features Identification
This sub-step identifies product features on which many people
have expressed their opinions. Before discussing frequent feature
identification, we first give some example sentences from some
reviews to describe what kinds of opinions that we will be
handling. Since our system aims to find what people like and
dislike about a given product, how to find the product features
that people talk about is the crucial step. However, due to the
difficulty of natural language understanding, some types of
sentences are hard to deal with. Let us see an easy and a hard
sentence from the reviews of a digital camera:
“The pictures are very clear.”
In this sentence, the user is satisfied with the picture quality of the
camera, picture is the feature that the user talks about. While the
feature of this sentence is explicitly mentioned in the sentence,
some features are implicit and hard to find. For example,
“While light, it will not easily fit in pockets.”
This customer is talking about the size of the camera, but the word
size does not appear in the sentence. In this work, we focus on
finding features that appear explicitly as nouns or noun phrases in
the reviews. We leave finding implicit features to our future work.
Here, we focus on finding frequent features, i.e., those features
that are talked about by many customers (finding infrequent
features will be discussed later). For this purpose, we use
association mining [1] to find all frequent itemsets. In our context,
an itemset is simply a set of words or a phrase that occurs together
in some sentences.
The main reason for using association mining is because of the
following observation. It is common that a customer review
contains many things that are not directly related to product
features. Different customers usually have different stories.
However, when they comment on product features, the words that
they use converge. Thus using association mining to find frequent
itemsets is appropriate because those frequent itemsets are likely
to be product features. Those noun/noun phrases that are
infrequent are likely to be non-product features.
We run the association miner CBA [26], which is based on the
Apriori algorithm in [1] on the transaction set of noun/noun
phrases produced in the previous step. Each resulting frequent
itemset is a possible feature. In our work, we define an itemset as
frequent if it appears in more than 1% (minimum support) of the
review sentences. The generated frequent itemsets are also called
candidate frequent features in this paper.
However, not all candidate frequent features generated by
association mining are genuine features. Two types of pruning are
used to remove those unlikely features.
Compactness pruning: This method checks features that contain
at least two words, which we call feature phrases, and remove
those that are likely to be meaningless.
The association mining algorithm does not consider the position
of an item (or word) in a sentence. However, in a sentence, words
that appear together in a specific order are more likely to be
meaningful phrases. Therefore, some of the frequent feature
phrases generated by association mining may not be genuine
features. Compactness pruning aims to prune those candidate
features whose words do not appear together in a specific order.
See [19] for the detailed definition of compactness and also the
pruning procedure.
Redundancy pruning: In this step, we focus on removing
redundant features that contain single words. To describe the
meaning of redundant features, we use the concept of p-support
(pure support). p-support of feature ftr is the number of sentences
that ftr appears in as a noun or noun phrase, and these sentences
must contain no feature phrase that is a superset of ftr.
We use a minimum p-support value to prune those redundant
features. If a feature has a p-support lower than the minimum p-
support (in our system, we set it to 3) and the feature is a subset of
another feature phrase (which suggests that the feature alone may
not be interesting), it is pruned. For instance, life by itself is not a
useful feature while battery life is a meaningful feature phrase.
See [19] for more explanations.
3.3 Opinion Words Extraction
We now identify opinion words. These are words that are
primarily used to express subjective opinions. Clearly, this is
related to existing work on distinguishing sentences used to

express subjective opinions from sentences used to objectively
describe some factual information [43]. Previous work on
subjectivity [44, 4] has established a positive statistically
significant correlation with the presence of adjectives. Thus the
presence of adjectives is useful for predicting whether a sentence
is subjective, i.e., expressing an opinion. This paper uses
adjectives as opinion words. We also limit the opinion words
extraction to those sentences that contain one or more product
features, as we are only interested in customers’ opinions on these
product features. Let us first define an opinion sentence.
Definition: opinion sentence
If a sentence contains one or more product features and one or
more opinion words, then the sentence is called an opinion
sentence.
We extract opinion words in the following manner (Figure 3):
for each sentence in the review database
if (it contains a frequent feature, extract all the adjective
words as opinion words)
for each feature in the sentence
the nearby adjective is recorded as its effective
opinion. /* A nearby adjective refers to the adjacent
adjective that modifies the noun/noun phrase that is a
frequent feature. */
Figure 3: Opinion word extraction
For example, horrible is the effective opinion of strap in The
strap is horrible and gets in the way of parts of the camera you
need access to.” Effective opinions will be useful when we
predict the orientation of opinion sentences.
3.4 Orientation Identification for Opinion
Words
For each opinion word, we need to identify its semantic
orientation, which will be used to predict the semantic orientation
of each opinion sentence. The semantic orientation of a word
indicates the direction that the word deviates from the norm for its
semantic group. Words that encode a desirable state (e.g.,
beautiful, awesome) have a positive orientation, while words that
represent undesirable states have a negative orientation (e.g.,
disappointing). While orientations apply to many adjectives, there
are also those adjectives that have no orientation (e.g., external,
digital) [17]. In this work, we are interested in only positive and
negative orientations.
Unfortunately, dictionaries and similar sources, i.e., WordNet
[29] do not include semantic orientation information for each
word. Hatzivassiloglou and McKeown [16] use a supervised
learning algorithm to infer the semantic orientation of adjectives
from constraints on conjunctions. Although their method achieves
high precision, it relies on a large corpus, and needs a large
amount of manually tagged training data. In Turney’s work [42],
the semantic orientation of a phrase is calculated as the mutual
information between the given phrase and the word “excellent”
minus the mutual information between the given phrase and the
word “poor”. The mutual information is estimated by issuing
queries to a search engine and noting the number of hits. The
paper [42], however, does not report the results of semantic
orientations of individual words/phrases. Instead it only gives the
classification results of reviews. We do not use these techniques
in this paper as both works rely on statistical information from a
rather big corpus. Their methods are also inefficient. For example,
in [42], for each word or phrase, a Web search and a substantial
processing of the returned results are needed.
In this research, we propose a simple and yet effective method by
utilizing the adjective synonym set and antonym set in WordNet
[29] to predict the semantic orientations of adjectives.
In WordNet, adjectives are organized into bipolar clusters, as
illustrated in Figure 4. The cluster for fast/slow, consists of two
half clusters, one for senses of fast and one for senses of slow.
Each half cluster is headed by a head synset, in this case fast and
its antonym slow. Following the head synset is the satellite
synsets, which represent senses that are similar to the sense of the
head adjective. The other half cluster is headed by the reverse
antonymous pair slow/fast, followed by satellite synsets for senses
of slow [12].
In general, adjectives share the same orientation as their
synonyms and opposite orientations as their antonyms. We use
this idea to predict the orientation of an adjective. To do this, the
synset of the given adjective and the antonym set are searched. If
a synonym/antonym has known orientation, then the orientation
of the given adjective could be set correspondingly. As the synset
of an adjective always contains a sense that links to head synset,
the search range is rather large. Given enough seed adjectives
with known orientations, we can almost predict the orientations of
all the adjective words in the review collection.
Thus, our strategy is to use a set of seed adjectives, which we
know their orientations and then grow this set by searching in the
WordNet. To have a reasonably broad range of adjectives, we
first manually come up a set of very common adjectives (in our
experiment, we used 30) as the seed list, e.g. positive adjectives:
great, fantastic, nice, cool and negative adjectives: bad, dull.
Then we resort to WordNet to predict the orientations of all the
adjectives in the opinion word list. Once an adjective’s orientation
is predicted, it is added to the seed list. Therefore, the list grows
in the process.
The complete procedure for predicting semantic orientations for
all the adjectives in the opinion list is shown in Figure 5.
Procedure OrientationPrediction takes the adjective seed list and
a set of opinion words whose orientations need to be determined.
Figure 4: Bipolar adjective structure,
( = similarity; = antonymy)
slow
dilatory
sluggish
leisurely
tardy
laggard
rapid
quick
alacritous
prompt
swift
fast

Citations
More filters
Proceedings ArticleDOI

Convolutional Neural Networks for Sentence Classification

TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.
Posted Content

Convolutional Neural Networks for Sentence Classification

TL;DR: In this article, CNNs are trained on top of pre-trained word vectors for sentence-level classification tasks and a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks.
Book

Opinion Mining and Sentiment Analysis

TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Posted Content

Representation Learning with Contrastive Predictive Coding

TL;DR: This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.
Book

Sentiment Analysis and Opinion Mining

TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.
References
More filters
Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum
- 01 Sep 2000 - 
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Book

Foundations of Statistical Natural Language Processing

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.

Thumbs up? Sentiment Classiflcation using Machine Learning Techniques

TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.
Proceedings ArticleDOI

Thumbs up? Sentiment Classification using Machine Learning Techniques

TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Frequently Asked Questions (10)
Q1. What are the contributions in "Mining and summarizing customer reviews" ?

In this research, the authors aim to mine and to summarize all the customer reviews of a product. The authors do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. This paper proposes several novel techniques to perform these tasks. 

In their future work, the authors plan to further improve and refine their techniques, and to deal with the outstanding problems identified above, i. e., pronoun resolution, determining the strength of opinions, and investigating opinions expressed with adverbs, verbs and nouns. Finally, the authors will also look into monitoring of customer reviews. The authors believe that monitoring will be particularly useful to product manufacturers because they want to know any new positive or negative comments on their products whenever they are available. Although a new review may be added, it may not contain any new information. 

Since their system aims to find what people like and dislike about a given product, how to find the product features that people talk about is the crucial step. 

In terminology finding, there are basically two techniques for discovering terms in corpora: symbolic approaches that rely on syntactic description of terms, namely noun phrases, and statistical approaches that exploit the fact that the words composing a term tend to be found close to each other and reoccurring [21, 22, 7, 6]. 

Every time an adjective with its orientation is added to the seed list, the seed list is updated; therefore calling OrientationSearch repeatedly is necessary in order to exploit the newly added information. 

There are 253 customer reviews that express positive opinions about the picture quality, and only 6 that express negative opinions. 

They propose to use opinion-oriented “scenario templates” to act as summary representations of the opinions expressed in a document, or a set of documents. 

The opinion words are mostly either positive or negative, e.g., there are two positive opinion words, good and exceptional in “overall this is a good camera with a really good picture clarity & an exceptional close-up shooting capability. 

Procedure OrientationPrediction takes the adjective seed list and a set of opinion words whose orientations need to be determined.(→ = similarity; = antonymy) 

The authors extract infrequent features using the procedure in Figure 6:The authors use the nearest noun/noun phrase as the noun/noun phrase that the opinion word modifies because that is what happens most of the time.