scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A Framework for Automated Rating of Online Reviews Against the Underlying Topics

TL;DR: A framework for extracting prevalent topics from online reviews and automatically rating them on a 5-star scale is presented, including linguistic pre-processing, topic modelling, text classification, sentiment analysis, and rating.
Abstract: Even though the most online review systems offer star rating in addition to free text reviews, this only applies to the overall review. However, different users may have different preferences in relation to different aspects of a product or a service and may struggle to extract relevant information from a massive amount of consumer reviews available online. In this paper, we present a framework for extracting prevalent topics from online reviews and automatically rating them on a 5-star scale. It consists of five modules, including linguistic pre-processing, topic modelling, text classification, sentiment analysis, and rating. Topic modelling is used to extract prevalent topics, which are then used to classify individual sentences against these topics. A state-of-the-art word embedding method is used to measure the sentiment of each sentence. The two types of information associated with each sentence -- its topic and sentiment -- are combined to aggregate the sentiment associated with each topic. The overall topic sentiment is then projected onto the 5-star rating scale. We use a dataset of Airbnb online reviews to demonstrate a proof of concept. The proposed framework is simple and fully unsupervised. It is also domain independent, and, therefore, applicable to any other domains of products and services.

Summary (2 min read)

1 INTRODUCTION

  • Online reviews are valuable sources of relevant information that can support users in their decision making.
  • The process of manually annotating large training datasets is labour- and time-intensive.
  • Moreover, such approaches are not readily portable to other domains.
  • A variety of studies on rating online reviews have been published for a wide range of domains.
  • These studies did not take into account the sentiment or semantic content of text reviews.

2 CHALLENGES

  • The manual annotation process is time- and labour-intensive.
  • The vast majority of sentiment classification approaches rely on the bag-of-words model, which disregards context, grammar and even word order [3].
  • Any implementation should ideally be portable from one domain to another.

3 FRAMEWORK DESIGN AND METHODOLOGY

  • The authors present a framework for rating online reviews, which extracts the underlying topics automatically and rates each review against these topics.
  • The framework consists of five modules, including linguistic pre-processing, topic modeling, text classification, sentiment analysis and rating (Fig. 1).
  • The following subsections provide details about each module.

3.1 Linguistic Pre-processing

  • The authors have previously discussed the challenges associated with automated analysis of online reviews, including the lack of formal structure and informal style of writing.
  • To prepare the raw text for further analysis, including topic modelling and sentiment analysis, the authors employed the following linguistic pre-processing steps [3] [12]: Removing stop words.
  • Converting slang and abbreviations to the corresponding words.

3.2 Topic Modelling

  • The authors use Latent Dirichlet Allocation (LDA), an unsupervised probabilistic method that is widely used to automatically discover underlying topics from a set of text documents based on word distribution [6][7].
  • To demonstrate the approach, the authors used a publicly available dataset of Airbnb online reviews [13].
  • Each review may be associated with multiple topics.
  • Intuitively, according to the given words, one may assume that the topic T1 is related to amenities, whereas T2 and T3 are more about the location.
  • Some automatically extracted topics might be similar (e.g., both T2 and T3 are related to the location aspect), and the authors aggregated such topics manually.

3.4 Sentiment Analysis

  • The authors operate under an assumption that the rating is correlated with the sentiment strength.
  • To calculate the overall sentiment, each sentence is analyzed separately using the weighted word embeddings method [3] [5].
  • (1) where A and B are vectors of length n. Step 2: Negation Handling – Negation words and punctuation marks are used to determine the context affected by negation.
  • To automatically identify such words, the authors use NLTK toolkit for parts-of-speech tagging [22].
  • The sentiment score also reflects the strength of the overall sentiment, e.g. the first sentence and the third sentence are both positive, but the sentiment of the first sentence is stronger than that of the third sentence.

3.5 Topic Rating

  • Once the topic model has been extracted from a corpus of reviews, each sentence is classified into an appropriate topic.
  • To rate a review from on a 5-star scale (1 star being very negative and 5 star being very positive), the authors first normalize the sentiment score of each sentence as follows: where and are the minimum and maximum sentiment score in a text review, and PSSis the sentiment score of the given sentence.
  • The normalization effectively maps the sentiment of each sentence to a real number between 0 and 5.
  • For each topic in turn, the authors aggregate the normalized scores of all sentences within the topic to obtain the average score, denoted as .

4.1 Data

  • The authors used the Boston Airbnb Open Data, a publicly available set of reviews [13].
  • As part of the Airbnb Inside initiative, this dataset describes the listing activity of home stays in Boston, MA.
  • Google News Dataset (Word2vec Model): Google’s pre-trained vector set [4] is used in the sentiment analysis module.
  • The model contains 300-dimensional vectors for 3 million words and phrases.

4.2 Implementation and Results

  • The core algorithms were implemented with Python, NLTK [22] toolkit and Gensim [23] library.
  • All reviews were stored in MongoDB [24] for easy access and processing.
  • The front-end pages used to visualize the results were developed with HTML5, JavaScript and CSS.
  • Different topics are highlighted in different colors.
  • The overall ratings of the given review in terms of location and amenities were calculated as 4-stars and 3-stars respectively.

5 CONCLUSIONS

  • The authors presented a framework for rating online reviews against automatically extracted underlying topics.
  • The proposed framework consists of modules: (1) linguistic pre-processing, (2) topic modeling, (3) sentence classification against the topics extracted in the previous module, (4) sentiment analysis, (5) rating against the topics based on the sentiment of the corresponding sentences.
  • The proposed method is unsupervised, i.e. does not require an annotated training dataset.
  • It is also domain independent, and, therefore, can be applied across different domains for which online reviews are available.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

A Framework for Automated Rating of Online Reviews against the Underlying Topics
Xiangfeng Dai, Irena Spasić, Frédéric Andrès
ABSTRACT: Even though most online review systems offer star rating in addition to free text reviews, this only applies to
the overall review. However, different users may have different preferences in relation to different aspects of a product or a
service and may struggle to extract relevant information from a massive amount of consumer reviews available online. In this
paper, we present a framework for extracting prevalent topics from online reviews and automatically rating them on a 5-star
scale. It consists of five modules, including linguistic pre-processing, topic modelling, text classification, sentiment analysis,
and rating. Topic modelling is used to extract prevalent topics, which are then used to classify individual sentences against
these topics. A state-of-the-art word embedding method is used to measure the sentiment of each sentence. The two types of
information associated with each sentence its topic and sentiment are combined to aggregate the sentiment associated
with each topic. The overall topic sentiment is then projected onto the 5-star rating scale. We use a dataset of Airbnb online
reviews to demonstrate a proof of concept. The proposed framework is simple and fully unsupervised. It is also domain
independent, and, therefore, applicable to any other domains of products and services.
KEYWORDS: natural language processing, topic modelling, machine learning, visualization, sentiment analysis, latent
dirichlet allocation, weighted word embeddings, data mining, big data
1 INTRODUCTION
Online reviews are valuable sources of relevant information that can support users in their decision making. An estimated
92% of online shoppers read online reviews, 88% trust online reviews as much as personal recommendations and they
typically read more than 10 reviews to form an opinion [1]. The objective of this study is to propose a framework aimed at
improving user experience when faced with an otherwise unmanageable amount of online reviews. This is achieved by
automatically extracting the underlying topics (e.g. a review of a garment of clothing may contain different opinions about
different aspects of the product such as fit, fabric, color or pattern, craftsmanship, etc.) and rating reviews with respect to
these topics. The rating framework combines algorithms for topic modelling, text classification, and sentiment analysis.
Most approaches to rating of online reviews use supervised learning approaches. For instance, Hu and Liu [2] manually
annotated 2,006 positive words and 4,783 negative words to train classifiers used to analyze customer reviews. Similarly,
Ganu et al. [9] rated 52,264 restaurant reviews after manually annotating a set 3,400 sentences with category and sentiment
labels in order to train an SVM classifier. However, the process of manually annotating large training datasets is labour- and
time-intensive. Moreover, such approaches are not readily portable to other domains.
A variety of studies on rating online reviews have been published for a wide range of domains. Chevalier et al. [8]
predicted ratings of online book reviews. Dellarocas and Zhang [10] demonstrated a case of rating movie reviews. However,
these studies did not take into account the sentiment or semantic content of text reviews. Furthermore, the studies [9] [10]
[11] [19] [25] [26] [27] [28] [29] focused on predicting ratings in a specific domain such as: restaurants, tourism, movies,

ACMSE 2017, April 13-15, 2017, Kennesaw, GA, USA
Author et al.
2
hotels, healthcare, etc. These approaches are domain-dependent and cannot be easily implemented and transferred to other
products and services.
2 CHALLENGES
Large Volume: The large volume of online reviews creates significant information overload [2] [9] [10] [16] [17] [18]
[20] [21]. It is challenging to uncover underlying topics from a massive amount of online reviews and especially to rate
them against these topics.
Informality: Online reviews are informal documents in terms of style and structure [2] [3] [9] [12] [18] [21]. The
language used may contain abbreviations, slang, spelling mistakes, typographical errors, special characters, hyperlinks,
redundant whitespaces, etc.
Supervision: Sentiment analysis plays an important role in predicting ratings from text reviews [2] [9] [12] [18].
Supervised and semi-supervised classification methods require a large amount of manually annotated instances to train a
sentiment classifier. The manual annotation process is time- and labour-intensive.
Context-awareness: The vast majority of sentiment classification approaches rely on the bag-of-words model, which
disregards context, grammar and even word order [3]. Approaches that analyse the sentiment based on how words
compose the meaning of longer phrases have shown better result [31], but they incur an additional annotation overhead.
Domain independence: Any implementation should ideally be portable from one domain to another. In particular, the
performance should not depend significantly on lexical resources that need to be hand-crafted for a particular domain
[18] [19].
3 FRAMEWORK DESIGN AND METHODOLOGY
We present a framework for rating online reviews, which extracts the underlying topics automatically and rates each review
against these topics. The framework consists of five modules, including linguistic pre-processing, topic modeling, text
classification, sentiment analysis and rating (Fig. 1). The following subsections provide details about each module.
Figure 1: Framework for Rating Online Reviews

3
3.1 Linguistic Pre-processing
We have previously discussed the challenges associated with automated analysis of online reviews, including the lack of
formal structure and informal style of writing. To prepare the raw text for further analysis, including topic modelling and
sentiment analysis, we employed the following linguistic pre-processing steps [3] [12]:
Removing stop words.
Correcting spelling mistakes and typographical errors.
Converting slang and abbreviations to the corresponding words.
Stemming to aggregate words with related meaning.
Tokenization.
Removing punctuation, special characters, hyperlinks, etc.
3.2 Topic Modelling
In this module, we use Latent Dirichlet Allocation (LDA), an unsupervised probabilistic method that is widely used to
automatically discover underlying topics from a set of text documents based on word distribution [6][7]. To demonstrate the
approach, we used a publicly available dataset of Airbnb online reviews [13]. Each topic is represented as a collection of
words with Dirichlet distribution [7]. Each review may be associated with multiple topics. Table 1 shows three examples of
topics represented by 10 most relevant words within a topic. Intuitively, according to the given words, one may assume that
the topic T1 is related to amenities, whereas T2 and T3 are more about the location.
Table 1: Example of LDA topic modeling
ID
LDA Models
T1
0.033*bathroom + 0.027*floor + 0.023*unit + 0.021*door +
0.017*building + 0.016*fine + 0.016*location + 0.015*bedroom +
0.014*air + 0.014*people
T2
0.057*station + 0.055*walk + 0.048*arrival + 0.044*house +
0.043*day + 0.036*subway + 0.031*center + 0.026*city +
0.026*train + 0.023*neighborhood
T3
0.133*location + 0.036*distance + 0.035*joe + 0.026*walk +
0.022*check + 0.020*neighborhood + 0.018*city + 0.017*close +
0.016*airport + 0.015*convenient
The number of topics is an input parameter to the LDA method, which is related to their coverage and their
comprehensibility. In a series of experiments and manual inspection of the generated topics, we decided to restrict the
number of topics to 10 and the number of feature words to 3000 most frequent ones [30]. Some automatically extracted
topics might be similar (e.g., both T2 and T3 are related to the location aspect), and we aggregated such topics manually.
Overall, we arranged 10 topics into four themes: Location, Amenities, Family-friendliness and Other.
3.3 Topic Classification
Once the topic model has been generated, each sentence can be checked against the model to obtain information on topic
distribution, which can be used to classify the sentence into an appropriate topic [6][7] (see Table 2 for examples).

ACMSE 2017, April 13-15, 2017, Kennesaw, GA, USA
Author et al.
4
Table 2: Examples of Topic Classification
Sentence
Classification
This spot has a perfect location as it's nestled
in a quiet neighborhood yet only a 3 min walk
to the St Mary's Green Line C stop.
Location
Dirty grubby apartment with single AC in one
bedroom so rest of apartment was boiling,
broken TV Screen, kitchen smelt of Gas,
broken closet door, cracked tiles in bathroom
that smelt damp, filthy stairs carpet, and dirty
paint work.
Amenities
The kitchenette was perfect for medium
cooking as it has a stove, refrigerator and has
cleaning supplies to cleanup after!
Amenities
3.4 Sentiment Analysis
We operate under an assumption that the rating is correlated with the sentiment strength. To calculate the overall sentiment,
each sentence is analyzed separately using the weighted word embeddings method [3] [5]. The word embedding algorithm
can capture semantic relationships from the surrounding words and has the advantage of being unsupervised, i.e. not
requiring manual annotation of a large training dataset. Once all sentences have been analyzed, the sentiment associated with
each topic is aggregated across the relevant sentences.The following steps provide more detail about our sentiment analysis
approach.
Step 1: The sentiment score of each word represented by a vector is calculated based on the cosine similarity between its
vector of a word and the vectors of seed words of positive and negative sentiments as they are defined in [3].

󰇛
󰇜


 

 (1)

󰇛

󰇜
󰇛
󰇜



󰇛󰇜
where A and B are vectors of length n.
Step 2: Negation Handling Negation words and punctuation marks are used to determine the context affected by negation.
We predefined a list of negation words such as “no” or “not”. If a negation word appears within a predefined distance (e.g.
one token before and two tokens after the negation word), the sentiment polarity of words within the negated context is
inverted.
Step 3: Part-of-Speech Tagging Not every word is equally important for sentiment analysis, e.g. most sentiment words are
adjectives, adverbs, nouns and verbs [32]. To automatically identify such words, we use NLTK toolkit for parts-of-speech
tagging [22].
Step 4: Having calculated the sentiment of individual words as described in Step 1, the sentiment of a sentence is calculated
using the following formula:

5
 
󰇛
󰇜
󰇛󰇜

󰇛󰇜
where K is the total number words in the sentence, 
󰇛
󰇜
is the part-of-speech weight of the j
th
word and 󰇛󰇜 is the
sentiment score of the j
th
word.
Over 2M experiments using 256 combinations of part-of-speech weights were previously conducted [3], based on which 9
combinations were recommended for the weighted word embeddings for sentiment classification. In this study, we used the
weights 1, 3, 2, and 2 for nouns, verbs, adjectives, and adverbs respectively. Table 3 shows the examples of automatically
scored sentiment for the given sentences. The sentiment score indicates the polarity of the sentence: the first and third
sentences are positive, the second sentence is negative. The sentiment score also reflects the strength of the overall sentiment,
e.g. the first sentence and the third sentence are both positive, but the sentiment of the first sentence is stronger than that of
the third sentence.
Table 3: Examples of Automatically Computed Sentiment Score
Sentence
PSS
This spot has a perfect location as it's nestled in a quiet
neighborhood yet only a 3 min walk to the St Mary's
Green Line C stop.
3.46
Dirty grubby apartment with single AC in one bedroom
so rest of apartment was boiling, broken TV Screen,
kitchen smelt of Gas, broken closet door, cracked tiles in
bathroom that smelt damp, filthy stairs carpet, and dirty
paint work.
-0.58
The kitchenette was perfect for medium cooking as it
has a stove, refrigerator and has cleaning supplies to
cleanup after!
1.72
3.5 Topic Rating
Once the topic model has been extracted from a corpus of reviews, each sentence is classified into an appropriate topic. To
rate a whole review against the given topics, we used the sentiment of all sentences associated with each topic. To rate a
review from on a 5-star scale (1 star being very negative and 5 star being very positive), we first normalize the sentiment
score of each sentence as follows:

 





󰇛󰇜
where 

and 

are the minimum and maximum sentiment score in a text review, and PSSis the sentiment score of
the given sentence. The normalization effectively maps the sentiment of each sentence to a real number between 0 and 5.
For each topic in turn, we aggregate the normalized scores of all sentences within the topic to obtain the average score,
denoted as 

. We then map the average score to 5-star rating using the rules given in Table 4.

Citations
More filters
Journal ArticleDOI
TL;DR: This study provides a comprehensive overview of work on the accommodation sharing platform Airbnb, to the best of the auhtors’ knowledge, representing the first systematic literature review on Airbnb.
Abstract: A growing body of research from various domains has investigated Airbnb, a two-sided market platform for peer-based accommodation sharing. The authors suggest that it is due time to take a step back and assess the current state of affairs. This paper aims to conflate and synthesize research on Airbnb.,To facilitate research on Airbnb and its underlying principles in electronic commerce, the authors present a structured literature review on Airbnb.,The findings are based on 118 articles from the fields of tourism, information and management, law and economics between 2013 and 2018. Based on this broad basis, the authors find that: research on Airbnb is highly diverse in terms of domains, methods and scope; motives for using Airbnb are manifold (e.g. financial, social and environmental); trust and reputation are considered crucial by almost all scholars; the platform’s variety is reflected in prices; and the majority of work is based on surveys and empirical data while experiments are scarce.,Based on the present assessment of studied topics, domains, methods and combinations thereof, the authors suggest that research should move toward building atop of a common ground of data structures and vocabulary, and that attention should focus on the identified gaps and hitherto scarcely used combinations. The set of under-represented areas includes cross-cultural investigations, field experiments and audit studies, the consideration of dynamic processes (e.g. based on panel data), Airbnb’s “experiences” and automated pricing algorithms and the rating distribution’s skewness.,This study provides a comprehensive overview of work on the accommodation sharing platform Airbnb, to the best of the auhtors’ knowledge, representing the first systematic literature review. The authors hope that researchers and practitioners alike will find this review useful as a reference for future research on Airbnb and as a guide for the development of innovative applications based on the platform’s peculiarities and paradigms in electronic commerce practice. From a practical perspective, the general tenor suggests that hotel and tourism operators may benefit from: focusing on their core advantages over Airbnb and differentiating features and aligning their marketing communication with their users’ aspirations.

133 citations

Journal ArticleDOI
TL;DR: In this paper, the authors quantified own price, cross price, and income elasticities of Airbnb demand to New York City within an empirical tourism demand framework using spatial panel data comprising a cross section of 1,461 continuously active Airbnb listings obtained from AirDNA and time series data from NYC and Company and the OECD covering the time period September 2014 to June 2016.

43 citations

Book ChapterDOI
06 Feb 2021
TL;DR: This paper used natural language processing to evaluate feedback from the language processing community and found that when such online reviews are too broad and/or extremely detailed, both buyers and sellers benefit from a mechanism that quickly extracts key insights from them.
Abstract: Artificial Intelligence is about how computers and humans communicate, and how we interact with language abbreviations. The ultimate purpose of the NLP is to connect in a way people can understand and reciprocate. Social networking messages have become a key source of consumer education. Sellers take online feedback to know if a potential buyer is a big part of their market. However, when such online reviews are too broad and/or extremely detailed, both buyers and sellers benefit from a mechanism that quickly extracts key insights from them. In this research paper, we used natural language processing to evaluate feedback from the language-processing community. Other data are included in the assessment of our peers.

4 citations

Book ChapterDOI
02 Jul 2021
TL;DR: In this paper, different multi-class classification methods were applied to assign automatic ratings for consumer reviews based on a 5-star rating scale, where the original review ratings were inconsistent with the content.
Abstract: Consumer reviews show inconsistent ratings when compared to their contents as a result of sarcastic feedback. Consequently, they cannot provide valuable feedback to improve products and services of the firms. One possible solution is to utilize consumer review contents to identify the true ratings. In this work, different multi-class classification methods were applied to assign automatic ratings for consumer reviews based on a 5-star rating scale, where the original review ratings were inconsistent with the content. Two term weighting schemes (i.e. tf-idf and tf-igm) and five supervised machine learning algorithms (i.e. k-NN, MNB, RF, XGBoost and SVM) were compared. The dataset was downloaded from the Amazon website, and language experts helped to correct the real rating for each consumer review. After verifying the effectiveness of the proposed methods, the multi-class classifier model developed by SVM along with tf-igm returned the best results for automatic ratings of consumer reviews, with average improved scores of accuracies and F1 over the other methods at 11.7% and 10.5%, respectively.

3 citations

Proceedings ArticleDOI
15 Jul 2019
TL;DR: A lexicon-based sentiment analysis algorithm that uses a unified approach for determining the sentiment of comments written in both languages and incorporates techniques that exploit the distinctive features of the language used in microblogs in order to accurately predict the sentiment expressed in microblog comments.
Abstract: Social media and microblogs have become an integral part of everyday life. People use microblogs to communicate with each other, express their opinion about a wide range of topics and inform themselves about issues they are interested in. The increasing volume of information generated in microblogs has led to the need of automatically determining the sentiment expressed in microblog comments. Researchers have worked in systematically analyzing microblog comments in order to identify the sentiment expressed in them. Most work in sentiment analysis of microblog comments has been focused on comments written in the English language, whereas fewer efforts have been made in predicting the sentiment of Greek microblog comments. In this paper, we propose a lexicon-based sentiment analysis algorithm for the sentiment classification of both Greek and English microblog comments. The proposed method uses a unified approach for determining the sentiment of comments written in both languages and incorporates techniques that exploit the distinctive features of the language used in microblogs in order to accurately predict the sentiment expressed in microblog comments. Our approach achieves promising results for the sentiment classification of microblog comments into positive, negative or neutral.

1 citations


Cites methods from "A Framework for Automated Rating of..."

  • ...methods is the use of linguistic processing such as modifiers detection [16], [17], negation handling [13], [16], [17], [19] and part-of-speech tagging [19]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations

Posted Content
TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

20,077 citations

Proceedings ArticleDOI
22 Aug 2004
TL;DR: This research aims to mine and to summarize all the customer reviews of a product, and proposes several novel techniques to perform these tasks.
Abstract: Merchants selling products on the Web often ask their customers to review the products that they have purchased and the associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make an informed decision on whether to purchase the product. It also makes it difficult for the manufacturer of the product to keep track and to manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the same product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.

7,330 citations


"A Framework for Automated Rating of..." refers background in this paper

  • ...For instance, Hu and Liu [2] manually annotated 2,006 positive words and 4,783 negative words to train classifiers used to analyze customer reviews....

    [...]

  • ... Supervision: Sentiment analysis plays an important role in predicting ratings from text reviews [2] [9] [12] [18]....

    [...]

  • ... Informality: Online reviews are informal documents in terms of style and structure [2] [3] [9] [12] [18] [21]....

    [...]

  • ... Large Volume: The large volume of online reviews creates significant information overload [2] [9] [10] [16] [17] [18] [20] [21]....

    [...]

Proceedings Article
01 Oct 2013
TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
Abstract: Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a Sentiment Treebank. It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality. To address them, we introduce the Recursive Neural Tensor Network. When trained on the new treebank, this model outperforms all previous methods on several metrics. It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines. Lastly, it is the only model that can accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases.

6,792 citations

Frequently Asked Questions (2)
Q1. What have the authors contributed in "A framework for automated rating of online reviews against the underlying topics" ?

Even though most online review systems offer star rating in addition to free text reviews, this only applies to the overall review. In this paper, the authors present a framework for extracting prevalent topics from online reviews and automatically rating them on a 5-star scale. The overall topic sentiment is then projected onto the 5-star rating scale. The authors use a dataset of Airbnb online reviews to demonstrate a proof of concept. 

As part of future work, the authors will formally evaluate the effectiveness of this method on a variety of domains and datasets.