scispace - formally typeset
Search or ask a question

Showing papers on "Sentiment analysis published in 2009"


Journal IssueDOI
TL;DR: It is found that microblogting is an online tool for customer word of mouth communications and the implications for corporations using microblogging as part of their overall marketing strategy are discussed.
Abstract: In this paper we report research results investigating microblogging as a form of electronic word-of-mouth for sharing consumer opinions concerning brands. We analyzed more than 150,000 microblog postings containing branding comments, sentiments, and opinions. We investigated the overall structure of these microblog postings, the types of expressions, and the movement in positive or negative sentiment. We compared automated methods of classifying sentiment in these microblogs with manual coding. Using a case study approach, we analyzed the range, frequency, timing, and content of tweets in a corporate account. Our research findings show that 19p of microblogs contain mention of a brand. Of the branding microblogs, nearly 20p contained some expression of brand sentiments. Of these, more than 50p were positive and 33p were critical of the company or product. Our comparison of automated and manual coding showed no significant differences between the two approaches. In analyzing microblogs for structure and composition, the linguistic structure of tweets approximate the linguistic patterns of natural language expressions. We find that microblogging is an online tool for customer word of mouth communications and discuss the implications for corporations using microblogging as part of their overall marketing strategy. © 2009 Wiley Periodicals, Inc.

1,753 citations


Proceedings ArticleDOI
02 Nov 2009
TL;DR: A novel probabilistic modeling framework based on Latent Dirichlet Allocation (LDA) is proposed, called joint sentiment/topic model (JST), which detects sentiment and topic simultaneously from text, which is fully unsupervised.
Abstract: Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet Allocation (LDA), called joint sentiment/topic model (JST), which detects sentiment and topic simultaneously from text. Unlike other machine learning approaches to sentiment classification which often require labeled corpora for classifier training, the proposed JST model is fully unsupervised. The model has been evaluated on the movie review dataset to classify the review sentiment polarity and minimum prior information have also been explored to further improve the sentiment classification accuracy. Preliminary experiments have shown promising results achieved by JST.

983 citations


Posted Content
TL;DR: This article performed a sentiment analysis of all public tweets broadcasted by Twitter users between August 1 and December 20, 2008 and found that events in the social, political, cultural and economic sphere do have a significant, immediate and highly specific effect on the various dimensions of public mood.
Abstract: Microblogging is a form of online communication by which users broadcast brief text updates, also known as tweets, to the public or a selected circle of contacts. A variegated mosaic of microblogging uses has emerged since the launch of Twitter in 2006: daily chatter, conversation, information sharing, and news commentary, among others. Regardless of their content and intended use, tweets often convey pertinent information about their author's mood status. As such, tweets can be regarded as temporally-authentic microscopic instantiations of public mood state. In this article, we perform a sentiment analysis of all public tweets broadcasted by Twitter users between August 1 and December 20, 2008. For every day in the timeline, we extract six dimensions of mood (tension, depression, anger, vigor, fatigue, confusion) using an extended version of the Profile of Mood States (POMS), a well-established psychometric instrument. We compare our results to fluctuations recorded by stock market and crude oil price indices and major events in media and popular culture, such as the U.S. Presidential Election of November 4, 2008 and Thanksgiving Day. We find that events in the social, political, cultural and economic sphere do have a significant, immediate and highly specific effect on the various dimensions of public mood. We speculate that large scale analyses of mood can provide a solid platform to model collective emotive trends in terms of their predictive value with regards to existing social as well as economic indicators.

939 citations


Journal ArticleDOI
TL;DR: This paper combines rule-based classification, supervised learning and machine learning into a new combined method, and proposes a semi-automatic, complementary approach in which each classifier can contribute to other classifiers to achieve a good level of effectiveness.

700 citations


Journal ArticleDOI
TL;DR: The goal of this work is to automatically distinguish between prior and contextual polarity, with a focus on understanding which features are important for this task, and it is shown that the presence of neutral instances greatly degrades the performance of features for distinguishing between positive and negative polarity.
Abstract: Many approaches to automatic sentiment analysis begin with a large lexicon of words marked with their prior polarity (also called semantic orientation). However, the contextual polarity of the phrase in which a particular instance of a word appears may be quite different from the word's prior polarity. Positive words are used in phrases expressing negative sentiments, or vice versa. Also, quite often words that are positive or negative out of context are neutral in context, meaning they are not even being used to express a sentiment. The goal of this work is to automatically distinguish between prior and contextual polarity, with a focus on understanding which features are important for this task. Because an important aspect of the problem is identifying when polar terms are being used in neutral contexts, features for distinguishing between neutral and polar instances are evaluated, as well as features for distinguishing between positive and negative contextual polarity. The evaluation includes assessing the performance of features across multiple machine learning algorithms. For all learning algorithms except one, the combination of all features together gives the best performance. Another facet of the evaluation considers how the presence of neutral instances affects the performance of features for distinguishing between positive and negative polarity. These experiments show that the presence of neutral instances greatly degrades the performance of these features, and that perhaps the best way to improve performance across all polarity classes is to improve the system's ability to identify when an instance is neutral.

677 citations


Proceedings ArticleDOI
28 Jun 2009
TL;DR: This paper presents a unified framework in which one can use background lexical information in terms of word-class associations, and refine this information for specific domains using any available training examples, and shows that this approach performs better than using background knowledge or training data in isolation.
Abstract: The explosion of user-generated content on the Web has led to new opportunities and significant challenges for companies, that are increasingly concerned about monitoring the discussion around their products. Tracking such discussion on weblogs, provides useful insight on how to improve products or market them more effectively. An important component of such analysis is to characterize the sentiment expressed in blogs about specific brands and products. Sentiment Analysis focuses on this task of automatically identifying whether a piece of text expresses a positive or negative opinion about the subject matter. Most previous work in this area uses prior lexical knowledge in terms of the sentiment-polarity of words. In contrast, some recent approaches treat the task as a text classification problem, where they learn to classify sentiment based only on labeled training data. In this paper, we present a unified framework in which one can use background lexical information in terms of word-class associations, and refine this information for specific domains using any available training examples. Empirical results on diverse domains show that our approach performs better than using background knowledge or training data in isolation, as well as alternative approaches to using lexical knowledge with text classification.

511 citations


Proceedings ArticleDOI
Xiaojun Wan1
02 Aug 2009
TL;DR: A cotraining approach is proposed to making use of unlabeled Chinese data for cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data.
Abstract: The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification. However, there are many freely available English sentiment corpora on the Web. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data. Machine translation services are used for eliminating the language gap between the training set and test set, and English features and Chinese features are considered as two independent views of the classification problem. We propose a cotraining approach to making use of unlabeled Chinese data. Experimental results show the effectiveness of the proposed approach, which can outperform the standard inductive classifiers and the transductive classifiers.

499 citations


Journal ArticleDOI
TL;DR: This survey discusses related issues and main approaches to these problems, namely, subjectivity classification, word sentiment classification, document sentiment classification and opinion extraction.
Abstract: The sentiment detection of texts has been witnessed a booming interest in recent years, due to the increased availability of online reviews in digital form and the ensuing need to organize them Till to now, there are mainly four different problems predominating in this research community, namely, subjectivity classification, word sentiment classification, document sentiment classification and opinion extraction In fact, there are inherent relations between them Subjectivity classification can prevent the sentiment classifier from considering irrelevant or even potentially misleading text Document sentiment classification and opinion extraction have often involved word sentiment classification techniques This survey discusses related issues and main approaches to these problems

447 citations


Journal ArticleDOI
TL;DR: This paper presents machine learning experiments with regard to sentiment analysis in blog, review and forum texts found on the World Wide Web and written in English, Dutch and French and investigates the role of active learning techniques for reducing the number of examples to be manually annotated.
Abstract: Sentiment analysis, also called opinion mining, is a form of information extraction from text of growing research and commercial interest. In this paper we present our machine learning experiments with regard to sentiment analysis in blog, review and forum texts found on the World Wide Web and written in English, Dutch and French. We train from a set of example sentences or statements that are manually annotated as positive, negative or neutral with regard to a certain entity. We are interested in the feelings that people express with regard to certain consumption products. We learn and evaluate several classification models that can be configured in a cascaded pipeline. We have to deal with several problems, being the noisy character of the input texts, the attribution of the sentiment to a particular entity and the small size of the training set. We succeed to identify positive, negative and neutral feelings to the entity under consideration with ca. 83% accuracy for English texts based on unigram features augmented with linguistic features. The accuracy results of processing the Dutch and French texts are ca. 70 and 68% respectively due to the larger variety of the linguistic expressions that more often diverge from standard language, thus demanding more training patterns. In addition, our experiments give us insights into the portability of the learned models across domains and languages. A substantial part of the article investigates the role of active learning techniques for reducing the number of examples to be manually annotated.

418 citations


Journal ArticleDOI
TL;DR: In this paper, the authors explore the use of web logs or blogs to improve market intelligence and market research for private and public tourism organizations and facilitate timely consumer decision-making and explore the development of user generated content.
Abstract: As the World Wide Web has developed considerable bargaining power has been transferred from suppliers to consumers; there is a real need to improve market intelligence and market research for private and public tourism organisations and facilitate timely consumer decision making. This article explores the development of user generated content and specifically the use of web logs or blogs. Tourism organisations cannot afford to ignore the development of user generated content, peer-to-peer web applications and virtual communities. A recent survey found that consumers trusted more websites with reviews than professional guides and travel agencies and far from being an irrelevance, blogs are often perceived to be more credible and trustworthy than traditional marketing communications. But there is a problem: given the sheer number of possibly relevant travel blogs there is a need to locate, extract and interpret blog content and this has proven so far to be time consuming, exhausting and costly, thus negating the relative value of the information obtained. A way forward may be the use of artificial intelligence and “opinion mining” or a blog visualisation system.

401 citations


Proceedings Article
11 Jul 2009
TL;DR: A novel propagation approach is proposed that exploits the relations between sentiment words and topics or product features that the sentiment words modify, and also sentiment Words and product features themselves to extract new sentiment words.
Abstract: In most sentiment analysis applications, the sentiment lexicon plays a key role. However, it is hard, if not impossible, to collect and maintain a universal sentiment lexicon for all application domains because different words may be used in different domains. The main existing technique extracts such sentiment words from a large domain corpus based on different conjunctions and the idea of sentiment coherency in a sentence. In this paper, we propose a novel propagation approach that exploits the relations between sentiment words and topics or product features that the sentiment words modify, and also sentiment words and product features themselves to extract new sentiment words. As the method propagates information through both sentiment words and features, we call it double propagation. The extraction rules are designed based on relations described in dependency trees. A new method is also proposed to assign polarities to newly discovered sentiment words in a domain. Experimental results show that our approach is able to extract a large number of new sentiment words. The polarity assignment method is also effective.

Proceedings ArticleDOI
17 May 2009
TL;DR: Delta TFIDF is presented, an intuitive general purpose technique to efficiently weight word scores before classification to significantly improves accuracy for sentiment analysis problems using three well known data sets.
Abstract: Mining opinions and sentiment from social networking sites is a popular application for social media systems Common approaches use a machine learning system with a bag of words feature set We present Delta TFIDF, an intuitive general purpose technique to efficiently weight word scores before classification Delta TFIDF is easy to compute, implement, and understand We use Support Vector Machines to show that Delta TFIDF significantly improves accuracy for sentiment analysis problems using three well known data sets

Proceedings ArticleDOI
28 Jun 2009
TL;DR: The OpinionMiner system designed in this work aims to mine customer reviews of a product and extract high detailed product entities on which reviewers express their opinions.
Abstract: Merchants selling products on the Web often ask their customers to share their opinions and hands-on experiences on products they have purchased. Unfortunately, reading through all customer reviews is difficult, especially for popular items, the number of reviews can be up to hundreds or even thousands. This makes it difficult for a potential customer to read them to make an informed decision. The OpinionMiner system designed in this work aims to mine customer reviews of a product and extract high detailed product entities on which reviewers express their opinions. Opinion expressions are identified and opinion orientations for each recognized product entity are classified as positive or negative. Different from previous approaches that employed rule-based or statistical techniques, we propose a novel machine learning approach built under the framework of lexicalized HMMs. The approach naturally integrates multiple important linguistic features into automatic learning. In this paper, we describe the architecture and main components of the system. The evaluation of the proposed method is presented based on processing the online product reviews from Amazon and other publicly available datasets.

DOI
01 Jan 2009
TL;DR: This research presents the results of applying the SentiWordNet lexical resource to the problem of automatic sentiment classification of film reviews, and finds that results obtained are in line with similar approaches using manual lexicons seen in the literature.
Abstract: Sentiment classification concerns the use of automatic methods for predicting the orientation of subjective content on text documents, with applications on a number of areas including recommender and advertising systems, customer intelligence and information retrieval. SentiWordNet is an opinion lexicon derived from the WordNet database where each term is associated with numerical scores indicating positive and negative sentiment information. This research presents the results of applying the SentiWordNet lexical resource to the problem of automatic sentiment classification of film reviews. Our approach comprises counting positive and negative term scores to determine sentiment orientation, and an improvement is presented by building a data set of relevant features using SentiWordNet as source, and applied to a machine learning classifier. We find that results obtained with SentiWordNet are in line with similar approaches using manual lexicons seen in the literature. In addition, our feature set approach yielded improvements over the baseline term counting method. The results indicate SentiWordNet could be used as an important resource for sentiment classification tasks. Additional considerations are made on possible further improvements to the method and its use in conjunction with other techniques.

Proceedings ArticleDOI
14 Jun 2009
TL;DR: The authors of as discussed by the authors violated ACM's publication policy on simultaneous submissions, and therefore ACM has shut off access to this paper, which is a violation of their publication policy.
Abstract: NOTE FROM ACM: A Joint ACM Conference Committee has determined that the authors of this article violated ACM's publication policy on simultaneous submissions. Therefore ACM has shut off access to this paper.

Proceedings ArticleDOI
06 Aug 2009
TL;DR: This work proposes a simple approach to generate a high-coverage semantic orientation lexicon, which includes both individual words and multi-word expressions, using only a Roget-like thesaurus and a handful of affixes and has properties that support the Polyanna Hypothesis.
Abstract: Sentiment analysis often relies on a semantic orientation lexicon of positive and negative words. A number of approaches have been proposed for creating such lexicons, but they tend to be computationally expensive, and usually rely on significant manual annotation and large corpora. Most of these methods use WordNet. In contrast, we propose a simple approach to generate a high-coverage semantic orientation lexicon, which includes both individual words and multi-word expressions, using only a Roget-like thesaurus and a handful of affixes. Further, the lexicon has properties that support the Polyanna Hypothesis. Using the General Inquirer as gold standard, we show that our lexicon has 14 percentage points more correct entries than the leading WordNet-based high-coverage lexicon (SentiWordNet). In an extrinsic evaluation, we obtain significantly higher performance in determining phrase polarity using our thesaurus-based lexicon than with any other. Additionally, we explore the use of visualization techniques to gain insight into the our algorithm beyond the evaluations mentioned above.

Proceedings ArticleDOI
06 Aug 2009
TL;DR: This paper first presents a linguistic analysis of conditional sentences, and then builds some supervised learning models to determine if sentiments expressed on different topics in a conditional sentence are positive, negative or neutral.
Abstract: This paper studies sentiment analysis of conditional sentences. The aim is to determine whether opinions expressed on different topics in a conditional sentence are positive, negative or neutral. Conditional sentences are one of the commonly used language constructs in text. In a typical document, there are around 8% of such sentences. Due to the condition clause, sentiments expressed in a conditional sentence can be hard to determine. For example, in the sentence, if your Nokia phone is not good, buy this great Samsung phone, the author is positive about "Samsung phone" but does not express an opinion on "Nokia phone" (although the owner of the "Nokia phone" may be negative about it). However, if the sentence does not have "if', the first clause is clearly negative. Although "if' commonly signifies a conditional sentence, there are many other words and constructs that can express conditions. This paper first presents a linguistic analysis of such sentences, and then builds some supervised learning models to determine if sentiments expressed on different topics in a conditional sentence are positive, negative or neutral. Experimental results on conditional sentences from 5 diverse domains are given to demonstrate the effectiveness of the proposed approach.

Patent
09 Apr 2009
TL;DR: In this paper, the sentimental significance of a group of historical documents related to a topic is assessed with respect to change in an extrinsic metric for the topic and a unique sentiment binding label is included to the content of actions documents that are determined to have sentimental significance.
Abstract: The sentimental significance of a group of historical documents related to a topic is assessed with respect to change in an extrinsic metric for the topic. A unique sentiment binding label is included to the content of actions documents that are determined to have sentimental significance and the group of documents is inserted into a historical document sentiment vector space for the topic. Action areas in the vector space are defined from the locations of action documents and singular sentiment vector may be created that describes the cumulative action area. Newly published documents are sentiment-scored by semantically comparing them to documents in the space and/or to the singular sentiment vector. The sentiment scores for the newly published documents are supplemented by human sentiment assessment of the documents and a sentiment time decay factor is applied to the supplemented sentiment score of each newly published documents. User queries are received and a set of sentiment-ranked documents is returned with the highest age-adjusted sentiment scores.

Patent
09 Jun 2009
TL;DR: The authors used natural language processing to determine the sentiment expressed in answers to survey questions and presented the information as actionable data, which can be used to analyze the sentiment of survey respondents.
Abstract: In one aspect, the invention provides apparatuses and methods for determining the sentiment expressed in answers to survey questions. Advantageously, the sentiment may be automatically determined using natural language processing. In another aspect, the invention provides apparatuses and methods for analyzing the sentiment of survey respondents and presenting the information as actionable data.

Proceedings ArticleDOI
02 Nov 2009
TL;DR: Experimental results show that the identification of the scope of negation improves both the accuracy of sentiment analysis and the retrieval effectiveness of opinion retrieval.
Abstract: We investigate the problem of determining the polarity of sentiments when one or more occurrences of a negation term such as "not" appear in a sentence. The concept of the scope of a negation term is introduced. By using a parse tree and typed dependencies generated by a parser and special rules proposed by us, we provide a procedure to identify the scope of each negation term. Experimental results show that the identification of the scope of negation improves both the accuracy of sentiment analysis and the retrieval effectiveness of opinion retrieval.

Book ChapterDOI
18 Apr 2009
TL;DR: The experimental results indicate that proposed approach could improve the performance of base classifier dramatically, and even provide much better performance than the transfer-learning baseline, i.e. the Naive Bayes Transfer Classifier (NTBC).
Abstract: In the community of sentiment analysis, supervised learning techniques have been shown to perform very well. When transferred to another domain, however, a supervised sentiment classifier often performs extremely bad. This is so-called domain-transfer problem. In this work, we attempt to attack this problem by making the maximum use of both the old-domain data and the unlabeled new-domain data. To leverage knowledge from the old-domain data, we proposed an effective measure, i.e., Frequently Co-occurring Entropy (FCE), to pick out generalizable features that occur frequently in both domains and have similar occurring probability. To gain knowledge from the new-domain data, we proposed Adapted Naive Bayes (ANB), a weighted transfer version of Naive Bayes Classifier. The experimental results indicate that proposed approach could improve the performance of base classifier dramatically, and even provide much better performance than the transfer-learning baseline, i.e. the Naive Bayes Transfer Classifier (NTBC).

Proceedings ArticleDOI
02 Aug 2009
TL;DR: A novel approach to learn from lexical prior knowledge in the form of domain-independent sentiment-laden terms, in conjunction with domain-dependent unlabeled data and a few labeled documents is proposed.
Abstract: Sentiment classification refers to the task of automatically identifying whether a given piece of text expresses positive or negative opinion towards a subject at hand. The proliferation of user-generated web content such as blogs, discussion forums and online review sites has made it possible to perform large-scale mining of public opinion. Sentiment modeling is thus becoming a critical component of market intelligence and social media technologies that aim to tap into the collective wisdom of crowds. In this paper, we consider the problem of learning high-quality sentiment models with minimal manual supervision. We propose a novel approach to learn from lexical prior knowledge in the form of domain-independent sentiment-laden terms, in conjunction with domain-dependent unlabeled data and a few labeled documents. Our model is based on a constrained non-negative tri-factorization of the term-document matrix which can be implemented using simple update rules. Extensive experimental studies demonstrate the effectiveness of our approach on a variety of real-world sentiment prediction tasks.

Proceedings ArticleDOI
06 Aug 2009
TL;DR: A novel method based on integer linear programming that can adapt an existing lexicon into a new one to reflect the characteristics of the data more directly, and Experimental results show that the lexicon adaptation technique improves the performance of fine-grained polarity classification.
Abstract: Polarity lexicons have been a valuable resource for sentiment analysis and opinion mining. There are a number of such lexical resources available, but it is often suboptimal to use them as is, because general purpose lexical resources do not reflect domain-specific lexical usage. In this paper, we propose a novel method based on integer linear programming that can adapt an existing lexicon into a new one to reflect the characteristics of the data more directly. In particular, our method collectively considers the relations among words and opinion expressions to derive the most likely polarity of each lexical item (positive, neutral, negative, or negator) for the given domain. Experimental results show that our lexicon adaptation technique improves the performance of fine-grained polarity classification.

Journal IssueDOI
TL;DR: A rule-based approach including two phases: determining each sentence's sentiment based on word dependency, and aggregating sentences to predict the document sentiment is proposed to address the unique challenges posed by Chinese sentiment analysis.
Abstract: User-generated content on the Web has become an extremely valuable source for mining and analyzing user opinions on any topic. Recent years have seen an increasing body of work investigating methods to recognize favorable and unfavorable sentiments toward specific subjects from online text. However, most of these efforts focus on English and there have been very few studies on sentiment analysis of Chinese content. This paper aims to address the unique challenges posed by Chinese sentiment analysis. We propose a rule-based approach including two phases: (1) determining each sentence's sentiment based on word dependency, and (2) aggregating sentences to predict the document sentiment. We report the results of an experimental study comparing our approach with three machine learning-based approaches using two sets of Chinese articles. These results illustrate the effectiveness of our proposed method and its advantages against learning-based approaches. © 2009 Wiley Periodicals, Inc.

Proceedings Article
01 Sep 2009
TL;DR: The results indicate that, although languageindependent methods provide a decent baseline performance, there is also a significant cost to automation, and thus the best path to long-term improvement is through the inclusion of language-specific knowledge and resources.
Abstract: We explore the adaptation of English resources and techniques for text sentiment analysis to a new language, Spanish. Our main focus is the modification of an existing English semantic orientation calculator and the building of dictionaries; however we also compare alternate approaches, including machine translation and Support Vector Machine classification. The results indicate that, although languageindependent methods provide a decent baseline performance, there is also a significant cost to automation, and thus the best path to long-term improvement is through the inclusion of language-specific knowledge and resources.

Proceedings ArticleDOI
20 Jul 2009
TL;DR: A large dataset was collected from a group within YouTube that was identified as potentially having a radicalising agenda, and gender differences in this group of users are focused on, suggesting most extreme and less tolerant views among female users.
Abstract: The increased online presence of jihadists has raised the possibility of individuals being radicalised via the Internet. To date, the study of violent radicalisation has focused on dedicated jihadist websites and forums. This may not be the ideal starting point for such research, as participants in these venues may be described as "already made-up minds". Crawling a global social networking platform, such as YouTube, on the other hand, has the potential to unearth content and interaction aimed at radicalisation of those with little or no apparent prior interest in violent jihadism. This research explores whether such an approach is indeed fruitful. We collected a large dataset from a group within YouTube that we identified as potentially having a radicalising agenda. We analysed this data using social network analysis and sentiment analysis tools, examining the topics discussed and what the sentiment polarity (positive or negative) is towards these topics. In particular, we focus on gender differences in this group of users, suggesting most extreme and less tolerant views among female users.

Journal ArticleDOI
TL;DR: This work proposes an automatic summarization approach based on the analysis of review articles' internal topic structure to assemble customer concerns and shows that the proposed approach outperforms the peer approaches, i.e. opinion mining and clustering-summarization, in terms of users' responsiveness and its ability to discover the most important topics.
Abstract: Product reviews possess critical information regarding customers' concerns and their experience with the product. Such information is considered essential to firms' business intelligence which can be utilized for the purpose of conceptual design, personalization, product recommendation, better customer understanding, and finally attract more loyal customers. Previous studies of deriving useful information from customer reviews focused mainly on numerical and categorical data. Textual data have been somewhat ignored although they are deemed valuable. Existing methods of opinion mining in processing customer reviews concentrates on counting positive and negative comments of review writers, which is not enough to cover all important topics and concerns across different review articles. Instead, we propose an automatic summarization approach based on the analysis of review articles' internal topic structure to assemble customer concerns. Different from the existing summarization approaches centered on sentence ranking and clustering, our approach discovers and extracts salient topics from a set of online reviews and further ranks these topics. The final summary is then generated based on the ranked topics. The experimental study and evaluation show that the proposed approach outperforms the peer approaches, i.e. opinion mining and clustering-summarization, in terms of users' responsiveness and its ability to discover the most important topics.

Proceedings ArticleDOI
31 May 2009
TL;DR: A strong predictive connection between linguistically well motivated features and implicit sentiment is established, and it is shown how computational approximations of these features can be used to improve on existing state-of-the-art sentiment classification results.
Abstract: Work on sentiment analysis often focuses on the words and phrases that people use in overtly opinionated text. In this paper, we introduce a new approach to the problem that focuses not on lexical indicators, but on the syntactic "packaging" of ideas, which is well suited to investigating the identification of implicit sentiment, or perspective. We establish a strong predictive connection between linguistically well motivated features and implicit sentiment, and then show how computational approximations of these features can be used to improve on existing state-of-the-art sentiment classification results.

Book ChapterDOI
18 Apr 2009
TL;DR: In this paper, the authors focus on multi-facetive review rating, i.e., on the case in which the review of a product (eg, a hotel) must be rated several times, according to several aspects of the product (for a hotel: cleanliness, centrality of location, etc) and explore the vectorial representations of the text by means of POS tagging, sentiment analysis, and feature selection for ordinal regression learning.
Abstract: Online product reviews are becoming increasingly available, and are being used more and more frequently by consumers in order to choose among competing products Tools that rank competing products in terms of the satisfaction of consumers that have purchased the product before, are thus also becoming popular We tackle the problem of rating (ie, attributing a numerical score of satisfaction to) consumer reviews based on their textual content We here focus on multi-facet review rating, ie, on the case in which the review of a product (eg, a hotel) must be rated several times, according to several aspects of the product (for a hotel: cleanliness, centrality of location, etc) We explore several aspects of the problem, with special emphasis on how to generate vectorial representations of the text by means of POS tagging, sentiment analysis, and feature selection for ordinal regression learning We present the results of experiments conducted on a dataset of more than 15,000 reviews that we have crawled from a popular hotel review site

Proceedings ArticleDOI
Honglei Guo1, Huijia Zhu1, Zhili Guo1, Xiaoxun Zhang1, Zhong Su1 
02 Nov 2009
TL;DR: This paper proposes an unsupervised product-feature categorization method with multilevel latent semantic association that achieves better performance compared with the existing approaches and is language- and domain-independent.
Abstract: In recent years, the number of freely available online reviews is increasing at a high speed. Aspect-based opinion mining technique has been employed to find out reviewers' opinions toward different product aspects. Such finer-grained opinion mining is valuable for the potential customers to make their purchase decisions. Product-feature extraction and categorization is very important for better mining aspect-oriented opinions. Since people usually use different words to describe the same aspect in the reviews, product-feature extraction and categorization becomes more challenging. Manually product-feature extraction and categorization is tedious and time consuming, and practically infeasible for the massive amount of products. In this paper, we propose an unsupervised product-feature categorization method with multilevel latent semantic association. After extracting product-features from the semi-structured reviews, we construct the first latent semantic association (LaSA) model to group words into a set of concepts according to their virtual context documents. It generates the latent semantic structure for each product-feature. The second LaSA model is constructed to categorize the product-features according to their latent semantic structures and context snippets in the reviews. Experimental results demonstrate that our method achieves better performance compared with the existing approaches. Moreover, the proposed method is language- and domain-independent.