TL;DR: It is found that emotionally charged Twitter messages tend to be retweeted more often and more quickly compared to neutral ones, and companies should pay more attention to the analysis of sentiment related to their brands and products in social media communication as well as in designing advertising content that triggers emotions.

...read moreread less

Abstract: As a new communication paradigm, social media has promoted information dissemination in social networks. Previous research has identified several content-related features as well as user and network characteristics that may drive information diffusion. However, little research has focused on the relationship between emotions and information diffusion in a social media setting. In this paper, we examine whether sentiment occurring in social media content is associated with a user's information sharing behavior. We carry out our research in the context of political communication on Twitter. Based on two data sets of more than 165,000 tweets in total, we find that emotionally charged Twitter messages tend to be retweeted more often and more quickly compared to neutral ones. As a practical implication, companies should pay more attention to the analysis of sentiment related to their brands and products in social media communication as well as in designing advertising content that triggers emotions.

...read moreread less

1,146 citations

Journal Article•DOI•

New Avenues in Opinion Mining and Sentiment Analysis

[...]

Erik Cambria¹, Björn Schuller², Yunqing Xia³, Catherine Havasi⁴•Institutions (4)

National University of Singapore¹, Ludwig Maximilian University of Munich², Tsinghua University³, Massachusetts Institute of Technology⁴

01 Mar 2013-IEEE Intelligent Systems

TL;DR: The history, current use, and future of opinion mining and sentiment analysis are discussed, along with relevant techniques and tools.

...read moreread less

Abstract: The Web holds valuable, vast, and unstructured information about public opinion. Here, the history, current use, and future of opinion mining and sentiment analysis are discussed, along with relevant techniques and tools.

...read moreread less

1,042 citations

Proceedings Article•

NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets

[...]

Saif M. Mohammad¹, Svetlana Kiritchenko¹, Xiaodan Zhu¹•Institutions (1)

National Research Council¹

28 Aug 2013

TL;DR: In this paper, two state-of-the-art SVM classifiers, one to detect the sentiment of messages such as tweets and SMS (message-level task) and another to detect sentiment of a term within a message (termlevel task), were presented.

...read moreread less

Abstract: In this paper, we describe how we created two state-of-the-art SVM classifiers, one to detect the sentiment of messages such as tweets and SMS (message-level task) and one to detect the sentiment of a term within a message (term-level task). Among submissions from 44 teams in a competition, our submissions stood first in both tasks on tweets, obtaining an F-score of 69.02 in the message-level task and 88.93 in the term-level task. We implemented a variety of surface-form, semantic, and sentiment features. We also generated two large word‐sentiment association lexicons, one from tweets with sentiment-word hashtags, and one from tweets with emoticons. In the message-level task, the lexicon-based features provided a gain of 5 F-score points over all others. Both of our systems can be replicated using freely available resources. 1

...read moreread less

854 citations

Proceedings Article•DOI•

Large-scale visual sentiment ontology and detectors using adjective noun pairs

[...]

Damian Borth¹, Rongrong Ji², Tao Chen², Thomas M. Breuel¹, Shih-Fu Chang² - Show less +1 more•Institutions (2)

Kaiserslautern University of Technology¹, Columbia University²

21 Oct 2013

TL;DR: This work presents a method built upon psychological theories and web mining to automatically construct a large-scale Visual Sentiment Ontology (VSO) consisting of more than 3,000 Adjective Noun Pairs (ANP) and proposes SentiBank, a novel visual concept detector library that can be used to detect the presence of 1,200 ANPs in an image.

...read moreread less

Abstract: We address the challenge of sentiment analysis from visual content. In contrast to existing methods which infer sentiment or emotion directly from visual low-level features, we propose a novel approach based on understanding of the visual concepts that are strongly related to sentiments. Our key contribution is two-fold: first, we present a method built upon psychological theories and web mining to automatically construct a large-scale Visual Sentiment Ontology (VSO) consisting of more than 3,000 Adjective Noun Pairs (ANP). Second, we propose SentiBank, a novel visual concept detector library that can be used to detect the presence of 1,200 ANPs in an image. The VSO and SentiBank are distinct from existing work and will open a gate towards various applications enabled by automatic sentiment analysis. Experiments on detecting sentiment of image tweets demonstrate significant improvement in detection accuracy when comparing the proposed SentiBank based predictors with the text-based approaches. The effort also leads to a large publicly available resource consisting of a visual sentiment ontology, a large detector library, and the training/testing benchmark for visual sentiment analysis.

...read moreread less

692 citations

Journal Article•DOI•

Document-level sentiment classification: An empirical comparison between SVM and ANN

[...]

Rodrigo Moraes¹, João Francisco Valiati¹, Wilson Pires Gavião Neto¹•Institutions (1)

Universidade do Vale do Rio dos Sinos¹

01 Feb 2013-Expert Systems With Applications

TL;DR: An empirical comparison between SVM and ANN regarding document-level sentiment analysis is presented and it is indicated that ANN produce superior or at least comparable results to SVM's, even on the context of unbalanced data.

...read moreread less

Abstract: Document-level sentiment classification aims to automate the task of classifying a textual review, which is given on a single topic, as expressing a positive or negative sentiment. In general, supervised methods consist of two stages: (i) extraction/selection of informative features and (ii) classification of reviews by using learning models like Support Vector Machines (SVM) and Nai@?ve Bayes (NB). SVM have been extensively and successfully used as a sentiment learning approach while Artificial Neural Networks (ANN) have rarely been considered in comparative studies in the sentiment analysis literature. This paper presents an empirical comparison between SVM and ANN regarding document-level sentiment analysis. We discuss requirements, resulting models and contexts in which both approaches achieve better levels of classification accuracy. We adopt a standard evaluation context with popular supervised methods for feature selection and weighting in a traditional bag-of-words model. Except for some unbalanced data contexts, our experiments indicated that ANN produce superior or at least comparable results to SVM's. Specially on the benchmark dataset of Movies reviews, ANN outperformed SVM by a statistically significant difference, even on the context of unbalanced data. Our results have also confirmed some potential limitations of both models, which have been rarely discussed in the sentiment classification literature, like the computational cost of SVM at the running time and ANN at the training time.

...read moreread less

616 citations

Journal Article•DOI•

More than words: Social networks' text mining for consumer brand sentiments

[...]

Mohamed M. Mostafa

01 Aug 2013-Expert Systems With Applications

TL;DR: This study uses a random sample of 3516 tweets to evaluate consumers' sentiment towards well-known brands such as Nokia, T-Mobile, IBM, KLM and DHL and indicates a generally positive consumer sentiment towards several famous brands.

...read moreread less

Abstract: Blogs and social networks have recently become a valuable resource for mining sentiments in fields as diverse as customer relationship management, public opinion tracking and text filtering. In fact knowledge obtained from social networks such as Twitter and Facebook has been shown to be extremely valuable to marketing research companies, public opinion organizations and other text mining entities. However, Web texts have been classified as noisy as they represent considerable problems both at the lexical and the syntactic levels. In this research we used a random sample of 3516 tweets to evaluate consumers' sentiment towards well-known brands such as Nokia, T-Mobile, IBM, KLM and DHL. We used an expert-predefined lexicon including around 6800 seed adjectives with known orientation to conduct the analysis. Our results indicate a generally positive consumer sentiment towards several famous brands. By using both a qualitative and quantitative methodology to analyze brands' tweets, this study adds breadth and depth to the debate over attitudes towards cosmopolitan brands.

...read moreread less

576 citations

Posted Content•

NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets

[...]

Saif M. Mohammad¹, Svetlana Kiritchenko¹, Xiaodan Zhu¹•Institutions (1)

National Research Council¹

28 Aug 2013-arXiv: Computation and Language

TL;DR: This paper describes how it created two state-of-the-art SVM classifiers, one to detect the sentiment of messages such as tweets and SMS (message-level task) and one to detects the sentimentof a term within a message (term-leveltask).

...read moreread less

Abstract: In this paper, we describe how we created two state-of-the-art SVM classifiers, one to detect the sentiment of messages such as tweets and SMS (message-level task) and one to detect the sentiment of a term within a submissions stood first in both tasks on tweets, obtaining an F-score of 69.02 in the message-level task and 88.93 in the term-level task. We implemented a variety of surface-form, semantic, and sentiment features. with sentiment-word hashtags, and one from tweets with emoticons. In the message-level task, the lexicon-based features provided a gain of 5 F-score points over all others. Both of our systems can be replicated us available resources.

...read moreread less

528 citations

Proceedings Article•

SemEval-2013 Task 2: Sentiment Analysis in Twitter

[...]

Preslav Nakov¹, Sara Rosenthal², Zornitsa Kozareva³, Veselin Stoyanov⁴, Alan Ritter⁵, Theresa Wilson⁴ - Show less +2 more•Institutions (5)

Qatar Foundation¹, Columbia University², Information Sciences Institute³, Johns Hopkins University⁴, University of Washington⁵

01 Jun 2013

TL;DR: SemEval-2013 Task 2: Sentiment Analysis in Twitter as discussed by the authors included two subtasks: A, an expression-level subtask, and B, a message-level subtask.

...read moreread less

Abstract: In recent years, sentiment analysis in social media has attracted a lot of research interest and has been used for a number of applications. Unfortunately, research has been hindered by the lack of suitable datasets, complicating the comparison between approaches. To address this issue, we have proposed SemEval-2013 Task 2: Sentiment Analysis in Twitter, which included two subtasks: A, an expression-level subtask, and B, a messagelevel subtask. We used crowdsourcing on Amazon Mechanical Turk to label a large Twitter training dataset along with additional test sets of Twitter and SMS messages for both subtasks. All datasets used in the evaluation are released to the research community. The task attracted significant interest and a total of 149 submissions from 44 teams. The bestperforming team achieved an F1 of 88.9% and 69% for subtasks A and B, respectively.

...read moreread less

483 citations

Journal Article•DOI•

The Role of Text Pre-processing in Sentiment Analysis

[...]

Emma Haddi¹, Xiaohui Liu¹, Yong Shi²•Institutions (2)

Brunel University London¹, Chinese Academy of Sciences²

01 Jan 2013-Procedia Computer Science

TL;DR: The role of text pre-processing in sentiment analysis is explored, and it is demonstrated that with appropriate feature selection and representation, sentiment analysis accuracies using support vector machines (SVM) in this area may be significantly improved.

...read moreread less

458 citations

Journal Article•DOI•

Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network

[...]

M. Ghiassi¹, J. Skinner², David Zimbra¹•Institutions (2)

Santa Clara University¹, Amazon.com²

01 Nov 2013-Expert Systems With Applications

TL;DR: This research introduces an approach to supervised feature reduction using n-grams and statistical analysis to develop a Twitter-specific lexicon for sentiment analysis, and develops sentiment classification models using this reduced lexicon and the DAN2 machine learning approach, which has demonstrated success in other text classification problems.

...read moreread less

Abstract: Twitter messages are increasingly used to determine consumer sentiment towards a brand. The existing literature on Twitter sentiment analysis uses various feature sets and methods, many of which are adapted from more traditional text classification problems. In this research, we introduce an approach to supervised feature reduction using n-grams and statistical analysis to develop a Twitter-specific lexicon for sentiment analysis. We augment this reduced Twitter-specific lexicon with brand-specific terms for brand-related tweets. We show that the reduced lexicon set, while significantly smaller (only 187 features), reduces modeling complexity, maintains a high degree of coverage over our Twitter corpus, and yields improved sentiment classification accuracy. To demonstrate the effectiveness of the devised Twitter-specific lexicon compared to a traditional sentiment lexicon, we develop comparable sentiment classification models using SVM. We show that the Twitter-specific lexicon is significantly more effective in terms of classification recall and accuracy metrics. We then develop sentiment classification models using the Twitter-specific lexicon and the DAN2 machine learning approach, which has demonstrated success in other text classification problems. We show that DAN2 produces more accurate sentiment classification results than SVM while using the same Twitter-specific lexicon.

...read moreread less

Journal Article•DOI•

The impact of social and conventional media on firm equity value: A sentiment analysis approach

[...]

Yang Yu¹, Wenjing Duan², Qing Cao¹•Institutions (2)

Texas Tech University¹, George Washington University²

01 Nov 2013

TL;DR: The findings suggest that overall social media has a stronger relationship with firm stock performance than conventional media while social and conventional media have a strong interaction effect on stock performance.

...read moreread less

Abstract: This study aims to investigate the effect of social media and conventional media, their relative importance, and their interrelatedness on short term firm stock market performances. We use a novel and large-scale dataset that features daily media content across various conventional media and social media outlets for 824 public traded firms across 6 industries. Social media outlets include blogs, forums, and Twitter. Conventional media includes major newspapers, television broadcasting companies, and business magazines. We apply the advanced sentiment analysis technique that goes beyond the number of mentions (counts) to analyze the overall sentiment of each media resource toward a specific company on the daily basis. We use stock return and risk as the indicators of companies' short-term performances. Our findings suggest that overall social media has a stronger relationship with firm stock performance than conventional media while social and conventional media have a strong interaction effect on stock performance. More interestingly, we find that the impact of different types of social media varies significantly. Different types of social media also interrelate with conventional media to influence stock movement in various directions and degrees. Our study is among the first to examine the effect of multiple sources of social media along with the effect of conventional media and to investigate their relative importance and their interrelatedness. Our findings suggest the importance for firms to differentiate and leverage the unique impact of various sources of media outlets in implementing their social media marketing strategies.

...read moreread less

Journal Article•DOI•

A multidimensional approach for detecting irony in Twitter

[...]

Antonio Reyes¹, Paolo Rosso¹, Tony Veale²•Institutions (2)

Polytechnic University of Valencia¹, University College Dublin²

01 Mar 2013

TL;DR: A new model of irony detection that is assessed along two dimensions: representativeness and relevance is constructed, and initial results are largely positive, and provide valuable insights into the figurative issues facing tasks such as sentiment analysis, assessment of online reputations, or decision making.

...read moreread less

Abstract: Irony is a pervasive aspect of many online texts, one made all the more difficult by the absence of face-to-face contact and vocal intonation. As our media increasingly become more social, the problem of irony detection will become even more pressing. We describe here a set of textual features for recognizing irony at a linguistic level, especially in short texts created via social media such as Twitter postings or "tweets". Our experiments concern four freely available data sets that were retrieved from Twitter using content words (e.g. "Toyota") and user-generated tags (e.g. "#irony"). We construct a new model of irony detection that is assessed along two dimensions: representativeness and relevance. Initial results are largely positive, and provide valuable insights into the figurative issues facing tasks such as sentiment analysis, assessment of online reputations, or decision making.

...read moreread less

Proceedings Article•DOI•

Unsupervised sentiment analysis with emotional signals

[...]

Xia Hu¹, Jiliang Tang¹, Huiji Gao¹, Huan Liu¹•Institutions (1)

Arizona State University¹

13 May 2013

TL;DR: This work investigates whether the signals in social media can potentially help sentiment analysis by providing a unified way to model two main categories of emotional signals, i.e., emotion indication and emotion correlation and incorporates the signals into an unsupervised learning framework for sentiment analysis.

...read moreread less

Abstract: The explosion of social media services presents a great opportunity to understand the sentiment of the public via analyzing its large-scale and opinion-rich data In social media, it is easy to amass vast quantities of unlabeled data, but very costly to obtain sentiment labels, which makes unsupervised sentiment analysis essential for various applications It is challenging for traditional lexicon-based unsupervised methods due to the fact that expressions in social media are unstructured, informal, and fast-evolving Emoticons and product ratings are examples of emotional signals that are associated with sentiments expressed in posts or words Inspired by the wide availability of emotional signals in social media, we propose to study the problem of unsupervised sentiment analysis with emotional signals In particular, we investigate whether the signals can potentially help sentiment analysis by providing a unified way to model two main categories of emotional signals, ie, emotion indication and emotion correlation We further incorporate the signals into an unsupervised learning framework for sentiment analysis In the experiment, we compare the proposed framework with the state-of-the-art methods on two Twitter datasets and empirically evaluate our proposed framework to gain a deep understanding of the effects of emotional signals

...read moreread less

Proceedings Article•DOI•

Comparing and combining sentiment analysis methods

[...]

Pollyanna Gonçalves¹, Matheus Araújo¹, Fabrício Benevenuto¹, Meeyoung Cha²•Institutions (2)

Universidade Federal de Minas Gerais¹, KAIST²

07 Oct 2013

TL;DR: A new method that combines existing approaches, providing the best coverage results and competitive agreement is developed and a free Web service called iFeel is presented, which provides an open API for accessing and comparing results across different sentiment methods for a given text.

...read moreread less

Abstract: Several messages express opinions about events, products, and services, political views or even their author's emotional state and mood. Sentiment analysis has been used in several applications including analysis of the repercussions of events in social networks, analysis of opinions about products and services, and simply to better understand aspects of social communication in Online Social Networks (OSNs). There are multiple methods for measuring sentiments, including lexical-based approaches and supervised machine learning methods. Despite the wide use and popularity of some methods, it is unclear which method is better for identifying the polarity (i.e., positive or negative) of a message as the current literature does not provide a method of comparison among existing methods. Such a comparison is crucial for understanding the potential limitations, advantages, and disadvantages of popular methods in analyzing the content of OSNs messages. Our study aims at filling this gap by presenting comparisons of eight popular sentiment analysis methods in terms of coverage (i.e., the fraction of messages whose sentiment is identified) and agreement (i.e., the fraction of identified sentiments that are in tune with ground truth). We develop a new method that combines existing approaches, providing the best coverage results and competitive agreement. We also present a free Web service called iFeel, which provides an open API for accessing and comparing results across different sentiment methods for a given text.

...read moreread less

Proceedings Article•DOI•

Exploiting social relations for sentiment analysis in microblogging

[...]

Xia Hu¹, Lei Tang², Jiliang Tang¹, Huan Liu¹•Institutions (2)

Arizona State University¹, Walmart Labs²

04 Feb 2013

TL;DR: This work proposes a Sociological Approach to handling Noisy and short Texts (SANT) for sentiment classification and presents a mathematical optimization formulation that incorporates the sentiment consistency and emotional contagion theories into the supervised learning process.

...read moreread less

Abstract: Microblogging, like Twitter and Sina Weibo, has become a popular platform of human expressions, through which users can easily produce content on breaking news, public events, or products. The massive amount of microblogging data is a useful and timely source that carries mass sentiment and opinions on various topics. Existing sentiment analysis approaches often assume that texts are independent and identically distributed (i.i.d.), usually focusing on building a sophisticated feature space to handle noisy and short texts, without taking advantage of the fact that the microblogs are networked data. Inspired by the social sciences findings that sentiment consistency and emotional contagion are observed in social networks, we investigate whether social relations can help sentiment analysis by proposing a Sociological Approach to handling Noisy and short Texts (SANT) for sentiment classification. In particular, we present a mathematical optimization formulation that incorporates the sentiment consistency and emotional contagion theories into the supervised learning process; and utilize sparse learning to tackle noisy texts in microblogging. An empirical study of two real-world Twitter datasets shows the superior performance of our framework in handling noisy and short tweets.

...read moreread less

Proceedings Article•DOI•

Sentiment analysis in twitter using machine learning techniques

[...]

M. S. Neethu¹, R. Rajasree¹•Institutions (1)

College of Engineering, Trivandrum¹

04 Jul 2013

TL;DR: A new feature vector is presented for classifying the tweets as positive, negative and extract peoples' opinion about products using Machine Learning approach.

...read moreread less

Abstract: Sentiment analysis deals with identifying and classifying opinions or sentiments expressed in source text. Social media is generating a vast amount of sentiment rich data in the form of tweets, status updates, blog posts etc. Sentiment analysis of this user generated data is very useful in knowing the opinion of the crowd. Twitter sentiment analysis is difficult compared to general sentiment analysis due to the presence of slang words and misspellings. The maximum limit of characters that are allowed in Twitter is 140. Knowledge base approach and Machine learning approach are the two strategies used for analyzing sentiments from the text. In this paper, we try to analyze the twitter posts about electronic products like mobiles, laptops etc using Machine Learning approach. By doing sentiment analysis in a specific domain, it is possible to identify the effect of domain information in sentiment classification. We present a new feature vector for classifying the tweets as positive, negative and extract peoples' opinion about products.

...read moreread less

Journal Article•DOI•

Ontology-based sentiment analysis of twitter posts

[...]

Efstratios Kontopoulos¹, Christos Berberidis¹, Theologos Dergiades¹, Nick Bassiliades²•Institutions (2)

International Hellenic University¹, Aristotle University of Thessaloniki²

01 Aug 2013-Expert Systems With Applications

TL;DR: This paper proposes the deployment of original ontology-based techniques towards a more efficient sentiment analysis of Twitter posts, where posts are not simply characterized by a sentiment score, as is the case with machine learning-based classifiers, but instead receive a sentiment grade for each distinct notion in the post.

...read moreread less

Abstract: The emergence of Web 2.0 has drastically altered the way users perceive the Internet, by improving information sharing, collaboration and interoperability. Micro-blogging is one of the most popular Web 2.0 applications and related services, like Twitter, have evolved into a practical means for sharing opinions on almost all aspects of everyday life. Consequently, micro-blogging web sites have since become rich data sources for opinion mining and sentiment analysis. Towards this direction, text-based sentiment classifiers often prove inefficient, since tweets typically do not consist of representative and syntactically consistent words, due to the imposed character limit. This paper proposes the deployment of original ontology-based techniques towards a more efficient sentiment analysis of Twitter posts. The novelty of the proposed approach is that posts are not simply characterized by a sentiment score, as is the case with machine learning-based classifiers, but instead receive a sentiment grade for each distinct notion in the post. Overall, our proposed architecture results in a more detailed analysis of post opinions regarding a specific topic.

...read moreread less

Journal Article•DOI•

YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context

[...]

Martin Wöllmer, Felix Weninger, T. Knaup, Bjoern Schuller, Congkai Sun¹, Kenji Sagae², L-P Morency² - Show less +3 more•Institutions (2)

Shanghai Jiao Tong University¹, University of Southern California²

01 May 2013-IEEE Intelligent Systems

TL;DR: Experimental results indicate that training on written movie reviews is a promising alternative to exclusively using (spoken) in-domain data for building a system that analyzes spoken movie review videos, and that language-independent audio-visual analysis can compete with linguistic analysis.

...read moreread less

Abstract: This work focuses on automatically analyzing a speaker's sentiment in online videos containing movie reviews. In addition to textual information, this approach considers adding audio features as typically used in speech-based emotion recognition as well as video features encoding valuable valence information conveyed by the speaker. Experimental results indicate that training on written movie reviews is a promising alternative to exclusively using (spoken) in-domain data for building a system that analyzes spoken movie review videos, and that language-independent audio-visual analysis can compete with linguistic analysis.

...read moreread less

Journal Article•DOI•

Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products

[...]

Mark Myslín¹, Shu-Hong Zhu¹, Wendy W. Chapman¹, Mike Conway¹•Institutions (1)

University of California, San Diego¹

29 Aug 2013-Journal of Medical Internet Research

TL;DR: Novel insights available through Twitter for tobacco surveillance are attested through the high prevalence of positive sentiment, correlated in complex ways with social image, personal experience, and recently popular products such as hookah and electronic cigarettes.

...read moreread less

Abstract: Background: Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users’ levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by hookah and electronic cigarettes. Objective: To develop a content and sentiment analysis of tobacco-related Twitter posts and build machine learning classifiers to detect tobacco-relevant posts and sentiment towards tobacco, with a particular focus on new and emerging products like hookah and electronic cigarettes. Methods: We collected 7362 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012. Each tweet was manually classified using a triaxial scheme, capturing genre, theme, and sentiment. Using the collected data, machine-learning classifiers were trained to detect tobacco-related vs irrelevant tweets as well as positive vs negative sentiment, using Naive Bayes, k-nearest neighbors, and Support Vector Machine (SVM) algorithms. Finally, phi contingency coefficients were computed between each of the categories to discover emergent patterns. Results: The most prevalent genres were first- and second-hand experience and opinion, and the most frequent themes were hookah, cessation, and pleasure. Sentiment toward tobacco was overall more positive (1939/4215, 46% of tweets) than negative (1349/4215, 32%) or neutral among tweets mentioning it, even excluding the 9% of tweets categorized as marketing. Three separate metrics converged to support an emergent distinction between, on one hand, hookah and electronic cigarettes corresponding to positive sentiment, and on the other hand, traditional tobacco products and more general references corresponding to negative sentiment. These metrics included correlations between categories in the annotation scheme (phi hookah-positive =0.39; phi e-cigs-positive =0.19); correlations between search keywords and sentiment (χ 2 4 =414.50, P <.001, Cramer’s V =0.36), and the most discriminating unigram features for positive and negative sentiment ranked by log odds ratio in the machine learning component of the study. In the automated classification tasks, SVMs using a relatively small number of unigram features (500) achieved best performance in discriminating tobacco-related from unrelated tweets ( F score=0.85). Conclusions: Novel insights available through Twitter for tobacco surveillance are attested through the high prevalence of positive sentiment. This positive sentiment is correlated in complex ways with social image, personal experience, and recently popular products such as hookah and electronic cigarettes. Several apparent perceptual disconnects between these products and their health effects suggest opportunities for tobacco control education. Finally, machine classification of tobacco-related posts shows a promising edge over strictly keyword-based approaches, yielding an improved signal-to-noise ratio in Twitter data and paving the way for automated tobacco surveillance applications. [J Med Internet Res 2013;15(8):e174]

...read moreread less

Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold.

[...]

Hassan Saif, Miriam Fernandez, Yulan He, Harith Alani

01 Jan 2013

TL;DR: A comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity is provided and the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets are investigated.

...read moreread less

Abstract: Sentiment analysis over Twitter offers organisations and individuals a fast and effective way to monitor the publics' feelings towards them and their competitors. To assess the performance of sentiment analysis methods over Twitter a small set of evaluation datasets have been released in the last few years. In this paper we present an overview of eight publicly available and manually annotated evaluation datasets for Twitter sentiment analysis. Based on this review, we show that a common limitation of most of these datasets, when assessing sentiment analysis at target (entity) level, is the lack of distinctive sentiment annotations among the tweets and the entities contained in them. For example, the tweet "I love iPhone, but I hate iPad" can be annotated with a mixed sentiment label, but the entity iPhone within this tweet should be annotated with a positive sentiment label. Aiming to overcome this limitation, and to complement current evaluation datasets, we present STS-Gold, a new evaluation dataset where tweets and targets (entities) are annotated individually and therefore may present different sentiment labels. This paper also provides a comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity. We also investigate the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets.

...read moreread less

Proceedings Article•DOI•

Arabic sentiment analysis: Lexicon-based and corpus-based

[...]

Nawaf A. Abdulla¹, Nizar A. Ahmed¹, Mohammed A. Shehab¹, Mahmoud Al-Ayyoub¹•Institutions (1)

Jordan University of Science and Technology¹

01 Dec 2013

TL;DR: This paper starts by building a manually annotated dataset and then takes the reader through the detailed steps of building the lexicon, which addresses both approaches to SA for the Arabic language.

...read moreread less

Abstract: The emergence of the Web 2.0 technology generated a massive amount of raw data by enabling Internet users to post their opinions, reviews, comments on the web. Processing this raw data to extract useful information can be a very challenging task. An example of important information that can be automatically extracted from the users' posts and comments is their opinions on different issues, events, services, products, etc. This problem of Sentiment Analysis (SA) has been studied well on the English language and two main approaches have been devised: corpus-based and lexicon-based. This paper addresses both approaches to SA for the Arabic language. Since there is a limited number of publically available Arabic dataset and Arabic lexicons for SA, this paper starts by building a manually annotated dataset and then takes the reader through the detailed steps of building the lexicon. Experiments are conducted throughout the different stages of this process to observe the improvements gained on the accuracy of the system and compare them to corpus-based approach.

...read moreread less

Journal Article•DOI•

Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus

[...]

Danushka Bollegala¹, David J. Weir², John M. Carroll²•Institutions (2)

University of Tokyo¹, University of Sussex²

01 Aug 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The proposed method significantly outperforms numerous baselines and returns results that are comparable with previously proposed cross-domain sentiment classification methods on a benchmark data set containing Amazon user reviews for different types of products.

...read moreread less

Abstract: Automatic classification of sentiment is important for numerous applications such as opinion mining, opinion summarization, contextual advertising, and market analysis. Typically, sentiment classification has been modeled as the problem of training a binary classifier using reviews annotated for positive or negative sentiment. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is costly. Applying a sentiment classifier trained using labeled data for a particular domain to classify sentiment of user reviews on a different domain often results in poor performance because words that occur in the train (source) domain might not appear in the test (target) domain. We propose a method to overcome this problem in cross-domain sentiment classification. First, we create a sentiment sensitive distributional thesaurus using labeled data for the source domains and unlabeled data for both source and target domains. Sentiment sensitivity is achieved in the thesaurus by incorporating document level sentiment labels in the context vectors used as the basis for measuring the distributional similarity between words. Next, we use the created thesaurus to expand feature vectors during train and test times in a binary classifier. The proposed method significantly outperforms numerous baselines and returns results that are comparable with previously proposed cross-domain sentiment classification methods on a benchmark data set containing Amazon user reviews for different types of products. We conduct an extensive empirical analysis of the proposed method on single- and multisource domain adaptation, unsupervised and supervised domain adaptation, and numerous similarity measures for creating the sentiment sensitive thesaurus. Moreover, our comparisons against the SentiWordNet, a lexical resource for word polarity, show that the created sentiment-sensitive thesaurus accurately captures words that express similar sentiments.

...read moreread less

Journal Article•DOI•

CSR communication strategies for organizational legitimacy in social media

[...]

Elanor Colleoni¹•Institutions (1)

Copenhagen Business School¹

10 Nov 2013-Corporate Communications: An International Journal

TL;DR: In this paper, the authors investigate which corporate communication strategy adopted in online social media is more effective to create convergence between corporations' corporate social responsibility (CSR) agenda and stakeholders' social expectations, and thereby, to increase corporate legitimacy.

...read moreread less

Abstract: Purpose – Organization legitimacy is a general reflection of the relationship between an organization and its environment. By adopting an institutional approach and defining moral legitimacy as “a positive normative evaluation of the organization and its activities”, the goal of this paper is to investigate which corporate communication strategy adopted in online social media is more effective to create convergence between corporations' corporate social responsibility (CSR) agenda and stakeholders' social expectations, and thereby, to increase corporate legitimacy.Design/methodology/approach – Using the entire Twitter social graph, a network analysis was carried out to study the structural properties of the CSR community, such as the level of reciprocity, and advanced data mining techniques, i.e. topic and sentiment analysis, were carried out to investigate the communication dynamics.Findings – Evidence was found that neither the engaging nor the information strategies lead to alignment. The assumption of...

...read moreread less

Journal Article•DOI•

Harnessing the cloud of patient experience: using social media to detect poor quality healthcare

[...]

Felix Greaves¹, Daniel Ramirez-Cano¹, Christopher Millett¹, Ara Darzi¹, Liam Donaldson¹ - Show less +1 more•Institutions (1)

Imperial College London¹

01 Mar 2013-BMJ Quality & Safety

TL;DR: This commentary outlines the ways in which the collection and aggregation of patients’ descriptions of their experiences on the internet could be used to detect poor clinical care and suggests using the techniques of natural language processing and sentiment analysis to transform unstructured descriptions of patient experience on theinternet into usable measures of healthcare performance.

...read moreread less

Abstract: Recent years have seen increasing interest in patientcentred care and calls to focus on improving the patient experience. At the same time, a growing number of patients are using the internet to describe their experiences of healthcare. We believe the increasing availability of patients’ accounts of their care on blogs, social networks, Twitter and hospital review sites presents an intriguing opportunity to advance the patient-centred care agenda and provide novel quality of care data. We describe this concept as a ‘cloud of patient experience’. In this commentary, we outline the ways in which the collection and aggregation of patients’ descriptions of their experiences on the internet could be used to detect poor clinical care. Over time, such an approach could also identify excellence and allow it to be built on. We suggest using the techniques of natural language processing and sentiment analysis to transform unstructured descriptions of patient experience on the internet into usable measures of healthcare performance. We consider the various sources of information that could be used, the limitations of the approach and discuss whether these new techniques could detect poor performance before conventional measures of healthcare quality.

...read moreread less

Journal Article•DOI•

Use of Sentiment Analysis for Capturing Patient Experience From Free-Text Comments Posted Online

[...]

Felix Greaves¹, Daniel Ramirez-Cano, Christopher Millett, Ara Darzi, Liam Donaldson - Show less +1 more•Institutions (1)

Imperial College London¹

01 Nov 2013-Journal of Medical Internet Research

TL;DR: This paper used machine learning techniques to predict whether a patient would recommend a hospital, whether the hospital was clean, and whether they were treated with dignity from their free-text description, compared to the patient's own quantitative rating of their care.

...read moreread less

Abstract: Background: There are large amounts of unstructured, free-text information about quality of health care available on the Internet in blogs, social networks, and on physician rating websites that are not captured in a systematic way New analytical techniques, such as sentiment analysis, may allow us to understand and use this information more effectively to improve the quality of health care Objective: We attempted to use machine learning to understand patients’ unstructured comments about their care We used sentiment analysis techniques to categorize online free-text comments by patients as either positive or negative descriptions of their health care We tried to automatically predict whether a patient would recommend a hospital, whether the hospital was clean, and whether they were treated with dignity from their free-text description, compared to the patient’s own quantitative rating of their care Methods: We applied machine learning techniques to all 6412 online comments about hospitals on the English National Health Service website in 2010 using Weka data-mining software We also compared the results obtained from sentiment analysis with the paper-based national inpatient survey results at the hospital level using Spearman rank correlation for all 161 acute adult hospital trusts in England Results: There was 81%, 84%, and 89% agreement between quantitative ratings of care and those derived from free-text comments using sentiment analysis for cleanliness, being treated with dignity, and overall recommendation of hospital respectively (kappa scores: 40–74, P <001 for all) We observed mild to moderate associations between our machine learning predictions and responses to the large patient survey for the three categories examined (Spearman rho 037-051, P <001 for all) Conclusions: The prediction accuracy that we have achieved using this machine learning process suggests that we are able to predict, from free-text, a reasonably accurate assessment of patients’ opinion about different performance aspects of a hospital and that these machine learning predictions are associated with results of more conventional surveys [J Med Internet Res 2013;15(11):e239]

...read moreread less

Journal Article•DOI•

Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon

[...]

Fu Xianghua¹, Liu Guo¹, Guo Yanyan¹, Wang Zhiqiang¹•Institutions (1)

Shenzhen University¹

01 Jan 2013-Knowledge Based Systems

TL;DR: This paper proposes an unsupervised approach to automatically discover the aspects discussed in Chinese social reviews and also the sentiments expressed in different aspects, and applies the Latent Dirichlet Allocation model to discover multi-aspect global topics of social reviews.

...read moreread less

Abstract: User-generated reviews on the Web reflect users' sentiment about products, services and social events. Existing researches mostly focus on the sentiment classification of the product and service reviews in document level. Reviews of social events such as economic and political activities, which are called social reviews, have specific characteristics different to the reviews of products and services. In this paper, we propose an unsupervised approach to automatically discover the aspects discussed in Chinese social reviews and also the sentiments expressed in different aspects. The approach is called Multi-aspect Sentiment Analysis for Chinese Online Social Reviews (MSA-COSRs). We first apply the Latent Dirichlet Allocation (LDA) model to discover multi-aspect global topics of social reviews, and then extract the local topic and associated sentiment based on a sliding window context over the review text. The aspect of the local topic is identified by a trained LDA model, and the polarity of the associated sentiment is classified by HowNet lexicon. The experiment results show that MSA-COSR cannot only obtain good topic partitioning results, but also help to improve sentiment analysis accuracy. It helps to simultaneously discover multi-aspect fine-grained topics and associated sentiment.

...read moreread less

Proceedings Article•

Exploiting Topic based Twitter Sentiment for Stock Prediction

[...]

Jianfeng Si¹, Arjun Mukherjee², Bing Liu², Qing Li¹, Huayi Li², Xiaotie Deng³ - Show less +2 more•Institutions (3)

City University of Hong Kong¹, University of Illinois at Chicago², Shanghai Jiao Tong University³

01 Jan 2013

TL;DR: This paper proposes a technique to leverage topic based sentiments from Twitter to help predict the stock market by utilizing a con- tinuous Dirichlet Process Mixture model to learn the daily topic set and regress the stock index and the Twitter sentiment time series to predict the market.

...read moreread less

Abstract: This paper proposes a technique to leverage topic based sentiments from Twitter to help predict the stock market. We first utilize a con- tinuous Dirichlet Process Mixture model to learn the daily topic set. Then, for each topic we derive its sentiment according to its opin- ion words distribution to build a sentiment time series. We then regress the stock index and the Twitter sentiment time series to predict the market. Experiments on real-life S&P100 Index show that our approach is effective and performs better than existing state-of-the-art non-topic based methods.

...read moreread less

Proceedings Article•DOI•

A sentiment-enhanced personalized location recommendation system

[...]

Dingqi Yang¹, Daqing Zhang¹, Zhiyong Yu¹, Zhu Wang²•Institutions (2)

Telecom SudParis¹, Northwestern Polytechnical University²

01 May 2013

TL;DR: This research proposes a hybrid user location preference model by combining the preference extracted from check-ins and text-based tips which is processed using sentiment analysis techniques and develops a location based social matrix factorization algorithm that takes both user social influence and venue similarity influence into account in location recommendation.

...read moreread less

Abstract: Although online recommendation systems such as recommendation of movies or music have been systematically studied in the past decade, location recommendation in Location Based Social Networks (LBSNs) is not well investigated yet. In LBSNs, users can check in and leave tips commenting on a venue. These two heterogeneous data sources both describe users' preference of venues. However, in current research work, only users' check-in behavior is considered in users' location preference model, users' tips on venues are seldom investigated yet. Moreover, while existing work mainly considers social influence in recommendation, we argue that considering venue similarity can further improve the recommendation performance. In this research, we ameliorate location recommendation by enhancing not only the user location preference model but also recommendation algorithm. First, we propose a hybrid user location preference model by combining the preference extracted from check-ins and text-based tips which are processed using sentiment analysis techniques. Second, we develop a location based social matrix factorization algorithm that takes both user social influence and venue similarity influence into account in location recommendation. Using two datasets extracted from the location based social networks Foursquare, experiment results demonstrate that the proposed hybrid preference model can better characterize user preference by maintaining the preference consistency, and the proposed algorithm outperforms the state-of-the-art methods.

...read moreread less

Posted Content•

Sentiment Analysis in the News

[...]

Alexandra Balahur¹, Ralf Steinberger², Mijail Kabadjov², Vanni Zavarella², Erik van der Goot, Matina Halkia, Bruno Pouliquen³, Jenya Belyaeva⁴ - Show less +4 more•Institutions (4)

University of Alicante¹, International Practical Shooting Confederation², Institute for the Protection and Security of the Citizen³, European Food Safety Authority⁴

24 Sep 2013-arXiv: Computation and Language

TL;DR: The authors identified three subtasks that need to be addressed: definition of the target, separation of the good and bad news content from the good or bad sentiment expressed on the target; and analysis of clearly marked opinion that is expressed explicitly, not needing interpretation or the use of world knowledge.

...read moreread less

Abstract: Recent years have brought a significant growth in the volume of research in sentiment analysis, mostly on highly subjective text types (movie or product reviews). The main difference these texts have with news articles is that their target is clearly defined and unique across the text. Following different annotation efforts and the analysis of the issues encountered, we realised that news opinion mining is different from that of other text types. We identified three subtasks that need to be addressed: definition of the target; separation of the good and bad news content from the good and bad sentiment expressed on the target; and analysis of clearly marked opinion that is expressed explicitly, not needing interpretation or the use of world knowledge. Furthermore, we distinguish three different possible views on newspaper articles - author, reader and text, which have to be addressed differently at the time of analysing sentiment. Given these definitions, we present work on mining opinions about entities in English language news, in which (a) we test the relative suitability of various sentiment dictionaries and (b) we attempt to separate positive or negative opinion from good or bad news. In the experiments described here, we tested whether or not subject domain-defining vocabulary should be ignored. Results showed that this idea is more appropriate in the context of news opinion mining and that the approaches taking this into consideration produce a better performance.

...read moreread less

Collapse