scispace - formally typeset
Search or ask a question

Showing papers on "Sentiment analysis published in 2013"


Journal ArticleDOI
TL;DR: The main applications and challenges of one of the hottest research areas in computer science are revealed.
Abstract: The main applications and challenges of one of the hottest research areas in computer science.

1,229 citations


Journal ArticleDOI
TL;DR: It is found that emotionally charged Twitter messages tend to be retweeted more often and more quickly compared to neutral ones, and companies should pay more attention to the analysis of sentiment related to their brands and products in social media communication as well as in designing advertising content that triggers emotions.
Abstract: As a new communication paradigm, social media has promoted information dissemination in social networks. Previous research has identified several content-related features as well as user and network characteristics that may drive information diffusion. However, little research has focused on the relationship between emotions and information diffusion in a social media setting. In this paper, we examine whether sentiment occurring in social media content is associated with a user's information sharing behavior. We carry out our research in the context of political communication on Twitter. Based on two data sets of more than 165,000 tweets in total, we find that emotionally charged Twitter messages tend to be retweeted more often and more quickly compared to neutral ones. As a practical implication, companies should pay more attention to the analysis of sentiment related to their brands and products in social media communication as well as in designing advertising content that triggers emotions.

1,146 citations


Journal ArticleDOI
TL;DR: The history, current use, and future of opinion mining and sentiment analysis are discussed, along with relevant techniques and tools.
Abstract: The Web holds valuable, vast, and unstructured information about public opinion. Here, the history, current use, and future of opinion mining and sentiment analysis are discussed, along with relevant techniques and tools.

1,042 citations


Proceedings Article
28 Aug 2013
TL;DR: In this paper, two state-of-the-art SVM classifiers, one to detect the sentiment of messages such as tweets and SMS (message-level task) and another to detect sentiment of a term within a message (termlevel task), were presented.
Abstract: In this paper, we describe how we created two state-of-the-art SVM classifiers, one to detect the sentiment of messages such as tweets and SMS (message-level task) and one to detect the sentiment of a term within a message (term-level task). Among submissions from 44 teams in a competition, our submissions stood first in both tasks on tweets, obtaining an F-score of 69.02 in the message-level task and 88.93 in the term-level task. We implemented a variety of surface-form, semantic, and sentiment features. We also generated two large word‐sentiment association lexicons, one from tweets with sentiment-word hashtags, and one from tweets with emoticons. In the message-level task, the lexicon-based features provided a gain of 5 F-score points over all others. Both of our systems can be replicated using freely available resources. 1

854 citations


Proceedings ArticleDOI
21 Oct 2013
TL;DR: This work presents a method built upon psychological theories and web mining to automatically construct a large-scale Visual Sentiment Ontology (VSO) consisting of more than 3,000 Adjective Noun Pairs (ANP) and proposes SentiBank, a novel visual concept detector library that can be used to detect the presence of 1,200 ANPs in an image.
Abstract: We address the challenge of sentiment analysis from visual content. In contrast to existing methods which infer sentiment or emotion directly from visual low-level features, we propose a novel approach based on understanding of the visual concepts that are strongly related to sentiments. Our key contribution is two-fold: first, we present a method built upon psychological theories and web mining to automatically construct a large-scale Visual Sentiment Ontology (VSO) consisting of more than 3,000 Adjective Noun Pairs (ANP). Second, we propose SentiBank, a novel visual concept detector library that can be used to detect the presence of 1,200 ANPs in an image. The VSO and SentiBank are distinct from existing work and will open a gate towards various applications enabled by automatic sentiment analysis. Experiments on detecting sentiment of image tweets demonstrate significant improvement in detection accuracy when comparing the proposed SentiBank based predictors with the text-based approaches. The effort also leads to a large publicly available resource consisting of a visual sentiment ontology, a large detector library, and the training/testing benchmark for visual sentiment analysis.

692 citations


Journal ArticleDOI
TL;DR: An empirical comparison between SVM and ANN regarding document-level sentiment analysis is presented and it is indicated that ANN produce superior or at least comparable results to SVM's, even on the context of unbalanced data.
Abstract: Document-level sentiment classification aims to automate the task of classifying a textual review, which is given on a single topic, as expressing a positive or negative sentiment. In general, supervised methods consist of two stages: (i) extraction/selection of informative features and (ii) classification of reviews by using learning models like Support Vector Machines (SVM) and Nai@?ve Bayes (NB). SVM have been extensively and successfully used as a sentiment learning approach while Artificial Neural Networks (ANN) have rarely been considered in comparative studies in the sentiment analysis literature. This paper presents an empirical comparison between SVM and ANN regarding document-level sentiment analysis. We discuss requirements, resulting models and contexts in which both approaches achieve better levels of classification accuracy. We adopt a standard evaluation context with popular supervised methods for feature selection and weighting in a traditional bag-of-words model. Except for some unbalanced data contexts, our experiments indicated that ANN produce superior or at least comparable results to SVM's. Specially on the benchmark dataset of Movies reviews, ANN outperformed SVM by a statistically significant difference, even on the context of unbalanced data. Our results have also confirmed some potential limitations of both models, which have been rarely discussed in the sentiment classification literature, like the computational cost of SVM at the running time and ANN at the training time.

616 citations


Journal ArticleDOI
TL;DR: This study uses a random sample of 3516 tweets to evaluate consumers' sentiment towards well-known brands such as Nokia, T-Mobile, IBM, KLM and DHL and indicates a generally positive consumer sentiment towards several famous brands.
Abstract: Blogs and social networks have recently become a valuable resource for mining sentiments in fields as diverse as customer relationship management, public opinion tracking and text filtering. In fact knowledge obtained from social networks such as Twitter and Facebook has been shown to be extremely valuable to marketing research companies, public opinion organizations and other text mining entities. However, Web texts have been classified as noisy as they represent considerable problems both at the lexical and the syntactic levels. In this research we used a random sample of 3516 tweets to evaluate consumers' sentiment towards well-known brands such as Nokia, T-Mobile, IBM, KLM and DHL. We used an expert-predefined lexicon including around 6800 seed adjectives with known orientation to conduct the analysis. Our results indicate a generally positive consumer sentiment towards several famous brands. By using both a qualitative and quantitative methodology to analyze brands' tweets, this study adds breadth and depth to the debate over attitudes towards cosmopolitan brands.

576 citations


Posted Content
TL;DR: This paper describes how it created two state-of-the-art SVM classifiers, one to detect the sentiment of messages such as tweets and SMS (message-level task) and one to detects the sentimentof a term within a message (term-leveltask).
Abstract: In this paper, we describe how we created two state-of-the-art SVM classifiers, one to detect the sentiment of messages such as tweets and SMS (message-level task) and one to detect the sentiment of a term within a submissions stood first in both tasks on tweets, obtaining an F-score of 69.02 in the message-level task and 88.93 in the term-level task. We implemented a variety of surface-form, semantic, and sentiment features. with sentiment-word hashtags, and one from tweets with emoticons. In the message-level task, the lexicon-based features provided a gain of 5 F-score points over all others. Both of our systems can be replicated us available resources.

528 citations


Proceedings Article
01 Jun 2013
TL;DR: SemEval-2013 Task 2: Sentiment Analysis in Twitter as discussed by the authors included two subtasks: A, an expression-level subtask, and B, a message-level subtask.
Abstract: In recent years, sentiment analysis in social media has attracted a lot of research interest and has been used for a number of applications. Unfortunately, research has been hindered by the lack of suitable datasets, complicating the comparison between approaches. To address this issue, we have proposed SemEval-2013 Task 2: Sentiment Analysis in Twitter, which included two subtasks: A, an expression-level subtask, and B, a messagelevel subtask. We used crowdsourcing on Amazon Mechanical Turk to label a large Twitter training dataset along with additional test sets of Twitter and SMS messages for both subtasks. All datasets used in the evaluation are released to the research community. The task attracted significant interest and a total of 149 submissions from 44 teams. The bestperforming team achieved an F1 of 88.9% and 69% for subtasks A and B, respectively.

483 citations


Journal ArticleDOI
TL;DR: The role of text pre-processing in sentiment analysis is explored, and it is demonstrated that with appropriate feature selection and representation, sentiment analysis accuracies using support vector machines (SVM) in this area may be significantly improved.

458 citations


Journal ArticleDOI
TL;DR: This research introduces an approach to supervised feature reduction using n-grams and statistical analysis to develop a Twitter-specific lexicon for sentiment analysis, and develops sentiment classification models using this reduced lexicon and the DAN2 machine learning approach, which has demonstrated success in other text classification problems.
Abstract: Twitter messages are increasingly used to determine consumer sentiment towards a brand. The existing literature on Twitter sentiment analysis uses various feature sets and methods, many of which are adapted from more traditional text classification problems. In this research, we introduce an approach to supervised feature reduction using n-grams and statistical analysis to develop a Twitter-specific lexicon for sentiment analysis. We augment this reduced Twitter-specific lexicon with brand-specific terms for brand-related tweets. We show that the reduced lexicon set, while significantly smaller (only 187 features), reduces modeling complexity, maintains a high degree of coverage over our Twitter corpus, and yields improved sentiment classification accuracy. To demonstrate the effectiveness of the devised Twitter-specific lexicon compared to a traditional sentiment lexicon, we develop comparable sentiment classification models using SVM. We show that the Twitter-specific lexicon is significantly more effective in terms of classification recall and accuracy metrics. We then develop sentiment classification models using the Twitter-specific lexicon and the DAN2 machine learning approach, which has demonstrated success in other text classification problems. We show that DAN2 produces more accurate sentiment classification results than SVM while using the same Twitter-specific lexicon.

Journal ArticleDOI
01 Nov 2013
TL;DR: The findings suggest that overall social media has a stronger relationship with firm stock performance than conventional media while social and conventional media have a strong interaction effect on stock performance.
Abstract: This study aims to investigate the effect of social media and conventional media, their relative importance, and their interrelatedness on short term firm stock market performances. We use a novel and large-scale dataset that features daily media content across various conventional media and social media outlets for 824 public traded firms across 6 industries. Social media outlets include blogs, forums, and Twitter. Conventional media includes major newspapers, television broadcasting companies, and business magazines. We apply the advanced sentiment analysis technique that goes beyond the number of mentions (counts) to analyze the overall sentiment of each media resource toward a specific company on the daily basis. We use stock return and risk as the indicators of companies' short-term performances. Our findings suggest that overall social media has a stronger relationship with firm stock performance than conventional media while social and conventional media have a strong interaction effect on stock performance. More interestingly, we find that the impact of different types of social media varies significantly. Different types of social media also interrelate with conventional media to influence stock movement in various directions and degrees. Our study is among the first to examine the effect of multiple sources of social media along with the effect of conventional media and to investigate their relative importance and their interrelatedness. Our findings suggest the importance for firms to differentiate and leverage the unique impact of various sources of media outlets in implementing their social media marketing strategies.

Journal ArticleDOI
01 Mar 2013
TL;DR: A new model of irony detection that is assessed along two dimensions: representativeness and relevance is constructed, and initial results are largely positive, and provide valuable insights into the figurative issues facing tasks such as sentiment analysis, assessment of online reputations, or decision making.
Abstract: Irony is a pervasive aspect of many online texts, one made all the more difficult by the absence of face-to-face contact and vocal intonation. As our media increasingly become more social, the problem of irony detection will become even more pressing. We describe here a set of textual features for recognizing irony at a linguistic level, especially in short texts created via social media such as Twitter postings or "tweets". Our experiments concern four freely available data sets that were retrieved from Twitter using content words (e.g. "Toyota") and user-generated tags (e.g. "#irony"). We construct a new model of irony detection that is assessed along two dimensions: representativeness and relevance. Initial results are largely positive, and provide valuable insights into the figurative issues facing tasks such as sentiment analysis, assessment of online reputations, or decision making.

Proceedings ArticleDOI
13 May 2013
TL;DR: This work investigates whether the signals in social media can potentially help sentiment analysis by providing a unified way to model two main categories of emotional signals, i.e., emotion indication and emotion correlation and incorporates the signals into an unsupervised learning framework for sentiment analysis.
Abstract: The explosion of social media services presents a great opportunity to understand the sentiment of the public via analyzing its large-scale and opinion-rich data In social media, it is easy to amass vast quantities of unlabeled data, but very costly to obtain sentiment labels, which makes unsupervised sentiment analysis essential for various applications It is challenging for traditional lexicon-based unsupervised methods due to the fact that expressions in social media are unstructured, informal, and fast-evolving Emoticons and product ratings are examples of emotional signals that are associated with sentiments expressed in posts or words Inspired by the wide availability of emotional signals in social media, we propose to study the problem of unsupervised sentiment analysis with emotional signals In particular, we investigate whether the signals can potentially help sentiment analysis by providing a unified way to model two main categories of emotional signals, ie, emotion indication and emotion correlation We further incorporate the signals into an unsupervised learning framework for sentiment analysis In the experiment, we compare the proposed framework with the state-of-the-art methods on two Twitter datasets and empirically evaluate our proposed framework to gain a deep understanding of the effects of emotional signals

Proceedings ArticleDOI
07 Oct 2013
TL;DR: A new method that combines existing approaches, providing the best coverage results and competitive agreement is developed and a free Web service called iFeel is presented, which provides an open API for accessing and comparing results across different sentiment methods for a given text.
Abstract: Several messages express opinions about events, products, and services, political views or even their author's emotional state and mood. Sentiment analysis has been used in several applications including analysis of the repercussions of events in social networks, analysis of opinions about products and services, and simply to better understand aspects of social communication in Online Social Networks (OSNs). There are multiple methods for measuring sentiments, including lexical-based approaches and supervised machine learning methods. Despite the wide use and popularity of some methods, it is unclear which method is better for identifying the polarity (i.e., positive or negative) of a message as the current literature does not provide a method of comparison among existing methods. Such a comparison is crucial for understanding the potential limitations, advantages, and disadvantages of popular methods in analyzing the content of OSNs messages. Our study aims at filling this gap by presenting comparisons of eight popular sentiment analysis methods in terms of coverage (i.e., the fraction of messages whose sentiment is identified) and agreement (i.e., the fraction of identified sentiments that are in tune with ground truth). We develop a new method that combines existing approaches, providing the best coverage results and competitive agreement. We also present a free Web service called iFeel, which provides an open API for accessing and comparing results across different sentiment methods for a given text.

Proceedings ArticleDOI
04 Feb 2013
TL;DR: This work proposes a Sociological Approach to handling Noisy and short Texts (SANT) for sentiment classification and presents a mathematical optimization formulation that incorporates the sentiment consistency and emotional contagion theories into the supervised learning process.
Abstract: Microblogging, like Twitter and Sina Weibo, has become a popular platform of human expressions, through which users can easily produce content on breaking news, public events, or products. The massive amount of microblogging data is a useful and timely source that carries mass sentiment and opinions on various topics. Existing sentiment analysis approaches often assume that texts are independent and identically distributed (i.i.d.), usually focusing on building a sophisticated feature space to handle noisy and short texts, without taking advantage of the fact that the microblogs are networked data. Inspired by the social sciences findings that sentiment consistency and emotional contagion are observed in social networks, we investigate whether social relations can help sentiment analysis by proposing a Sociological Approach to handling Noisy and short Texts (SANT) for sentiment classification. In particular, we present a mathematical optimization formulation that incorporates the sentiment consistency and emotional contagion theories into the supervised learning process; and utilize sparse learning to tackle noisy texts in microblogging. An empirical study of two real-world Twitter datasets shows the superior performance of our framework in handling noisy and short tweets.

Proceedings ArticleDOI
04 Jul 2013
TL;DR: A new feature vector is presented for classifying the tweets as positive, negative and extract peoples' opinion about products using Machine Learning approach.
Abstract: Sentiment analysis deals with identifying and classifying opinions or sentiments expressed in source text. Social media is generating a vast amount of sentiment rich data in the form of tweets, status updates, blog posts etc. Sentiment analysis of this user generated data is very useful in knowing the opinion of the crowd. Twitter sentiment analysis is difficult compared to general sentiment analysis due to the presence of slang words and misspellings. The maximum limit of characters that are allowed in Twitter is 140. Knowledge base approach and Machine learning approach are the two strategies used for analyzing sentiments from the text. In this paper, we try to analyze the twitter posts about electronic products like mobiles, laptops etc using Machine Learning approach. By doing sentiment analysis in a specific domain, it is possible to identify the effect of domain information in sentiment classification. We present a new feature vector for classifying the tweets as positive, negative and extract peoples' opinion about products.

Journal ArticleDOI
TL;DR: This paper proposes the deployment of original ontology-based techniques towards a more efficient sentiment analysis of Twitter posts, where posts are not simply characterized by a sentiment score, as is the case with machine learning-based classifiers, but instead receive a sentiment grade for each distinct notion in the post.
Abstract: The emergence of Web 2.0 has drastically altered the way users perceive the Internet, by improving information sharing, collaboration and interoperability. Micro-blogging is one of the most popular Web 2.0 applications and related services, like Twitter, have evolved into a practical means for sharing opinions on almost all aspects of everyday life. Consequently, micro-blogging web sites have since become rich data sources for opinion mining and sentiment analysis. Towards this direction, text-based sentiment classifiers often prove inefficient, since tweets typically do not consist of representative and syntactically consistent words, due to the imposed character limit. This paper proposes the deployment of original ontology-based techniques towards a more efficient sentiment analysis of Twitter posts. The novelty of the proposed approach is that posts are not simply characterized by a sentiment score, as is the case with machine learning-based classifiers, but instead receive a sentiment grade for each distinct notion in the post. Overall, our proposed architecture results in a more detailed analysis of post opinions regarding a specific topic.

Journal ArticleDOI
TL;DR: Experimental results indicate that training on written movie reviews is a promising alternative to exclusively using (spoken) in-domain data for building a system that analyzes spoken movie review videos, and that language-independent audio-visual analysis can compete with linguistic analysis.
Abstract: This work focuses on automatically analyzing a speaker's sentiment in online videos containing movie reviews. In addition to textual information, this approach considers adding audio features as typically used in speech-based emotion recognition as well as video features encoding valuable valence information conveyed by the speaker. Experimental results indicate that training on written movie reviews is a promising alternative to exclusively using (spoken) in-domain data for building a system that analyzes spoken movie review videos, and that language-independent audio-visual analysis can compete with linguistic analysis.

Journal ArticleDOI
TL;DR: Novel insights available through Twitter for tobacco surveillance are attested through the high prevalence of positive sentiment, correlated in complex ways with social image, personal experience, and recently popular products such as hookah and electronic cigarettes.
Abstract: Background: Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users’ levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by hookah and electronic cigarettes. Objective: To develop a content and sentiment analysis of tobacco-related Twitter posts and build machine learning classifiers to detect tobacco-relevant posts and sentiment towards tobacco, with a particular focus on new and emerging products like hookah and electronic cigarettes. Methods: We collected 7362 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012. Each tweet was manually classified using a triaxial scheme, capturing genre, theme, and sentiment. Using the collected data, machine-learning classifiers were trained to detect tobacco-related vs irrelevant tweets as well as positive vs negative sentiment, using Naive Bayes, k-nearest neighbors, and Support Vector Machine (SVM) algorithms. Finally, phi contingency coefficients were computed between each of the categories to discover emergent patterns. Results: The most prevalent genres were first- and second-hand experience and opinion, and the most frequent themes were hookah, cessation, and pleasure. Sentiment toward tobacco was overall more positive (1939/4215, 46% of tweets) than negative (1349/4215, 32%) or neutral among tweets mentioning it, even excluding the 9% of tweets categorized as marketing. Three separate metrics converged to support an emergent distinction between, on one hand, hookah and electronic cigarettes corresponding to positive sentiment, and on the other hand, traditional tobacco products and more general references corresponding to negative sentiment. These metrics included correlations between categories in the annotation scheme (phi hookah-positive =0.39; phi e-cigs-positive =0.19); correlations between search keywords and sentiment (χ 2 4 =414.50, P <.001, Cramer’s V =0.36), and the most discriminating unigram features for positive and negative sentiment ranked by log odds ratio in the machine learning component of the study. In the automated classification tasks, SVMs using a relatively small number of unigram features (500) achieved best performance in discriminating tobacco-related from unrelated tweets ( F score=0.85). Conclusions: Novel insights available through Twitter for tobacco surveillance are attested through the high prevalence of positive sentiment. This positive sentiment is correlated in complex ways with social image, personal experience, and recently popular products such as hookah and electronic cigarettes. Several apparent perceptual disconnects between these products and their health effects suggest opportunities for tobacco control education. Finally, machine classification of tobacco-related posts shows a promising edge over strictly keyword-based approaches, yielding an improved signal-to-noise ratio in Twitter data and paving the way for automated tobacco surveillance applications. [J Med Internet Res 2013;15(8):e174]

01 Jan 2013
TL;DR: A comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity is provided and the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets are investigated.
Abstract: Sentiment analysis over Twitter offers organisations and individuals a fast and effective way to monitor the publics' feelings towards them and their competitors. To assess the performance of sentiment analysis methods over Twitter a small set of evaluation datasets have been released in the last few years. In this paper we present an overview of eight publicly available and manually annotated evaluation datasets for Twitter sentiment analysis. Based on this review, we show that a common limitation of most of these datasets, when assessing sentiment analysis at target (entity) level, is the lack of distinctive sentiment annotations among the tweets and the entities contained in them. For example, the tweet "I love iPhone, but I hate iPad" can be annotated with a mixed sentiment label, but the entity iPhone within this tweet should be annotated with a positive sentiment label. Aiming to overcome this limitation, and to complement current evaluation datasets, we present STS-Gold, a new evaluation dataset where tweets and targets (entities) are annotated individually and therefore may present different sentiment labels. This paper also provides a comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity. We also investigate the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This paper starts by building a manually annotated dataset and then takes the reader through the detailed steps of building the lexicon, which addresses both approaches to SA for the Arabic language.
Abstract: The emergence of the Web 2.0 technology generated a massive amount of raw data by enabling Internet users to post their opinions, reviews, comments on the web. Processing this raw data to extract useful information can be a very challenging task. An example of important information that can be automatically extracted from the users' posts and comments is their opinions on different issues, events, services, products, etc. This problem of Sentiment Analysis (SA) has been studied well on the English language and two main approaches have been devised: corpus-based and lexicon-based. This paper addresses both approaches to SA for the Arabic language. Since there is a limited number of publically available Arabic dataset and Arabic lexicons for SA, this paper starts by building a manually annotated dataset and then takes the reader through the detailed steps of building the lexicon. Experiments are conducted throughout the different stages of this process to observe the improvements gained on the accuracy of the system and compare them to corpus-based approach.

Journal ArticleDOI
TL;DR: The proposed method significantly outperforms numerous baselines and returns results that are comparable with previously proposed cross-domain sentiment classification methods on a benchmark data set containing Amazon user reviews for different types of products.
Abstract: Automatic classification of sentiment is important for numerous applications such as opinion mining, opinion summarization, contextual advertising, and market analysis. Typically, sentiment classification has been modeled as the problem of training a binary classifier using reviews annotated for positive or negative sentiment. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is costly. Applying a sentiment classifier trained using labeled data for a particular domain to classify sentiment of user reviews on a different domain often results in poor performance because words that occur in the train (source) domain might not appear in the test (target) domain. We propose a method to overcome this problem in cross-domain sentiment classification. First, we create a sentiment sensitive distributional thesaurus using labeled data for the source domains and unlabeled data for both source and target domains. Sentiment sensitivity is achieved in the thesaurus by incorporating document level sentiment labels in the context vectors used as the basis for measuring the distributional similarity between words. Next, we use the created thesaurus to expand feature vectors during train and test times in a binary classifier. The proposed method significantly outperforms numerous baselines and returns results that are comparable with previously proposed cross-domain sentiment classification methods on a benchmark data set containing Amazon user reviews for different types of products. We conduct an extensive empirical analysis of the proposed method on single- and multisource domain adaptation, unsupervised and supervised domain adaptation, and numerous similarity measures for creating the sentiment sensitive thesaurus. Moreover, our comparisons against the SentiWordNet, a lexical resource for word polarity, show that the created sentiment-sensitive thesaurus accurately captures words that express similar sentiments.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate which corporate communication strategy adopted in online social media is more effective to create convergence between corporations' corporate social responsibility (CSR) agenda and stakeholders' social expectations, and thereby, to increase corporate legitimacy.
Abstract: Purpose – Organization legitimacy is a general reflection of the relationship between an organization and its environment. By adopting an institutional approach and defining moral legitimacy as “a positive normative evaluation of the organization and its activities”, the goal of this paper is to investigate which corporate communication strategy adopted in online social media is more effective to create convergence between corporations' corporate social responsibility (CSR) agenda and stakeholders' social expectations, and thereby, to increase corporate legitimacy.Design/methodology/approach – Using the entire Twitter social graph, a network analysis was carried out to study the structural properties of the CSR community, such as the level of reciprocity, and advanced data mining techniques, i.e. topic and sentiment analysis, were carried out to investigate the communication dynamics.Findings – Evidence was found that neither the engaging nor the information strategies lead to alignment. The assumption of...

Journal ArticleDOI
TL;DR: This commentary outlines the ways in which the collection and aggregation of patients’ descriptions of their experiences on the internet could be used to detect poor clinical care and suggests using the techniques of natural language processing and sentiment analysis to transform unstructured descriptions of patient experience on theinternet into usable measures of healthcare performance.
Abstract: Recent years have seen increasing interest in patientcentred care and calls to focus on improving the patient experience. At the same time, a growing number of patients are using the internet to describe their experiences of healthcare. We believe the increasing availability of patients’ accounts of their care on blogs, social networks, Twitter and hospital review sites presents an intriguing opportunity to advance the patient-centred care agenda and provide novel quality of care data. We describe this concept as a ‘cloud of patient experience’. In this commentary, we outline the ways in which the collection and aggregation of patients’ descriptions of their experiences on the internet could be used to detect poor clinical care. Over time, such an approach could also identify excellence and allow it to be built on. We suggest using the techniques of natural language processing and sentiment analysis to transform unstructured descriptions of patient experience on the internet into usable measures of healthcare performance. We consider the various sources of information that could be used, the limitations of the approach and discuss whether these new techniques could detect poor performance before conventional measures of healthcare quality.

Journal ArticleDOI
TL;DR: This paper used machine learning techniques to predict whether a patient would recommend a hospital, whether the hospital was clean, and whether they were treated with dignity from their free-text description, compared to the patient's own quantitative rating of their care.
Abstract: Background: There are large amounts of unstructured, free-text information about quality of health care available on the Internet in blogs, social networks, and on physician rating websites that are not captured in a systematic way New analytical techniques, such as sentiment analysis, may allow us to understand and use this information more effectively to improve the quality of health care Objective: We attempted to use machine learning to understand patients’ unstructured comments about their care We used sentiment analysis techniques to categorize online free-text comments by patients as either positive or negative descriptions of their health care We tried to automatically predict whether a patient would recommend a hospital, whether the hospital was clean, and whether they were treated with dignity from their free-text description, compared to the patient’s own quantitative rating of their care Methods: We applied machine learning techniques to all 6412 online comments about hospitals on the English National Health Service website in 2010 using Weka data-mining software We also compared the results obtained from sentiment analysis with the paper-based national inpatient survey results at the hospital level using Spearman rank correlation for all 161 acute adult hospital trusts in England Results: There was 81%, 84%, and 89% agreement between quantitative ratings of care and those derived from free-text comments using sentiment analysis for cleanliness, being treated with dignity, and overall recommendation of hospital respectively (kappa scores: 40–74, P <001 for all) We observed mild to moderate associations between our machine learning predictions and responses to the large patient survey for the three categories examined (Spearman rho 037-051, P <001 for all) Conclusions: The prediction accuracy that we have achieved using this machine learning process suggests that we are able to predict, from free-text, a reasonably accurate assessment of patients’ opinion about different performance aspects of a hospital and that these machine learning predictions are associated with results of more conventional surveys [J Med Internet Res 2013;15(11):e239]

Journal ArticleDOI
TL;DR: This paper proposes an unsupervised approach to automatically discover the aspects discussed in Chinese social reviews and also the sentiments expressed in different aspects, and applies the Latent Dirichlet Allocation model to discover multi-aspect global topics of social reviews.
Abstract: User-generated reviews on the Web reflect users' sentiment about products, services and social events. Existing researches mostly focus on the sentiment classification of the product and service reviews in document level. Reviews of social events such as economic and political activities, which are called social reviews, have specific characteristics different to the reviews of products and services. In this paper, we propose an unsupervised approach to automatically discover the aspects discussed in Chinese social reviews and also the sentiments expressed in different aspects. The approach is called Multi-aspect Sentiment Analysis for Chinese Online Social Reviews (MSA-COSRs). We first apply the Latent Dirichlet Allocation (LDA) model to discover multi-aspect global topics of social reviews, and then extract the local topic and associated sentiment based on a sliding window context over the review text. The aspect of the local topic is identified by a trained LDA model, and the polarity of the associated sentiment is classified by HowNet lexicon. The experiment results show that MSA-COSR cannot only obtain good topic partitioning results, but also help to improve sentiment analysis accuracy. It helps to simultaneously discover multi-aspect fine-grained topics and associated sentiment.

Proceedings Article
01 Jan 2013
TL;DR: This paper proposes a technique to leverage topic based sentiments from Twitter to help predict the stock market by utilizing a con- tinuous Dirichlet Process Mixture model to learn the daily topic set and regress the stock index and the Twitter sentiment time series to predict the market.
Abstract: This paper proposes a technique to leverage topic based sentiments from Twitter to help predict the stock market. We first utilize a con- tinuous Dirichlet Process Mixture model to learn the daily topic set. Then, for each topic we derive its sentiment according to its opin- ion words distribution to build a sentiment time series. We then regress the stock index and the Twitter sentiment time series to predict the market. Experiments on real-life S&P100 Index show that our approach is effective and performs better than existing state-of-the-art non-topic based methods.

Proceedings ArticleDOI
01 May 2013
TL;DR: This research proposes a hybrid user location preference model by combining the preference extracted from check-ins and text-based tips which is processed using sentiment analysis techniques and develops a location based social matrix factorization algorithm that takes both user social influence and venue similarity influence into account in location recommendation.
Abstract: Although online recommendation systems such as recommendation of movies or music have been systematically studied in the past decade, location recommendation in Location Based Social Networks (LBSNs) is not well investigated yet. In LBSNs, users can check in and leave tips commenting on a venue. These two heterogeneous data sources both describe users' preference of venues. However, in current research work, only users' check-in behavior is considered in users' location preference model, users' tips on venues are seldom investigated yet. Moreover, while existing work mainly considers social influence in recommendation, we argue that considering venue similarity can further improve the recommendation performance. In this research, we ameliorate location recommendation by enhancing not only the user location preference model but also recommendation algorithm. First, we propose a hybrid user location preference model by combining the preference extracted from check-ins and text-based tips which are processed using sentiment analysis techniques. Second, we develop a location based social matrix factorization algorithm that takes both user social influence and venue similarity influence into account in location recommendation. Using two datasets extracted from the location based social networks Foursquare, experiment results demonstrate that the proposed hybrid preference model can better characterize user preference by maintaining the preference consistency, and the proposed algorithm outperforms the state-of-the-art methods.

Posted Content
TL;DR: The authors identified three subtasks that need to be addressed: definition of the target, separation of the good and bad news content from the good or bad sentiment expressed on the target; and analysis of clearly marked opinion that is expressed explicitly, not needing interpretation or the use of world knowledge.
Abstract: Recent years have brought a significant growth in the volume of research in sentiment analysis, mostly on highly subjective text types (movie or product reviews). The main difference these texts have with news articles is that their target is clearly defined and unique across the text. Following different annotation efforts and the analysis of the issues encountered, we realised that news opinion mining is different from that of other text types. We identified three subtasks that need to be addressed: definition of the target; separation of the good and bad news content from the good and bad sentiment expressed on the target; and analysis of clearly marked opinion that is expressed explicitly, not needing interpretation or the use of world knowledge. Furthermore, we distinguish three different possible views on newspaper articles - author, reader and text, which have to be addressed differently at the time of analysing sentiment. Given these definitions, we present work on mining opinions about entities in English language news, in which (a) we test the relative suitability of various sentiment dictionaries and (b) we attempt to separate positive or negative opinion from good or bad news. In the experiments described here, we tested whether or not subject domain-defining vocabulary should be ignored. Results showed that this idea is more appropriate in the context of news opinion mining and that the approaches taking this into consideration produce a better performance.