Showing papers on "Sentiment analysis published in 2015"

PDF

Open Access

Journal Article•DOI•

A survey on opinion mining and sentiment analysis

[...]

Kumar Satish Ravi¹, Vadlamani Ravi•Institutions (1)

01 Nov 2015-Knowledge Based Systems

TL;DR: A rigorous survey on sentiment analysis is presented, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications of sentiment analysis.

...read moreread less

Abstract: With the advent of Web 2.0, people became more eager to express and share their opinions on web regarding day-to-day activities and global issues as well. Evolution of social media has also contributed immensely to these activities, thereby providing us a transparent platform to share views across the world. These electronic Word of Mouth (eWOM) statements expressed on the web are much prevalent in business and service industry to enable customer to share his/her point of view. In the last one and half decades, research communities, academia, public and service industries are working rigorously on sentiment analysis, also known as, opinion mining, to extract and analyze public mood and views. In this regard, this paper presents a rigorous survey on sentiment analysis, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications of sentiment analysis. Several sub-tasks need to be performed for sentiment analysis which in turn can be accomplished using various approaches and techniques. This survey covering published literature during 2002-2015, is organized on the basis of sub-tasks to be performed, machine learning and natural language processing techniques used and applications of sentiment analysis. The paper also presents open issues and along with a summary table of a hundred and sixty-one articles.

...read moreread less

1,011 citations

Proceedings Article•DOI•

Deep Unordered Composition Rivals Syntactic Methods for Text Classification

[...]

Mohit Iyyer¹, Varun Manjunatha¹, Jordan Boyd-Graber¹, Hal Daumé²•Institutions (2)

University of Maryland, College Park¹, University of Colorado Boulder²

01 Jul 2015

TL;DR: This work presents a simple deep neural network that competes with and, in some cases, outperforms such models on sentiment analysis and factoid question answering tasks while taking only a fraction of the training time.

...read moreread less

Abstract: Many existing deep learning models for natural language processing tasks focus on learning the compositionality of their inputs, which requires many expensive computations. We present a simple deep neural network that competes with and, in some cases, outperforms such models on sentiment analysis and factoid question answering tasks while taking only a fraction of the training time. While our model is syntactically-ignorant, we show significant improvements over previous bag-of-words models by deepening our network and applying a novel variant of dropout. Moreover, our model performs better than syntactic models on datasets with high syntactic variance. We show that our model makes similar errors to syntactically-aware models, indicating that for the tasks we consider, nonlinearly transforming the input is more important than tailoring a network to incorporate word order and syntax.

...read moreread less

824 citations

Proceedings Article•DOI•

SemEval-2015 Task 12: Aspect Based Sentiment Analysis

[...]

Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Suresh Manandhar¹, Ion Androutsopoulos² - Show less +1 more•Institutions (2)

University of York¹, Athens University of Economics and Business²

01 Jun 2015

TL;DR: The task provided manually annotated reviews in three domains (restaurants, laptops and hotels), and a common evaluation procedure, to foster research beyond sentenceor text-level sentiment classification towards Aspect Based Sentiment Analysis.

...read moreread less

Abstract: SemEval-2015 Task 12, a continuation of SemEval-2014 Task 4, aimed to foster research beyond sentenceor text-level sentiment classification towards Aspect Based Sentiment Analysis. The goal is to identify opinions expressed about specific entities (e.g., laptops) and their aspects (e.g., price). The task provided manually annotated reviews in three domains (restaurants, laptops and hotels), and a common evaluation procedure. It attracted 93 submissions from 16 teams.

...read moreread less

807 citations

Journal Article•DOI•

Sentiment of Emojis.

[...]

Petra Kralj Novak¹, Jasmina Smailović¹, Borut Sluban¹, Igor Mozetič¹•Institutions (1)

Jožef Stefan Institute¹

07 Dec 2015-PLOS ONE

TL;DR: The first emoji sentiment lexicon is provided, called the Emoji Sentiment Ranking, and a sentiment map of the 751 most frequently used emojis is drawn, which indicates that most of the emoji are positive, especially the most popular ones.

...read moreread less

Abstract: There is a new generation of emoticons, called emojis, that is increasingly being used in mobile communications and social media. In the past two years, over ten billion emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to the small number of well-known emoticons that carry clear emotional contents, there are hundreds of emojis. But what are their emotional contents? We provide the first emoji sentiment lexicon, called the Emoji Sentiment Ranking, and draw a sentiment map of the 751 most frequently used emojis. The sentiment of the emojis is computed from the sentiment of the tweets in which they occur. We engaged 83 human annotators to label over 1.6 million tweets in 13 European languages by the sentiment polarity (negative, neutral, or positive). About 4% of the annotated tweets contain emojis. The sentiment analysis of the emojis allows us to draw several interesting conclusions. It turns out that most of the emojis are positive, especially the most popular ones. The sentiment distribution of the tweets with and without emojis is significantly different. The inter-annotator agreement on the tweets with emojis is higher. Emojis tend to occur at the end of the tweets, and their sentiment polarity increases with the distance. We observe no significant differences in the emoji rankings between the 13 languages and the Emoji Sentiment Ranking. Consequently, we propose our Emoji Sentiment Ranking as a European language-independent resource for automated sentiment analysis. Finally, the paper provides a formalization of sentiment and a novel visualization in the form of a sentiment bar.

...read moreread less

629 citations

Book•

Sentiment Analysis: Mining Opinions, Sentiments, and Emotions

[...]

Bing Liu¹•Institutions (1)

University of Illinois at Chicago¹

01 Jun 2015

TL;DR: Sentiment analysis is the computational study of people's opinions, sentiments, emotions, moods, and attitudes as discussed by the authors, which offers numerous research challenges, but promises insight useful to anyone interested in opinion analysis and social media analysis.

...read moreread less

Abstract: Sentiment analysis is the computational study of people's opinions, sentiments, emotions, moods, and attitudes. This fascinating problem offers numerous research challenges, but promises insight useful to anyone interested in opinion analysis and social media analysis. This comprehensive introduction to the topic takes a natural-language-processing point of view to help readers understand the underlying structure of the problem and the language constructs commonly used to express opinions, sentiments, and emotions. The book covers core areas of sentiment analysis and also includes related topics such as debate analysis, intention mining, and fake-opinion detection. It will be a valuable resource for researchers and practitioners in natural language processing, computer science, management sciences, and the social sciences.In addition to traditional computational methods, this second edition includes recent deep learning methods to analyze and summarize sentiments and opinions, and also new material on emotion and mood analysis techniques, emotion-enhanced dialogues, and multimodal emotion analysis.

...read moreread less

587 citations

Proceedings Article•DOI•

Twitter Sentiment Analysis with Deep Convolutional Neural Networks

[...]

Aliaksei Severyn¹, Alessandro Moschitti²•Institutions (2)

Google¹, Qatar Computing Research Institute²

09 Aug 2015

TL;DR: A comparison between the results of the approach and the systems participating in the challenge on the official test sets, suggests that the model could be ranked in the first two positions in both the phrase-level subtask A and the message- level subtask B on Twitter Sentiment Analysis.

...read moreread less

Abstract: This paper describes our deep learning system for sentiment analysis of tweets. The main contribution of this work is a new model for initializing the parameter weights of the convolutional neural network, which is crucial to train an accurate model while avoiding the need to inject any additional features. Briefly, we use an unsupervised neural language model to train initial word embeddings that are further tuned by our deep learning model on a distant supervised corpus. At a final stage, the pre-trained parameters of the network are used to initialize the model. We train the latter on the supervised training data recently made available by the official system evaluation campaign on Twitter Sentiment Analysis organized by Semeval-2015. A comparison between the results of our approach and the systems participating in the challenge on the official test sets, suggests that our model could be ranked in the first two positions in both the phrase-level subtask A (among 11 teams) and on the message-level subtask B (among 40 teams). This is an important evidence on the practical value of our solution.

...read moreread less

582 citations

Book•

Python Machine Learning

[...]

Sebastian Raschka

01 Sep 2015

TL;DR: Covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras, and featuring guidance and tips on everything from sentiment analysis to neural networks, you'll soon be able to answer some of the most important questions facing you and your organization.

...read moreread less

Abstract: Unlock deeper insights into Machine Leaning with this vital guide to cutting-edge predictive analyticsAbout This BookLeverage Python's most powerful open-source libraries for deep learning, data wrangling, and data visualizationLearn effective strategies and best practices to improve and optimize machine learning systems and algorithmsAsk and answer tough questions of your data with robust statistical models, built for a range of datasetsWho This Book Is ForIf you want to find out how to use Python to start answering critical questions of your data, pick up Python Machine Learning whether you want to get started from scratch or want to extend your data science knowledge, this is an essential and unmissable resource.What You Will LearnExplore how to use different machine learning models to ask different questions of your dataLearn how to build neural networks using Keras and TheanoFind out how to write clean and elegant Python code that will optimize the strength of your algorithmsDiscover how to embed your machine learning model in a web application for increased accessibilityPredict continuous target outcomes using regression analysisUncover hidden patterns and structures in data with clusteringOrganize data using effective pre-processing techniquesGet to grips with sentiment analysis to delve deeper into textual and social media dataIn DetailMachine learning and predictive analytics are transforming the way businesses and other organizations operate. Being able to understand trends and patterns in complex data is critical to success, becoming one of the key strategies for unlocking growth in a challenging contemporary marketplace. Python can help you deliver key insights into your data its unique capabilities as a language let you build sophisticated algorithms and statistical models that can reveal new perspectives and answer key questions that are vital for success.Python Machine Learning gives you access to the world of predictive analytics and demonstrates why Python is one of the world's leading data science languages. If you want to ask better questions of data, or need to improve and extend the capabilities of your machine learning systems, this practical data science book is invaluable. Covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras, and featuring guidance and tips on everything from sentiment analysis to neural networks, you'll soon be able to answer some of the most important questions facing you and your organization.Style and approachPython Machine Learning connects the fundamental theoretical principles behind machine learning to their practical application in a way that focuses you on asking and answering the right questions. It walks you through the key elements of Python and its powerful machine learning libraries, while demonstrating how to get to grips with a range of statistical models.

...read moreread less

546 citations

Journal Article•DOI•

Sentiment analysis using product review data

[...]

Xing Fang¹, Justin Zhan¹•Institutions (1)

North Carolina Agricultural and Technical State University¹

16 Jun 2015-Journal of Big Data

TL;DR: A general process for sentiment polarity categorization is proposed with detailed process descriptions and insight into the future work on sentiment analysis is given.

...read moreread less

Abstract: Sentiment analysis or opinion mining is one of the major tasks of NLP (Natural Language Processing). Sentiment analysis has gain much attention in recent years. In this paper, we aim to tackle the problem of sentiment polarity categorization, which is one of the fundamental problems of sentiment analysis. A general process for sentiment polarity categorization is proposed with detailed process descriptions. Data used in this study are online product reviews collected from Amazon.com. Experiments for both sentence-level categorization and review-level categorization are performed with promising outcomes. At last, we also give insight into our future work on sentiment analysis.

...read moreread less

523 citations

Posted Content•

Text Understanding from Scratch

[...]

Xiang Zhang¹, Yann LeCun¹•Institutions (1)

New York University¹

05 Feb 2015-arXiv: Learning

TL;DR: It is shown that temporal ConvNets can achieve astonishing performance without the knowledge of words, phrases, sentences and any other syntactic or semantic structures with regards to a human language.

...read moreread less

Abstract: This article demontrates that we can apply deep learning to text understanding from character-level inputs all the way up to abstract text concepts, using temporal convolutional networks (ConvNets). We apply ConvNets to various large-scale datasets, including ontology classification, sentiment analysis, and text categorization. We show that temporal ConvNets can achieve astonishing performance without the knowledge of words, phrases, sentences and any other syntactic or semantic structures with regards to a human language. Evidence shows that our models can work for both English and Chinese.

...read moreread less

507 citations

Journal Article•DOI•

Social media analytics: a survey of techniques, tools and platforms

[...]

Bogdan Batrinca¹, Philip Treleaven¹•Institutions (1)

University College London¹

01 Feb 2015-Ai & Society

TL;DR: A comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds, and the system architecture of a social media (analytics) platform built by University College London is presented.

...read moreread less

Abstract: This paper is written for (social science) researchers seeking to analyze the wealth of social media now available. It presents a comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds. For completeness, it also includes introductions to social media scraping, storage, data cleaning and sentiment analysis. Although principally a review, the paper also provides a methodology and a critique of social media tools. Analyzing social media, in particular Twitter feeds for sentiment analysis, has become a major research and business activity due to the availability of web-based application programming interfaces (APIs) provided by Twitter, Facebook and News services. This has led to an `explosion' of data services, software tools for scraping and analysis and social media analytics platforms. It is also a research area undergoing rapid change and evolution due to commercial pressures and the potential for using social media data for computational (social science) research. Using a simple taxonomy, this paper provides a review of leading software tools and how to use them to scrape, cleanse and analyze the spectrum of social media. In addition, it discussed the requirement of an experimental computational environment for social media research and presents as an illustration the system architecture of a social media (analytics) platform built by University College London. The principal contribution of this paper is to provide an overview (including code fragments) for scientists seeking to utilize social media scraping and analytics either in their research or business. The data retrieval techniques that are presented in this paper are valid at the time of writing this paper (June 2014), but they are subject to change since social media data scraping APIs are rapidly changing.

...read moreread less

471 citations

Journal Article•DOI•

Sentiment analysis

[...]

Jesus Serrano-Guerrero¹, José A. Olivas¹, Francisco P. Romero¹, Enrique Herrera-Viedma²•Institutions (2)

University of Castilla–La Mancha¹, King Abdulaziz University²

01 Aug 2015-Information Sciences

TL;DR: The goal of this work is to review and compare some free access web services, analyzing their capabilities to classify and score different pieces of text with respect to the sentiments contained therein.

...read moreread less

Proceedings Article•DOI•

Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis

[...]

Soujanya Poria¹, Erik Cambria¹, Alexander Gelbukh•Institutions (1)

Nanyang Technological University¹

01 Jan 2015

TL;DR: A novel way of extracting features from short texts, based on the activation values of an inner layer of a deep convolutional neural network, is presented and a parallelizable decision-level data fusion method is presented, which is much faster, though slightly less accurate.

...read moreread less

Abstract: We present a novel way of extracting features from short texts, based on the activation values of an inner layer of a deep convolutional neural network. We use the extracted features in multimodal sentiment analysis of short video clips representing one sentence each. We use the combined feature vectors of textual, visual, and audio modalities to train a classifier based on multiple kernel learning, which is known to be good at heterogeneous data. We obtain 14% performance improvement over the state of the art and present a parallelizable decision-level data fusion method, which is much faster, though slightly less accurate.

...read moreread less

39. Opinion mining and sentiment analysis

[...]

Eric Breck, Claire Cardie

01 Jan 2015

TL;DR: This paper introduced an idealised, end-to-end opinion analysis system and described its components, including constructing opinion lexica, performing sentiment analysis, and producing opinion summaries, which can be used for sentiment analysis.

...read moreread less

Abstract: Opinions are ubiquitous in text, and readers of on-line text — from consumers to sports fans to news addicts to governments — can benefit from automatic methods that synthesise useful opinion-orientated information from the sea of data In this chapter on opinion mining and sentiment analysis, we introduce an idealised, end-to-end opinion analysis system and describe its components, including constructing opinion lexica, performing sentiment analysis, and producing opinion summaries

...read moreread less

Journal Article•DOI•

Sentiment analysis on social media for stock movement prediction

[...]

Thien Hai Nguyen¹, Kiyoaki Shirai¹, Julien Velcin²•Institutions (2)

Japan Advanced Institute of Science and Technology¹, University of Lyon²

30 Dec 2015-Expert Systems With Applications

TL;DR: This paper shows an evaluation of the effectiveness of the sentiment analysis in the stock prediction task via a large scale experiment and a novel method for predicting stock price movement using the sentiment from social media.

...read moreread less

Abstract: A novel method for predicting stock price movement was presentedTopics and sentiments of them were extracted from social media as the featureTwo methods were proposed to capture the topic-sentiment featureIntegration of the sentiments was investigated via a large scale experimentOur model outperformed other methods in the average accuracy of 18 stocks The goal of this research is to build a model to predict stock price movement using the sentiment from social media Unlike previous approaches where the overall moods or sentiments are considered, the sentiments of the specific topics of the company are incorporated into the stock prediction model Topics and related sentiments are automatically extracted from the texts in a message board by using our proposed method as well as existing topic models In addition, this paper shows an evaluation of the effectiveness of the sentiment analysis in the stock prediction task via a large scale experiment Comparing the accuracy average over 18 stocks in one year transaction, our method achieved 207% better performance than the model using historical prices only Furthermore, when comparing the methods only for the stocks that are difficult to predict, our method achieved 983% better accuracy than historical price method, and 303% better than human sentiment method

...read moreread less

Proceedings Article•DOI•

Bug report, feature request, or simply praise? On automatically classifying app reviews

[...]

Walid Maalej¹, Hadeer Nabil¹•Institutions (1)

University of Hamburg¹

01 Aug 2015

TL;DR: This paper introduces several probabilistic techniques to classify app reviews into four types: bug reports, feature requests, user experiences, and ratings, and conducts a series of experiments to compare the accuracy of the techniques and compared them with simple string matching.

...read moreread less

Abstract: App stores like Google Play and Apple AppStore have over 3 Million apps covering nearly every kind of software and service. Billions of users regularly download, use, and review these apps. Recent studies have shown that reviews written by the users represent a rich source of information for the app vendors and the developers, as they include information about bugs, ideas for new features, or documentation of released features. This paper introduces several probabilistic techniques to classify app reviews into four types: bug reports, feature requests, user experiences, and ratings. For this we use review metadata such as the star rating and the tense, as well as, text classification, natural language processing, and sentiment analysis techniques. We conducted a series of experiments to compare the accuracy of the techniques and compared them with simple string matching. We found that metadata alone results in a poor classification accuracy. When combined with natural language processing, the classification precision got between 70–95% while the recall between 80–90%. Multiple binary classifiers outperformed single multiclass classifiers. Our results impact the design of review analytics tools which help app vendors, developers, and users to deal with the large amount of reviews, filter critical reviews, and assign them to the appropriate stakeholders.

...read moreread less

Proceedings Article•DOI•

How can i improve my app? Classifying user reviews for software maintenance and evolution

[...]

Sebastiano Panichella¹, Andrea Di Sorbo², Emitza Guzman³, Corrado Aaron Visaggio², Gerardo Canfora², Harald C. Gall¹ - Show less +2 more•Institutions (3)

University of Zurich¹, University of Sannio², Technische Universität München³

29 Sep 2015

TL;DR: This paper presents a taxonomy to classify app reviews into categories relevant to software maintenance and evolution, as well as an approach that merges three techniques: (1) Natural Language Processing, (2) Text Analysis and (3) Sentiment Analysis to automatically classify app Reviews into the proposed categories.

...read moreread less

Abstract: App Stores, such as Google Play or the Apple Store, allow users to provide feedback on apps by posting review comments and giving star ratings. These platforms constitute a useful electronic mean in which application developers and users can productively exchange information about apps. Previous research showed that users feedback contains usage scenarios, bug reports and feature requests, that can help app developers to accomplish software maintenance and evolution tasks. However, in the case of the most popular apps, the large amount of received feedback, its unstructured nature and varying quality can make the identification of useful user feedback a very challenging task. In this paper we present a taxonomy to classify app reviews into categories relevant to software maintenance and evolution, as well as an approach that merges three techniques: (1) Natural Language Processing, (2) Text Analysis and (3) Sentiment Analysis to automatically classify app reviews into the proposed categories. We show that the combined use of these techniques allows to achieve better results (a precision of 75% and a recall of 74%) than results obtained using each technique individually (precision of 70% and a recall of 67%).

...read moreread less

Journal Article•DOI•

Insights from hashtag #supplychain and Twitter Analytics: Considering Twitter and Twitter data for supply chain practice and research

[...]

Bongsug Chae¹•Institutions (1)

College of Business Administration¹

01 Jul 2015-International Journal of Production Economics

TL;DR: In this article, the authors proposed a novel analytical framework (Twitter Analytics) for analyzing supply chain tweets, highlighting the current use of Twitter in supply chain contexts, and further developing insights into the potential role of Twitter for supply chain practice and research.

...read moreread less

Journal Article•DOI•

Using Hashtags to Capture Fine Emotion Categories from Tweets

[...]

Saif M. Mohammad¹, Svetlana Kiritchenko¹•Institutions (1)

National Research Council¹

01 May 2015

TL;DR: It is shown that emotion‐word hashtags are good manual labels of emotions in tweets and a method to generate a large lexicon of word–emotion associations from this emotion‐labeled tweet corpus is proposed, which is the first lexicon with real‐valued word‐emotion association scores.

...read moreread less

Abstract: Detecting emotions in microblogs and social media posts has applications for industry, health, and security. Statistical, supervised automatic methods for emotion detection rely on text that is labeled for emotions, but such data are rare and available for only a handful of basic emotions. In this article, we show that emotion-word hashtags are good manual labels of emotions in tweets. We also propose a method to generate a large lexicon of word-emotion associations from this emotion-labeled tweet corpus. This is the first lexicon with real-valued word-emotion association scores. We begin with experiments for six basic emotions and show that the hashtag annotations are consistent and match with the annotations of trained judges. We also show how the extracted tweet corpus and word-emotion associations can be used to improve emotion classification accuracy in a different nontweet domain.

...read moreread less

Journal Article•DOI•

A Lexicon-based Approach for Hate Speech Detection

[...]

Njagi Dennis Gitari, Zhang Zuping¹, Zuping Zhang¹, Hanyurwimfura Damien, Jun Long¹ - Show less +1 more•Institutions (1)

Central South University¹

30 Apr 2015

TL;DR: The goal of the research is to create a model classifier that uses sentiment analysis techniques and in particular subjectivity detection to not only detect that a given sentence is subjective but also to identify and rate the polarity of sentiment expressions.

...read moreread less

Abstract: We explore the idea of creating a classifier that can be used to detect presence of hate speech in web discourses such as web forums and blogs. In this work, hate speech problem is abstracted into three main thematic areas of race, nationality and religion. The goal of our research is to create a model classifier that uses sentiment analysis techniques and in particular subjectivity detection to not only detect that a given sentence is subjective but also to identify and rate the polarity of sentiment expressions. We begin by whittling down the document size by removing objective sentences. Then, using subjectivity and semantic features related to hate speech, we create a lexicon that is employed to build a classifier for hate speech detection. Experiments with a hate corpus show significant practical application for a real-world web discourse.

...read moreread less

Proceedings Article•DOI•

SemEval-2015 Task 10: Sentiment Analysis in Twitter

[...]

Sara Rosenthal¹, Preslav Nakov², Svetlana Kiritchenko², Saif M. Mohammad³, Alan Ritter³, Veselin Stoyanov⁴ - Show less +2 more•Institutions (4)

Columbia University¹, Qatar Foundation², University of Washington³, Facebook⁴

01 Jun 2015

TL;DR: The 2015 iteration of the SemEval shared task on Sentiment Analysis in Twitter was the most popular sentiment analysis shared task to date with more than 40 teams participating in each of the last three years.

...read moreread less

Abstract: In this paper, we describe the 2015 iteration of the SemEval shared task on Sentiment Analysis in Twitter. This was the most popular sentiment analysis shared task to date with more than 40 teams participating in each of the last three years. This year’s shared task competition consisted of five sentiment prediction subtasks. Two were reruns from previous years: (A) sentiment expressed by a phrase in the context of a tweet, and (B) overall sentiment of a tweet. We further included three new subtasks asking to predict (C) the sentiment towards a topic in a single tweet, (D) the overall sentiment towards a topic in a set of tweets, and (E) the degree of prior polarity of a phrase.

...read moreread less

Journal Article•DOI•

Survey of review spam detection using machine learning techniques

[...]

Michael Crawford¹, Taghi M. Khoshgoftaar¹, Joseph D. Prusa¹, Aaron N. Richter¹, Hamzah Al Najada¹ - Show less +1 more•Institutions (1)

Florida Atlantic University¹

05 Oct 2015-Journal of Big Data

TL;DR: A strong and comprehensive comparative study of current research on detecting review spam using various machine learning techniques and to devise methodology for conducting further investigation is provided.

...read moreread less

Abstract: Online reviews are often the primary factor in a customer’s decision to purchase a product or service, and are a valuable source of information that can be used to determine public opinion on these products or services. Because of their impact, manufacturers and retailers are highly concerned with customer feedback and reviews. Reliance on online reviews gives rise to the potential concern that wrongdoers may create false reviews to artificially promote or devalue products and services. This practice is known as Opinion (Review) Spam, where spammers manipulate and poison reviews (i.e., making fake, untruthful, or deceptive reviews) for profit or gain. Since not all online reviews are truthful and trustworthy, it is important to develop techniques for detecting review spam. By extracting meaningful features from the text using Natural Language Processing (NLP), it is possible to conduct review spam detection using various machine learning techniques. Additionally, reviewer information, apart from the text itself, can be used to aid in this process. In this paper, we survey the prominent machine learning techniques that have been proposed to solve the problem of review spam detection and the performance of different approaches for classification and detection of review spam. The majority of current research has focused on supervised learning methods, which require labeled data, a scarcity when it comes to online review spam. Research on methods for Big Data are of interest, since there are millions of online reviews, with many more being generated daily. To date, we have not found any papers that study the effects of Big Data analytics for review spam detection. The primary goal of this paper is to provide a strong and comprehensive comparative study of current research on detecting review spam using various machine learning techniques and to devise methodology for conducting further investigation.

...read moreread less

Proceedings Article•DOI•

Learning Semantic Representations of Users and Products for Document Level Sentiment Classification

[...]

Duyu Tang¹, Bing Qin¹, Ting Liu¹•Institutions (1)

Harbin Institute of Technology¹

01 Jul 2015

TL;DR: By combining evidence at user-, product and documentlevel in a unified neural framework, the proposed model achieves state-of-the-art performances on IMDB and Yelp datasets1.

...read moreread less

Abstract: Neural network methods have achieved promising results for sentiment classification of text. However, these models only use semantics of texts, while ignoring users who express the sentiment and products which are evaluated, both of which have great influences on interpreting the sentiment of text. In this paper, we address this issue by incorporating userand productlevel information into a neural network approach for document level sentiment classification. Users and products are modeled using vector space models, the representations of which capture important global clues such as individual preferences of users or overall qualities of products. Such global evidence in turn facilitates embedding learning procedure at document level, yielding better text representations. By combining evidence at user-, productand documentlevel in a unified neural framework, the proposed model achieves state-of-the-art performances on IMDB and Yelp datasets1.

...read moreread less

Journal Article•DOI•

Recommender systems based on user reviews: the state of the art

[...]

Li Chen¹, Guanliang Chen¹, Feng Wang¹•Institutions (1)

Hong Kong Baptist University¹

01 Jun 2015-User Modeling and User-adapted Interaction

TL;DR: This article provides a comprehensive overview of how the review elements have been exploited to improve standard content-based recommending, collaborative filtering, and preference-based product ranking techniques and classifies state-of-the-art studies into two principal branches: review-based user profile building and review- based product profile building.

...read moreread less

Abstract: In recent years, a variety of review-based recommender systems have been developed, with the goal of incorporating the valuable information in user-generated textual reviews into the user modeling and recommending process. Advanced text analysis and opinion mining techniques enable the extraction of various types of review elements, such as the discussed topics, the multi-faceted nature of opinions, contextual information, comparative opinions, and reviewers' emotions. In this article, we provide a comprehensive overview of how the review elements have been exploited to improve standard content-based recommending, collaborative filtering, and preference-based product ranking techniques. The review-based recommender system's ability to alleviate the well-known rating sparsity and cold-start problems is emphasized. This survey classifies state-of-the-art studies into two principal branches: review-based user profile building and review-based product profile building. In the user profile sub-branch, the reviews are not only used to create term-based profiles, but also to infer or enhance ratings. Multi-faceted opinions can further be exploited to derive the weight/value preferences that users place on particular features. In another sub-branch, the product profile can be enriched with feature opinions or comparative opinions to better reflect its assessment quality. The merit of each branch of work is discussed in terms of both algorithm development and the way in which the proposed algorithms are evaluated. In addition, we discuss several future trends based on the survey, which may inspire investigators to pursue additional studies in this area.

...read moreread less

Posted Content•

Document embedding with paragraph vectors

[...]

Andrew M. Dai, Chris Olah, Quoc V. Le

29 Jul 2015-arXiv: Computation and Language

TL;DR: This work observes that the Paragraph Vector method performs significantly better than other methods, and proposes a simple improvement to enhance embedding quality, and shows that much like word embeddings, vector operations on Paragraph Vectors can perform useful semantic results.

...read moreread less

Abstract: Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts. In their work, the authors showed that the method can learn an embedding of movie review texts which can be leveraged for sentiment analysis. That proof of concept, while encouraging, was rather narrow. Here we consider tasks other than sentiment analysis, provide a more thorough comparison of Paragraph Vectors to other document modelling algorithms such as Latent Dirichlet Allocation, and evaluate performance of the method as we vary the dimensionality of the learned representation. We benchmarked the models on two document similarity data sets, one from Wikipedia, one from arXiv. We observe that the Paragraph Vector method performs significantly better than other methods, and propose a simple improvement to enhance embedding quality. Somewhat surprisingly, we also show that much like word embeddings, vector operations on Paragraph Vectors can perform useful semantic results.

...read moreread less

Journal Article•DOI•

Chinese comments sentiment classification based on word2vec and SVMperf

[...]

Dongwen Zhang¹, Hua Xu², Zengcai Su¹, Yunfeng Xu¹•Institutions (2)

Hebei University of Science and Technology¹, Tsinghua University²

01 Mar 2015-Expert Systems With Applications

TL;DR: A method for sentiment classification based on word2vec and SVMperf is proposed, which trains faster and predicts more accurate than other SVM packages and can reach more than 90% accuracy.

...read moreread less

Abstract: We achieve similar features clustering using word2vec.A method for sentiment classification based on word2vec and SVMperf is proposed.Word2vec can extract deep semantic features between words.SVMperf trains faster and predicts more accurate than other SVM packages.Our classification result can reach more than 90% accuracy. Since the booming development of e-commerce in the last decade, the researchers have begun to pay more attention to extract the valuable information from consumers comments. Sentiment classification, which focuses on classify the comments into positive class and negative class according to the polarity of sentiment, is one of the studies. Machine learning-based method for sentiment classification becomes mainstream due to its outstanding performance. Most of the existing researches are centered on the extraction of lexical features and syntactic features, while the semantic relationships between words are ignored. In this paper, in order to get the semantic features, we propose a method for sentiment classification based on word2vec and SVMperf. Our research consists of two parts of work. First of all, we use word2vec to cluster the similar features for purpose of showing the capability of word2vec to capture the semantic features in selected domain and Chinese language. And then, we train and classify the comment texts using word2vec again and SVMperf. In the process, the lexicon-based and part-of-speech-based feature selection methods are respectively adopted to generate the training file. We conduct the experiments on the data set of Chinese comments on clothing products. The experimental results show the superior performance of our method in sentiment classification.

...read moreread less

Proceedings Article•

Target-dependent twitter sentiment classification with rich automatic features

[...]

Duy Tin Vo¹, Yue Zhang¹•Institutions (1)

Singapore University of Technology and Design¹

25 Jul 2015

TL;DR: This paper shows that competitive results can be achieved without the use of syntax, by extracting a rich set of automatic features from a tweet, using distributed word representations and neural pooling functions to extract features.

...read moreread less

Abstract: Target-dependent sentiment analysis on Twitter has attracted increasing research attention. Most previous work relies on syntax, such as automatic parse trees, which are subject to noise for informal text such as tweets. In this paper, we show that competitive results can be achieved without the use of syntax, by extracting a rich set of automatic features. In particular, we split a tweet into a left context and a right context according to a given target, using distributed word representations and neural pooling functions to extract features. Both sentiment-driven and standard embeddings are used, and a rich set of neural pooling functions are explored. Sentiment lexicons are used as an additional source of information for feature extraction. In standard evaluation, the conceptually simple method gives a 4.8% absolute improvement over the state-of-the-art on three-way targeted sentiment classification, achieving the best reported results for this task.

...read moreread less

Proceedings Article•DOI•

ASTD: Arabic Sentiment Tweets Dataset

[...]

Mahmoud Nabil¹, Mohamed Aly², Amir F. Atiya³•Institutions (3)

Cairo University¹, Google², California Institute of Technology³

01 Sep 2015

TL;DR: ASTD, an Arabic social sentiment analysis dataset gathered from Twitter, consists of about 10,000 tweets which are classified as objective, subjective positive, subjective negative, and subjective mixed.

...read moreread less

Abstract: This paper introduces ASTD, an Arabic social sentiment analysis dataset gathered from Twitter. It consists of about 10,000 tweets which are classified as objective, subjective positive, subjective negative, and subjective mixed. We present the properties and the statistics of the dataset, and run experiments using standard partitioning of the dataset. Our experiments provide benchmark results for 4 way sentiment classification on the dataset.

...read moreread less

Proceedings Article•DOI•

PhraseRNN: Phrase Recursive Neural Network for Aspect-based Sentiment Analysis

[...]

Thien Hai Nguyen¹, Kiyoaki Shirai¹•Institutions (1)

Japan Advanced Institute of Science and Technology¹

01 Sep 2015

TL;DR: A new method is presented that takes both dependency and constituent trees of a sentence into account and significantly outperforms previous methods to identify sentiment of an aspect of an entity.

...read moreread less

Abstract: This paper presents a new method to identify sentiment of an aspect of an entity. It is an extension of RNN (Recursive Neural Network) that takes both dependency and constituent trees of a sentence into account. Results of an experiment show that our method significantly outperforms previous methods.

...read moreread less

Journal Article•DOI•

Sentiment, emotion, purpose, and style in electoral tweets

[...]

Saif M. Mohammad¹, Xiaodan Zhu¹, Svetlana Kiritchenko¹, Joel Martin¹•Institutions (1)

National Research Council¹

01 Jul 2015-Information Processing and Management

TL;DR: This work automatically annotates a set of 2012 US presidential election tweets for a number of attributes pertaining to sentiment, emotion, purpose, and style by crowdsourcing, and shows that the tweets convey negative emotions twice as often as positive.

...read moreread less

Abstract: We automatically compile a dataset of 2012 US presidential election tweets.We annotate the tweets for sentiment, emotion, style, and purpose.We show that the tweets convey negative emotions twice as often as positive.We describe two automatic systems that predict emotion and purpose in tweets. Social media is playing a growing role in elections world-wide. Thus, automatically analyzing electoral tweets has applications in understanding how public sentiment is shaped, tracking public sentiment and polarization with respect to candidates and issues, understanding the impact of tweets from various entities, etc. Here, for the first time, we automatically annotate a set of 2012 US presidential election tweets for a number of attributes pertaining to sentiment, emotion, purpose, and style by crowdsourcing. Overall, more than 100,000 crowdsourced responses were obtained for 13 questions on emotions, style, and purpose. Additionally, we show through an analysis of these annotations that purpose, even though correlated with emotions, is significantly different. Finally, we describe how we developed automatic classifiers, using features from state-of-the-art sentiment analysis systems, to predict emotion and purpose labels, respectively, in new unseen tweets. These experiments establish baseline results for automatic systems on this new data.

...read moreread less

Journal Article•DOI•

Text mining of news-headlines for FOREX market prediction

[...]

Arman Khadjeh Nassirtoussi¹, Saeed Aghabozorgi¹, Teh Ying Wah¹, David Chek Ling Ngo²•Institutions (2)

Information Technology University¹, Sunway University²

01 Jan 2015-Expert Systems With Applications

TL;DR: A novel approach is proposed to predict intraday directional-movements of a currency-pair in the foreign exchange market based on the text of breaking financial news-headlines and produces a multi-layer algorithm that tackles each of the mentioned aspects of the text-mining problem at a designated layer.

...read moreread less

Abstract: FOREX prediction through text mining of news is viable and effective.Feature-selection by abstraction of word-hypernyms increases prediction accuracy.Feature-weighting based on the sum of pos and neg sentiment scores is effective.Feature-reduction based on maximum optimization for prediction-target is crucial. In this paper a novel approach is proposed to predict intraday directional-movements of a currency-pair in the foreign exchange market based on the text of breaking financial news-headlines. The motivation behind this work is twofold: First, although market-prediction through text-mining is shown to be a promising area of work in the literature, the text-mining approaches utilized in it at this stage are not much beyond basic ones as it is still an emerging field. This work is an effort to put more emphasis on the text-mining methods and tackle some specific aspects thereof that are weak in previous works, namely: the problem of high dimensionality as well as the problem of ignoring sentiment and semantics in dealing with textual language. This research assumes that addressing these aspects of text-mining have an impact on the quality of the achieved results. The proposed system proves this assumption to be right. The second part of the motivation is to research a specific market, namely, the foreign exchange market, which seems not to have been researched in the previous works based on predictive text-mining. Therefore, results of this work also successfully demonstrate a predictive relationship between this specific market-type and the textual data of news. Besides the above two main components of the motivation, there are other specific aspects that make the setup of the proposed system and the conducted experiment unique, for example, the use of news article-headlines only and not news article-bodies, which enables usage of short pieces of text rather than long ones; or the use of general financial breaking news without any further filtration.In order to accomplish the above, this work produces a multi-layer algorithm that tackles each of the mentioned aspects of the text-mining problem at a designated layer. The first layer is termed the Semantic Abstraction Layer and addresses the problem of co-reference in text mining that is contributing to sparsity. Co-reference occurs when two or more words in a text corpus refer to the same concept. This work produces a custom approach by the name of Heuristic-Hypernyms Feature-Selection which creates a way to recognize words with the same parent-word to be regarded as one entity. As a result, prediction accuracy increases significantly at this layer which is attributed to appropriate noise-reduction from the feature-space.The second layer is termed Sentiment Integration Layer, which integrates sentiment analysis capability into the algorithm by proposing a sentiment weight by the name of SumScore that reflects investors' sentiment. Additionally, this layer reduces the dimensions by eliminating those that are of zero value in terms of sentiment and thereby improves prediction accuracy.The third layer encompasses a dynamic model creation algorithm, termed Synchronous Targeted Feature Reduction (STFR). It is suitable for the challenge at hand whereby the mining of a stream of text is concerned. It updates the models with the most recent information available and, more importantly, it ensures that the dimensions are reduced to the absolute minimum.The algorithm and each of its layers are extensively evaluated using real market data and news content across multiple years and have proven to be solid and superior to any other comparable solution. The proposed techniques implemented in the system, result in significantly high directional-accuracies of up to 83.33%.On top of a well-rounded multifaceted algorithm, this work contributes a much needed research framework for this context with a test-bed of data that must make future research endeavors more convenient. The produced algorithm is scalable and its modular design allows improvement in each of its layers in future research. This paper provides ample details to reproduce the entire system and the conducted experiments.

...read moreread less

Collapse