scispace - formally typeset
Search or ask a question
Journal Article

Sentiment Analysis and Text Mining for Social Media Microblogs using Open Source Tools: An Empirical Study

18 Feb 2015-International Journal of Computer Applications (Foundation of Computer Science (FCS))-Vol. 112, Iss: 5, pp 44-48
TL;DR: An open source approach is presented, throughout which, twitter Microblogs data has been collected, pre-processed, analyzed and visualized using open source tools to perform text mining and sentiment analysis for analyzing user contributed online reviews about two giant retail stores in the UK namely Tesco and Asda stores over Christmas period 2014.
Abstract: Social media has arisen not only as a personal communication media, but also, as a media to communicate opinions about products and services or even political and general events among its users. Due to its widespread and popularity, there is a massive amount of user reviews or opinions produced and shared daily. Twitter is one of the most widely used social media micro blogging sites. Mining user opinions from social media data is not a straight forward task; it can be accomplished in different ways. In this work, an open source approach is presented, throughout which, twitter Microblogs data has been collected, pre-processed, analyzed and visualized using open source tools to perform text mining and sentiment analysis for analyzing user contributed online reviews about two giant retail stores in the UK namely Tesco and Asda stores over Christmas period 2014. Collecting customer opinions can be expensive and time consuming task using conventional methods such as surveys. The sentiment analysis of the customer opinions makes it easier for businesses to understand their competitive value in a changing market and to understand their customer views about their products and services, which also provide an insight into future marketing strategies and decision making policies.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This work presents the first attempt at fusing and modelling data from environmental and physiological sources collected from sensors in a real-world setting and predicts emotions based on-body sensors and environmental data.

92 citations

Journal ArticleDOI
TL;DR: In this paper, the proposed improved RBF kernel of SVM-performed with 98.8% of accuracy when compared with the existing S VM-RBF classifier and other models.
Abstract: The sentiment analysis has gained its importance in recent years. People had improved their way of expressing their opinions about products, services, celebrities, and current topics in internet portals, blogs and social networks. The social network websites like Face book, Twitter, WhatsApp, LinkedIn and Hike messenger, providing the users to express their feelings by using the different symbols like smiley’s, funny faces, etc., These social media websites provide a platform to display peoples’ opinions on topics like movies, products, fashion trends, politics, technologies were expressed. The E-Commerce portals like Amazon, Flip Kart, Snap deal etc., help the people to express their opinions on products. A framework is proposed in this work to find the scores of the opinions and derive conclusions. The classification of opinions is called opinion mining, whereas deriving the scores for those opinions are called sentiment analysis. Here the Classification techniques are used for opinion mining and the scores to those opinions are given by taking a scale from –5 to +5.In this work, a movie review data set has been collected from the twitter reviews (http://ai.stanford.edu/~amaas/data/sentiment/) between the years 2003 and 2012. The Word net lexicon dictionary is used to compare the emotions for obtaining the score. In this paper, the proposed improved RBF kernel of SVM-performed with 98.8% of accuracy when compared with the existing SVM-RBF classifier and other models.

80 citations

Journal ArticleDOI
TL;DR: In this article, the wisdom pyramid methodology is used to conduct a systematic review of current digital frontiers in Healthcare 4.0.
Abstract: Healthcare 4.0 is a term that has emerged recently and derived from Industry 4.0. Today, the health care sector is more digital than in past decades; for example, spreading from x‐rays and magnetic resonance imaging to computed tomography and ultrasound scans to electric medical records. With the wide spectrum of digital technologies underpinning Healthcare 4.0 to deliver more effective and efficient health care services, in this article, we use the wisdom pyramid methodology to conduct a systematic review of current digital frontiers in Healthcare 4.0.

79 citations

Proceedings ArticleDOI
13 Nov 2014
TL;DR: A new method for path planning for UAV to avoid obstacle coming in its path based on the combination of Genetic Algorithms and Artificial Neural networks has been proposed in which the output generated from the Genetic Algorithm is used to train the network of Artificial Neural Networks.
Abstract: The planning of path for Unmanned Aerial Vehicle (UAV) is always considered to be a vital task. Path planning for UAV for avoiding the obstacle in its path can be accomplished by finding the solution for an optimization problem. Genetic Algorithm which is a global optimization tool can be of great use to solve the optimization problem for path planning of UAV. Artificial Neural Network (ANN) works well for function fitting quickly and can be used to approximate almost any function. The Genetic Algorithms are good at converging to the globally optimum solution generation by generation. Each generation is expected to be better than its previous generation. Neural Networks work faster than Genetic Algorithms for finding the solution to a given problem but may get converged to local optimum instead of global optimum. In this paper a new method for path planning for UAV to avoid obstacle coming in its path based on the combination of Genetic Algorithms and Artificial Neural Networks has been proposed in which the output generated from the Genetic Algorithms is used to train the network of Artificial Neural Networks. The model for path planning is based on 3D digital map.

45 citations


Cites background from "Sentiment Analysis and Text Mining ..."

  • ...Younis [12] work was focused on the users reviews and opinions that were being produced on the micro blogging websites on regular basis ....

    [...]

Journal ArticleDOI
01 Dec 2020
TL;DR: In the derived approach, an analysis is performed on Twitter data for World Cup soccer 2014 to detect the sentiment of the people throughout the world using machine learning techniques and sentiment polarity was calculated based on the emotion words detected in the user tweets.
Abstract: In the derived approach, an analysis is performed on Twitter data for World Cup soccer 2014 held in Brazil to detect the sentiment of the people throughout the world using machine learning techniques. By filtering and analyzing the data using natural language processing techniques, sentiment polarity was calculated based on the emotion words detected in the user tweets. The dataset is normalized to be used by machine learning algorithms and prepared using natural language processing techniques like word tokenization, stemming and lemmatization, part-of-speech (POS) tagger, name entity recognition (NER), and parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK). A derived algorithm extracts emotional words using WordNet with its POS (part-of-speech) for the word in a sentence that has a meaning in the current context, and is assigned sentiment polarity using the SentiWordNet dictionary or using a lexicon-based method. The resultant polarity assigned is further analyzed using naive Bayes, support vector machine (SVM), K-nearest neighbor (KNN), and random forest machine learning algorithms and visualized on the Weka platform. Naive Bayes gives the best accuracy of 88.17% whereas random forest gives the best area under the receiver operating characteristics curve (AUC) of 0.97.

38 citations

References
More filters
Journal ArticleDOI
TL;DR: The Semantic Orientation CALculator (SO-CAL) uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation, and is applied to the polarity classification task.
Abstract: We present a lexicon-based approach to extracting sentiment from text. The Semantic Orientation CALculator (SO-CAL) uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. SO-CAL is applied to the polarity classification task, the process of assigning a positive or negative label to a text that captures the text's opinion towards its main subject matter. We show that SO-CAL's performance is consistent across domains and in completely unseen data. Additionally, we describe the process of dictionary creation, and our use of Mechanical Turk to check dictionaries for consistency and reliability.

2,798 citations


"Sentiment Analysis and Text Mining ..." refers methods in this paper

  • ...The first is opinion lexicon-based approach [14], in which, the lexicon is composed of a set of positive and negative opinion words, used to score the opinion sentences either, positive, negative or neutral....

    [...]

Proceedings Article
16 May 2010
TL;DR: It is found that the mere number of messages mentioning a party reflects the election result, and joint mentions of two parties are in line with real world political ties and coalitions.
Abstract: Twitter is a microblogging website where users read and write millions of short messages on a variety of topics every day This study uses the context of the German federal election to investigate whether Twitter is used as a forum for political deliberation and whether online messages on Twitter validly mirror offline political sentiment Using LIWC text analysis software, we conducted a content-analysis of over 100,000 messages containing a reference to either a political party or a politician Our results show that Twitter is indeed used extensively for political deliberation We find that the mere number of messages mentioning a party reflects the election result Moreover, joint mentions of two parties are in line with real world political ties and coalitions An analysis of the tweets’ political sentiment demonstrates close correspondence to the parties' and politicians’ political positions indicating that the content of Twitter messages plausibly reflects the offline political landscape We discuss the use of microblogging message content as a valid indicator of political sentiment and derive suggestions for further research

2,718 citations


"Sentiment Analysis and Text Mining ..." refers methods in this paper

  • ...al [10, 13], proposed a method for mining opinions from twitter about presidential elections candidates and predicting the election results....

    [...]

Proceedings Article
01 May 2010
TL;DR: This paper shows how to automatically collect a corpus for sentiment analysis and opinion mining purposes and builds a sentiment classifier, that is able to determine positive, negative and neutral sentiments for a document.
Abstract: Microblogging today has become a very popular communication tool among Internet users. Millions of users share opinions on different aspects of life everyday. Therefore microblogging web-sites are rich sources of data for opinion mining and sentiment analysis. Because microblogging has appeared relatively recently, there are a few research works that were devoted to this topic. In our paper, we focus on using Twitter, the most popular microblogging platform, for the task of sentiment analysis. We show how to automatically collect a corpus for sentiment analysis and opinion mining purposes. We perform linguistic analysis of the collected corpus and explain discovered phenomena. Using the corpus, we build a sentiment classifier, that is able to determine positive, negative and neutral sentiments for a document. Experimental evaluations show that our proposed techniques are efficient and performs better than previously proposed methods. In our research, we worked with English, however, the proposed technique can be used with any other language.

2,570 citations


"Sentiment Analysis and Text Mining ..." refers methods in this paper

  • ...al [5], used supervised technique to build a classifier using Part of speech tagger and N-gram methods and used the classifier to classify opinions....

    [...]

Proceedings ArticleDOI
31 Aug 2010
TL;DR: It is shown that a simple model built from the rate at which tweets are created about particular topics can outperform market-based predictors and improve the forecasting power of social media.
Abstract: In recent years, social media has become ubiquitous and important for social networking and content sharing. And yet, the content that is generated from these websites remains largely untapped. In this paper, we demonstrate how social media content can be used to predict real-world outcomes. In particular, we use the chatter from Twitter.com to forecast box-office revenues for movies. We show that a simple model built from the rate at which tweets are created about particular topics can outperform market-based predictors. We further demonstrate how sentiments extracted from Twitter can be utilized to improve the forecasting power of social media.

1,909 citations

01 Jan 2010
TL;DR: In this article, the authors focus on opinion expressions that convey people's positive or negative sentiments, i.e., opinions are subjective expressions that describe people's sentiments, appraisals or feelings toward entities, events and their properties.
Abstract: Textual information in the world can be broadly categorized into two main types: facts and opinions. Facts are objective expressions about entities, events and their properties. Opinions are usually subjective expressions that describe people’s sentiments, appraisals or feelings toward entities, events and their properties. The concept of opinion is very broad. In this chapter, we only focus on opinion expressions that convey people’s positive or negative sentiments. Much of the existing research on textual information processing has been focused on mining and retrieval of factual information, e.g., information retrieval, Web search, text classification, text clustering and many other text mining and natural language processing tasks. Little work had been done on the processing of opinions until only recently. Yet, opinions are so important that whenever we need to make a decision we want to hear others’ opinions. This is not only true for individuals but also true for organizations. One of the main reasons for the lack of study on opinions is the fact that there was little opinionated text available before the World Wide Web. Before the Web, when an individual needed to make a decision, he/she typically asked for opinions from friends and families. When an organization wanted to find the opinions or sentiments of the general public about its products and services, it conducted opinion polls, surveys, and focus groups. However, with the Web, especially with the explosive growth of the usergenerated content on the Web in the past few years, the world has been transformed. The Web has dramatically changed the way that people express their views and opinions. They can now post reviews of products at merchant sites and express their views on almost anything in Internet forums, discussion groups, and blogs, which are collectively called the user-generated content. This online wordof-mouth behavior represents new and measurable sources of information with many practical applications. Now if one wants to buy a product, he/she is no longer limited to asking his/her friends and families because there are many product reviews on the Web which give opinions of existing users of the product. For a company, it may no longer be necessary to conduct surveys, organize focus groups or employ external consultants in order to find consumer opinions about its products and those of its competitors because the user-generated content on the Web can already give them such information.

1,575 citations