Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis
Zhao Jianqiang,Gui Xiaolin +1 more
TLDR
The experiments show that the accuracy and F1-measure of Twitter sentiment classification classifier are improved when using the pre-processing methods of expanding acronyms and replacing negation, but barely changes when removing URLs, removing numbers or stop words.Abstract:
Twitter sentiment analysis offers organizations ability to monitor public feeling towards the products and events related to them in real time. The first step of the sentiment analysis is the text pre-processing of Twitter data. Most existing researches about Twitter sentiment analysis are focused on the extraction of new sentiment features. However, to select the pre-processing method is ignored. This paper discussed the effects of text pre-processing method on sentiment classification performance in two types of classification tasks, and summed up the classification performances of six pre-processing methods using two feature models and four classifiers on five Twitter datasets. The experiments show that the accuracy and F1-measure of Twitter sentiment classification classifier are improved when using the pre-processing methods of expanding acronyms and replacing negation, but barely changes when removing URLs, removing numbers or stop words. The Naive Bayes and Random Forest classifiers are more sensitive than Logistic Regression and support vector machine classifiers when various pre-processing methods were applied.read more
Citations
More filters
Book
Information retrieval
TL;DR: The major change in the second edition of this book is the addition of a new chapter on probabilistic retrieval, which I think is one of the most interesting and active areas of research in information retrieval.
Journal ArticleDOI
A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis
TL;DR: It is found that techniques like lemmatization, removing numbers, and replacing contractions, improve accuracy, while others like removing punctuation do not, and the significance of techniques such as replacing numbers and replacing repetitions of punctuation are shown.
Journal ArticleDOI
A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks
TL;DR: An effort to map the current research topics in Twitter focusing on three major areas: the structure and properties of the social graph, sentiment analysis and threats such as spam, bots, fake news and hate speech is presented.
Journal ArticleDOI
Systematic literature review of sentiment analysis on Twitter using soft computing techniques
Akshi Kumar,Arunima Jaiswal +1 more
TL;DR: This work presents a systematic literature review to collate, explore, understand, understand and analyze the efforts and trends in a well‐structured manner to identify research gaps defining the future prospects of this coupling of soft computing techniques for sentiment analysis on Twitter.
Journal ArticleDOI
SentiDiff: Combining Textual Information and Sentiment Diffusion Patterns for Twitter Sentiment Analysis
Lei Wang,Jianwei Niu,Shui Yu +2 more
TL;DR: This work considers the inter-relationships between textual information of Twitter messages and sentiment diffusion patterns, and proposes an iterative algorithm called SentiDiff to predict sentiment polarities expressed in Twitter messages to help improve Twitter sentiment analysis.
References
More filters
Proceedings Article
SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.
TL;DR: This work discusses SENTIWORDNET 3.0, a lexical resource explicitly devised for supporting sentiment classification and opinion mining applications, and reports on the improvements concerning aspect (b) that it embodies with respect to version 1.0.
Proceedings Article
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Alexander Pak,Patrick Paroubek +1 more
TL;DR: This paper shows how to automatically collect a corpus for sentiment analysis and opinion mining purposes and builds a sentiment classifier, that is able to determine positive, negative and neutral sentiments for a document.
Book
Information Retrieval: Data Structures and Algorithms
TL;DR: For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents.
Sentiment Analysis of Twitter Data
TL;DR: This article introduced POS-specific prior polarity features and explored the use of a tree kernel to obviate the need for tedious feature engineering for sentiment analysis on Twitter data, which outperformed the state-of-the-art baseline.
Proceedings Article
Twitter Sentiment Analysis: The Good the Bad and the OMG!
TL;DR: This paper evaluates the usefulness of existing lexical resources as well as features that capture information about the informal and creative language used in microblogging, and uses existing hashtags in the Twitter data for building training data.