scispace - formally typeset
Open AccessJournal ArticleDOI

Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis

Zhao Jianqiang, +1 more
- 22 Feb 2017 - 
- Vol. 5, pp 2870-2879
TLDR
The experiments show that the accuracy and F1-measure of Twitter sentiment classification classifier are improved when using the pre-processing methods of expanding acronyms and replacing negation, but barely changes when removing URLs, removing numbers or stop words.
Abstract
Twitter sentiment analysis offers organizations ability to monitor public feeling towards the products and events related to them in real time. The first step of the sentiment analysis is the text pre-processing of Twitter data. Most existing researches about Twitter sentiment analysis are focused on the extraction of new sentiment features. However, to select the pre-processing method is ignored. This paper discussed the effects of text pre-processing method on sentiment classification performance in two types of classification tasks, and summed up the classification performances of six pre-processing methods using two feature models and four classifiers on five Twitter datasets. The experiments show that the accuracy and F1-measure of Twitter sentiment classification classifier are improved when using the pre-processing methods of expanding acronyms and replacing negation, but barely changes when removing URLs, removing numbers or stop words. The Naive Bayes and Random Forest classifiers are more sensitive than Logistic Regression and support vector machine classifiers when various pre-processing methods were applied.

read more

Citations
More filters
Book

Information retrieval

TL;DR: The major change in the second edition of this book is the addition of a new chapter on probabilistic retrieval, which I think is one of the most interesting and active areas of research in information retrieval.
Journal ArticleDOI

A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis

TL;DR: It is found that techniques like lemmatization, removing numbers, and replacing contractions, improve accuracy, while others like removing punctuation do not, and the significance of techniques such as replacing numbers and replacing repetitions of punctuation are shown.
Journal ArticleDOI

A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks

TL;DR: An effort to map the current research topics in Twitter focusing on three major areas: the structure and properties of the social graph, sentiment analysis and threats such as spam, bots, fake news and hate speech is presented.
Journal ArticleDOI

Systematic literature review of sentiment analysis on Twitter using soft computing techniques

TL;DR: This work presents a systematic literature review to collate, explore, understand, understand and analyze the efforts and trends in a well‐structured manner to identify research gaps defining the future prospects of this coupling of soft computing techniques for sentiment analysis on Twitter.
Journal ArticleDOI

SentiDiff: Combining Textual Information and Sentiment Diffusion Patterns for Twitter Sentiment Analysis

TL;DR: This work considers the inter-relationships between textual information of Twitter messages and sentiment diffusion patterns, and proposes an iterative algorithm called SentiDiff to predict sentiment polarities expressed in Twitter messages to help improve Twitter sentiment analysis.
References
More filters
Proceedings Article

SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.

TL;DR: This work discusses SENTIWORDNET 3.0, a lexical resource explicitly devised for supporting sentiment classification and opinion mining applications, and reports on the improvements concerning aspect (b) that it embodies with respect to version 1.0.
Proceedings Article

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

TL;DR: This paper shows how to automatically collect a corpus for sentiment analysis and opinion mining purposes and builds a sentiment classifier, that is able to determine positive, negative and neutral sentiments for a document.
Book

Information Retrieval: Data Structures and Algorithms

TL;DR: For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents.

Sentiment Analysis of Twitter Data

TL;DR: This article introduced POS-specific prior polarity features and explored the use of a tree kernel to obviate the need for tedious feature engineering for sentiment analysis on Twitter data, which outperformed the state-of-the-art baseline.
Proceedings Article

Twitter Sentiment Analysis: The Good the Bad and the OMG!

TL;DR: This paper evaluates the usefulness of existing lexical resources as well as features that capture information about the informal and creative language used in microblogging, and uses existing hashtags in the Twitter data for building training data.