Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis

doi:10.1109/ACCESS.2017.2672677

Open AccessJournal ArticleDOI

Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis

Zhao Jianqiang, +1 more

- 22 Feb 2017 -

IEEE Access

- Vol. 5, pp 2870-2879

TLDR

The experiments show that the accuracy and F1-measure of Twitter sentiment classification classifier are improved when using the pre-processing methods of expanding acronyms and replacing negation, but barely changes when removing URLs, removing numbers or stop words.

Abstract:

Twitter sentiment analysis offers organizations ability to monitor public feeling towards the products and events related to them in real time. The first step of the sentiment analysis is the text pre-processing of Twitter data. Most existing researches about Twitter sentiment analysis are focused on the extraction of new sentiment features. However, to select the pre-processing method is ignored. This paper discussed the effects of text pre-processing method on sentiment classification performance in two types of classification tasks, and summed up the classification performances of six pre-processing methods using two feature models and four classifiers on five Twitter datasets. The experiments show that the accuracy and F1-measure of Twitter sentiment classification classifier are improved when using the pre-processing methods of expanding acronyms and replacing negation, but barely changes when removing URLs, removing numbers or stop words. The Naive Bayes and Random Forest classifiers are more sensitive than Logistic Regression and support vector machine classifiers when various pre-processing methods were applied.

Citations

PDF

Open Access

More filters

Book

Information retrieval

C. J. Van Rijsbergen

TL;DR: The major change in the second edition of this book is the addition of a new chapter on probabilistic retrieval, which I think is one of the most interesting and active areas of research in information retrieval.

...read moreread less

Journal ArticleDOI

A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis

Symeon Symeonidis, +2 more

- 15 Nov 2018 -

Expert Systems With Applications

TL;DR: It is found that techniques like lemmatization, removing numbers, and replacing contractions, improve accuracy, while others like removing punctuation do not, and the significance of techniques such as replacing numbers and replacing repetitions of punctuation are shown.

...read moreread less

Journal ArticleDOI

A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks

Despoina Antonakaki, +3 more

- 01 Feb 2021 -

Expert Systems With Applications

TL;DR: An effort to map the current research topics in Twitter focusing on three major areas: the structure and properties of the social graph, sentiment analysis and threats such as spam, bots, fake news and hate speech is presented.

...read moreread less

Journal ArticleDOI

Systematic literature review of sentiment analysis on Twitter using soft computing techniques

Akshi Kumar, +1 more

- 10 Jan 2020 -

Concurrency and Computation: Practice an...

TL;DR: This work presents a systematic literature review to collate, explore, understand, understand and analyze the efforts and trends in a well‐structured manner to identify research gaps defining the future prospects of this coupling of soft computing techniques for sentiment analysis on Twitter.

...read moreread less

Journal ArticleDOI

SentiDiff: Combining Textual Information and Sentiment Diffusion Patterns for Twitter Sentiment Analysis

Lei Wang, +2 more

- 01 Oct 2020 -

IEEE Transactions on Knowledge and Data ...

TL;DR: This work considers the inter-relationships between textual information of Twitter messages and sentiment diffusion patterns, and proposes an iterative algorithm called SentiDiff to predict sentiment polarities expressed in Twitter messages to help improve Twitter sentiment analysis.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.

Stefano Baccianella, +2 more

TL;DR: This work discusses SENTIWORDNET 3.0, a lexical resource explicitly devised for supporting sentiment classification and opinion mining applications, and reports on the improvements concerning aspect (b) that it embodies with respect to version 1.0.

...read moreread less

Proceedings Article

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Alexander Pak, +1 more

TL;DR: This paper shows how to automatically collect a corpus for sentiment analysis and opinion mining purposes and builds a sentiment classifier, that is able to determine positive, negative and neutral sentiments for a document.

...read moreread less

Book

Information Retrieval: Data Structures and Algorithms

William B. Frakes, +1 more

TL;DR: For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents.

...read moreread less

Sentiment Analysis of Twitter Data

Apoorv Agarwal, +4 more

TL;DR: This article introduced POS-specific prior polarity features and explored the use of a tree kernel to obviate the need for tedious feature engineering for sentiment analysis on Twitter data, which outperformed the state-of-the-art baseline.

...read moreread less

Proceedings Article

Twitter Sentiment Analysis: The Good the Bad and the OMG!

Efthymios Kouloumpis, +2 more

TL;DR: This paper evaluates the usefulness of existing lexical resources as well as features that capture information about the informal and creative language used in microblogging, and uses existing hashtags in the Twitter data for building training data.

...read moreread less

Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis

Citations

Information retrieval

A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis

A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks

Systematic literature review of sentiment analysis on Twitter using soft computing techniques

SentiDiff: Combining Textual Information and Sentiment Diffusion Patterns for Twitter Sentiment Analysis

References

SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Information Retrieval: Data Structures and Algorithms

Sentiment Analysis of Twitter Data

Twitter Sentiment Analysis: The Good the Bad and the OMG!

Related Papers (5)

Sentiment analysis algorithms and applications: A survey

Glove: Global Vectors for Word Representation

Opinion Mining and Sentiment Analysis

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Convolutional Neural Networks for Sentence Classification