A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis.

doi:10.1371/JOURNAL.PONE.0245909

Open AccessJournal ArticleDOI

A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis.

Furqan Rustam, +5 more

- 25 Feb 2021 -

PLOS ONE

- Vol. 16, Iss: 2

Chats0

TLDR

In this paper, the authors performed Covid-19 tweets sentiment analysis using a supervised machine learning approach using a bag-of-words and the term frequency-inverse document frequency.

Abstract:

The spread of Covid-19 has resulted in worldwide health concerns. Social media is increasingly used to share news and opinions about it. A realistic assessment of the situation is necessary to utilize resources optimally and appropriately. In this research, we perform Covid-19 tweets sentiment analysis using a supervised machine learning approach. Identification of Covid-19 sentiments from tweets would allow informed decisions for better handling the current pandemic situation. The used dataset is extracted from Twitter using IDs as provided by the IEEE data port. Tweets are extracted by an in-house built crawler that uses the Tweepy library. The dataset is cleaned using the preprocessing techniques and sentiments are extracted using the TextBlob library. The contribution of this work is the performance evaluation of various machine learning classifiers using our proposed feature set. This set is formed by concatenating the bag-of-words and the term frequency-inverse document frequency. Tweets are classified as positive, neutral, or negative. Performance of classifiers is evaluated on the accuracy, precision, recall, and F1 score. For completeness, further investigation is made on the dataset using the Long Short-Term Memory (LSTM) architecture of the deep learning model. The results show that Extra Trees Classifiers outperform all other models by achieving a 0.93 accuracy score using our proposed concatenated features set. The LSTM achieves low accuracy as compared to machine learning classifiers. To demonstrate the effectiveness of our proposed feature set, the results are compared with the Vader sentiment analysis technique based on the GloVe feature extraction approach.

A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis.

Citations

Impact of SMOTE on Imbalanced Text Features for Toxic Comments Classification Using RVVC Model

Deep Learning-Based Methods for Sentiment Analysis on Nepali COVID-19-Related Tweets.

Sentiment Analysis and Topic Modeling on Tweets about Online Education during COVID-19

Sentiment Analysis of Nepali COVID19 Tweets Using NB, SVM AND LSTM

Reliability of Google Trends: Analysis of the Limits and Potential of Web Infoveillance During COVID-19 Pandemic and for Future Research

References

Glove: Global Vectors for Word Representation

XGBoost: A Scalable Tree Boosting System

VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text

The origin, transmission and clinical therapies on coronavirus disease 2019 (COVID-19) outbreak - an update on the status.

A survey of decision tree classifier methodology

Related Papers (5)

A Tweet Sentiment Classification Approach Using a Hybrid Stacked Ensemble Technique

Tweets Classification on the Base of Sentiments for US Airline Companies

Emotion Recognition by Textual Tweets Classification Using Voting Classifier (LR-SGD)

Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques

The Effect of Dataset Size on Training Tweet Sentiment Classifiers