scispace - formally typeset
Open AccessJournal ArticleDOI

A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis.

Reads0
Chats0
TLDR
In this paper, the authors performed Covid-19 tweets sentiment analysis using a supervised machine learning approach using a bag-of-words and the term frequency-inverse document frequency.
Abstract
The spread of Covid-19 has resulted in worldwide health concerns. Social media is increasingly used to share news and opinions about it. A realistic assessment of the situation is necessary to utilize resources optimally and appropriately. In this research, we perform Covid-19 tweets sentiment analysis using a supervised machine learning approach. Identification of Covid-19 sentiments from tweets would allow informed decisions for better handling the current pandemic situation. The used dataset is extracted from Twitter using IDs as provided by the IEEE data port. Tweets are extracted by an in-house built crawler that uses the Tweepy library. The dataset is cleaned using the preprocessing techniques and sentiments are extracted using the TextBlob library. The contribution of this work is the performance evaluation of various machine learning classifiers using our proposed feature set. This set is formed by concatenating the bag-of-words and the term frequency-inverse document frequency. Tweets are classified as positive, neutral, or negative. Performance of classifiers is evaluated on the accuracy, precision, recall, and F1 score. For completeness, further investigation is made on the dataset using the Long Short-Term Memory (LSTM) architecture of the deep learning model. The results show that Extra Trees Classifiers outperform all other models by achieving a 0.93 accuracy score using our proposed concatenated features set. The LSTM achieves low accuracy as compared to machine learning classifiers. To demonstrate the effectiveness of our proposed feature set, the results are compared with the Vader sentiment analysis technique based on the GloVe feature extraction approach.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Impact of SMOTE on Imbalanced Text Features for Toxic Comments Classification Using RVVC Model

TL;DR: In this paper, an ensemble approach, called regression vector voting classifier (RVVC), was introduced to identify the toxic comments on social media platforms, which merges the logistic regression and support vector classifier under soft voting criteria.
Journal ArticleDOI

Deep Learning-Based Methods for Sentiment Analysis on Nepali COVID-19-Related Tweets.

TL;DR: In this paper, the authors analyzed people's sentiment based on the classification of tweets collected from the social media platform, Twitter, in Nepal and used three different feature extraction methods-fastText-based (ft), domain-specific (ds), and domain-agnostic (da) for the representation of tweets.
Journal ArticleDOI

Sentiment Analysis and Topic Modeling on Tweets about Online Education during COVID-19

TL;DR: In this paper, the authors investigated the effectiveness of e-learning by analyzing the sentiments of people about online learning using a Twitter dataset containing 17,155 tweets about elearning and found that uncertainty of campus opening date, children's disabilities to grasp online education, and lagging efficient networks for online education are the top three problems.
Journal ArticleDOI

Sentiment Analysis of Nepali COVID19 Tweets Using NB, SVM AND LSTM

TL;DR: Four language-based models for sentiment analysis of Nepali covid19 tweets are designed and evaluated and will greatly assist firms in adapting to the changing climate.
Journal ArticleDOI

Reliability of Google Trends: Analysis of the Limits and Potential of Web Infoveillance During COVID-19 Pandemic and for Future Research

TL;DR: In this article, the authors focus on the analysis of relative search volumes (RSVs) quantifying their dependence on the day they are collected, using the Welch's t-test to assess the statistical significance of the differences between the average RSVs of the various countries, regions, or cities of a given dataset.
References
More filters
Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Proceedings ArticleDOI

XGBoost: A Scalable Tree Boosting System

TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.
Proceedings Article

VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text

TL;DR: Interestingly, using the authors' parsimonious rule-based model to assess the sentiment of tweets, it is found that VADER outperforms individual human raters, and generalizes more favorably across contexts than any of their benchmarks.
Journal ArticleDOI

The origin, transmission and clinical therapies on coronavirus disease 2019 (COVID-19) outbreak - an update on the status.

TL;DR: The latest research progress of the epidemiology, pathogenesis, and clinical characteristics of COVID-19 are summarized, and the current treatment and scientific advancements to combat the epidemic novel coronavirus are discussed.
Journal ArticleDOI

A survey of decision tree classifier methodology

TL;DR: The subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed, and the relation between decision trees and neutral networks (NN) is also discussed.
Related Papers (5)