scispace - formally typeset
Open AccessDOI

Hope Speech Detection for Dravidian Languages Using Cross-Lingual Embeddings with Stacked Encoder Architecture.

TLDR
In this paper, a multilingual model, with main emphasis on Dravidian languages, was proposed to automatically detect hope speech, which achieved an F1-score of 0.61 and 0.85 for Tamil and Malayalam, respectively.
Abstract
The task of hope speech detection has gained traction in the natural language processing field owing to the need for an increase in positive reinforcement online during the COVID-19 pandemic. Hope speech detection focuses on identifying texts among social media comments that could invoke positive emotions in people. Students and working adults alike posit that they experience a lot of work-induced stress further proving that there exists a need for external inspiration which in this current scenario, is mostly found online. In this paper, we propose a multilingual model, with main emphasis on Dravidian languages, to automatically detect hope speech. We have employed a stacked encoder architecture which makes use of language agnostic cross-lingual word embeddings as the dataset consists of code-mixed YouTube comments. Additionally, we have carried out an empirical analysis and tested our architecture against various traditional, transformer, and transfer learning methods. Furthermore a k-fold paired t test was conducted which corroborates that our model outperforms the other approaches. Our methodology achieved an F1-score of 0.61 and 0.85 for Tamil and Malayalam, respectively. Our methodology is quite competitive to the state-of-the-art methods. The code for our work can be found in our GitHub repository (https://github.com/arunimasundar/Hope-Speech-LT-EDI).

read more

Citations
More filters
Journal ArticleDOI

Language-agnostic deep learning framework for automatic monitoring of population-level mental health from social networks

TL;DR: In this article , a framework for monitoring real-time mental health indicators from social media data without using labeled datasets in low-resource languages is presented. But, because of the limits of fundamental natural language processing tools and labeled corpora in countries with limited natural language resources, implementing social media systems to monitor mental health signals could be challenging.
Journal ArticleDOI

Hope speech detection in Spanish

TL;DR: In this paper , the authors define hope speech as the type of speech that is able to relax a hostile environment and that helps, gives suggestions and inspires for good to a number of people when they are in times of illness, stress, loneliness or depression.
Posted ContentDOI

On finetuning Adapter-based Transformer models for classifying Abusive Social Media Tamil Comments

TL;DR: In this paper , the authors make use of abusive Tamil language comments released by the workshop “Tamil DravidianLangTech@ACL 2022” and develop adapter-based multilingual transformer models namely Muril, XLMRoBERTa and mBERT to classify the abusive comments.
References
More filters
Journal Article

Scikit-learn: Machine Learning in Python

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Journal ArticleDOI

SMOTE: synthetic minority over-sampling technique

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Journal ArticleDOI

SMOTE: Synthetic Minority Over-sampling Technique

TL;DR: In this article, a method of over-sampling the minority class involves creating synthetic minority class examples, which is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Posted Content

Attention Is All You Need

TL;DR: A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Posted Content

Bidirectional LSTM-CRF Models for Sequence Tagging

TL;DR: This work is the first to apply a bidirectional LSTM CRF model to NLP benchmark sequence tagging data sets and it is shown that the BI-LSTM-CRF model can efficiently use both past and future input features thanks to a biddirectional L STM component.