scispace - formally typeset
Search or ask a question
Author

Arunima Sundar

Bio: Arunima Sundar is an academic researcher from Sri Sivasubramaniya Nadar College of Engineering. The author has contributed to research in topics: Voice activity detection & Dravidian languages. The author has co-authored 1 publications.

Papers
More filters
DOI
01 Jan 2022
TL;DR: In this paper, a multilingual model, with main emphasis on Dravidian languages, was proposed to automatically detect hope speech, which achieved an F1-score of 0.61 and 0.85 for Tamil and Malayalam, respectively.
Abstract: The task of hope speech detection has gained traction in the natural language processing field owing to the need for an increase in positive reinforcement online during the COVID-19 pandemic. Hope speech detection focuses on identifying texts among social media comments that could invoke positive emotions in people. Students and working adults alike posit that they experience a lot of work-induced stress further proving that there exists a need for external inspiration which in this current scenario, is mostly found online. In this paper, we propose a multilingual model, with main emphasis on Dravidian languages, to automatically detect hope speech. We have employed a stacked encoder architecture which makes use of language agnostic cross-lingual word embeddings as the dataset consists of code-mixed YouTube comments. Additionally, we have carried out an empirical analysis and tested our architecture against various traditional, transformer, and transfer learning methods. Furthermore a k-fold paired t test was conducted which corroborates that our model outperforms the other approaches. Our methodology achieved an F1-score of 0.61 and 0.85 for Tamil and Malayalam, respectively. Our methodology is quite competitive to the state-of-the-art methods. The code for our work can be found in our GitHub repository (https://github.com/arunimasundar/Hope-Speech-LT-EDI).

4 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article , a framework for monitoring real-time mental health indicators from social media data without using labeled datasets in low-resource languages is presented. But, because of the limits of fundamental natural language processing tools and labeled corpora in countries with limited natural language resources, implementing social media systems to monitor mental health signals could be challenging.

3 citations

Journal ArticleDOI
TL;DR: In this paper , the authors define hope speech as the type of speech that is able to relax a hostile environment and that helps, gives suggestions and inspires for good to a number of people when they are in times of illness, stress, loneliness or depression.
Abstract: In recent years, systems have been developed to monitor online content and remove abusive, offensive or hateful content. Comments in online social media have been analyzed to find and stop the spread of negativity using methods such as hate speech detection, identification of offensive language or detection of abusive language. We define hope speech as the type of speech that is able to relax a hostile environment and that helps, gives suggestions and inspires for good to a number of people when they are in times of illness, stress, loneliness or depression. Detecting it automatically, in order to give greater diffusion to positive comments, can have a very significant effect when it comes to fighting against sexual or racial discrimination or when we intend to foster less bellicose environments. In this article we perform a complete study on hope speech, analyzing existing solutions and available resources. In addition, we have generated a quality resource, SpanishHopeEDI, a new Spanish Twitter dataset on LGBT community, and we have conducted some experiments that can serve as a baseline for further research.
Posted ContentDOI
22 Feb 2023
TL;DR: In this paper , the authors make use of abusive Tamil language comments released by the workshop “Tamil DravidianLangTech@ACL 2022” and develop adapter-based multilingual transformer models namely Muril, XLMRoBERTa and mBERT to classify the abusive comments.
Abstract: Abstract Speaking or expressing oneself in an abusive manner is a form of verbal abuse that targets individuals or groups on the basis of their membership in a particular social group, which is differentiated by traits such as culture, gender, sexual orientation, religious affiliation etc. In today's world, the dissemination of evil and depraved content on social media has increased exponentially. Abusive language on the internet has been linked to an increase in violence against minorities around the world, including mass shootings, murders, and ethnic cleansing. People who use social media in places where English is not the main language often use a code-mixed form of text. This makes it harder to find abusive texts, and when combined with the fact that there aren't many resources for languages like Tamil, the task becomes significantly challenging. This work makes use of abusive Tamil language comments released by the workshop “Tamil DravidianLangTech@ACL 2022” and develops adapter-based multilingual transformer models namely Muril, XLMRoBERTa and mBERT to classify the abusive comments. These transformers have been utilized as fine-tuners and adapters. This study shows that in low-resource languages like Tamil, adapter-based strategies work better than fine-tuned models. In addition, we use Optuna, a hyperparameter optimization framework to find the ideal values of the hyper-parameters that lead to better classification. Of all the proposed models, MuRIL (Large) gives 74.7%, which is comparatively better than other models proposed for the same dataset.