Home
/
Authors
/
Sajeetha Thavareesan

Author

Sajeetha Thavareesan

Bio: Sajeetha Thavareesan is an academic researcher from Eastern University (United States). The author has contributed to research in topics: Tamil & Dravidian languages. The author has an hindex of 6, co-authored 21 publications receiving 194 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Sentiment Analysis in Tamil Texts: A Study on Machine Learning Techniques and Feature Representation

[...]

Sajeetha Thavareesan¹, Sinnathamby Mahesan²•Institutions (2)

Eastern University (United States)¹, University of Jaffna²

01 Dec 2019

TL;DR: Basic features such as word count and punctuation count are used in addition to traditional features including Bag of Words and Term Frequency-Inverse Document Frequency included to check their influence in the prediction.

...read moreread less

Abstract: Sentiment Analysis (SA) is an application of Natural Language Processing (NLP) to extract the sentiments expressed in the text. In this paper, we experimented five approaches to perform SA, namely, Lexicon based approach, Supervised Machine learning based approach, Hybrid approach, K-means with Bag of Word (BoW) approach and K-modes with BoW approach. We have experimented these approaches using five corpora with different feature representation techniques to predict the best approach to perform SA in Tamil texts. In this research we used Basic features such as word count and punctuation count in addition to traditional features such as Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) included to check their influence in the prediction. We have compared these approaches, features and the corpora. From the evaluation the highest accuracy of 79% is obtained for UJ_Corpus_Opinions_Nouns corpus with fastText for supervised Machine learning based approach.

...read moreread less

91 citations

Proceedings Article•DOI•

Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts

[...]

Sajeetha Thavareesan¹, Sinnathamby Mahesan²•Institutions (2)

Eastern University (United States)¹, University of Jaffna²

28 Jul 2020

TL;DR: A sentiment lexicon expansion method using Word2vec and fastText word embeddings along with rule-based Sentiment Analysis method, which uses expanded lexicons, lists of conjunctions and negational words to predict the sentiments expressed in Tamil texts is proposed.

...read moreread less

Abstract: Sentiment Analysis is the process of identifying and categorising the sentiments expressed in a text into positive or negative. The words which carry the sentiments are the keys in sentiment prediction. The SentiWordNet is the sentiment lexicon used to determine the sentiment of texts. There are huge number of sentiment terms that are not in the SentiWordNet limit the performance of Sentiment Analysis. Gathering and grouping such sentiment words manually is a tedious task. In this paper we propose a sentiment lexicon expansion method using Word2vec and fastText word embeddings along with rule-based Sentiment Analysis method. We expand the sentiment lexicon from the initial seed list of 2951 positive and 5598 negative words in two steps: (i) Gathering related words using Word2vec word embedding and (ii) Gathering lexically similar words using fastText word embedding. Our final lexicons UJ_Lex_Pos and UJ_Lex_Neg ended up with 10537 positive and 12664 negative words respectively which are labelled using Word2vec word embedding. Furthermore the rule-based Sentiment Analysis method uses expanded lexicons (UJ_Lex_Pos and UJ_Lex_Neg), lists of conjunctions and negational words to predict the sentiments expressed in Tamil texts. The method is evaluated on UJ_MovieReviews and an accuracy of 88 0.14% is obtained.

...read moreread less

88 citations

Proceedings Article•DOI•

Word embedding-based Part of Speech tagging in Tamil texts

[...]

Sajeetha Thavareesan¹, Sinnathamby Mahesan²•Institutions (2)

Eastern University (United States)¹, University of Jaffna²

26 Nov 2020

TL;DR: In this article, a word embedding-based POS tagger for Tamil language is proposed, where the experiments are conducted with different word embeddings BoW, TF-IDF, Word2vec, fastText and GloVe.

...read moreread less

Abstract: This paper proposes a word embedding-based Part of Speech (POS) tagger for Tamil language The experiments are conducted with different word embeddings BoW, TF-IDF, Word2vec, fastText and GloVe that are created using UJ-Tamil corpus Different combinations of eight features with three classifiers linear SVM, Extreme Gradient Boosting and k-Nearest Neighbor are used to build the POS tagger The results are compared against Viterbi algorithm-based POS tagger The results show that word embedding can be used for POS tagging with good performance BoW, TF-IDF and fastText give an impressive performance compared with Word2vec and GloVe The accuracy of 99% is obtained with word embedding of BoW and TF-IDF with unigrams as well as bigrams and with linear SVM classifier POS tag of a given word can be identified with 99% of accuracy using word embeddings based POS tagger in Tamil

...read moreread less

79 citations

Proceedings Article•DOI•

Findings of the Shared Task on Emotion Analysis in Tamil

[...]

Anbukkarasi Sampath, Thenmozhi Durairaj, Bharathi Raja Chakravarthi, Ruba Priyadharshini, Subalalitha Cn, Kogilavani Shanmugavadivel, Sajeetha Thavareesan, S. Thangasamy, Parameswari Krishnamurthy, Adeep Hande, Sean Benhur, Kishore Ponnusamy, Santhiya Pandiyan - Show less +9 more

01 Jan 2022

TL;DR: The dataset used in the shared task, task description, and the methodology used by the participants and the evaluation results of the submission are presented.

...read moreread less

Abstract: This paper presents the overview of the shared task on emotional analysis in Tamil. The result of the shared task is presented at the workshop. This paper presents the dataset used in the shared task, task description, and the methodology used by the participants and the evaluation results of the submission. This task is organized as two Tasks. Task A is carried with 11 emotions annotated data for social media comments in Tamil and Task B is organized with 31 fine-grained emotion annotated data for social media comments in Tamil. For conducting experiments, training and development datasets were provided to the participants and results are evaluated for the unseen data. Totally we have received around 24 submissions from 13 teams. For evaluating the models, Precision, Recall, micro average metrics are used.

...read moreread less

45 citations

Journal Article•DOI•

Findings of the Shared Task on Offensive Span Identification fromCode-Mixed Tamil-English Comments

[...]

Manikandan Ravikiran, Bharathi Raja Chakravarthi, Anand Kumar Madasamy, Sangeetha Sivanesan, R. Rajalakshmi, Sajeetha Thavareesan, R. Ponnusamy, Shankar Mahadevan - Show less +4 more

12 May 2022

TL;DR: This paper outlines the dataset released, methods, and results of the submitted systems, and provides Tamil-English code-mixed social comments with offensive spans with annotated data for offensive spans.

...read moreread less

Abstract: Offensive content moderation is vital in social media platforms to support healthy online discussions. However, their prevalence in code-mixed Dravidian languages is limited to classifying whole comments without identifying part of it contributing to offensiveness. Such limitation is primarily due to the lack of annotated data for offensive spans. Accordingly, in this shared task, we provide Tamil-English code-mixed social comments with offensive spans. This paper outlines the dataset so released, methods, and results of the submitted systems.

...read moreread less

37 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Synthesis Lectures on Human Language Technologies

[...]

Ido Dagan, Dan Roth, Mark Sammons, Fabio Massimo Zanzotto, Web Corpus Construction, Roland Schäfer, Felix Bildhauer - Show less +3 more

01 Jan 2013

TL;DR: This book gives a comprehensive view of state-of-the-art techniques that are used to build spoken dialogue systems and presents dialogue modelling and system development issues relevant in both academic and industrial environments and also discusses requirements and challenges for advanced interaction management and future research.

...read moreread less

Abstract: Considerable progress has been made in recent years in the development of dialogue systems that support robust and efficient human–machine interaction using spoken language. Spoken dialogue technology allows various interactive applications to be built and used for practical purposes, and research focuses on issues that aim to increase the system’s communicative competence by including aspects of error correction, cooperation, multimodality, and adaptation in context. This book gives a comprehensive view of state-of-the-art techniques that are used to build spoken dialogue systems. It provides an overview of the basic issues such as system architectures, various dialogue management methods, system evaluation, and also surveys advanced topics concerning extensions of the basic model to more conversational setups. The goal of the book is to provide an introduction to the methods, problems, and solutions that are used in dialogue system development and evaluation. It presents dialogue modelling and system development issues relevant in both academic and industrial environments and also discusses requirements and challenges for advanced interaction management and future research. vi KEywoRDS Spoken dialogue systems, multimodality, evaluation, error-handling, dialogue management, statistical method v MC_Jok nen_FM. ndd Achorn Internat onal 10/10/2009 04:18AM

...read moreread less

304 citations

Findings of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion

[...]

Bharathi Raja Chakravarthi, Vigneshwaran Muralidaran

01 Apr 2021

TL;DR: The shared task of hope speech detection for Tamil, English, and Malayalam languages was conducted as a part of the EACL 2021 workshop on Language Technology for Equality, Diversity, and Inclusion.

...read moreread less

Abstract: Hope is considered significant for the well-being, recuperation and restoration of human life by health professionals. Hope speech reflects the belief that one can discover pathways to their desired objectives and become roused to utilise those pathways. To encourage research in natural language processing towards positive reinforcement approach, we created a hope speech detection dataset. This paper reports on the shared task of hope speech detection for Tamil, English, and Malayalam languages. The shared task was conducted as a part of the EACL 2021 workshop on Language Technology for Equality, Diversity, and Inclusion (LT-EDI-2021). We summarize here the datasets for this challenge which are openly available at https://competitions.codalab.org/competitions/27653, and present an overview of the methods and the results of the competing systems. To the best of our knowledge, this is the first shared task to conduct hope speech detection.

...read moreread less

107 citations

Proceedings Article•DOI•

Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts

[...]

Sajeetha Thavareesan¹, Sinnathamby Mahesan²•Institutions (2)

Eastern University (United States)¹, University of Jaffna²

28 Jul 2020

...read moreread less

88 citations

Proceedings Article•DOI•

Overview of the track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text

[...]

Bharathi Raja Chakravarthi¹, Ruba Priyadharshini², Vigneshwaran Muralidaran³, Shardul Suryawanshi¹, Navya Jose⁴, Elizabeth Sherly⁴, John P. McCrae⁵ - Show less +3 more•Institutions (5)

National University of Ireland¹, ULTra², Cardiff University³, Indian Institute of Information Technology and Management, Kerala⁴, National University of Ireland, Galway⁵

16 Dec 2020

TL;DR: The Dravidian-CodeMix-FIRE 2020 Track as discussed by the authors focused on sentiment analysis of code-mixed text in code mixed text for Tamil and Malayalam, and participants were given a dataset of YouTube comments and the goal of the shared task submissions was to recognise the sentiment of each comment by classifying them into positive, negative, neutral, mixed-feeling classes or by recognizing whether the comment is not in the intended language.

...read moreread less

Abstract: Sentiment analysis of Dravidian languages has received attention in recent years However, most social media text is code-mixed and there is no research available on sentiment analysis of code-mixed Dravidian languages The Dravidian-CodeMix-FIRE 2020, a track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text, focused on creating a platform for researchers to come together and investigate the problem There were two languages for this track: (i) Tamil, and (ii) Malayalam The participants were given a dataset of YouTube comments and the goal of the shared task submissions was to recognise the sentiment of each comment by classifying them into positive, negative, neutral, mixed-feeling classes or by recognising whether the comment is not in the intended language The performance of the systems was evaluated by weighted-F1 score

...read moreread less

87 citations

Proceedings Article•DOI•

Findings of the Shared Task on Emotion Analysis in Tamil

[...]

01 Jan 2022

TL;DR: The dataset used in the shared task, task description, and the methodology used by the participants and the evaluation results of the submission are presented.

...read moreread less

45 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Collapse