scispace - formally typeset
Proceedings ArticleDOI

Sentiment Analysis in Tamil Texts: A Study on Machine Learning Techniques and Feature Representation

Reads0
Chats0
TLDR
Basic features such as word count and punctuation count are used in addition to traditional features including Bag of Words and Term Frequency-Inverse Document Frequency included to check their influence in the prediction.
Abstract
Sentiment Analysis (SA) is an application of Natural Language Processing (NLP) to extract the sentiments expressed in the text. In this paper, we experimented five approaches to perform SA, namely, Lexicon based approach, Supervised Machine learning based approach, Hybrid approach, K-means with Bag of Word (BoW) approach and K-modes with BoW approach. We have experimented these approaches using five corpora with different feature representation techniques to predict the best approach to perform SA in Tamil texts. In this research we used Basic features such as word count and punctuation count in addition to traditional features such as Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) included to check their influence in the prediction. We have compared these approaches, features and the corpora. From the evaluation the highest accuracy of 79% is obtained for UJ_Corpus_Opinions_Nouns corpus with fastText for supervised Machine learning based approach.

read more

Citations
More filters

Synthesis Lectures on Human Language Technologies

TL;DR: This book gives a comprehensive view of state-of-the-art techniques that are used to build spoken dialogue systems and presents dialogue modelling and system development issues relevant in both academic and industrial environments and also discusses requirements and challenges for advanced interaction management and future research.

Findings of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion

TL;DR: The shared task of hope speech detection for Tamil, English, and Malayalam languages was conducted as a part of the EACL 2021 workshop on Language Technology for Equality, Diversity, and Inclusion.
Proceedings ArticleDOI

Sentiment Lexicon Expansion using Word2vec and fastText for Sentiment Prediction in Tamil texts

TL;DR: A sentiment lexicon expansion method using Word2vec and fastText word embeddings along with rule-based Sentiment Analysis method, which uses expanded lexicons, lists of conjunctions and negational words to predict the sentiments expressed in Tamil texts is proposed.
Proceedings ArticleDOI

Findings of the Shared Task on Speech Recognition for Vulnerable Individuals in Tamil

TL;DR: Different results using different pre-trained transformer models are discussed in this overview of the shared task on automatic speech recognition in the Tamillanguage.
Journal ArticleDOI

Findings of the Shared Task on Offensive Span Identification fromCode-Mixed Tamil-English Comments

TL;DR: This paper outlines the dataset released, methods, and results of the submitted systems, and provides Tamil-English code-mixed social comments with offensive spans with annotated data for offensive spans.
References
More filters

Synthesis Lectures on Human Language Technologies

TL;DR: This book gives a comprehensive view of state-of-the-art techniques that are used to build spoken dialogue systems and presents dialogue modelling and system development issues relevant in both academic and industrial environments and also discusses requirements and challenges for advanced interaction management and future research.
Book ChapterDOI

Shared Task on Sentiment Analysis in Indian Languages SAIL Tweets - An Overview

TL;DR: This is the first attempt to sentiment analysis task in tweets for three Indian languages namely Bengali, Hindi and Tamil, and the main objective was to classify the tweets into positive, negative, and neutral polarity.
Journal ArticleDOI

Predicting the Sentimental Reviews in Tamil Movie using Machine Learning Algorithms

TL;DR: SVM algorithm performs well in classifying the Tamil movie reviews when compared with other machine learning algorithms and both cross validation and accuracy of the algorithm shows that SVM performs well.
Proceedings Article

Sentiment Analysis of Tweets in Three Indian Languages

TL;DR: This paper describes the results of sentiment analysis on tweets in three Indian languages – Bengali, Hindi, and Tamil, and obtained state-of-the-art results in all three languages using the recently released SAIL dataset.

Analyzing sentiment in Indian languages micro text using recurrent neural network

TL;DR: The system performs well for recurrent neural network when compared with the system submitted to the shared task as the accuracy of the system had increased and the network seeks to pursue sentiment oriented feature which improves in analyzing the sentiments on tweets.
Related Papers (5)