Sentiment mining: An approach for Bengali and Tamil tweets

doi:10.1109/IC3.2016.7880246

Home
/
Papers
/
Sentiment mining: An approach for Bengali and Tamil tweets

Proceedings Article•DOI•

Sentiment mining: An approach for Bengali and Tamil tweets

Sudha Shanker Prasad¹, Jitendra Kumar¹, Dinesh Kumar Prabhakar¹, Sachin Tripathi¹•Institutions (1)

Indian Institutes of Technology¹

01 Aug 2016-pp 1-4

TL;DR: The aim is to classify a given Bengali or Tamil tweets into three sentiment classes namely positive, negative or neutral, using unigram and bi-gram models along with different supervised machine learning techniques.

read less

Abstract: This paper presents a proposed work for extracting the sentiments from tweets in Indian Language. We proposed a system that deal with the goal to extract the sentiments from Bengali & Tamil tweets. Our aim is to classify a given Bengali or Tamil tweets into three sentiment classes namely positive, negative or neutral. In recent time, Twitter gain much attention to NLP researchers as it is most widely used platform that allows the user to share there opinion in form of tweets. The proposed methodology used unigram and bi-gram models along with different supervised machine learning techniques. We also consider the use of features generated from lexical resources such as Wordnets and Emoticons Tagger.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Corpus creation for sentiment analysis in code-mixed Tamil-English text

[...]

Bharathi Raja Chakravarthi¹, Vigneshwaran Muralidaran², Ruba Priyadharshini³, John P. McCrae⁴•Institutions (4)

National University of Ireland, Galway¹, Cardiff University², ULTra³, National University of Ireland⁴

11 May 2020

TL;DR: A gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube is created and inter-annotator agreement is presented, and the results of sentiment analysis trained on this corpus are shown.

...read moreread less

Abstract: Understanding the sentiment of a comment from a video or an image is an essential task in many applications. Sentiment analysis of a text can be useful for various decision-making processes. One such application is to analyse the popular sentiments of videos on social media based on viewer comments. However, comments from social media do not follow strict rules of grammar, and they contain mixing of more than one language, often written in non-native scripts. Non-availability of annotated code-mixed data for a low-resourced language like Tamil also adds difficulty to this problem. To overcome this, we created a gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube. In this paper, we describe the process of creating the corpus and assigning polarities. We present inter-annotator agreement and show the results of sentiment analysis trained on this corpus as a benchmark.

...read moreread less

168 citations

Cites background from "Sentiment mining: An approach for B..."

...Several research activities on sentiment analysis in Tamil (Padmamala and Prema, 2017) and other Indian languages (Ranjan et al., 2016; Das and Bandyopadhyay, 2010; A.R. et al., 2012; Phani et al., 2016; Prasad et al., 2016; Priyadharshini et al., 2020; Chakravarthi et al., 2020) are happening because the sheer number of native speakers are a potential market for commercial NLP applications....
[...]
...…(Padmamala and Prema, 2017) and other Indian languages (Ranjan et al., 2016; Das and Bandyopadhyay, 2010; A.R. et al., 2012; Phani et al., 2016; Prasad et al., 2016; Priyadharshini et al., 2020; Chakravarthi et al., 2020) are happening because the sheer number of native speakers are a…...
[...]

Posted Content•

Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text

[...]

Bharathi Raja Chakravarthi¹, Vigneshwaran Muralidaran², Ruba Priyadharshini³, John P. McCrae⁴•Institutions (4)

National University of Ireland, Galway¹, Cardiff University², ULTra³, National University of Ireland⁴

30 May 2020-arXiv: Computation and Language

TL;DR: In this article, the authors created a gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube and presented inter-annotator agreement and show the results of sentiment analysis trained on this corpus as a benchmark.

...read moreread less

29 citations

Book Chapter•DOI•

BEmoD: Development of Bengali Emotion Dataset for Classifying Expressions of Emotion in Texts

[...]

Avishek Das¹, Md. Asif Iqbal¹, Omar Sharif¹, Mohammed Moshiul Hoque¹•Institutions (1)

Chittagong University of Engineering & Technology¹

17 Dec 2020

TL;DR: In this article, the authors presented an emotional dataset (hereafter called "BEmoD") for analysis of emotion in Bengali texts and described its development process, including data crawling, pre-processing, labeling, and verification.

...read moreread less

Abstract: Recently, emotion detection in language has increased attention to NLP researchers due to the massive availability of people’s expressions, opinions, and emotions through comments on the Web 2.0 platforms. It is a very challenging task to develop an automatic sentiment analysis system in Bengali due to the scarcity of resources and the unavailability of standard corpora. Therefore, the development of a standard dataset is a prerequisite to analyze emotional expressions in Bengali texts. This paper presents an emotional dataset (hereafter called ‘BEmoD’) for analysis of emotion in Bengali texts and describes its development process, including data crawling, pre-processing, labeling, and verification. BEmoD contains 5200 texts, which are labeled into six basic emotional categories such as anger, fear, surprise, sadness, joy, and disgust, respectively. Dataset evaluation with a Cohen’s \(\kappa \) score of 0.920 shows the agreement among annotators. The evaluation analysis also shows the distribution of emotion words that follow Zipf’s law.

...read moreread less

17 citations

Journal Article•DOI•

BEmoC: A Corpus for Identifying Emotion in Bengali Texts

[...]

Md. Asif Iqbal, Avishek Das, Omar Sharif, Mohammed Moshiul Hoque, Iqbal H. Sarker - Show less +1 more

17 Jan 2022-SN computer science

TL;DR: In this article , the authors describe the development of an emotional corpus (hereafter called "BEmoC") for classifying six emotions in Bengali texts, i.e., anger, fear, surprise, sadness, joy, and disgust.

...read moreread less

Abstract: Emotion classification in text has growing interest among NLP experts due to the enormous availability of people's emotions and its emergence on various Web 2.0 applications/services. Emotion classification in the Bengali texts is also gradually being considered as an important task for sports, e-commerce, entertainments, and security applications. However, It is a very critical task to develop an automatic emotion classification system for low-resource languages such as, Bengali. Scarcity of resources and deficiency of benchmark corpora make the task more complicated. Thus, the development of a benchmark corpus is the prerequisite to develop an emotion classifier for Bengali texts. This paper describes the development of an emotional corpus (hereafter called 'BEmoC') for classifying six emotions in Bengali texts. The corpus development process consists of four key steps: data crawling, pre-processing, labelling, and verification. A total of 7000 texts are labelled into six basic emotion categories such as anger, fear, surprise, sadness, joy, and disgust, respectively. Dataset evaluation with 0.969 Cohen's κ score indicates the close agreement between the corpus annotators and the expert. The analysis of evaluation also represents that the distribution of emotion words obeys Zipf's law. Moreover, the results of BEmoC analysis shown in terms of coding reliability, emotion density, and most frequent emotion words, respectively.

...read moreread less

11 citations

Book Chapter•DOI•

Indian Language Identification for Short Text

[...]

Sreebha Bhaskaran¹, Geetika Paul¹, Deepa Gupta¹, J. Amudha¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

01 Jan 2021-Advances in intelligent systems and computing

TL;DR: This work classify each line of text to a particular language and focused on short phrases of length 2–6 words for 15 Indian languages to detect that a given document is in multilingual and identifies the appropriate Indian languages.

...read moreread less

Abstract: Language identification is used to categorize the language of a given document Language identification categorizes the contents and can have a better search results for a multilingual document In this work, we classify each line of text to a particular language and focused on short phrases of length 2–6 words for 15 Indian languages It detects that a given document is in multilingual and identifies the appropriate Indian languages The approach used is the combination of n-gram technique and a list of short distinctive words The n-gram model applied is language independent whereas short word method uses less computation The results show the effectiveness of our approach over the synthetic data

...read moreread less

5 citations

References

PDF

Open Access

More filters

Thumbs up? Sentiment Classiflcation using Machine Learning Techniques

[...]

Bo Pang, Lillian Lee, Shivakumar Vaithyanathan

01 Jan 2002

TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.

...read moreread less

Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we flnd that standard machine learning techniques deflnitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classiflcation, and support vector machines) do not perform as well on sentiment classiflcation as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classiflcation problem more challenging.

...read moreread less

6,980 citations

Proceedings Article•DOI•

Thumbs up? Sentiment Classification using Machine Learning Techniques

[...]

Bo Pang¹, Lillian Lee¹, Shivakumar Vaithyanathan²•Institutions (2)

Cornell University¹, IBM²

06 Jul 2002

TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.

...read moreread less

Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

...read moreread less

6,626 citations

Posted Content•

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

[...]

Peter D. Turney¹•Institutions (1)

National Research Council¹

11 Dec 2002-arXiv: Learning

TL;DR: A simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (Thumbs down) if the average semantic orientation of its phrases is positive.

...read moreread less

Abstract: This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews.

...read moreread less

4,526 citations

"Sentiment mining: An approach for B..." refers methods in this paper

...Early work in this area includes work done by Turney [2] and Pang [3] for detecting the polarity of product reviews....
[...]

Proceedings Article•

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

[...]

Peter, Turney

01 Jan 2002

TL;DR: This article proposed an unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended(thumbs down) based on the average semantic orientation of phrases in the review that contain adjectives or adverbs.

...read moreread less

Abstract: This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down) The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs A phrase has a positive semantic orientation when it has good associations (eg, “subtle nuances”) and a negative semantic orientation when it has bad associations (eg, “very cavalier”) In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word “excellent” minus the mutual information between the given phrase and the word “poor” A review is classified as recommended if the average semantic orientation of its phrases is positive The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations) The accuracy ranges from 84% for automobile reviews to 66% for movie reviews

...read moreread less

3,814 citations

Proceedings Article•DOI•

Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales

[...]

Bo Pang¹, Lillian Lee¹•Institutions (1)

Carnegie Mellon University¹

25 Jun 2005

TL;DR: A meta-algorithm is applied, based on a metric labeling formulation of the rating-inference problem, that alters a given n-ary classifier's output in an explicit attempt to ensure that similar items receive similar labels.

...read moreread less

Abstract: We address the rating-inference problem, wherein rather than simply decide whether a review is "thumbs up" or "thumbs down", as in previous sentiment analysis work, one must determine an author's evaluation with respect to a multi-point scale (e.g., one to five "stars"). This task represents an interesting twist on standard multi-class text categorization because there are several different degrees of similarity between class labels; for example, "three stars" is intuitively closer to "four stars" than to "one star".We first evaluate human performance at the task. Then, we apply a meta-algorithm, based on a metric labeling formulation of the problem, that alters a given n-ary classifier's output in an explicit attempt to ensure that similar items receive similar labels. We show that the meta-algorithm can provide significant improvements over both multi-class and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem.

...read moreread less

2,544 citations

"Sentiment mining: An approach for B..." refers methods in this paper

...Early work in this area includes work done by Turney [2] and Pang [3] for detecting the polarity of product reviews....
[...]
...A multiway document classification on polarity basis is attempted by Pang [4] and Synder [5]....
[...]