scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Context-based Sarcasm Detection in Hindi Tweets

TL;DR: This article proposed a context-based pattern i.e. “sarcasm as a contradiction between a tweet and the context of its related news” for sarcasm detection in Hindi tweets that attained an accuracy of 87 %.
Abstract: Sentiment analysis is the way of finding ones' opinion towards any specific target. Sarcasm is a special type of sentiment which infers the opposite meaning of what people convey in the text. It is often expressed using positive or intensified positive words. Nowadays, posting sarcastic messages on social media like Twitter, Facebook, WhatsApp, etc., became a new trend to avoid direct negativity. In the presence of sarcasm, sentiment analysis on these social media texts became the most challenging task. Therefore, an automated system is required for sarcasm detector in textual data. Many researchers have proposed several sarcasm detection techniques to identify sarcastic text. These techniques are designed to detect sarcasm on the text scripted in English since it is the most popular language in social networking groups. However, parallel research for sarcasm detection on different Asian languages like Hindi, Telugu, Tamil, Urdu, and Bengali are not yet explored. One of the reasons for the less exploration of these languages for sarcastic sentiment analysis is the lack of annotated corpus even though they are popular in a large networked society. In this article, we proposed a context-based pattern i.e. “sarcasm as a contradiction between a tweet and the context of its related news” for sarcasm detection in Hindi tweets. The proposed approach utilized Hindi news as the context of a tweet with in the same timestamp and attained an accuracy of 87 %.
Citations
More filters
Journal ArticleDOI
TL;DR: Five different variants of synthetic minority oversampling based methods to mitigate the issue of imbalanced classes which can severely effect the classifier performance in social media sarcasm detection are proposed.
Abstract: Recent developments in sarcasm detection have been emerged as extremely successful tools in Social media opinion mining. With the advent of machine learning tools, accurate detection has been made possible. However, the social media data used to train the machine learning models is often ill suited due to the presence of highly imbalanced classes. In absence of any thorough study on the effect of imbalanced classes in sarcasm detection for social media opinion mining, the current article proposed synthetic minority oversampling based methods to mitigate the issue of imbalanced classes which can severely effect the classifier performance in social media sarcasm detection. In the current study, five different variants of synthetic minority oversampling technique have been used on two different datasets of varying sizes. The trustworthiness is judged by training and testing of six well known classifiers and measuring their performance in terms of test phase confusion matrix based performance measuring metrics. The experimental results indicated that SMOTE and BorderlineSMOTE – 1 are extremely successful in improving the classifier performance. A thorough analysis has been performed to better understand the effect of imbalanced classes in social media sarcasm detection.

21 citations

Journal ArticleDOI
TL;DR: An overview of the sentiment analysis, sarcasm and related work for sarcasm detection, and training to health-care professionals to make the decision on the patient’s sentiments is provided.
Abstract: Purpose Sentiment analysis has observed a nascent interest over the past decade in the field of social media analytics. With major advances in the volume, rationality and veracity of social networking data, the misunderstanding, uncertainty and inaccuracy within the data have multiplied. In the textual data, the location of sarcasm is a challenging task. It is a different way of expressing sentiments, in which people write or says something different than what they actually intended to. So, the researchers are showing interest to develop various techniques for the detection of sarcasm in the texts to boost the performance of sentiment analysis. This paper aims to overview the sentiment analysis, sarcasm and related work for sarcasm detection. Further, this paper provides training to health-care professionals to make the decision on the patient’s sentiments. Design/methodology/approach This paper has compared the performance of five different classifiers – support vector machine, naïve Bayes classifier, decision tree classifier, AdaBoost classifier and K-nearest neighbour on the Twitter data set. Findings This paper has observed that naïve Bayes has performed the best having the highest accuracy of 61.18%, and decision tree performed the worst with an accuracy of 54.27%. Accuracy of AdaBoost, K-nearest neighbour and support vector machine measured were 56.13%, 54.81% and 59.55%, respectively. Originality/value This research work is original.

16 citations

Proceedings ArticleDOI
06 Mar 2020
TL;DR: This paper tried to detailing the general architecture of sarcasm detection, existing methods, and different types of sarcasms, issues, challenges and future scope.
Abstract: Sarcasm is a way of expressing feelings in which people says or write something, which is completely different of what they actually intended. Due to the obscurity nature of sarcasm, it is really hard to detect it. Sarcasm is a type of irony. Criticism is one of the main purpose to which sarcasm is being used. People generally use sarcasm to express their opinions or feelings especially in the social networking sites like Twitter and Facebook. Perfect analysis and understanding of the sarcasm sentences can improve the accuracy of sentiment analysis. Sentiment analysis means understanding the attitude or opinions of individuals or society about a particular event or topic. In this paper, we tried to detailing the general architecture of sarcasm detection, existing methods, and different types of sarcasm, issues, challenges and future scope.

6 citations


Cites methods from "Context-based Sarcasm Detection in ..."

  • ...Santhosh Kumar Bharati et al [27] proposed a context-based method to identify sarcasm from Hindi tweets....

    [...]

Proceedings ArticleDOI
10 Aug 2022
TL;DR: The research validates that automated feature engineering facilitates efficient and repeatable predictive model for detecting sarcasm in indigenous, low-resource languages.
Abstract: Automated sarcasm detection is deemed as a complex natural language processing task and extending it to a morphologically-rich and free-order dominant indigenous Indian language Hindi is another challenge in itself. The scarcity of resources and tools such as annotated corpora, lexicons, dependency parser, Part-of-Speech tagger, and benchmark datasets engorge the linguistic challenges of sarcasm detection in low-resource languages like Hindi. Furthermore, as context incongruity is imperative to detect sarcasm, various linguistic, aural and visual cues can be used to predict target utterance as sarcastic. While pre-trained word embeddings capture the meanings, semantic relationships and different types of contexts in the form of word representations, emojis can also render useful contextual information, analogous to human facial expressions, for gauging sarcasm. Thus, the goal of this research is to demonstrate the use of a hybrid deep learning model trained using two embeddings, namely word and emoji embeddings to detect sarcasm. The model is validated on a Hindi tweets dataset, Sarc-H, manually annotated with sarcastic and non-sarcastic labels. The preliminary results clearly depict the importance of using emojis for sarcasm detection, with our model attaining an accuracy of 97.35% with an F-score of 0.9708. The research validates that automated feature engineering facilitates efficient and repeatable predictive model for detecting sarcasm in indigenous, low-resource languages.

5 citations

Journal ArticleDOI
16 Nov 2020
TL;DR: Trends in sentiment analysis especially sarcasm detection in the last ten years and its direction in the future are explained and the critical aspect of research on sarcasm sentence is dataset usage with various languages that cover unstructured data problem with contextual information will effectively detect sarcasm sentences and will improve the existing performance.
Abstract: Nowadays, sarcasm recognition and detection simplified with various domains knowledge, among others, computer science, social science, psychology, mathematics, and many more. This article aims to explain trends in sentiment analysis especially sarcasm detection in the last ten years and its direction in the future. We review journals with the title’s keyword “sarcasm” and published from the year 2008 until 2018. The articles were classified based on the most frequently discussed topics among others: the dataset, pre-processing, annotations, approaches, features, context, and methods used. The significant increase in the number of articles on “sarcasm” in recent years indicates that research in this area still has enormous opportunities. The research about “sarcasm” also became very interesting because only a few researchers offer solutions for unstructured language. Some hybrid approaches using classification and feature extraction are used to identify the sarcasm sentence using deep learning models. This article will provide a further explanation of the most widely used algorithms for sarcasm detection with object social media. At the end of this article also shown that the critical aspect of research on sarcasm sentence that could be done in the future is dataset usage with various languages that cover unstructured data problem with contextual information will effectively detect sarcasm sentence and will improve the existing performance.

2 citations


Additional excerpts

  • ...Sentiment [25], [15], [32], [33], [34], [33], [35] Pragmatic [33], [26], [36], [16]...

    [...]

  • ...[34] The context-based pattern in Hindi Tweets Sufficient dataset and language for training and testing...

    [...]

References
More filters
01 Jan 2002
TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we flnd that standard machine learning techniques deflnitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classiflcation, and support vector machines) do not perform as well on sentiment classiflcation as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classiflcation problem more challenging.

6,980 citations

Proceedings ArticleDOI
06 Jul 2002
TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

6,626 citations

Book
01 May 2012
TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.
Abstract: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. For the first time in human history, we now have a huge volume of opinionated data recorded in digital form for analysis. Sentiment analysis systems are being applied in almost every business and social domain because opinions are central to almost all human activities and are key influencers of our behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely conditioned on how others see and evaluate the world. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. This book is a comprehensive introductory and survey text. It covers all important topics and the latest developments in the field with over 400 references. It is suitable for students, researchers and practitioners who are interested in social media analysis in general and sentiment analysis in particular. Lecturers can readily use it in class for courses on natural language processing, social media analysis, text mining, and data mining. Lecture slides are also available online.

4,515 citations

Proceedings Article
01 Jan 2006
TL;DR: SENTIWORDNET is a lexical resource in which each WORDNET synset is associated to three numerical scores Obj, Pos and Neg, describing how objective, positive, and negative the terms contained in the synset are.
Abstract: Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. OM has a rich set of applications, ranging from tracking users’ opinions about products or about political candidates as expressed in online forums, to customer relationship management. In order to aid the extraction of opinions from text, recent research has tried to automatically determine the “PNpolarity” of subjective terms, i.e. identify whether a term that is a marker of opinionated content has a positive or a negative connotation. Research on determining whether a term is indeed a marker of opinionated content (a subjective term) or not (an objective term) has been instead much scarcer. In this work we describe SENTIWORDNET, a lexical resource in which each WORDNET synset sis associated to three numerical scores Obj(s), Pos(s) and Neg(s), describing how objective, positive, and negative the terms contained in the synset are. The method used to develop SENTIWORDNET is based on the quantitative analysis of the glosses associated to synsets, and on the use of the resulting vectorial term representations for semi-supervised synset classi.cation. The three scores are derived by combining the results produced by a committee of eight ternary classi.ers, all characterized by similar accuracy levels but different classification behaviour. SENTIWORDNET is freely available for research purposes, and is endowed with a Web-based graphical user interface.

2,625 citations


"Context-based Sarcasm Detection in ..." refers background in this paper

  • ...in countries like India, Mauritius, Fiji, Suriname, Guyana, Trinidad & Tobago and Nepal [15]....

    [...]

  • ...It is widely used for speaking in countries like India, Mauritius, Fiji, Suriname, Guyana, Trinidad & Tobago and Nepal [15]....

    [...]

Journal ArticleDOI
TL;DR: The statistics kappa and weighted kappa (Cohen, 1960) were introduced to provide coefficients of agreement between two raters for nominal scales as discussed by the authors, and they were used to provide a measure of the relative seriousness of the different possible disagreements.
Abstract: The statistics kappa (Cohen, 1960) and weighted kappa (Cohen, 1968) were introduced to provide coefficients of agreement between two raters for nominal scales. Kappa is appropriate when all disagreements may be considered equally serious, and weighted kappa is appropriate when the relative seriousness of the different possible disagreements can be specified. The papers describing these two statistics also present expressions for their standard errors. These expressions are incorrect, having been derived from the contradictory assumptions of fixed marginal totals and binomial variation of cell frequencies. Everitt (1968) derived the exact variances of weighted and unweighted kappa when the parameters are zero by assuming a generalized hypergeometric distribution. He found these expressions to be far too complicated for routine use, and offered, as alternatives, expressions derived by assuming binomial distributions. These alternative expressions are incorrect, essentially for the same reason as above. Assume that N subjects are distributed into k* cells by each of them being assigned to one of k categories by one rater and, independently, to one of the same k categories by a second

1,443 citations


"Context-based Sarcasm Detection in ..." refers methods in this paper

  • ...[22] is used as it is more suitable when the number of...

    [...]