Home
/
Authors
/
Anirudh Dahiya

Author

Anirudh Dahiya

International Institute of Information Technology, Hyderabad

Bio: Anirudh Dahiya is an academic researcher from International Institute of Information Technology, Hyderabad. The author has contributed to research in topics: Hindi & Sentiment analysis. The author has an hindex of 2, co-authored 3 publications receiving 6 citations.

Topics: Hindi, Sentiment analysis, Deep learning, Public opinion, Treebank ...read more

Papers

PDF

Open Access

More filters

Posted Content•

Curriculum Learning Strategies for Hindi-English Codemixed Sentiment Analysis

[...]

Anirudh Dahiya, Neeraj Battan, Manish Shrivastava, Dipti Mishra Sharma

18 Jun 2019-arXiv: Computation and Language

TL;DR: This paper introduced curriculum learning strategies for semantic tasks in code-mixed Hindi-English (Hi-En) texts, and investigated various training strategies for enhancing model performance, which outperforms the state-of-the-art methods for Hi-En codemixed sentiment analysis.

...read moreread less

Abstract: Sentiment Analysis and other semantic tasks are commonly used for social media textual analysis to gauge public opinion and make sense from the noise on social media. The language used on social media not only commonly diverges from the formal language, but is compounded by codemixing between languages, especially in large multilingual societies like India. Traditional methods for learning semantic NLP tasks have long relied on end to end task specific training, requiring expensive data creation process, even more so for deep learning methods. This challenge is even more severe for resource scarce texts like codemixed language pairs, with lack of well learnt representations as model priors, and task specific datasets can be few and small in quantities to efficiently exploit recent deep learning approaches. To address above challenges, we introduce curriculum learning strategies for semantic tasks in code-mixed Hindi-English (Hi-En) texts, and investigate various training strategies for enhancing model performance. Our method outperforms the state of the art methods for Hi-En codemixed sentiment analysis by 3.31% accuracy, and also shows better model robustness in terms of convergence, and variance in test performance.

...read moreread less

4 citations

Book Chapter•DOI•

Curriculum Learning Strategies for Hindi-English Code-Mixed Sentiment Analysis

[...]

Anirudh Dahiya¹, Neeraj Battan¹, Manish Shrivastava¹, Dipti Mishra Sharma¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

10 Aug 2019

TL;DR: This work introduces curriculum learning strategies for semantic tasks in code-mixed Hindi-English (Hi-En) texts, and investigates various training strategies for enhancing model performance.

...read moreread less

3 citations

Book Chapter•DOI•

Cross-Lingual Transfer for Hindi Discourse Relation Identification.

[...]

Anirudh Dahiya¹, Manish Shrivastava¹, Dipti Misra Sharma¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

08 Sep 2020

TL;DR: This work explores various cross-lingual transfer techniques on Hindi Discourse Relation Bank (HDRB), a Penn Discourse Treebank styled dataset for discourse analysis in Hindi and observes performance gains in both zero shot and finetuning settings on the Hindi Discourses Relation Classification task.

...read moreread less

Abstract: Discourse relations between two textual spans in a document attempt to capture the coherent structure which emerges in language use. Automatic classification of these relations remains a challenging task especially in case of implicit discourse relations, where there is no explicit textual cue which marks the discourse relation. In low resource languages, this motivates the exploration of transfer learning approaches, more particularly the cross-lingual techniques towards discourse relation classification. In this work, we explore various cross-lingual transfer techniques on Hindi Discourse Relation Bank (HDRB), a Penn Discourse Treebank styled dataset for discourse analysis in Hindi and observe performance gains in both zero shot and finetuning settings on the Hindi Discourse Relation Classification task. This is the first effort towards exploring transfer learning for Hindi Discourse relation classification to the best of our knowledge.

...read moreread less

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Bi-LSTM and Ensemble based Bilingual Sentiment Analysis for a Code-mixed Hindi-English Social Media Text

[...]

Konark Yadav¹, Aashish Lamba¹, Dhruv Gupta¹, Ansh Gupta¹, Purnendu Karmakar¹, Sandeep Saini¹ - Show less +2 more•Institutions (1)

LNM Institute of Information Technology¹

10 Dec 2020

TL;DR: The authors proposed an ensemble based approach which is based on hybridization of Naive Bayes, SVM, Linear Regression, and SGD classifiers for sentiment classification of Hindi-English text.

...read moreread less

Abstract: India is a multilingual and multi-script country and a large part of its population speaks more than one language. It has been noted that such multilingual speakers switch between languages while communicating informally. The code-mixed language is very common in informal communication and social media, and extracting sentiments from these code-mixed sentences is a challenging task. In this work, we have worked on sentiment classification for one of the most common code-mixed language pairs in India i.e. Hindi-English. The conventional sentiment analysis techniques designed for a single language don’t provide satisfactory results for such texts. We have proposed two approaches for better sentiment classification. We have proposed an Ensembling based approach which is based on hybridization of Naive Bayes, SVM, Linear Regression, and SGD classifiers. We have also developed a bidirectional LSTM based novel approach. The approaches provide quite satisfactory results for the code-mixed Hindi-English text.

...read moreread less

9 citations

Zyy1510@HASOC-Dravidian-CodeMix-FIRE2020: An Ensemble Model for Offensive Language Identification.

[...]

Yueying Zhu, Xiaobing Zhou

01 Jan 2020

2 citations

Proceedings Article•DOI•

Effective Distributed Representation of Code-Mixed Text

[...]

Aditya Malte¹, Sheetal S. Sonawane¹•Institutions (1)

Pune Institute of Computer Technology¹

01 Dec 2019

TL;DR: A large scale code-mixed corpus is generated that would aid in further research of code mixed text on social media and machine learning models that improve upon the previous state-of-the-art using a much lighter and explainable architecture are trained.

...read moreread less

Abstract: As an increasing number of people embrace social media, mining data generated from the same has become an important task. Possible applications range from opinion mining, sentiment analysis to hate speech detection. More importantly, analyzing code-mixed multilingual text has gained popularity due to the reason that it holds important socio-cultural clues that may be lost in translation. Methods to effectively analyse code-mixed Hindi/English(Hinglish) text have been explored in this paper. Firstly, we generate a large scale code-mixed corpus that would aid in further research of code mixed text on social media. High-quality word embeddings are trained on this code-mixed text. Finally, we demonstrate the efficacy of our proposed method by training machine learning models that improve upon the previous state-of-the-art using a much lighter and explainable architecture. Our main intention behind training the classifier model was not only high performance but also good model explainability and speed.

...read moreread less

1 citations

Journal Article•DOI•

KL-NF technique for sentiment classification

[...]

Kanika Garg¹, D. K. Lobiyal¹•Institutions (1)

Jawaharlal Nehru University¹

03 Mar 2021-Multimedia Tools and Applications

TL;DR: The authors proposed a novel approach for calculating feature values using Kullback-Leibler (KL) divergence method for sentiment analysis for low-resource languages like Hindi using Neuro-Fuzzy Technique.

...read moreread less

Abstract: This work proposes sentiment analysis for low-resource languages like Hindi using Neuro-Fuzzy Technique. Low-resource languages suffer from the scarcity of resources; consequently, we propose a method that can be implemented for any language. We use information theory for establishing a relation between terms that exists in a sentence. This work proposes a novel approach for calculating feature values using Kullback-Leibler (KL) divergence method. The feature values are employed to calculate the membership values associated with the Fuzzy logic in Neuro-Fuzzy Technique. The novelty of this method lies in its predictive nature that can mitigate the impact generated from un-labeled, unknown data or multi-domain data. We have seen the results for multi-domain data in our experiments. We evaluate our results using Accuracy, Precision, Recall and F1-Score. Our experiments show the efficacy of the proposed approach. It achieved 93.01% accuracy for English dataset and 91.18% accuracy for Hindi dataset which is more than the other state-of-art techniques like Naive Bayes and SVM. Additionally, we found that our approach provides satisfactory results with multi-domain data as both the datasets were of different domains.

...read moreread less

1 citations

Proceedings Article•

Homophobia, Transphobia Detection in Tamil, Malayalam, English Languages using Logistic Regression and Code-Mixed Data using AWD-LSTM

[...]

Lavanya Sambath Kumar, Sajeetha Thavareesan

TL;DR: In this article , the authors proposed the AWD-LSTM model for the code-mixed(Tamil-English) language data set and Logistic Regression for Tamil, Malayalam, and English languages.

...read moreread less

Abstract: This paper presents the submission of the shared task “Homophobia, Transphobia Detection of YouTube Comments” organized by DravidianLangTech. Our team has participated in Task - B, which tries to identify the comments on youtube are Non-anti LGBTQ+ content or Homophobic or Transphobic in code-mixed(Tamil-English), Tamil, Malayalam, and English. We proposed the AWD-LSTM model for the code-mixed(Tamil-English) language data set and Logistic Regression for Tamil, Malayalam, and English languages. Our AWD-LSTM model achieved a 0.33 macro average F1 score for code-mixed(Tamil-English) language and Logistic Regression achieved a 0.55 macro average F1 score in the Tamil language, 0.98 macro average F1-score in the Malayalam language, 0.91 macro average F1-score in the English language.

...read moreread less