Home
/
Authors
/
Amitava Das

Author

Amitava Das

Other affiliations: Jadavpur University, University of North Texas, University of California, Santa Barbara ...read more

Bio: Amitava Das is an academic researcher from Wipro. The author has contributed to research in topics: Bengali & Sentiment analysis. The author has an hindex of 25, co-authored 115 publications receiving 2268 citations. Previous affiliations of Amitava Das include Jadavpur University & University of North Texas.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2005
2003
2000
1999
1998
1996
1995
1994
1993
1990
1989

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Code Mixing: A Challenge for Language Identification in the Language of Social Media

[...]

Utsab Barman¹, Amitava Das¹, Joachim Wagner¹, Jennifer Foster²•Institutions (2)

Dublin City University¹, University of North Texas²

01 Oct 2014

TL;DR: A new dataset is described, which contains Facebook posts and comments that exhibit code mixing between Bengali, English and Hindi, and it is found that the dictionary-based approach is surpassed by supervised classification and sequence labelling, and that it is important to take contextual clues into consideration.

...read moreread less

Abstract: In social media communication, multilingual speakers often switch between languages, and, in such an environment, automatic language identification becomes both a necessary and challenging task. In this paper, we describe our work in progress on the problem of automatic language identification for the language of social media. We describe a new dataset that we are in the process of creating, which contains Facebook posts and comments that exhibit code mixing between Bengali, English and Hindi. We also present some preliminary word-level language identification experiments using this dataset. Different techniques are employed, including a simple unsupervised dictionary-based approach, supervised word-level classification with and without contextual clues, and sequence labelling using Conditional Random Fields. We find that the dictionary-based approach is surpassed by supervised classification and sequence labelling, and that it is important to take contextual clues into consideration.

...read moreread less

273 citations

Book Chapter•DOI•

Fighting an Infodemic: COVID-19 Fake News Dataset.

[...]

Parth Patwa, Shivam Sharma, Srinivas Pykl, Vineeth Guptha, Gitanjali Kumari, Shad Akhtar, Asif Ekbal, Amitava Das, Tanmoy Chakraborty - Show less +5 more

06 Nov 2020-arXiv: Computation and Language

TL;DR: A manually annotated dataset of 10,700 social media posts and articles of real and fake news on COVID-19 is curate and released, and four machine learning baselines are benchmarked.

...read moreread less

Abstract: Along with COVID-19 pandemic we are also fighting an `infodemic'. Fake news and rumors are rampant on social media. Believing in rumors can cause significant harm. This is further exacerbated at the time of a pandemic. To tackle this, we curate and release a manually annotated dataset of 10,700 social media posts and articles of real and fake news on COVID-19. We benchmark the annotated dataset with four machine learning baselines - Decision Tree, Logistic Regression, Gradient Boost, and Support Vector Machine (SVM). We obtain the best performance of 93.46% F1-score with SVM. The data and code is available at: this https URL

...read moreread less

178 citations

Proceedings Article•

Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text

[...]

Amitava Das¹, Björn Gambäck²•Institutions (2)

Dublin City University¹, Norwegian University of Science and Technology²

01 Dec 2014

TL;DR: A code-mixing index is introduced to evaluate the level of blending in the corpora and the performance of a system developed to separate multiple languages is described.

...read moreread less

Abstract: Language identification at the document level has been considered an almost solved problem in some application areas, but language detectors fail in the social media context due to phenomena such as utterance internal code-switching, lexical borrowings, and phonetic typing; all implying that language identification in social media has to be carried out at the word level. The paper reports a study to detect language boundaries at the word level in chat message corpora in mixed EnglishBengali and English-Hindi. We introduce a code-mixing index to evaluate the level of blending in the corpora and describe the performance of a system developed to separate multiple languages.

...read moreread less

146 citations

SentiWordNet for Indian Languages

[...]

Amitava Das, Sivaji Bandyopadhyay

01 Aug 2010

TL;DR: Multiple computational techniques like, WordNet based, dictionary based, Dictionary based, corpus based or generative approaches for generating SentiWordNet(s) for Indian languages are proposed.

...read moreread less

Abstract: The discipline where sentiment/ opinion/ emotion has been identified and classified in human written text is well known as sentiment analysis. A typical computational approach to sentiment analysis starts with prior polarity lexicons where entries are tagged with their prior out of context polarity as human beings perceive using their cognitive knowledge. Till date, all research efforts found in sentiment lexicon literature deal mostly with English texts. In this article, we propose multiple computational techniques like, WordNet based, dictionary based, corpus based or generative approaches for generating SentiWordNet(s) for Indian languages. Currently, SentiWordNet(s) are being developed for three Indian languages: Bengali, Hindi and Telugu. An online intuitive game has been developed to create and validate the developed SentiWordNet(s) by involving Internet population. A number of automatic, semi-automatic and manual validations and evaluation methodologies have been adopted to measure the coverage and credibility of the developed SentiWordNet(s).

...read moreread less

119 citations

Proceedings Article•DOI•

SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets

[...]

Parth Patwa¹, Gustavo Aguilar², Sudipta Kar, Suraj Pandey³, Srinivas Pykl¹, Björn Gambäck⁴, Tanmoy Chakraborty⁵, Thamar Solorio⁶, Amitava Das⁷ - Show less +5 more•Institutions (7)

Indian Institutes of Information Technology¹, University of Houston², Open University³, Norwegian University of Science and Technology⁴, Indraprastha Institute of Information Technology⁵, Adobe Systems⁶, Wipro⁷

06 Aug 2020

TL;DR: The SemEval-2020 Task 9 on Sentiment Analysis of Code-Mixed Tweets (SentiMix 2020) as discussed by the authors was the first task focusing on sentiment analysis of code-mixed tweets.

...read moreread less

Abstract: In this paper, we present the results of the SemEval-2020 Task 9 on Sentiment Analysis of Code-Mixed Tweets (SentiMix 2020). We also release and describe our Hinglish (Hindi-English)and Spanglish (Spanish-English) corpora annotated with word-level language identification and sentence-level sentiment labels. These corpora are comprised of 20K and 19K examples, respectively. The sentiment labels are - Positive, Negative, and Neutral. SentiMix attracted 89 submissions in total including 61 teams that participated in the Hinglish contest and 28 submitted systems to the Spanglish competition. The best performance achieved was 75.0% F1 score for Hinglish and 80.6% F1 for Spanglish. We observe that BERT-like models and ensemble methods are the most common and successful approaches among the participants.

...read moreread less

98 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

Social Vulnerability to Environmental Hazards

[...]

Daniels Sheridan, Judith Basin, Golden Valley

01 Jan 2010

1,006 citations

Journal Article•DOI•

Current State of Text Sentiment Analysis from Opinion to Emotion Mining

[...]

Ali Yadollahi¹, Ameneh Gholipour Shahraki¹, Osmar R. Zaïane¹•Institutions (1)

University of Alberta¹

25 May 2017-ACM Computing Surveys

TL;DR: This work presents the state-of-the-art methods and proposes the following contributions: a taxonomy of sentiment analysis; a survey on polarity classification methods and resources, especially those related to emotion mining; a complete survey on emotion theories and emotion-mining research; and some useful resources, including lexicons and datasets.

...read moreread less

Abstract: Sentiment analysis from text consists of extracting information about opinions, sentiments, and even emotions conveyed by writers towards topics of interest. It is often equated to opinion mining, but it should also encompass emotion mining. Opinion mining involves the use of natural language processing and machine learning to determine the attitude of a writer towards a subject. Emotion mining is also using similar technologies but is concerned with detecting and classifying writers emotions toward events or topics. Textual emotion-mining methods have various applications, including gaining information about customer satisfaction, helping in selecting teaching materials in e-learning, recommending products based on users emotions, and even predicting mental-health disorders. In surveys on sentiment analysis, which are often old or incomplete, the strong link between opinion mining and emotion mining is understated. This motivates the need for a different and new perspective on the literature on sentiment analysis, with a focus on emotion mining. We present the state-of-the-art methods and propose the following contributions: (1) a taxonomy of sentiment analysis; (2) a survey on polarity classification methods and resources, especially those related to emotion mining; (3) a complete survey on emotion theories and emotion-mining research; and (4) some useful resources, including lexicons and datasets.

...read moreread less

331 citations

Proceedings Article•DOI•

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification.

[...]

Francesco Barbieri¹, Jose Camacho-Collados², Luis Espinosa Anke², Leonardo Neves²•Institutions (2)

Pompeu Fabra University¹, Cardiff University²

01 Nov 2020

TL;DR: This paper proposes a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks, and shows the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.

...read moreread less

Abstract: The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.

...read moreread less

328 citations

Book Chapter•DOI•

Sentiment Analysis: Detecting Valence, Emotions, and Other Affectual States from Text

[...]

Saif M. Mohammad¹•Institutions (1)

National Research Council¹

01 Jan 2016

TL;DR: Sentiment analysis is the task of automatically determining from text the attitude, emotion, or some other affectual state of the author as mentioned in this paper, which is a difficult task due to the complexity and subtlety of language use.

...read moreread less

Abstract: Sentiment analysis is the task of automatically determining from text the attitude, emotion, or some other affectual state of the author. This chapter summarizes the diverse landscape of tasks and applications associated with sentiment analysis. We outline key challenges stemming from the complexity and subtlety of language use, the prevalence of creative and non-standard language, and the lack of paralinguistic information, such as tone and stress markers. We describe automatic systems and datasets commonly used in sentiment analysis. We summarize several manual and automatic approaches to creating valence- and emotion-association lexicons. We also discuss preliminary approaches for sentiment composition (how smaller units of text combine to express sentiment) and approaches for detecting sentiment in figurative and metaphoric language—these are the areas where we expect to see significant work in the near future.

...read moreread less

315 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse