Home
/
Authors
/
Manish Shrivastava

Author

Manish Shrivastava

International Institute of Information Technology, Hyderabad

Other affiliations: Indian Institutes of Technology, Microsoft, International Institute of Information Technology ...read more

Bio: Manish Shrivastava is an academic researcher from International Institute of Information Technology, Hyderabad. The author has contributed to research in topics: Computer science & Sentiment analysis. The author has an hindex of 19, co-authored 94 publications receiving 1284 citations. Previous affiliations of Manish Shrivastava include Indian Institutes of Technology & Microsoft.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2011
2008
2006

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection.

[...]

Aditya Bohra, Deepanshu Vijay, Vinay Singh, Syed Sarfaraz Akhtar, Manish Shrivastava - Show less +1 more

01 Jun 2018

TL;DR: This work presents a Hindi-English code-mixed dataset consisting of tweets posted online on Twitter and proposes a supervised classification system for detecting hate speech in the text using various character level, word level, and lexicon based features.

...read moreread less

Abstract: Hate speech detection in social media texts is an important Natural language Processing task, which has several crucial applications like sentiment analysis, investigating cyberbullying and examining socio-political controversies. While relevant research has been done independently on code-mixed social media texts and hate speech detection, our work is the first attempt in detecting hate speech in Hindi-English code-mixed social media text. In this paper, we analyze the problem of hate speech detection in code-mixed texts and present a Hindi-English code-mixed dataset consisting of tweets posted online on Twitter. The tweets are annotated with the language at word level and the class they belong to (Hate Speech or Normal Speech). We also propose a supervised classification system for detecting hate speech in the text using various character level, word level, and lexicon based features.

...read moreread less

175 citations

Proceedings Article•

Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text

[...]

Aditya Joshi¹, Ameya Prabhu², Manish Shrivastava², Vasudeva Varma²•Institutions (2)

Commonwealth Scientific and Industrial Research Organisation¹, International Institute of Information Technology, Hyderabad²

01 Dec 2016

TL;DR: In this article, a Hindi-English (Hi-En) code-mixed dataset was introduced for sentiment analysis and the authors performed empirical analysis comparing the suitability and performance of various state-of-the-art SA methods in social media.

...read moreread less

Abstract: Sentiment analysis (SA) using code-mixed data from social media has several applications in opinion mining ranging from customer satisfaction to social campaign analysis in multilingual societies. Advances in this area are impeded by the lack of a suitable annotated dataset. We introduce a Hindi-English (Hi-En) code-mixed dataset for sentiment analysis and perform empirical analysis comparing the suitability and performance of various state-of-the-art SA methods in social media. In this paper, we introduce learning sub-word level representations in our LSTM (Subword-LSTM) architecture instead of character-level or word-level representations. This linguistic prior in our architecture enables us to learn the information about sentiment value of important morphemes. This also seems to work well in highly noisy text containing misspellings as shown in our experiments which is demonstrated in morpheme-level feature maps learned by our model. Also, we hypothesize that encoding this linguistic prior in the Subword-LSTM architecture leads to the superior performance. Our system attains accuracy 4-5% greater than traditional approaches on our dataset, and also outperforms the available system for sentiment analysis in Hi-En code-mixed text by 18%.

...read moreread less

148 citations

Proceedings Article•DOI•

Together we stand: Siamese Networks for Similar Question Retrieval

[...]

Arpita Das¹, Harish Yenala², Manoj Kumar Chinnakotla³, Manish Shrivastava³•Institutions (3)

International Institute of Information Technology, Hyderabad¹, Indian Institutes of Information Technology², Microsoft³

01 Aug 2016

TL;DR: This paper proposes a novel approach called “Siamese Convolutional Neural Network for cQA (SCQA)” to find the semantic similarity between the current and the archived questions to outperform current state-of-theart approaches based on translation models, topic models and deep neural network.

...read moreread less

Abstract: Community Question Answering (cQA) services like Yahoo! Answers1, Baidu Zhidao2, Quora3, StackOverflow4 etc. provide a platform for interaction with experts and help users to obtain precise and accurate answers to their questions. The time lag between the user posting a question and receiving its answer could be reduced by retrieving similar historic questions from the cQA archives. The main challenge in this task is the “lexicosyntactic” gap between the current and the previous questions. In this paper, we propose a novel approach called “Siamese Convolutional Neural Network for cQA (SCQA)” to find the semantic similarity between the current and the archived questions. SCQA consist of twin convolutional neural networks with shared parameters and a contrastive loss function joining them. SCQA learns the similarity metric for question-question pairs by leveraging the question-answer pairs available in cQA forum archives. The model projects semantically similar question pairs nearer to each other and dissimilar question pairs farther away from each other in the semantic space. Experiments on large scale reallife “Yahoo! Answers” dataset reveals that SCQA outperforms current state-of-theart approaches based on translation models, topic models and deep neural network https://answers.yahoo.com/ http://zhidao.baidu.com/ http://www.quora.com/ http://stackoverflow.com/ based models which use non-shared parameters.

...read moreread less

100 citations

Proceedings Article•DOI•

Morphological Richness Offsets Resource Demand -- Experiences in Constructing a POS Tagger for Hindi

[...]

Smriti Singh¹, Kuhoo Gupta¹, Manish Shrivastava¹, Pushpak Bhattacharyya¹•Institutions (1)

Indian Institutes of Technology¹

17 Jul 2006

TL;DR: A methodology of POS tagging which the resource disadvantaged languages can make use of which makes use of locally annotated modestly-sized corpora, exhaustive morpohological analysis backed by high-coverage lexicon and a decision tree based learning algorithm (CN2).

...read moreread less

Abstract: In this paper we report our work on building a POS tagger for a morphologically rich language- Hindi. The theme of the research is to vindicate the stand that- if morphology is strong and harnessable, then lack of training corpora is not debilitating. We establish a methodology of POS tagging which the resource disadvantaged (lacking annotated corpora) languages can make use of. The methodology makes use of locally annotated modestly-sized corpora (15,562 words), exhaustive morpohological analysis backed by high-coverage lexicon and a decision tree based learning algorithm (CN2). The evaluation of the system was done with 4-fold cross validation of the corpora in the news domain (www.bbc.co.uk/hindi). The current accuracy of POS tagging is 93.45% and can be further improved.

...read moreread less

80 citations

Proceedings Article•DOI•

IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search

[...]

Irshad Ahmad Bhat¹, Vandan Mujadia¹, Aniruddha Tammewar¹, Riyaz Ahmad Bhat¹, Manish Shrivastava¹ - Show less +1 more•Institutions (1)

International Institute of Information Technology, Hyderabad¹

05 Dec 2014

TL;DR: This paper describes the submission for FIRE 2014 Shared Task on Transliterated Search, which features two sub-tasks: Query word labeling and Mixed-script Ad hoc retrieval for Hindi Song Lyrics.

...read moreread less

Abstract: This paper describes our submission for FIRE 2014 Shared Task on Transliterated Search. The shared task features two sub-tasks: Query word labeling and Mixed-script Ad hoc retrieval for Hindi Song Lyrics.Query Word Labeling is on token level language identification of query words in code-mixed queries and back-transliteration of identified Indian language words into their native scripts. We have developed letter based language models for the token level language identification of query words and a structured perceptron model for back-transliteration of Indic words.The second subtask for Mixed-script Ad hoc retrieval for Hindi Song Lyrics is to retrieve a ranked list of songs from a corpus of Hindi song lyrics given an input query in Devanagari or transliterated Roman script. We have used edit distance based query expansion and language modeling followed by relevance based reranking for the retrieval of relevant Hindi Song lyrics for a given query.

...read moreread less

71 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Deep learning for sentiment analysis: A survey

[...]

Lei Zhang¹, Shuai Wang², Bing Liu²•Institutions (2)

LinkedIn¹, University of Illinois at Urbana–Champaign²

01 Jul 2018-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results as mentioned in this paper, which is also popularly used in sentiment analysis in recent years.

...read moreread less

Abstract: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.

...read moreread less

917 citations

Book•

Sentiment Analysis: Mining Opinions, Sentiments, and Emotions

[...]

Bing Liu¹•Institutions (1)

University of Illinois at Chicago¹

01 Jun 2015

TL;DR: Sentiment analysis is the computational study of people's opinions, sentiments, emotions, moods, and attitudes as discussed by the authors, which offers numerous research challenges, but promises insight useful to anyone interested in opinion analysis and social media analysis.

...read moreread less

Abstract: Sentiment analysis is the computational study of people's opinions, sentiments, emotions, moods, and attitudes. This fascinating problem offers numerous research challenges, but promises insight useful to anyone interested in opinion analysis and social media analysis. This comprehensive introduction to the topic takes a natural-language-processing point of view to help readers understand the underlying structure of the problem and the language constructs commonly used to express opinions, sentiments, and emotions. The book covers core areas of sentiment analysis and also includes related topics such as debate analysis, intention mining, and fake-opinion detection. It will be a valuable resource for researchers and practitioners in natural language processing, computer science, management sciences, and the social sciences.In addition to traditional computational methods, this second edition includes recent deep learning methods to analyze and summarize sentiments and opinions, and also new material on emotion and mood analysis techniques, emotion-enhanced dialogues, and multimodal emotion analysis.

...read moreread less

587 citations

Proceedings Article•DOI•

How multilingual is Multilingual BERT

[...]

Telmo Pires¹, Eva Schlinger¹, Dan Garrette¹•Institutions (1)

Google¹

01 Jul 2019

TL;DR: This article showed that M-BERT is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language.

...read moreread less

Abstract: In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiments, showing that transfer is possible even to languages in different scripts, that transfer works best between typologically similar languages, that monolingual corpora can train models for code-switching, and that the model can find translation pairs. From these results, we can conclude that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs.

...read moreread less

543 citations

Posted Content•

How multilingual is Multilingual BERT

[...]

Telmo Pires¹, Eva Schlinger¹, Dan Garrette¹•Institutions (1)

Google¹

04 Jun 2019-arXiv: Computation and Language

TL;DR: It is concluded that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs, and that the model can find translation pairs.

...read moreread less

471 citations

Journal Article•DOI•

Deep Learning--based Text Classification: A Comprehensive Review

[...]

Shervin Minaee, Nal Kalchbrenner¹, Erik Cambria², Narjes Nikzad³, Meysam Chenaghlu³, Jianfeng Gao⁴ - Show less +2 more•Institutions (4)

Google¹, Nanyang Technological University², University of Tabriz³, Microsoft⁴

17 Apr 2021-ACM Computing Surveys

TL;DR: This paper provided a comprehensive review of more than 150 deep learning-based models for text classification developed in recent years, and discussed their technical contributions, similarities, and strengths, and provided a quantitative analysis of the performance of different deep learning models on popular benchmarks.

...read moreread less

Abstract: Deep learning--based models have surpassed classical machine learning--based approaches in various text classification tasks, including sentiment analysis, news categorization, question answering, and natural language inference. In this article, we provide a comprehensive review of more than 150 deep learning--based models for text classification developed in recent years, and we discuss their technical contributions, similarities, and strengths. We also provide a summary of more than 40 popular datasets widely used for text classification. Finally, we provide a quantitative analysis of the performance of different deep learning models on popular benchmarks, and we discuss future research directions.

...read moreread less

457 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse