Home
/
Authors
/
M. Anand Kumar

Author

M. Anand Kumar

National Institute of Technology, Karnataka

Other affiliations: Amrita Vishwa Vidyapeetham

Bio: M. Anand Kumar is an academic researcher from National Institute of Technology, Karnataka. The author has contributed to research in topics: Tamil & Word embedding. The author has an hindex of 16, co-authored 140 publications receiving 1015 citations. Previous affiliations of M. Anand Kumar include Amrita Vishwa Vidyapeetham.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2012
2011
2010
2009
2008

Papers

PDF

Open Access

More filters

A sequence labeling approach to morphological analyzer for Tamil language

[...]

M. Anand Kumar, Dhanalakshmi, K P Soman, S. Rajendran

01 Jan 2010

TL;DR: A novel approach is proposed to solve the morphological analyzer problem using machine learning methodology based on sequence labeling and training by kernel methods that captures the non linear relationships of the Morphological features from training data samples in a better and simpler way.

...read moreread less

Abstract: Morphological analysis is the basic process for any Natural Language Processing task. Morphology is the study of internal structure of the word. Morphological analysis retrieves the grammatical features and properties of a morphologically inflected word. Capturing the agglutinative structure of Tamil words by an automatic system is a challenging job. Generally rule based approaches are used for building morphological analyzer. In this paper we propose a novel approach to solve the morphological analyzer problem using machine learning methodology. Here morphological analyzer problem is redefined as classification problem. This approach is based on sequence labeling and training by kernel methods that captures the non linear relationships of the morphological features from training data samples in a better and simpler way. Keywordsmorphology; morphological analyzer; machine learning; sequence labeling.

...read moreread less

47 citations

Journal Article•DOI•

A deep learning approach for Malayalam morphological analysis at character level

[...]

B. Premjith¹, K. P. Soman¹, M. Anand Kumar¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

01 Jan 2018-Procedia Computer Science

TL;DR: A deep learning approach is proposed for learning the rules for identifying the morphemes automatically and segmenting them from the original word, to identify the grammatical structure of the word.

...read moreread less

36 citations

Proceedings Article•DOI•

Morphological Analyzer for Agglutinative Languages Using Machine Learning Approaches

[...]

V. Dhanalakshmi¹, M. Anand Kumar¹, R. U. Rekha¹, C. Arun Kumar¹, K. P. Soman¹, S. Rajendran² - Show less +2 more•Institutions (2)

Amrita Vishwa Vidyapeetham¹, Tamil University²

27 Oct 2009

TL;DR: This new and state of the art machine learning approach based on sequence labeling and training by kernel methods captures the non-linear relationships in the different aspect of morphological features of natural languages in a better and simpler way.

...read moreread less

Abstract: This paper is based on morphological analyzer using machine learning approach for complex agglutinative natural languages. Morphological analysis is concerned with retrieving the structure, the syntactic and morphological properties or the meaning of a morphologically complex word. The morphology structure of agglutinative language is unique and capturing its complexity in a machine analyzable and generatable format is a challenging job. Generally rule based approaches are used for building morphological analyzer system. In rule based approaches what works in the forward direction may not work in the backward direction. This new and state of the art machine learning approach based on sequence labeling and training by kernel methods captures the non-linear relationships in the different aspect of morphological features of natural languages in a better and simpler way. The overall accuracy obtained for the morphologically rich agglutinative language (Tamil) was really encouraging.

...read moreread less

34 citations

Journal Article•DOI•

Predicting the Sentimental Reviews in Tamil Movie using Machine Learning Algorithms

[...]

Shriya Se¹, R. Vinayakumar¹, M. Anand Kumar¹, K. P. Soman¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

07 Dec 2016-Indian journal of science and technology

TL;DR: SVM algorithm performs well in classifying the Tamil movie reviews when compared with other machine learning algorithms and both cross validation and accuracy of the algorithm shows that SVM performs well.

...read moreread less

Abstract: Objective: This paper aims at classifying the Tamil movie reviews as positive and negative using supervised machine learning algorithms. Methods/Analysis: A novel machine learning approaches are needed for analyzing the Social media text where the data are increasing exponentially. Here, in this work, Machine learning algorithms such as SVM, Maxent classifier, Decision tree and Naive Bayes are used for classifying Tamil movie reviews into positive and negative. Features are also extracted from TamilSentiwordnet. Findings: The dataset for this work has been prepared. SVM algorithm performs well in classifying the Tamil movie reviews when compared with other machine learning algorithms. Both cross validation and accuracy of the algorithm shows that SVM performs well. Other than SVM, Decision tree perform well in classifying the Tamil reviews. Novelty/Improvement: SVM gives an accuracy of 75.9% for classifying Tamil movie reviews which is a good milestone in the research field of Tamil language.

...read moreread less

31 citations

Journal Article•DOI•

Neural Machine Translation System for English to Indian Language Translation Using MTIL Parallel Corpus

[...]

B. Premjith, M. Anand Kumar, K. P. Soman

26 Jul 2019-Journal of intelligent systems

TL;DR: A neural machine translation system for four language pairs, designed with long short-term memory (LSTM) networks and bi-directional recurrent neural networks (Bi-RNN) and able to perceive long-term contexts in the sentences.

...read moreread less

Abstract: Abstract Introduction of deep neural networks to the machine translation research ameliorated conventional machine translation systems in multiple ways, specifically in terms of translation quality. The ability of deep neural networks to learn a sensible representation of words is one of the major reasons for this improvement. Despite machine translation using deep neural architecture is showing state-of-the-art results in translating European languages, we cannot directly apply these algorithms in Indian languages mainly because of two reasons: unavailability of the good corpus and Indian languages are morphologically rich. In this paper, we propose a neural machine translation (NMT) system for four language pairs: English–Malayalam, English–Hindi, English–Tamil, and English–Punjabi. We also collected sentences from different sources and cleaned them to make four parallel corpora for each of the language pairs, and then used them to model the translation system. The encoder network in the NMT architecture was designed with long short-term memory (LSTM) networks and bi-directional recurrent neural networks (Bi-RNN). Evaluation of the obtained models was performed both automatically and manually. For automatic evaluation, the bilingual evaluation understudy (BLEU) score was used, and for manual evaluation, three metrics such as adequacy, fluency, and overall ranking were used. Analysis of the results showed the presence of lengthy sentences in English–Malayalam, and the English–Hindi corpus affected the translation. Attention mechanism was employed with a view to addressing the problem of translating lengthy sentences (sentences contain more than 50 words), and the system was able to perceive long-term contexts in the sentences.

...read moreread less

29 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Learner English: A Teacher's Guide to Interference and Other Problems Second Edition [Book Review]

[...]

Anj Foley

01 Feb 2002-TESOL in context

TL;DR: Review(s) of: Learner English: A Teacher's Guide to Interference and Other Problems, Second Edition, by Michael Swan and Bernard Smith.

...read moreread less

Abstract: Review(s) of: Learner English: A Teacher's Guide to Interference and Other Problems, Second Edition, by Michael Swan and Bernard Smith. Cambridge: Cambridge University Press, 2001.

...read moreread less

292 citations

Proceedings Article•DOI•

IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages

[...]

Divyanshu Kakwani, Anoop Kunchukuttan¹, Satish Golla, N C Gokul, Avik Bhattacharyya², Mitesh M. Khapra², Pratyush Kumar² - Show less +3 more•Institutions (2)

Microsoft¹, Indian Institute of Technology Madras²

01 Nov 2020

TL;DR: This paper introduces NLP resources for 11 major Indian languages from two major language families, and creates datasets for the following tasks: Article Genre Classification, Headline Prediction, Wikipedia Section-Title Prediction, Cloze-style Multiple choice QA, Winograd NLI and COPA.

...read moreread less

Abstract: In this paper, we introduce NLP resources for 11 major Indian languages from two major language families. These resources include: (a) large-scale sentence-level monolingual corpora, (b) pre-trained word embeddings, (c) pre-trained language models, and (d) multiple NLU evaluation datasets (IndicGLUE benchmark). The monolingual corpora contains a total of 8.8 billion tokens across all 11 languages and Indian English, primarily sourced from news crawls. The word embeddings are based on FastText, hence suitable for handling morphological complexity of Indian languages. The pre-trained language models are based on the compact ALBERT model. Lastly, we compile the (IndicGLUE benchmark for Indian language NLU. To this end, we create datasets for the following tasks: Article Genre Classification, Headline Prediction, Wikipedia Section-Title Prediction, Cloze-style Multiple choice QA, Winograd NLI and COPA. We also include publicly available datasets for some Indic languages for tasks like Named Entity Recognition, Cross-lingual Sentence Retrieval, Paraphrase detection, etc. Our embeddings are competitive or better than existing pre-trained embeddings on multiple tasks. We hope that the availability of the dataset will accelerate Indic NLP research which has the potential to impact more than a billion people. It can also help the community in evaluating advances in NLP over a more diverse pool of languages. The data and models are available at https://indicnlp.ai4bharat.org.

...read moreread less

257 citations

Proceedings Article•DOI•

Corpus creation for sentiment analysis in code-mixed Tamil-English text

[...]

Bharathi Raja Chakravarthi¹, Vigneshwaran Muralidaran², Ruba Priyadharshini³, John P. McCrae⁴•Institutions (4)

National University of Ireland, Galway¹, Cardiff University², ULTra³, National University of Ireland⁴

11 May 2020

TL;DR: A gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube is created and inter-annotator agreement is presented, and the results of sentiment analysis trained on this corpus are shown.

...read moreread less

Abstract: Understanding the sentiment of a comment from a video or an image is an essential task in many applications. Sentiment analysis of a text can be useful for various decision-making processes. One such application is to analyse the popular sentiments of videos on social media based on viewer comments. However, comments from social media do not follow strict rules of grammar, and they contain mixing of more than one language, often written in non-native scripts. Non-availability of annotated code-mixed data for a low-resourced language like Tamil also adds difficulty to this problem. To overcome this, we created a gold standard Tamil-English code-switched, sentiment-annotated corpus containing 15,744 comment posts from YouTube. In this paper, we describe the process of creating the corpus and assigning polarities. We present inter-annotator agreement and show the results of sentiment analysis trained on this corpus as a benchmark.

...read moreread less

168 citations

Proceedings Article•DOI•

Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German

[...]

Thomas Mandl¹, Sandip Modha, Anand Kumar M², Bharathi Raja Chakravarthi³•Institutions (3)

University of Hildesheim¹, National Institute of Technology, Karnataka², National University of Ireland³

16 Dec 2020

TL;DR: The HASOC track as mentioned in this paper is dedicated to evaluate technology for finding offensive language and hate speech, which has attracted much interest and over 40 research groups have participated as well as described their approaches in papers.

...read moreread less

Abstract: This paper presents the HASOC track and its two parts. HASOC is dedicated to evaluate technology for finding Offensive Language and Hate Speech. HASOC is creating test collections for languages with few resources and English for comparison. The first track within HASOC has continued work from 2019 and provided a testbed of Twitter posts for Hindi, German and English. The second track within HASOC has created test resources for Tamil and Malayalam in native and Latin script. Posts were extracted mainly from Youtube and Twitter. Both tracks have attracted much interest and over 40 research groups have participated as well as described their approaches in papers. In this overview, we present the tasks, the data and the main results.

...read moreread less

127 citations

Journal Article•DOI•

Hatred and trolling detection transliteration framework using hierarchical LSTM in code-mixed social media text

[...]

Shashi Shekhar¹, Hitendra Garg¹, Rohit Agrawal¹, Shivendra Shivani², Bhisham Sharma³ - Show less +1 more•Institutions (3)

GLA University¹, Thapar University², Chitkara University³

17 Aug 2021-Complex & Intelligent Systems

TL;DR: The paper describes the usage of self-learning Hierarchical LSTM technique for classifying hatred and trolling contents in social media code-mixed data and the method developed based on HLSTM model helps in recognizing the hatred word context by mining the intention of the user for using that word in the sentence.

...read moreread less

Abstract: The paper describes the usage of self-learning Hierarchical LSTM technique for classifying hatred and trolling contents in social media code-mixed data. The Hierarchical LSTM-based learning is a novel learning architecture inspired from the neural learning models. The proposed HLSTM model is trained to identify the hatred and trolling words available in social media contents. The proposed HLSTM systems model is equipped with self-learning and predicting mechanism for annotating hatred words in transliteration domain. The Hindi–English data are ordered into Hindi, English, and hatred labels for classification. The mechanism of word embedding and character-embedding features are used here for word representation in the sentence to detect hatred words. The method developed based on HLSTM model helps in recognizing the hatred word context by mining the intention of the user for using that word in the sentence. Wide experiments suggests that the HLSTM-based classification model gives the accuracy of 97.49% when evaluated against the standard parameters like BLSTM, CRF, LR, SVM, Random Forest and Decision Tree models especially when there are some hatred and trolling words in the social media data.

...read moreread less

111 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188

Collapse