K
Kalika Bali
Researcher at Microsoft
Publications - 86
Citations - 2284
Kalika Bali is an academic researcher from Microsoft. The author has contributed to research in topics: Computer science & Hindi. The author has an hindex of 20, co-authored 72 publications receiving 1637 citations. Previous affiliations of Kalika Bali include Hewlett-Packard.
Papers
More filters
Posted Content
The State and Fate of Linguistic Diversity and Inclusion in the NLP World
TL;DR: The relation between the types of languages, resources, and their representation in NLP conferences is looked at to understand the trajectory that different languages have followed over time and underlines the disparity between languages.
Proceedings ArticleDOI
POS Tagging of English-Hindi Code-Mixed Social Media Content
TL;DR: The initial efforts to create a multi-level annotated corpus of Hindi-English codemixed text collated from Facebook forums are described, and language identification, back-transliteration, normalization and POS tagging of this data are explored.
Proceedings ArticleDOI
The State and Fate of Linguistic Diversity and Inclusion in the NLP World
TL;DR: In this article, the authors look at the relation between the types of languages, resources, and their representation in NLP conferences to understand the trajectory that different languages have followed over time.
Proceedings ArticleDOI
“I am borrowing ya mixing ?†An Analysis of English-Hindi Code Mixing in Facebook
TL;DR: The classification of Code-Mixed words based on frequency and linguistic typology underline the fact that while there are easily identifiable cases of borrowing and mixing at the two ends, a large majority of the words form a continuum in the middle, emphasizing the need to handle these at different levels for automatic processing of the data.
Proceedings ArticleDOI
Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data
Adithya Pratapa,Gayatri Bhat,Monojit Choudhury,Sunayana Sitaram,Sandipan Dandapat,Kalika Bali +5 more
TL;DR: A computational technique for creation of grammatically valid artificial CM data based on the Equivalence Constraint Theory is presented and it is shown that when training examples are sampled appropriately from this synthetic data and presented in certain order, it can significantly reduce the perplexity of an RNN-based language model.