scispace - formally typeset
K

Kalika Bali

Researcher at Microsoft

Publications -  86
Citations -  2284

Kalika Bali is an academic researcher from Microsoft. The author has contributed to research in topics: Computer science & Hindi. The author has an hindex of 20, co-authored 72 publications receiving 1637 citations. Previous affiliations of Kalika Bali include Hewlett-Packard.

Papers
More filters
Posted Content

The State and Fate of Linguistic Diversity and Inclusion in the NLP World

TL;DR: The relation between the types of languages, resources, and their representation in NLP conferences is looked at to understand the trajectory that different languages have followed over time and underlines the disparity between languages.
Proceedings ArticleDOI

POS Tagging of English-Hindi Code-Mixed Social Media Content

TL;DR: The initial efforts to create a multi-level annotated corpus of Hindi-English codemixed text collated from Facebook forums are described, and language identification, back-transliteration, normalization and POS tagging of this data are explored.
Proceedings ArticleDOI

The State and Fate of Linguistic Diversity and Inclusion in the NLP World

TL;DR: In this article, the authors look at the relation between the types of languages, resources, and their representation in NLP conferences to understand the trajectory that different languages have followed over time.
Proceedings ArticleDOI

“I am borrowing ya mixing ?†An Analysis of English-Hindi Code Mixing in Facebook

TL;DR: The classification of Code-Mixed words based on frequency and linguistic typology underline the fact that while there are easily identifiable cases of borrowing and mixing at the two ends, a large majority of the words form a continuum in the middle, emphasizing the need to handle these at different levels for automatic processing of the data.
Proceedings ArticleDOI

Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data

TL;DR: A computational technique for creation of grammatically valid artificial CM data based on the Equivalence Constraint Theory is presented and it is shown that when training examples are sampled appropriately from this synthetic data and presented in certain order, it can significantly reduce the perplexity of an RNN-based language model.