scispace - formally typeset
Open AccessProceedings Article

The TDIL Program and the Indian Langauge Corpora Intitiative (ILCI).

Reads0
Chats0
TLDR
The government of India through various ministries and a think tank consisting of eminent linguistics and policy makers has done a commendable job despite the obvious roadblocks on language development and maintenance in the age of technology.
Abstract
India is considered a linguistic ocean with 4 language families and 22 scheduled national languages, and 100 un-scheduled languages reported by the 2001 census. This puts tremendous pressures on the Indian government to not only have comprehensive language policies, but also to create resources for their maintenance and development. In the age of information technology, there is a greater need to have a fine balance between allocation of resources to each language keeping in view the political compulsions, electoral potential of a linguistic community and other issues. In this connection, the government of India through various ministries and a think tank consisting of eminent linguistics and policy makers has done a commendable job despite the obvious roadblocks. This paper describes the Indian government’s policies towards language development and maintenance in the age of technology through the Ministry of HRD through its various agencies and the Ministry of Communications & Information Technology (MCIT) through its dedicated program called TDIL (Technology Development for Indian Languages). The paper also describes some of the recent activities of the TDIL in general and in particular, an innovative corpora project called ILCI - Indian Languages Corpora Initiative.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A Survey of Multilingual Neural Machine Translation

TL;DR: The authors presented a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in recent years and many approaches have been proposed to exploit multilingual parallel corpora for improving translation quality.
Proceedings ArticleDOI

Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation

TL;DR: This paper reports on a systematic comparison of multistage fine-tuning configurations, confirming that multi-parallel corpora are extremely useful despite their scarcity and content-wise redundancy thus exhibiting the true power of multilingualism.
Proceedings Article

Shata-Anuvadak: Tackling Multiway Translation of Indian Languages

TL;DR: A compendium of 110 Statistical Machine Translation systems built from parallel corpora of 11 Indian languages belonging to both Indo-Aryan and Dravidian families is presented and the relationship between translation accuracy and the language families involved is analyzed.
Proceedings ArticleDOI

Universal Dependency Parsing for Hindi-English Code-switching

TL;DR: This paper presents a treebank of Hindi-English code-switching tweets under Universal Dependencies scheme and proposes a neural stacking model for parsing that efficiently leverages the part-of-speech tag and syntactic tree annotations in the code- Switching treebank and the preexisting Hindi and English treebanks.
Proceedings ArticleDOI

Meaningless yet meaningful: Morphology grounded subword-level NMT

TL;DR: A combined approach of these two segmentation algorithms Morfessor-B PE (M-BPE) which outperforms these two baseline systems in terms of BLEU score is proposed.
References
More filters
Related Papers (5)