scispace - formally typeset
Journal ArticleDOI

Indic language computing

TLDR
India’s Ministry of Human Resource Development (MHRD) wants lectures on Swayam and NPTELb—the online teaching platforms—to be translated into all Indian languages, which will be great enablers for the marginalized sections of society.
Abstract
I M A G E B Y J O A T IN APRIL 2019, following the Easter Sunday bomb attacks, the Government of Sri Lanka had to shut down Facebook and YouTube for nine days to stop the spreading of hate speech and false news, posted mainly in the local languages Sinhala and Tamil. This came about simply because these social media platforms did not have the capability to detect and warn about the provocative content. India’s Ministry of Human Resource Development (MHRD) wants lectures on Swayam and NPTELb—the online teaching platforms—to be translated into all Indian languages. Approximately 2.5 million students use the Swayam lectures on computer science alone. The lectures are in English, which students find difficult to understand. A large number of lectures are manually subtitled in English. Automatic speech recognition and machine translation into Indian languages will be great enablers for the marginalized sections of society. Requirements like these are real and abundant.

read more

Citations
More filters
Journal ArticleDOI

ThamizhiMorph: A morphological parser for the Tamil language

TL;DR: In this article, an open source and extendable Morphological Analyser cum Generator (MAG) for Tamil named ThamizhiMorph is presented, which is designed using a Finite State Transducer (FST) and implemented using Foma.
Proceedings ArticleDOI

How low is too low? A monolingual take on lemmatisation in Indian languages

TL;DR: It is shown that monolingual approaches with data augmentation can give competitive accuracy even in the low resource setting, which augurs well for NLP in low resourceSetting.
Proceedings ArticleDOI

Finite-state script normalization and processing utilities: The Nisaba Brahmic library

TL;DR: In this article, the authors present an open-source library for efficient low-level processing of ten major South Asian Brahmic scripts, such as NFC, visual normalization, reversible transliteration, and validity checks, implemented in Python within a finite state transducer formalism.
References
More filters
Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Book

Sentiment Analysis and Opinion Mining

TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.

Europarl: A Parallel Corpus for Statistical Machine Translation

Philipp Koehn
TL;DR: A corpus of parallel text in 11 languages from the proceedings of the European Parliament is collected and its acquisition and application as training data for statistical machine translation (SMT) is focused on.
Book

South Asian Languages: A Syntactic Typology

TL;DR: The authors explored the similarities and differences of about forty languages from the four different language families (Austro-Asiatic, Dravidian, Indo-Aryan (Indo-European) and Tibeto-Burman (Sino-Tibetan)).
Proceedings Article

Shata-Anuvadak: Tackling Multiway Translation of Indian Languages

TL;DR: A compendium of 110 Statistical Machine Translation systems built from parallel corpora of 11 Indian languages belonging to both Indo-Aryan and Dravidian families is presented and the relationship between translation accuracy and the language families involved is analyzed.