Deep Learning Based Unsupervised POS Tagging for Sanskrit

doi:10.1145/3302425.3302487

Proceedings ArticleDOI

Deep Learning Based Unsupervised POS Tagging for Sanskrit

Prakhar Srivastava, +5 more

Chats0

TLDR

A deep learning based approach to assign POS tags to words in a piece of text given to it as input and uses the untagged Sanskrit Corpus prepared by JNU for the tag assignment purpose and determining model accuracy.

Abstract:

In this paper, we present a deep learning based approach to assign POS tags to words in a piece of text given to it as input. We propose an unsupervised approach owing to the lack of a large Sanskrit annotated corpora and use the untagged Sanskrit Corpus prepared by JNU for our purpose. The only tagged corpora for Sanskrit is created by JNU which has 115,000 words which are not sufficient to apply supervised deep learning approaches. For the tag assignment purpose and determining model accuracy, we utilize this tagged corpus. We explore various methods through which each Sanskrit word can be represented as a point multi-dimensional vector space whose position accurately captures its meaning and semantic information associated with it. We also explore other data sources to improve performance and robustness of the vector representations. We use these rich vector representations and explore autoencoder based approaches for dimensionality reduction to compress these into encodings which are suitable for clustering in the vector space. We experiment with different dimensions of these compressed representations and present one which was found to offer the best clustering performance. For modelling the sequence in order to preserve the semantic information we feed these embeddings to a bidirectional LSTM autoencoder. We assign a POS tag to each of the obtained clusters and produce our result by testing the model on the tagged corpus.

Deep Learning Based Unsupervised POS Tagging for Sanskrit

Citations

Part of speech tagging: a systematic review of deep learning and machine learning approaches

Part of speech tagging: a systematic review of deep learning and machine learning approaches

Peer Analysis of “Sanguj” with Other Sanskrit Morphological Analyzers

Rule based approach for compound segmentation and paraphrase generation in Sanskrit

Kannada Grammar Checker Using LSTM Neural Network

References

SVD and Clustering for Unsupervised POS Tagging

SanskritTagger: A Stochastic Lexical and POS Tagger for Sanskrit

Deep learning based parts of speech tagger for Bengali

Treebank based deep grammar acquisition and Part-Of-Speech Tagging for Sanskrit sentences

Related Papers (5)

Character-based feature extraction with LSTM networks for POS-tagging task

Weakly Supervised POS Tagging without Disambiguation

Unsupervised Part-of-Speech Tagging in the Large

Multilingual POS tagging by a composite deep architecture based on character-level features and on-the-fly enriched Word Embeddings

Corpus based part-of-speech tagging