scispace - formally typeset
Proceedings ArticleDOI

Deep Learning Based Unsupervised POS Tagging for Sanskrit

Reads0
Chats0
TLDR
A deep learning based approach to assign POS tags to words in a piece of text given to it as input and uses the untagged Sanskrit Corpus prepared by JNU for the tag assignment purpose and determining model accuracy.
Abstract
In this paper, we present a deep learning based approach to assign POS tags to words in a piece of text given to it as input. We propose an unsupervised approach owing to the lack of a large Sanskrit annotated corpora and use the untagged Sanskrit Corpus prepared by JNU for our purpose. The only tagged corpora for Sanskrit is created by JNU which has 115,000 words which are not sufficient to apply supervised deep learning approaches. For the tag assignment purpose and determining model accuracy, we utilize this tagged corpus. We explore various methods through which each Sanskrit word can be represented as a point multi-dimensional vector space whose position accurately captures its meaning and semantic information associated with it. We also explore other data sources to improve performance and robustness of the vector representations. We use these rich vector representations and explore autoencoder based approaches for dimensionality reduction to compress these into encodings which are suitable for clustering in the vector space. We experiment with different dimensions of these compressed representations and present one which was found to offer the best clustering performance. For modelling the sequence in order to preserve the semantic information we feed these embeddings to a bidirectional LSTM autoencoder. We assign a POS tag to each of the obtained clusters and produce our result by testing the model on the tagged corpus.

read more

Citations
More filters
Journal ArticleDOI

Part of speech tagging: a systematic review of deep learning and machine learning approaches

TL;DR: A comprehensive review of the latest POS tagging articles is provided by discussing the weakness and strengths of the proposed approaches as mentioned in this paper , which emphasized various research gaps and presented future recommendations for the research in advancing DL and ML-based POS tagging.
Journal ArticleDOI

Part of speech tagging: a systematic review of deep learning and machine learning approaches

TL;DR: A comprehensive review of the latest POS tagging articles is provided by discussing the weakness and strengths of the proposed approaches as discussed by the authors , which emphasized various research gaps and presented future recommendations for the research in advancing DL and ML-based POS tagging.
Book ChapterDOI

Peer Analysis of “Sanguj” with Other Sanskrit Morphological Analyzers

TL;DR: Here, 328 Sanskrit words are tested through four morphological analyzers namely—Samsaadhanii, morphological Analyzers by JNU and TDIL, both of which are available online and locally developed and installed Sanguj morphological analyzezer.
Journal ArticleDOI

Rule based approach for compound segmentation and paraphrase generation in Sanskrit

TL;DR: A rule-based approach is implemented that makes use of a set of rules for compound type identification and paraphrase generation that gives an accuracy of 83% as compared with the existing machine learning based system, which has accuracy of 77%.
Proceedings ArticleDOI

Kannada Grammar Checker Using LSTM Neural Network

TL;DR: A model is advocated that employs a deep learning method to train the LSTM (Long Short Term Memory) neural network trained over a massive data set to fulfill the necessary categorisation, using a context-based retention of the data attained through Word2Vec along with the TensorFlow and Keras packages.
References
More filters
Proceedings Article

SVD and Clustering for Unsupervised POS Tagging

TL;DR: The algorithm of Schutze (1995) for unsupervised part-of-speech tagging is revisited, this time using reduced-rank singular value decomposition followed by clustering to extract latent features from context distributions and can produce a range of finer-grained taggings.
Book ChapterDOI

SanskritTagger: A Stochastic Lexical and POS Tagger for Sanskrit

TL;DR: The tagging process is sketched, the results of tagging a few short passages of Sanskrit text are reported and further improvements of the program are described.
Proceedings ArticleDOI

Deep learning based parts of speech tagger for Bengali

TL;DR: The Part of Speech (POS) tagger for Bengali Language is described and it is observed from the experiments based on Linguistic Data Consortium (LDC) catalog number LDC2010T16 and ISBN 1-58563-561-8 corpus that 93.33% accuracy is obtained for Bengalis POS tagger using the Deep Learning.
Proceedings ArticleDOI

Treebank based deep grammar acquisition and Part-Of-Speech Tagging for Sanskrit sentences

TL;DR: This work presents simple rule-based POST for Sanskrit language that uses rule based approach to tag each word of the sentence and assigns suitable tag to each word automatically in the given Sanskrit sentence.
Related Papers (5)