scispace - formally typeset
Search or ask a question
Author

K. Usha

Bio: K. Usha is an academic researcher from Pondicherry Engineering College. The author has contributed to research in topics: Hidden Markov model & Malayalam. The author has co-authored 1 publications.

Papers
More filters
Book ChapterDOI
TL;DR: This work proposed a comparison of the Malayalam POS tagger using the SVM and Hidden Markov model (HMM); the tagset used was the popular Bureau of Indian Standard (BIS) tag set.
Abstract: Many Parts Of Speech (POS) taggers for the Malayalam language has been implemented using Support Vector Machine (SVM), Memory-Based Language Processing (MBLP), Hidden Markov Model (HMM) and other similar techniques. The objective was to find an improved POS tagger for the Malayalam language. This work proposed a comparison of the Malayalam POS tagger using the SVM and Hidden Markov model (HMM). The tagset used was the popular Bureau of Indian Standard (BIS) tag set. A manually created data set which has around 52,000 words has been taken from various Malayalam news sites. The preprocessing steps that have done for news text are also mentioned. Then POS tagging has been done using SVM and HMM. As POS tagging requires the extraction of multiple class labels, a multi-class SVM is used. It also performs feature extraction, feature selection, and classification. The word sense disambiguation and misclassification of words are the two major issues identified in SVM. Hidden Markov Model predicts the hidden sequence based on maximum observation likelihood which reduces ambiguity and misclassification rate.

1 citations


Cited by
More filters
Proceedings ArticleDOI
01 Jun 2022
TL;DR: This paper presents the effort in developing a Maithili Part of Speech (POS) tagger, which employs several recurrent neural networks (RNN) based models, including Long-short Term Memory (LSTM), Gated Recurrent Unit (GRU), LSTM with a CRF layer (L STM-CRF), and GRU with aCRF layer(s) and performs a comparative study.
Abstract: This article presents our effort in developing a Maithili Part of Speech (POS) tagger. Substantial effort has been devoted to developing POS taggers in several Indian languages, including Hindi, Bengali, Tamil, Telugu, Kannada, Punjabi, and Marathi; but Maithili did not achieve much attention from the research community. Maithili is one of the official languages of India, with around 50 million native speakers. So, we worked on developing a POS tagger in Maithili. For the development, we use a manually annotated in-house Maithili corpus containing 56,126 tokens. The tagset contains 27 tags. We train a conditional random fields (CRF) classifier to prepare a baseline system that achieves an accuracy of 82.67%. Then, we employ several recurrent neural networks (RNN)-based models, including Long-short Term Memory (LSTM), Gated Recurrent Unit (GRU), LSTM with a CRF layer (LSTM-CRF), and GRU with a CRF layer (GRU-CRF) and perform a comparative study. We also study the effect of both word embedding and character embedding in the task. The highest accuracy of the system is 91.53%.

1 citations