scispace - formally typeset
Search or ask a question
Author

Diganta Baishya

Bio: Diganta Baishya is an academic researcher from Jorhat Engineering College. The author has contributed to research in topics: Deep learning & Bigram. The author has an hindex of 1, co-authored 3 publications receiving 1 citations.

Papers
More filters
Proceedings ArticleDOI
27 Jan 2021
TL;DR: A review of the advances in this area for one of the Low Resource Indian languages -Assamese, the official language of Assam, the gateway to North East India is presented in this article.
Abstract: Natural language processing has been a challenging area for researchers in recent times. A lot of research works are conducted to improve the performance of natural language processing that have helped many applications to be made available for our day to day life. Most of the developments have happened only for a few western languages. Researchers have also published some quality works related to Indian languages in recent times. However, the research works conducted for most of the other languages spoken in India, the country with the second largest population in the world, are at a very primitive stage. This paper aims to review the advances in this area for one of the Low Resource Indian languages –“Assamese”, the official language of Assam, the gateway to North East India. It is also spoken in many other states of North Eastern part of India. In this work, we present a highlight of research works related to various methods that are applied to Assamese Text processing. It is observed that certain language characteristics of Assamese are best applicable to certain popular methods used for Natural Language Processing. The reason behind this relook is to overview and report the present state and future possibilities of research in text processing in Assamese. This paper should serve as a good beginning for anyone interested to carry forward the computational works for Assamese. We conclude the paper with a brief discussion on the scope of Assamese Text processing in the future.

2 citations

Journal ArticleDOI
TL;DR: In this paper, a deep neural network model was proposed to improve the accuracy of parts-of-speech tagging in low-resource languages, such as Assamese and English.
Abstract: Over the years, many different algorithms are proposed to improve the accuracy of the automatic parts of speech tagging. High accuracy of parts of speech tagging is very important for any NLP application. Powerful models like The Hidden Markov Model (HMM), used for this purpose require a huge amount of training data and are also less accurate to detect unknown (untrained) words. Most of the languages in this world lack enough resources in the computable form to be used during training such models. NLP applications for such languages also encounter many unknown words during execution. This results in a low accuracy rate. Improving accuracy for such low-resource languages is an open problem. In this paper, one stochastic method and a deep learning model are proposed to improve accuracy for such languages. The proposed language-independent methods improve unknown word accuracy and overall accuracy with a low amount of training data. At first, bigrams and trigrams of characters that are already part of training samples are used to calculate the maximum likelihood for tagging unknown words using the Viterbi algorithm and HMM. With training datasets below the size of 10K, an improvement of 12% to 14% accuracy has been achieved. Next, a deep neural network model is also proposed to work with a very low amount of training data. It is based on word level, character level, character bigram level, and character trigram level representations to perform parts of speech tagging with less amount of available training data. The model improves the overall accuracy of the tagger along with improving accuracy for unknown words. Results for “English” and a low resource Indian Language “Assamese” are discussed in detail. Performance is better than many state-of-the-art techniques for low resource language. The method is generic and can be used with any language with very less amount of training data.

1 citations

Proceedings ArticleDOI
05 Nov 2020
TL;DR: In this article, various methods to read data in unstructured prescriptions and to convert them to a structured form were discussed and the designing and implementation of an application that reminds the user to take their medication in time, as prescribed by the doctor were discussed.
Abstract: Health and Medication are equally important for every section of society, be it rich or poor, educated, or uneducated. But, Doctors’ prescriptions are not easy to use for many. The challenge is to automatically read the data from advice slips or prescriptions and to make it explanatory to all sections of people. With advancements of computers, smartphones, etc., and their usage by all sections of society, designing applications for reading unstructured data and presenting it to consumers in a structured and useful way is a great benefit to society at large. The information are not only presented to consumers but also alarming consumers about their medications and body needs in time. In this article, various methods to read data in unstructured prescriptions and to convert them to a structured form were discussed. Also, the designing and implementation of an application that reminds the user to take their medication in time, as prescribed by the doctor were discussed.

1 citations


Cited by
More filters
Proceedings ArticleDOI
27 Jan 2021
TL;DR: A review of the advances in this area for one of the Low Resource Indian languages -Assamese, the official language of Assam, the gateway to North East India is presented in this article.
Abstract: Natural language processing has been a challenging area for researchers in recent times. A lot of research works are conducted to improve the performance of natural language processing that have helped many applications to be made available for our day to day life. Most of the developments have happened only for a few western languages. Researchers have also published some quality works related to Indian languages in recent times. However, the research works conducted for most of the other languages spoken in India, the country with the second largest population in the world, are at a very primitive stage. This paper aims to review the advances in this area for one of the Low Resource Indian languages –“Assamese”, the official language of Assam, the gateway to North East India. It is also spoken in many other states of North Eastern part of India. In this work, we present a highlight of research works related to various methods that are applied to Assamese Text processing. It is observed that certain language characteristics of Assamese are best applicable to certain popular methods used for Natural Language Processing. The reason behind this relook is to overview and report the present state and future possibilities of research in text processing in Assamese. This paper should serve as a good beginning for anyone interested to carry forward the computational works for Assamese. We conclude the paper with a brief discussion on the scope of Assamese Text processing in the future.

2 citations

Proceedings ArticleDOI
23 Mar 2023
TL;DR: In this paper , the effect of training size for part-of-speech tagging was investigated for English and Assamese languages, and experiments were conducted to understand the training size required for standard techniques to perform with high accuracy.
Abstract: The advancement of research in the field of natural language processing has made peoples’ daily lives much easier, with numerous applications at the disposal. The traditional methods like the Hidden Markov Model, the CRF classifier, the Naive Bayes classifier, and others are being replaced by neural networks in recent times. However, most of these methods work considerably well only with huge amounts of training data and, hence are not suitable for languages that are poor in terms of trainable resources. The challenge is to make the system work considerably well with minimal training. This paper presents research work to understand the effect of training size for part of speech tagging, which is one of the preliminary tasks for any NLP application. Experiments are conducted to understand the training size required for standard techniques to perform with high accuracy. The results of the experiments conducted for English and Assamese are presented in this paper.
Proceedings ArticleDOI
23 Mar 2023
TL;DR: In this article , the effect of training size for part-of-speech tagging was investigated for English and Assamese languages, and experiments were conducted to understand the training size required for standard techniques to perform with high accuracy.
Abstract: The advancement of research in the field of natural language processing has made peoples’ daily lives much easier, with numerous applications at the disposal. The traditional methods like the Hidden Markov Model, the CRF classifier, the Naive Bayes classifier, and others are being replaced by neural networks in recent times. However, most of these methods work considerably well only with huge amounts of training data and, hence are not suitable for languages that are poor in terms of trainable resources. The challenge is to make the system work considerably well with minimal training. This paper presents research work to understand the effect of training size for part of speech tagging, which is one of the preliminary tasks for any NLP application. Experiments are conducted to understand the training size required for standard techniques to perform with high accuracy. The results of the experiments conducted for English and Assamese are presented in this paper.