scispace - formally typeset
Search or ask a question

Showing papers on "Devanagari published in 2022"


Journal ArticleDOI
TL;DR: In this article, the authors focus on analyzing hate speech in Hindi-English code-switched language and explore transformation techniques to capture precise text representation, which shows a significant improvement in state-of-the-art scores on all three datasets.
Abstract: Warning: This manuscript may contain upsetting language. Social media has become a bedrock for people to voice their opinions worldwide. Due to the greater sense of freedom with the anonymity feature, it is possible to disregard social etiquette online and attack others without facing severe consequences, inevitably propagating hate speech. The current measures to sift the online content and offset the hatred spread do not go far enough. One factor contributing to this is the prevalence of regional languages in social media and the paucity of language flexible hate speech detectors. The proposed work focuses on analyzing hate speech in Hindi–English code-switched language. Our method explores transformation techniques to capture precise text representation. To contain the structure of data and yet use it with existing algorithms, we developed ‘MoH’ or (Map Only Hindi), which means ‘Love’ in Hindi. ‘MoH’ pipeline which consists of language identification, Roman to Devanagari Hindi transliteration using a knowledge base of Roman Hindi words, and finally employs the fine-tuned Multilingual Bert, and MuRIL language models. We conducted several quantitative experiment studies on three datasets, and evaluated performance using Precision, Recall and F1 metrics. The first experiment studies ‘MoH’ mapped text’s performance with classical machine learning models and shows an average increase of 13% in F1 scores. The second compares the proposed work’s scores with those of the baseline models and shows a rise in performance by 6%. Finally, the third compares the proposed ‘MoH’ technique with various data simulations using the existing transliteration library. Here, ‘MoH’ outperforms the rest by 15%. Our results demonstrate a significant improvement in the state-of-the-art scores on all three datasets.

18 citations


Journal ArticleDOI
TL;DR: In this article , the authors focus on analyzing hate speech in Hindi-English code-switched language and explore transformation techniques to capture precise text representation, which shows a significant improvement in state-of-the-art scores on all three datasets.
Abstract: Warning: This manuscript may contain upsetting language. Social media has become a bedrock for people to voice their opinions worldwide. Due to the greater sense of freedom with the anonymity feature, it is possible to disregard social etiquette online and attack others without facing severe consequences, inevitably propagating hate speech. The current measures to sift the online content and offset the hatred spread do not go far enough. One factor contributing to this is the prevalence of regional languages in social media and the paucity of language flexible hate speech detectors. The proposed work focuses on analyzing hate speech in Hindi–English code-switched language. Our method explores transformation techniques to capture precise text representation. To contain the structure of data and yet use it with existing algorithms, we developed ‘MoH’ or (Map Only Hindi), which means ‘Love’ in Hindi. ‘MoH’ pipeline which consists of language identification, Roman to Devanagari Hindi transliteration using a knowledge base of Roman Hindi words, and finally employs the fine-tuned Multilingual Bert, and MuRIL language models. We conducted several quantitative experiment studies on three datasets, and evaluated performance using Precision, Recall and F1 metrics. The first experiment studies ‘MoH’ mapped text’s performance with classical machine learning models and shows an average increase of 13% in F1 scores. The second compares the proposed work’s scores with those of the baseline models and shows a rise in performance by 6%. Finally, the third compares the proposed ‘MoH’ technique with various data simulations using the existing transliteration library. Here, ‘MoH’ outperforms the rest by 15%. Our results demonstrate a significant improvement in the state-of-the-art scores on all three datasets.

18 citations


Journal ArticleDOI
TL;DR: In this paper , a Convolutional Neural Network (CNN) was used for digitization of Devanagari handwritten text recognition (DHTR) using 46 classes of characters and each class has two thousand different images.

12 citations


Book ChapterDOI
01 Jan 2022
TL;DR: This article evaluated CNN, LSTM, ULMFiT, and BERT based models on two publicly available Marathi text classification datasets and presented a comparative analysis on word-based and pre-trained word embeddings by Facebook and IndicNLP.
Abstract: The Marathi language is one of the prominent languages used in India. It is predominantly spoken by the people of Maharashtra. Over the past decade, the usage of language on online platforms has tremendously increased. However, research on Natural Language Processing (NLP) approaches for Marathi text has not received much attention. Marathi is a morphologically rich language and uses a variant of the Devanagari script in the written form. This works aims to provide a comprehensive overview of available resources and models for Marathi text classification. We evaluate CNN, LSTM, ULMFiT, and BERT based models on two publicly available Marathi text classification datasets and present a comparative analysis. The pre-trained Marathi fast text word embeddings by Facebook and IndicNLP are used in conjunction with word-based models. We show that basic single layer models based on CNN and LSTM coupled with FastText embeddings perform on par with the BERT based models on the available datasets. We hope our paper aids focused research and experiments in the area of Marathi NLP.

7 citations


Journal ArticleDOI
30 Apr 2022
TL;DR: This work has collected 3900 distorted Hindi characters and tried to extract six different types of features from these characters to analyze the recognition accuracy and achieved maximum recognition accuracy of 91.1%.

6 citations






Journal ArticleDOI
TL;DR: In this article , the authors proposed a new notion of handwritten numerals on the hypothesis that the handwritten characters are distinct deformations of the printed forms, which leads to easier recognition task with higher accuracy when superimposing handwritten numeral images onto the corresponding printed numeral image.

4 citations




Book ChapterDOI
01 Jan 2022
TL;DR: An attempt has been made to construct and evaluate a simple individual learning algorithm using Keras to recognize isolated Devanagari handwritten characters datasets and assess the impact of variations in parameters in the learning phase.
Abstract: It is a very complicated task to recognize the handwritten characters and scanned data/images in recent years. The different sizes and writing methods of the characters play a critical role in clearly identifying the handwritten characters. This script's massive prevalence must be taken care of by using advanced technologies to connect to the real world to a greater depth. Machine Learning is one of the most popular technologies that has attracted the recent research work of handwritten character recognition using A.I. techniques. Various new technologies have been developed to execute fast neural networks with little exhaustive knowledge requirements. Here, we operate using Keras and Python libraries for building our model. The main aim of CNN is to recognize the training data and fit that training data into models that should help human beings. In this paper, an attempt has been made to construct and evaluate a simple individual learning algorithm (like k-means and SVM) using Keras to recognize isolated Devanagari handwritten characters datasets and assess the impact of variations in parameters in the learning phase. The proposed methodology gives a better result. The accuracy is better than individual algorithm performance.

Book ChapterDOI
13 Jul 2022
TL;DR: In this article , a no-segmentation approach was proposed to eliminate the need for segmentation in the handwritten Devanagari word recognition using a method analogous to human reading strategy.
Abstract: Handwritten script recognition is an important application in the machine learning domain and gaining more importance due to its numerous applications like automatic postal card sorting, digital signature verification, and processing of historical documents and also helps to develop applications for helping visually impaired people. English is the most widely spoken language that has witnessed much research work to read a script using the machine. Devanagari is also such a script that is used by a great number of people in the Indian Subcontinent. This chapter proposes Handwritten Devanagari Word Recognition (HDWR) using a method that is more analogous to human reading strategy. This chapter, contradictory to the traditional method, encourages a no-segmentation approach. Two novel approaches, Scan Profile and Sliding Window, have been introduced in this chapter to eliminate segmentation. First, the input word is passed to propose a no-segmentation approach where the window is defined using different methods. Secondly, each window is passed to the classifier for recognition purposes and the result is saved. Here no need to wait to segment the complete image. Immediately the window is recognized without knowing what is there in the window automatically by our designed state-of-the-art ResNet classifier like human reading. After one window recognition, the window moves next by the calculated stride to read the complete word. The proposed HDWR model successfully recognized Devanagari words with an accuracy of 86%.

Book ChapterDOI
01 Jan 2022
TL;DR: In this paper , a convolutional neural network model was used for the recognition of handwritten Devanagari compound characters and achieved the highest accuracy of 100% on their dataset.
Abstract: AbstractCharacter recognition is the most challenging research topic due to its diverse applicable environment. Numerous research on Devanagari basic characters has been conducted, but due to difficulties associated, research on handwritten compound characters has received very little attention. The dilemma becomes much more complicated as a result of the different authors writing styles and moods. The traditional machine earning approach of character recognition focuses more on feature extraction, whereas the deep learning approach is a subset of machine learning that uses deep neural networks for learning. For current research work, we have created our own dataset for handwritten Devanagari compound characters. Our dataset has 5000 instances of 50 classes of compound characters collected from various writers of different age groups. This paper presents a convolutional neural network model for the recognition of Devanagari compound characters. We have implemented the ResNet model of CNN and used ReLu as an activation function as it effectively trains deep neural networks. We have implemented three-layer CNN, four-layer CNN, and five-layer CNN on our dataset, and its results are compared. We have achieved the highest accuracy of 100% on our dataset.KeywordsHandwritten character recognitionDevanagari compound charactersCNNResNetReLu




Journal ArticleDOI
01 Dec 2022
TL;DR: In this paper , the authors proposed an efficient compact classification model called "DS-P3SNet" along with a knowledge distillation (KD) and transfer learning (TL) to mitigate the problems like computational complexity and overfitting.
Abstract: Deep convolution neural network and its ensemble variant-based classification methods of P300 in the Devanagari script (DS)-based P300 speller (DS-P3S) have generated numerous training parameters. This is likely to increase the problems like computational complexity and overfitting. The recent attempts of researchers to overcome these problems are further deteriorating the accuracy due to the dense connectivity and channel-mix group convolution. Moreover, compressing the deep models in these attempts also found losing vital information. Therefore, to mitigate these problems, an efficient compact classification model called “DS-P3SNet” along with a knowledge distillation (KD) and transfer learning (TL) is proposed in this article. It includes: 1) extraction of rich morphological information across temporal region; 2) combination of channelwise and channel-mix-depthwise convolution (C2-DwCN) for efficient channel selection and extraction of spatial information with less number of trainable parameters; 3) channelwise convolution (Cw-CN) for classification to provide sparse connectivity; 4) knowledge distillation to reduce the tradeoff between accuracy and the number of trainable parameters; and 5) subject-to-subject transfer of learning to reduce subject variability. The trial-to-trial transfer of learning to reduce the tradeoff between the number of trials and accuracy. The experimentations were performed on a self-generated dataset of 20 words comprising of 79 DS characters collected from ten volunteer healthy subjects. An average accuracy of $95.32~{\pm }~0.85$ % and $94.64~{\pm }~0.68$ % were obtained for subject-dependent and subject-independent experiments, respectively. The trainable parameters were also reduced approximately by 2–34 times compared to existing models with improved or equivalent performance.




Journal ArticleDOI
TL;DR: An finger point based signed language symbol to text identification and classification algorithm that is based upon RGB image datasets that provides an excellent classification rate which promises upliftment for research in the upcoming future is presented.
Abstract: In this paper, we present an finger point based signed language symbol to text identification and classification algorithm that is based upon RGB image datasets. The palm sized images based upon different sizes, backgrounds, orientation are captured to be preprocessed as per the requirements of developing a convolution neural network based algorithm. This algorithm utilizes Alexnet for the preprocessing requisites where in 47 symbols of Devanagari script are augmented based on the reference rulebook created for our requirements as highlighted in the paper. At the primary level this algorithm provides an excellent classification rate which promises upliftment for our research in the upcoming future. We have provided detailed steps and discussion on the classification parameters considered for our algorithm which is implemented on MATLAB platform with the help of machine learning solution libraries.

Journal ArticleDOI
TL;DR: In this paper , the authors presented handwritten isolated characters of the Devanagari script, which contains ten numerals, 13 vowels, and 33 consonants, and collected samples are digitized and pre-processed.

Proceedings ArticleDOI
24 Nov 2022
TL;DR: In this article , a single-trial P300 detection using compact CNN architecture with dilated convolution (D-EEGNet) was proposed, which achieved a classification accuracy of 80.86 % for a Devanagari Script-based P300 speller.
Abstract: P300 speller is a well-known Brain-Computer Interface (BCI) application that allows users to spell words using cognitive ability and establishes a pathway between the human mind and a computer. P300 detection is the most crucial stage in the design of the P300 character speller. However, present Convolutional Neural Network (CNN) architectures hinder the use of CNNs in portable BCIs as they restrict future accuracy improvements of P300 detection and require significant complexity to attain competitive accuracy. Furthermore, the multi-trial approach adopted in most of the recent works is a major bottleneck in the real-time implementation of such a speller. To deal with both issues, the authors propose a single trial P300 detection using compact CNN architecture with dilated convolution (D-EEGNet). The proposed model with 1066 parameters achieves a classification accuracy of 80.86 % for a Devanagari Script-based P300 speller. Apart from lessening the trainable parameters, D-EEGNet also reduces computational complexity. Moreover, the proposed model demonstrates the ability to deal with high variance often encountered in single-trial detection.



Proceedings ArticleDOI
26 Nov 2022
TL;DR: In this article , transfer learning was used for feature extraction and classification of handwritten Devanagari characters, where a Deep CNN with Support Vector Machine (SVM) is used as a concept of transfer learning, where the features extracted by the Deep CNN are fed to SVM.
Abstract: This paper proposes how Transfer Learning can be utilized in recognition of handwritten Devanagari characters. A Deep Convolution Neural Network (Deep CNN) is used for features extraction and classification, whereas Deep CNN with Support Vector Machine (SVM) is used as a concept of Transfer Learning, where the features extracted by the Deep CNN are fed to SVM. Experiments conducted on real-world dataset show that the proposed approach not only achieved 99.413% accuracy on test set but also reduced the number of parameters used for classification.

Book ChapterDOI
01 Jan 2022
TL;DR: In this article , performance of three thinning algorithms developed by Zhang-Suen [ZSu], Guo-Hall [GHa] and Lee-Kashyab-Chu [LKC] has been analyzed to check their suitability to skeletonize handwritten Devanagari words in terms of various objective (reduction rate, sensitivity measurement and thinness measurement) and subjective (mean opinion score) performance metrics.
Abstract: AbstractThinning or skeletonize is useful preprocessing step in pattern recognition systems such as character or word recognition system. In past few decades, various thinning algorithms are reported in literature for such type of systems to make them more reliable, independent of font variations and efficient. In this paper, performance of three thinning algorithms developed by Zhang-Suen [ZSu], Guo-Hall [GHa] and Lee-Kashyab-Chu [LKC] has been analyzed to check their suitability to skeletonize handwritten Devanagari words in terms of various objective (reduction rate, sensitivity measurement and thinness measurement) and subjective (mean opinion score) performance metrics. For the present work, the performance these algorithms has been tested using a handwritten Devanagari words database having 15-word-classes, collected from hundreds of writers. It has been observed that [LKC] thinning algorithm achieved higher reduction rate, thinness measurement and mean opinion score as compared with [ZSu] and [GHa] algorithms which show its more-suitability to skeletonize handwritten Devanagari words. Moreover, slightly-higher value of sensitivity measurement also depicts that resultant skeleton may contain some artifacts, redundant branches and lines caused by noise.KeywordsThinningSkeletonFeature extractionClassificationHandwritten word recognition