scispace - formally typeset
Search or ask a question

Showing papers on "Devanagari published in 2023"



Posted ContentDOI
22 Mar 2023
TL;DR: In this article , a point-based algorithm for identifying dynamic air writing and free hand gesture recognition for Devanagari Hindi characters is presented, where the similarity of the drawn gesture with the training samples to identify and recognize the gesture drawn using Mahalanobis distance.
Abstract: Abstract Air writing and gesture recognition has been one of the most comprehensive and intensively researched areas due to their wide variety of applications in several domains of Human-Computer Interaction (HCI),such as gaming, medicines, and automobiles. This article presents ahighly accurate point-based algorithm for identifying dynamic air writing and free hand gesture recognition for Devanagari Hindi characters.Usually, the written gestures are limited in nature. In order to create a training data set, we are using Leap Motion Controller (LMC) that generates twenty-four sample points from the gesture drawn. In the pre-processing stage, a trimesh is generated using these sample points; the developed trimesh is then fed as input to PointNet. The algorithm developed measures the similarity of the drawn gesture with the training samples to identify and recognize the gesture drawn using Mahalanobis distance. The validation test indicates that the approach is quite accurate, giving a recognition rate of more than 97%. Comparative studies also show that the methodology performs better when trimesh is used as input for the PointNet and more training samples are used.

1 citations


Journal ArticleDOI
TL;DR: In this paper , the authors used a CapsNet-based method to recognize the handwritten Devanagari characters from the manuscripts and achieved the best recognition accuracy of 94.6%.
Abstract: Manuscripts serve as a wealth of knowledge for future generations and are a useful source of information for locating material from the Middle Ages. Ancient manuscripts can be found in handwritten form, thus they must be translated into digital form so that computing equipment can access them and additional indexing and search operations can be performed with ease. Manuscript recognition is already possible using a variety of methods. Regional languages like Devanagari, Gurmukhi, Sanskrit, etc., however, have very few methods available. In this study, the Devanagari characters from the manuscripts is recognised using a CapsNet-based method. 33 fundamental characters, 3 conjuncts, and 12 modifiers make up the Devanagari alphabet. The complete dataset is divided into 399 classes for the recognition of basic, modifiers, and conjunct characters. Due to spatial relationship, CapsNet is used to recognize the handwritten characters. The proposed model was run using 10:70, 20:80, and 30:70 as test: train ratio of characters. Also, the number of epochs was varied for better recognition accuracy. The authors observed the best recognition accuracy of 94.6% was achieved to recognize the Devanagari characters using CapsNet.

1 citations


Proceedings ArticleDOI
03 Mar 2023
TL;DR: This paper analyzed text representation by the code-mixed and code-switching of two different languages of separate scripts like English Roman script and Hindi Devanagari through the mixed script.
Abstract: As social media networks have grown in prominence in recent years, we have seen a transformation in how we live our lives. People in multilingual societies are increasingly using social media platforms. Research communities have recently begun using code-mixed data to accomplish NLP tasks involving multiple languages. This paper analyzes text representation by the code-mixed and code-switching of two different languages of separate scripts like English Roman script and Hindi Devanagari through the mixed script. Word-level language Identification. Quantifying the pattern of sentence/Text written in which language and which word is ambiguous same spelling two to three meanings in a mixed script, and spelling variation normalization of Hindi Roman one has the different spelling used for daily communications by users using word embedding techniques (word2vec, TF-IDF, skip-gram, Continuous bag of words(CBOW)). An approach that uses character-based embedding to process ambiguous words in a code-mixed text has been proposed and shows promising results in terms of spelling variation and language identification.


Proceedings ArticleDOI
05 May 2023
TL;DR: In this article , a deep learning model was used to transliterate the ancient Tamil inscriptions (Vatteluttu Script), which can be extended further to other languages.
Abstract: Ancient inscriptions, palm scripts, manuscripts, etc., have vital information about India's rich culture. Recognition and understanding of these inscriptions have been challenging for epigraphers and professionals. The goal of the proposed research is to advance optical character recognition methods for archival Vatteluttu script inscriptions, which date back to the 4th or 5th century AD. This paper discusses a deep learning model to transliterate the ancient Tamil inscriptions (Vatteluttu Script), which can be extended further to other languages. The proposed work is beneficial to epigraphists, archaeological researchers, and the general public who are interested in this topic. The developed deep learning model has achieved an accuracy of 84.12%.

Proceedings ArticleDOI
03 Mar 2023
TL;DR: In this article , a machine learning-based Hindi character recognition using deep learning techniques is presented, where the user can gesticulate a Hindi alphabet in front of a webcam and the machine will recognize which letter is being completed.
Abstract: Convolutional Neural Network in Deep learning is a type of deep neural networks, generally put in an application to analyze visual images. The project name entitled “Devnagari Lipi Recognition using Deep Learning Techniques” is a machine learning-based project in which we are recognizing Hindi characters by gesticulating a Hindi “Akshar” or alphabet in front of our webcam the machine will recognize which letter is being completed. For the project, we will train our machine with all the Devanagari alphabets and after the training, our machine will get expertise in recognizing Hindi alphabets in no time. Not only the machine will recognize the character but it will tell the user how to pronounce it in English by writing its English pronunciation on the screen.

Proceedings ArticleDOI
10 Apr 2023
TL;DR: In this article , the authors proposed a bidirectional Encoder Representations from Transformers (BERT) based contextual embedding technique with a concatenation of emoji2vec Embedings to classify social media posts in Hindi Devanagari script as hostile or non-hostile.
Abstract: Detection of hostile content from social media posts (FacebookTM, TwitterTM etc.) is a demanding task in the field of Natural Language Processing (NLP). Daily growing nature of hostile content in different electronic media opened up new challenges in language understanding. It becomes more difficult in regional languages. AI-based solution is required to identify hostile content on a large scale. Though a satisfactory amount of researches has been carried out in the English language, finding hostile content in regional languages is still under progress due to unavailability of suitable datasets and tools. In terms of the number of speakers, Hindi ranks third in the world and first in the Indian Subcontinent. The objective of the article is to design hostile content detection system in Hindi language using coarse-grained (binary) classification and fine-grained (multi-class, multi-label) classification. We noted that different baseline learning method with different pre-trained language models perform differently. Using the Constraint 2021 Hindi Dataset, this research proposes a Bidirectional Encoder Representations from Transformers (BERT) based contextual embedding technique with a concatenation of emoji2vec Embedings to classify social media posts in Hindi Devanagari script as hostile or non-hostile. Additionally, for the fine-grained tasks where hostile posts are sub-categorized as defamation, fake, hate, and offensive, we develop an Ensemble Classifier varying different learning methods and embedding models. With an F1-Score of 0.9721, it is found that our proposed Indic-BERT+emoji model outperforms the baseline model and other existing models for the coarse-grained task. We have also observed that our proposed Ensemble method is giving good results than the existing models and the baseline model for the fine-grained tasks with F1-Score of 0.43, 0.82, 0.58 and 0.62 for defamation, fake, hate, and offensive classes respectively. The code and the data are available in https://github.com/skarifahmed/hostile.

Book ChapterDOI
01 Jan 2023
TL;DR: In this article , a convolutional neural network (CNN) architecture was proposed to identify six word-level handwritten scripts involving Arabic, Latin, Chinese, Bangla, Devanagari and Telugu.
Abstract: In this work, we propose a convolutional neural network (CNN) architecture to identify six word-level handwritten scripts involving Arabic, Latin, Chinese, Bangla, Devanagari and Telugu. A large dataset of 14k word images per script was constructed based on several public handwritten datasets. Then, three architectures are proposed and compared based on standard metrics performance and time execution. Experiments conducted on both test and validation classification show high performances that outperform the state-of-art techniques. Indeed, the best result was provided by CNN model with three-convolutional-polling pairs layers that achieved an average script identification accuracy of 97.67% and ran in a sufficiently fast time of 2 ms per frame during the test phase.


Journal ArticleDOI
TL;DR: In this article , the shifted window (SWIN) transformer method was used to recognize handwritten Devanagari numerals for the first time, achieving a recognition accuracy of 99.20% with only 0.218 million trainable parameters and 0.0912 giga floating-point operations per second (FLOPs).
Abstract: The broad application area and accompanying challenges make machine learning-based recognition of handwritten scripts a demanding field. Individuals’ writing practices and inherent variations in the size, shape, and tilt of characters may increase the difficulty level. Deep convolutional neural network (DCNN) models have been successful in solving pattern recognition problems, but at the expense of a considerable number of trainable parameters and heavy computational loads. The proposed work addresses these problems by using the shifted window (SWIN) transformer method to recognize handwritten Devanagari numerals for the first time. In the presented model, the SWIN transformer is finely tuned to withstand popular DCNN models, such as VGG-16Net, ResNet-50, and DenseNet-121, in terms of recognition accuracy, space requirement, and computational complexity. The model successfully attained a recognition accuracy of 99.20% with only 0.218 million trainable parameters and 0.0912 giga floating-point operations per second (FLOPs). This indicates the validity and soundness of the proposed model for recognizing handwritten Devanagari numerals.



Proceedings ArticleDOI
03 Mar 2023
TL;DR: In this paper , a hybrid approach where some part of the text is converted using a rule base and in case an ambiguity arises then a probabilistic model is used to resolve the same.
Abstract: In this paper, we have shown a script conversion (transliteration) technique that converts Sindhi text in the Devanagari script to the Perso-Arabic script. We showed this by incorporating a hybrid approach where some part of the text is converted using a rule base and in case an ambiguity arises then a probabilistic model is used to resolve the same. Using this approach, the system achieved an overall accuracy of 99.64%.

Journal ArticleDOI
23 May 2023
TL;DR: In this paper , the authors used Convolutional Neural Network (CNN) for character recognition in Devanagari script and achieved a 98% accuracy on 49 characters of Hindi and Nepali.
Abstract: Devanagari script is widely used across India. It forms many languages like Hindi, Marathi, Nepali and Sanskrit languages. As the Devanagari characters are similar to the hindi character the national language of India. It is important to recognize the characters to understand the message that particular tries to tell. The automatic character recognition system is thus developing for the Devanagari script. The character recognition process converts an image of a character into machine-readable format also its English corresponds. In this paper, we are using Convolutional Neural Network for developing the character recognition system. Convolutional neural network learns directly from data. It is a type of Deep learning neural network architecture. CNN is useful as it does not require any human intervention and performs the identification of important features on its own. The proposed paper uses a CNN algorithm applied to a dataset of 49 characters of Devanagari script. The dataset contains of total 4018 Images. The algorithm of the Convolutional Neural Network is applied to train the dataset. The input image to be predicted is first preprocessed and then the model predicts the output result. The system is designed in Jupyter Lab using Python. The Convolutional Neural Network model's overall accuracy is 98%.

Journal ArticleDOI
Zahra Cheraghi1
TL;DR: The MAO College of Aligarh was founded by Sayyid Ahmad Khan (1817-98) as mentioned in this paper , who argued throughout his journey that the use of Urdu was even more extensive than that of French in Europe, contrasting it with Hindi.
Abstract: In 1869–70, the celebrated South Asian Muslim intellectual Sayyid Ahmad Khan (1817–98) visited Egypt on his way to England. Khan, one of South Asia's most renowned Muslim thinkers, was the founder of the Muhammadan Anglo-Oriental College (est. 1875; hereafter MAO College), a higher education institution in the North Indian town of Aligarh modeled after Oxbridge. Responding to intensified efforts by Hindu organizations to elevate the status of Devanagari-script Hindi to that of Urdu in Indian provincial courts, Khan argued throughout his journey that the use of Urdu was even more extensive than that of French in Europe, contrasting it with Hindi, which he “did not find anywhere.” In his view, Urdu was a clear and simple language that facilitated connections between diverse peoples, unlike Hindi.

Journal ArticleDOI
TL;DR: In this paper , a text recognition system for MODI text is presented. But, it is not explained with the help of a generalized text recognition model, which includes image acquisition, normalization, binarization, segmentation, feature extraction, training, and classification lastly recognized image.
Abstract: MODI script is an ancient language of the Marathi people. MODI script is used to write the Marathi language, which is the mother language of Maharashtra, India. To understand this ancient language here we analyze text recognition techniques. MODI script was used primarily by administrative people to keep their accounts, as well as most of the revenue documents, were written in MODI language. For recognition of such text, number of image processing techniques are used. The official scriptures of Goa were previously written in this 17th-century Balbodh style of Devanagari, which is currently being restored. It is now a practical visual reminder of the former Maratha era and a specialized research skill; it is a technological key required primarily to access the Maratha state's empirical history through these archive resources. This paper explains the previous analysis of MODI text recognition. MODI text recognition system is well explained with the help of a generalized text recognition system model. The model includes image acquisition, normalization, binarization, segmentation, feature extraction, training, and classification lastly recognized image. Numerous optional strategies that have been used in various identification systems are available for each level. The history of the Maratha dynasty and other significant facts can be revealed in numerous MODI manuscripts by using these different techniques to identify MODI characters. Also, some applications of text recognition are explained in this paper.

Journal ArticleDOI
TL;DR: In this paper , the authors surveyed various methods to detect and recognize the vehicle number plate depending on specified applications and proposed the machine learning approach to recognize the character from fancy number plate.
Abstract: – It is important to ensure that all vehicles have a standard number plate as required by law to ensure proper identification and safety on the roads. The vehicle number plate recognition systems are a useful tool for improving security, traffic management, and overall efficiency in various industries. Currently, many techniques have been developed to detect vehicle number plate and recognize the characters but work related to fancy number plate detection specifically in Devanagari script is untouched yet. This paper surveyed various methods to detect and recognize the vehicle number plate depending on specified applications. We implemented the system using embedded devices and developed the number plate localization method. Finally, we proposed the machine learning approach to recognize the character from fancy number plate.


Posted ContentDOI
14 Mar 2023
TL;DR: In this paper , a generic video camera-aided convolutional neural network (CNN) based air-writing framework was proposed to recognize isolated unistroke numerals of multiple languages.
Abstract: Air-writing refers to virtually writing linguistic characters through hand gestures in three-dimensional space with six degrees of freedom. This paper proposes a generic video camera-aided convolutional neural network (CNN) based air-writing framework. Gestures are performed using a marker of fixed color in front of a generic video camera, followed by color-based segmentation to identify the marker and track the trajectory of the marker tip. A pre-trained CNN is then used to classify the gesture. The recognition accuracy is further improved using transfer learning with the newly acquired data. The performance of the system varies significantly on the illumination condition due to color-based segmentation. In a less fluctuating illumination condition, the system is able to recognize isolated unistroke numerals of multiple languages. The proposed framework has achieved 97.7%, 95.4% and 93.7% recognition rates in person independent evaluations on English, Bengali and Devanagari numerals, respectively.


Journal ArticleDOI
TL;DR: In this paper , a multinomial Naive Bayes algorithm is used for detecting languages in Devanagari like Marathi, Sanskrit and Hindi, and three European languages French, Italian and English.
Abstract: Language Identification is among the crucial steps in any NLP based application. Text - based documents and webpages are rapidly increasing in the modern Internet. It is simple to locate documents written in different languages from all across the world that are available with just one click. Therefore, a language identifier is absolutely necessary in order to help the user interpret the content. Language identification has so far tended to be more concentrated on European languages and is still rather limited for Indian Traditional Languages. Many researchers have become more interested in the study of language identification for similar languages from popular languages. In this paper, Multinomial Na¨ıve Bayes Algorithm is used for detecting languages in Devanagari like Marathi, Sanskrit and Hindi, and three European languages French, Italian and English. An experiment done ondatasets of each language has produced satisfactorily accurate results after training and testing the model.

Posted ContentDOI
04 Apr 2023
TL;DR: The authors developed language models for the Sanskrit language, namely Bidirectional Encoder Representations from Transformers (BERT) and its variants: A Lite BERT (ALBERT), and Robustly Optimized BERT(RoBERTa) using Devanagari Sanskrit text corpus.
Abstract: In this work, we develop language models for the Sanskrit language, namely Bidirectional Encoder Representations from Transformers (BERT) and its variants: A Lite BERT (ALBERT), and Robustly Optimized BERT (RoBERTa) using Devanagari Sanskrit text corpus. Then we extracted the features for the given text from these models. We applied the dimensional reduction and clustering techniques on the features to generate an extractive summary for a given Sanskrit document. Along with the extractive text summarization techniques, we have also created and released a Sanskrit Devanagari text corpus publicly.

Book ChapterDOI
01 Jan 2023
TL;DR: In this paper , handwritten Hindi characters are recognized using a convolutional neural network that is CNN-based technique, which can be used in a variety of ways and stored digitally on your computer.
Abstract: In this study, handwritten Hindi characters are recognized using a convolutional neural network that is CNN-based technique. After being recognized, the characters can be used in a variety of ways and stored digitally on your computer. The characters in these pictures are all written in the Devanagari script. Every one of the 46-character classes has 2000 patterns. The training set makes up 85% of the data set, whereas the test set makes up 15%. OCR systems’ classification algorithms can be tested using image data sets. The highest Curacy score in the test set was 98.47. For the development and testing of handwritten text recognition systems, it offers a sizable collection of Devanagari handwriting styles created by numerous authors. Three completely connected detection layers are added after his four CNN layers. The input takes the form of a handwritten image in grayscale. Utilize filters to extract unique information from each layer in your photos. Convolution is used to achieve this. The processes of bunching and flattening are also crucial. A fully linked layer receives the output of the CNN layer and processes it. The character with the highest score is shown as the outcome after computing the chance or probability value for each character. There are 98.94 curates for acknowledgment. For this goal, there are already models that are similar, but the new model was more effective and precise than some of the earlier versions.

Journal ArticleDOI
TL;DR: In this paper, the authors tried to design text classifier for Hindi text documents and tried to show how stemmer affects the performance of Hindi text classifiers and applied various stemmers on Hindi text classification models.
Abstract: Abstract. Text classification is very useful to search large amount of textual data available online by dividing it into smaller relevant units. Now a day’s large amount of digital documents are available in Indian languages. Designing text classifiers in Indian languages is one of the research areas so that people can search and read required documents in their local languages. In proposed work tried to design Text classifier for Hindi text documents and tried to show how stemmer affects the performance of Hindi text classifiers. Stemming is a process to convert words in any language to its base or root words. Stemmers are used for written documents not for spoken languages. Performance of many applications such as text summarization, Information Retrieval (IR) system,text classification systems, syntactic parsing can be improved by applying stemmers. Stemmer eliminates suffix or prefix of the word and form original root word. These root words helps in the preprocessing step required in many algorithms. We applied various stemmers on Hindi text classification models. Experiments and results show that performance of the classifiers is improved by applying stemmers.

Proceedings ArticleDOI
23 Mar 2023
TL;DR: In this article , a variety of characteristics of text-to-speech (TTS) model training and transfer learning with a phonetic language script was studied. But, the authors emphasized that even with commercial TTS solutions, the pronunciation of Indian proper nouns contained in English text is a difficulty, which has been addressed in the current framework.
Abstract: This study looks into a variety of characteristics of text-to-speech (TTS) model training and transfer learning with a phonetic language script. When compared to training using a non-phonetic script, training with a phonetic language script appears to speed up the process. In both Hindi and Indian English evaluation data, transfer learning with phonetic writing (in Devanagari script) leads to remarkable improvements in the number of pronunciation errors (87.5% and 16.67% relative). As a result, a case is made for publishing models pre-trained on phonetic language transcripts rather than the models that have been pretrained on non-phonetic transcripts for the purpose of transfer learning. Going further, a mix-lingual training approach is also explored to solve the problem of transforming code-switched text to speech. It is emphasized that even with commercial text-to-speech solutions, the pronunciation of Indian proper nouns contained in English text is a difficulty, which has been addressed in the current framework.

Proceedings ArticleDOI
03 Mar 2023
TL;DR: In this article , an innovative, efficient, and real-time handwritten text-to-speech conversion technique for the Devanagari script was introduced, which combines the concept of Optical Character Recognition (OCR) and Text to Speech Synthesizer (TTS).
Abstract: In this paper, we are introducing an innovative, efficient, and real-time handwritten text-to-speech conversion technique for the Devanagari script. It combines the concept of Optical Character Recognition (OCR) and Text to Speech Synthesizer (TTS). This type of system can be helpful for visually impaired persons, for reading the number plates of a vehicle, and for medical applications. Text extraction from colored images is a challenging task in computer vision, for that, we have trained our model using Convolution Neural Network (CNN) with 1000s of different handwriting styles written by different people and of different age groups for each character. A trained model has been used for the recognition of input characters which was taken from the gesture movement of a fingertip with blue ink. The system is developed in Python 3.6.0.

Book ChapterDOI
01 Jan 2023
TL;DR: In this article , the authors highlight the Machine Learning techniques applied to handwriting character recognition using network models built by using Deep Learning Techniques. And they have experimented Devanagari Character database with two different architectures GoogLeNet and AlexNet to evaluate the performance and achieve highest accuracy.
Abstract: AbstractDeep Learning is a growing set of approaches for extracting useful information and knowledge from large amounts of data. Deep Learning research and tools have focussed on commercial sector applications. Only a fewer Deep Learning research have focussed on scientific data. This paper highlights the Machine Learning techniques applied to handwriting character recognition using network models built by using Deep Learning Techniques. Handwriting Recognition is now getting attention for researchers to assist technology for visually impaired, blind and human-robots for business documents. We have experimented Devanagari Character database with two different architectures GoogLeNet and AlexNet to evaluate the performance and achieve highest accuracy.

Proceedings ArticleDOI
23 Mar 2023
TL;DR: This article used GANs for transliterating ancient Indian script characters into present day devanagari script characters, which can be useful for preserving and digitizing ancient texts, and making them more accessible to modern readers.
Abstract: This study aims to investigate the application of Generative Adversarial Networks (GANs) for transliterating ancient Indian script characters into present day devanagari script characters. The ancient scripts Nandinagari and Sharda are known for their ornate and complex forms, making it a challenge to read and understand for modern readers. The goal of this research is to use GANs to train a model that can convert ancient Indian script images like Nandinagari and Sharda to present day Devanagari script images, which can be useful for preserving and digitizing ancient texts, and making them more accessible to modern readers. The study will evaluate the ability of Generative Adversarial Networks to accurately recognize and transliterate ancient script characters, and will provide insights into the potential of this approach for preserving cultural heritage and digitizing and transliterating historical texts to modern scripts.

Posted ContentDOI
27 Apr 2023
TL;DR: In this paper , a digital handwritten Devanagari alphabets and numerals were used to build a dataset of 44,000-character images in the (dot).JPEG format.
Abstract: Abstract The offline technique was used to build the datasets of Devanagari alphabets and numerals. In the offline character dataset, the writer writes on the pages, which are then optically or digitally scanned, requiring an image processing technique to remove noise. In this paper, we designed, I. an online digital handwritten Devanagari dataset, II. To recognize the dataset by using the deep learning technique CNN. Dataset design-i. Devanagari alphabets, 34 (4 vowels and 30 consonants), and 10 numerals total of 44 characters are selected. ii. Designed a Canvas using Python Jupiter to draw an image, the writer uses an input tool, like a mouse, and the data is recorded in digital form.iii. One hundred individuals of varied ages are chosen, and the total number of characters produced by each person is 440. A dataset of 44,000-character images is produced.Each image has a grayscale data type and is in the (dot).JPEG format. The image requires 2 kb of storage and had a resolution of 65 by 65 pixels. Converted images dataset into Comma Separated Values (CSV) files/ dataset, the dataset is divided into 70 % for training and 30 % for testing. Published the same on the IEEE data port and Mendeley repository Now the objective is to recognize our dataset using the deep learning concept Convolution Neural Network is applied to extract features and classify the input images, recognizing digital Devanagari characters. There are 44 classes, image size is (65,65,1). In this model, two Conv2D, a MaxPooling2D, a Flatten, a Dropout, and a Dense (fully connected) layers are used. The CNN-trained model achieved recognition accuracy of 95.00 %,95.45 %, and 96.00 % using 10,20, and 30 epochs. The model with 30 epochs has higher recognition accuracy, tested 100 images on trained 1000 images, of each class and designed a confusion matrix resulting in the accuracy rating ranging from 0.994 to 1.0. However, a multi-class accuracy score of above 0.99 is regarded as outstanding, and a high accuracy score is generally desired in any classification task.