scispace - formally typeset
Search or ask a question
Topic

Devanagari

About: Devanagari is a research topic. Over the lifetime, 655 publications have been published within this topic receiving 7428 citations. The topic is also known as: Deva nagari & Hindi Script.


Papers
More filters
Proceedings ArticleDOI
01 Oct 2016
TL;DR: From the comparative study of the word recognition results, it is noted that dominant point based local feature extraction provides best accuracies for both Bengali and Devanagari scripts.
Abstract: This paper presents a comparative study of three feature extraction approaches for online handwritten word recognition of two major Indic scripts-Bengali and Devanagari using Hidden Markov Model (HMM). First approach uses feature extraction from whole stroke without local zone division after segmenting the word into its basic strokes. Whereas, other two approaches consider the segmentation of a word into its basic strokes and a local zone wise analysis of each online stroke. Among these two zone wise local features, one takes into account structural and directional features and other uses dominant points, detected from strokes using slope angles, to find the local features. These features are studied in HMM-based word recognition platform. From the comparative study of the word recognition results, we have noted that dominant point based local feature extraction provides best accuracies for both Bengali and Devanagari scripts. We have obtained 90.23% and 93.82% accuracies for Bengali and Devanagari scripts respectively.

13 citations

Posted Content
TL;DR: The authors evaluated CNN, LSTM, ULMFiT, and BERT based models on two publicly available Marathi text classification datasets and presented a comparative analysis on word-based and pre-trained word embeddings by Facebook and IndicNLP.
Abstract: The Marathi language is one of the prominent languages used in India. It is predominantly spoken by the people of Maharashtra. Over the past decade, the usage of language on online platforms has tremendously increased. However, research on Natural Language Processing (NLP) approaches for Marathi text has not received much attention. Marathi is a morphologically rich language and uses a variant of the Devanagari script in the written form. This works aims to provide a comprehensive overview of available resources and models for Marathi text classification. We evaluate CNN, LSTM, ULMFiT, and BERT based models on two publicly available Marathi text classification datasets and present a comparative analysis. The pre-trained Marathi fast text word embeddings by Facebook and IndicNLP are used in conjunction with word-based models. We show that basic single layer models based on CNN and LSTM coupled with FastText embeddings perform on par with the BERT based models on the available datasets. We hope our paper aids focused research and experiments in the area of Marathi NLP.

13 citations

01 Jan 2009
TL;DR: A two-stage system for handling the extra space insertion problem in Urdu has been presented, which is the first time such a system has been developed for Urdu script.
Abstract: Hindi and Urdu are variants of the same language, but while Hindi is written in the Devanagari script from left to right, Urdu is written in a script derived from a Persian modification of Arabic script written from right to left. To break the script barrier an Urdu-Devnagri transliteration system has been developed. The transliteration system faced many problems related to word segmentation of Urdu script, as in many cases space is not properly put between Urdu words. Sometimes it is deleted resulting in many Urdu words being jumbled together and many other times extra space is put in word resulting in over segmentation of that word. In this paper, a two-stage system for handling the extra space insertion problem in Urdu has been presented. In the first stage, Urdu grammar rules have been applied, while a statistical based approach has been employed in the second stage. For statistical analysis, lexical resources from both Urdu and Hindi languages, including Urdu and Hindi unigram and bigram probabilities have been used. In addition the Urdu-Devnagri transliteration module is also executed in parallel to help in decision making. The system was tested on 1.84 million word Urdu corpus and the success rate was 98.57%. This is the first time such a system has been developed for Urdu script.

13 citations

Journal ArticleDOI
TL;DR: A method of extracting root words from Devanagari script document which can be used for information retrieval, text summarization, text categorization, ontology building etc, and the accuracy of designed morphological analyzer is up to 96%.
Abstract: In India, more than 300 million people use Devanagari script for documentation. In Devanagari script, Marathi and Hindi are mainly used as primary language of Maharashtra state and national language of India respectively. As compared with English script, Devanagari script is reach of morphemes. Thus the lemmatization of Devanagari script is quite complex than that of English script. There is lack of resources for Devanagari script such as WordNet, ontology representation, parsing the keywords and their part of speech. Thus the overall task of information retrieval becomes complex and time consuming. Devanagari script document always carries suffixes which may cause problem in accurate information retrieval. We propose a method of extracting root words from Devanagari script document which can be used for information retrieval, text summarization, text categorization, ontology building etc. An attempt is made to design the Morphological Analyzer for Devanagari script. We have designed CORPUS containing more than 3000 possible stop words and suffixes for Marathi language. Morphological Analyzer can acts as a preliminary stage for developing any information retrieval application in Devanagari script. We have conducted the experiments on randomly selected Marathi documents and we found the accuracy of designed morphological analyzer is up to 96%.

13 citations

Journal ArticleDOI
TL;DR: A comprehensive review has been reported for online handwriting recognition of non-Indic and Indic scripts and an effort has been made to provide the list of publicly available online handwritten dataset for various scripting languages.
Abstract: Handwriting recognition is one of the challenging tasks in the area of pattern recognition and machine learning. Handwriting recognition has two flavors, namely, Offline Handwriting Recognition and Online Handwriting Recognition. Though, saturation level has been achieved in machine printed (Offline) character recognition. Presently, due to dramatical development in IT sector, touch-based devices are available in the market with efficient processing capabilities. With this revolution, research in the area of handwriting recognition has become more popular in real-time (Online) mode. In this paper, a comprehensive review has been reported for online handwriting recognition of non-Indic and Indic scripts. The six non-Indic-scripts and eight Indic script namely, Arabic, Chinese, Japanese, Persian, Roman, Thai, and, Assamese, Bangla, Devanagari, Gurmukhi, Kannada, Malayalam, Tamil, Telugu, respectively have been considered in this article. This study comprises introduction of online handwriting recognition process, various challenges, motivations, feature extraction, and classification methodologies, used for recognizing the various scripting languages. Moreover, an effort has been made to provide the list of publicly available online handwritten dataset for various scripting languages. This study also provides the recognition and beneficial assistance to the novice researchers in field of handwriting recognition by providing a nut shell studies of various feature extraction strategies and classification techniques, used for the recognition of both Indic and non-Indic scripts.

13 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
77% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Image segmentation
79.6K papers, 1.8M citations
74% related
Convolutional neural network
74.7K papers, 2M citations
74% related
Encryption
98.3K papers, 1.4M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202342
202298
202148
202061
201938
201843