Topic
Devanagari
About: Devanagari is a research topic. Over the lifetime, 655 publications have been published within this topic receiving 7428 citations. The topic is also known as: Deva nagari & Hindi Script.
Papers published on a yearly basis
Papers
More filters
••
05 Dec 2014
TL;DR: The authors used back transliteration to reduce spelling variations, and a set of hand-tailored rules for consonant mapping to take care of breaking and joining of transliterated words, and implemented query labeling of mixed script content using a supervised learning approach where an SVM classifier was trained using character n-grams as features for language identification.
Abstract: Much of the user generated content on the internet is written in their transliterated form instead of in their indigenous script. Due to this search engines receive a large number of transliterated search queries.This paper presents our approach to handle labelling of queries and ad hoc retrieval of documents based on these queries, as part of the FIRE2014 shared task on transliterated search. The content of each document is written in either the native Devanagari script or its transliterated form in Roman script or a combination of both. The queries to retrieve these documents can also be in mixed script. The task is challenging primarily due to the spelling variations that occur in the transliterated form of search queries. This particular problem is addressed by using back transliteration to reduce spelling variations, and a set of hand-tailored rules for consonant mapping. Sub-word indexing is done to take care of breaking and joining of transliterated words. Implementation of query labelling of the mixed script content was done using a supervised learning approach where an SVM classifier was trained using character n-grams as features for language identification. A Naive Bayes classifier was used for classifying transliterated words that can belong to both Hindi and English when looked at individually.The 2 runs submitted by our team (BITS-Lipyantaran) performs best across all metrics for Subtask 2 among all the teams that participated, with a MRR score of 0.8171 and MAP score of 0.6421.
5 citations
••
01 Sep 2017TL;DR: Scale invariant Feature Transformation (SIFT) based script identification has been proposed and overall accuracy reported is 97.65% and 96.71% for bi-script and tri-scripts, respectively.
Abstract: Automatic identification of scripts from document images helps selecting appropriate OCR for character recognition and content retrieval. In this paper, Scale invariant Feature Transformation (SIFT) based script identification has been proposed. Features are extracted using SIFT approach at word level (two, three or more character words) and KNN classifier has been used to recognize the script. Experiments are performed by extracting the words from document images consisting of English, Kannada, and Devanagari scripts. Overall accuracy reported for the proposed system is 97.65% and 96.71% for bi-script and tri-scripts, respectively.
5 citations
••
01 Jan 2015TL;DR: The paper is about the application of mini minibatch stochastic gradient descent (SGD) based learning applied to Multilayer Perceptron in the domain of isolated Devanagari handwritten character/numeral recognition.
Abstract: The paper is about the application of mini minibatch stochastic gradient descent (SGD) based learning applied to Multilayer Perceptron in the domain of isolated Devanagari handwritten character/numeral recognition. This technique reduces the variance in the estimate of the gradient and often makes better use of the hierarchical memory organization in modern computers. L2-weight decay is added on minibatch SGD to avoid overfitting. The experiments are conducted firstly on the direct pixel intensity values as features. After that, the experiments are performed on the proposed flexible zone based gradient feature extraction algorithm. The results are promising on most of the standard dataset of Devanagari characters/numerals.
5 citations
••
5 citations
••
16 Dec 2020TL;DR: In this article, a narrative machine learning algorithm for classification of Odia handwritten characters using Naive Bayes and Decision Table in Waikato Environment for Knowledge Analysis (WEKA) environment has been implemented.
Abstract: Optical Character Recognition (OCR) is a burning technology to recognize text inside images in the current era, such as: scanned documents and photos. OCR technology is used to convert virtually any kind of images containing written text (handwritten or printed) into machine-readable text data. Several research work have been done on the recognition of different foreign languages such as Chinese, English and Japanese Scripts. In India, there are 22 official languages such; Marahati, Punjabi, angala, Odia, etc. Odia is one of the majorly spoken languages of Odisha, a premier Eastern state of India. Many Indian scripts languages have been researched and yielded results of good accuracy rate such as Devanagari, Telugu scripts. There is a need for research on the languages of the Eastern part of the country such as Odia. In this paper it has been implemented for data preprocessing and classification model for offline odia handwritten character with and without noise. Hence this research work has been strived towards buildout of a narrative machine learning algorithm for classification of Offline Odia handwritten Character using Naive Bayes and Decision Table in Waikato Environment for Knowledge Analysis (WEKA) environment. It has been observed noiseless character is better than the noise character in both classification techniques such as: Naive Bayes and Decision Table.
5 citations