scispace - formally typeset
Search or ask a question
Topic

Devanagari

About: Devanagari is a research topic. Over the lifetime, 655 publications have been published within this topic receiving 7428 citations. The topic is also known as: Deva nagari & Hindi Script.


Papers
More filters
Proceedings ArticleDOI
05 Dec 2014
TL;DR: The authors used back transliteration to reduce spelling variations, and a set of hand-tailored rules for consonant mapping to take care of breaking and joining of transliterated words, and implemented query labeling of mixed script content using a supervised learning approach where an SVM classifier was trained using character n-grams as features for language identification.
Abstract: Much of the user generated content on the internet is written in their transliterated form instead of in their indigenous script. Due to this search engines receive a large number of transliterated search queries.This paper presents our approach to handle labelling of queries and ad hoc retrieval of documents based on these queries, as part of the FIRE2014 shared task on transliterated search. The content of each document is written in either the native Devanagari script or its transliterated form in Roman script or a combination of both. The queries to retrieve these documents can also be in mixed script. The task is challenging primarily due to the spelling variations that occur in the transliterated form of search queries. This particular problem is addressed by using back transliteration to reduce spelling variations, and a set of hand-tailored rules for consonant mapping. Sub-word indexing is done to take care of breaking and joining of transliterated words. Implementation of query labelling of the mixed script content was done using a supervised learning approach where an SVM classifier was trained using character n-grams as features for language identification. A Naive Bayes classifier was used for classifying transliterated words that can belong to both Hindi and English when looked at individually.The 2 runs submitted by our team (BITS-Lipyantaran) performs best across all metrics for Subtask 2 among all the teams that participated, with a MRR score of 0.8171 and MAP score of 0.6421.

5 citations

Proceedings ArticleDOI
01 Sep 2017
TL;DR: Scale invariant Feature Transformation (SIFT) based script identification has been proposed and overall accuracy reported is 97.65% and 96.71% for bi-script and tri-scripts, respectively.
Abstract: Automatic identification of scripts from document images helps selecting appropriate OCR for character recognition and content retrieval. In this paper, Scale invariant Feature Transformation (SIFT) based script identification has been proposed. Features are extracted using SIFT approach at word level (two, three or more character words) and KNN classifier has been used to recognize the script. Experiments are performed by extracting the words from document images consisting of English, Kannada, and Devanagari scripts. Overall accuracy reported for the proposed system is 97.65% and 96.71% for bi-script and tri-scripts, respectively.

5 citations

Journal ArticleDOI
01 Jan 2015
TL;DR: The paper is about the application of mini minibatch stochastic gradient descent (SGD) based learning applied to Multilayer Perceptron in the domain of isolated Devanagari handwritten character/numeral recognition.
Abstract: The paper is about the application of mini minibatch stochastic gradient descent (SGD) based learning applied to Multilayer Perceptron in the domain of isolated Devanagari handwritten character/numeral recognition. This technique reduces the variance in the estimate of the gradient and often makes better use of the hierarchical memory organization in modern computers. L2-weight decay is added on minibatch SGD to avoid overfitting. The experiments are conducted firstly on the direct pixel intensity values as features. After that, the experiments are performed on the proposed flexible zone based gradient feature extraction algorithm. The results are promising on most of the standard dataset of Devanagari characters/numerals.

5 citations

Proceedings ArticleDOI
16 Dec 2020
TL;DR: In this article, a narrative machine learning algorithm for classification of Odia handwritten characters using Naive Bayes and Decision Table in Waikato Environment for Knowledge Analysis (WEKA) environment has been implemented.
Abstract: Optical Character Recognition (OCR) is a burning technology to recognize text inside images in the current era, such as: scanned documents and photos. OCR technology is used to convert virtually any kind of images containing written text (handwritten or printed) into machine-readable text data. Several research work have been done on the recognition of different foreign languages such as Chinese, English and Japanese Scripts. In India, there are 22 official languages such; Marahati, Punjabi, angala, Odia, etc. Odia is one of the majorly spoken languages of Odisha, a premier Eastern state of India. Many Indian scripts languages have been researched and yielded results of good accuracy rate such as Devanagari, Telugu scripts. There is a need for research on the languages of the Eastern part of the country such as Odia. In this paper it has been implemented for data preprocessing and classification model for offline odia handwritten character with and without noise. Hence this research work has been strived towards buildout of a narrative machine learning algorithm for classification of Offline Odia handwritten Character using Naive Bayes and Decision Table in Waikato Environment for Knowledge Analysis (WEKA) environment. It has been observed noiseless character is better than the noise character in both classification techniques such as: Naive Bayes and Decision Table.

5 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
77% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Image segmentation
79.6K papers, 1.8M citations
74% related
Convolutional neural network
74.7K papers, 2M citations
74% related
Encryption
98.3K papers, 1.4M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202342
202298
202148
202061
201938
201843