scispace - formally typeset
Search or ask a question
Topic

Devanagari

About: Devanagari is a research topic. Over the lifetime, 655 publications have been published within this topic receiving 7428 citations. The topic is also known as: Deva nagari & Hindi Script.


Papers
More filters
Journal ArticleDOI
TL;DR: Techniques like particle swarm optimization and support vector machines are implemented and compared and classification methods based on learning from examples have been widely applied to character recognition from the 1990s and have brought forth significant improvements of recognition accuracies.

8 citations

01 Jan 2016
TL;DR: The research in document recognition is extended, from modern Latin scripts to Old Latin, to Greek and to other ``under-privilaged'' scripts such as Devanagari and Urdu Nastaleeq, to address the challenge of OCR of historical documents.
Abstract: The task of printed Optical Character Recognition (OCR), though considered ``solved'' by many, still poses several challenges. The complex grapheme structure of many scripts, such as Devanagari and Urdu Nastaleeq, greatly lowers the performance of state-of-the-art OCR systems. Moreover, the digitization of historical and multilingual documents still require much probing. Lack of benchmark datasets further complicates the development of reliable OCR systems. This thesis aims to find the answers to some of these challenges using contemporary machine learning technologies. Specifically, the Long Short-Term Memory (LSTM) networks, have been employed to OCR modern as well historical monolingual documents. The excellent OCR results obtained on these have led us to extend their application for multilingual documents. The first major contribution of this thesis is to demonstrate the usability of LSTM networks for monolingual documents. The LSTM networks yield very good OCR results on various modern and historical scripts, without using sophisticated features and post-processing techniques. The set of modern scripts include modern English, Urdu Nastaleeq and Devanagari. To address the challenge of OCR of historical documents, this thesis focuses on Old German Fraktur script, medieval Latin script of the 15th century, and Polytonic Greek script. LSTM-based systems outperform the contemporary OCR systems on all of these scripts. To cater for the lack of ground-truth data, this thesis proposes a new methodology, combining segmentation-based and segmentation-free OCR approaches, to OCR scripts for which no transcribed training data is available. Another major contribution of this thesis is the development of a novel multilingual OCR system. A unified framework for dealing with different types of multilingual documents has been proposed. The core motivation behind this generalized framework is the human reading ability to process multilingual documents, where no script identification takes place. In this design, the LSTM networks recognize multiple scripts simultaneously without the need to identify different scripts. The first step in building this framework is the realization of a language-independent OCR system which recognizes multilingual text in a single step. This language-independent approach is then extended to script-independent OCR that can recognize multiscript documents using a single OCR model. The proposed generalized approach yields low error rate (1.2%) on a test corpus of English-Greek bilingual documents. In summary, this thesis aims to extend the research in document recognition, from modern Latin scripts to Old Latin, to Greek and to other ``under-privilaged'' scripts such as Devanagari and Urdu Nastaleeq. It also attempts to add a different perspective in dealing with multilingual documents.

8 citations

Journal ArticleDOI
TL;DR: This paper presents a review of Urdu handwritten character recognition methods with special reference to neural networks and includes information regarding the various operations that may be performed on the image for the recognition ofUrdu characters.
Abstract: Character recognition being one of the most interesting and attractive areas of pattern recognition and artificial intelligence has got additional consideration during last decade due to its wide range of applications. It contributes immensely to the computerization process and enhancing the man-machine interaction in many applications. It is an art of detecting and recognizing the characters from input image and converting them into ASCII or other corresponding machine editable form. There are four main phases of Character Recognition – Data acquisition and Preprocessing, Segmentation, Feature extraction and Classification. Several research studies have been carried out for recognition of scripts like Chinese, Japanese, English, Devanagari, etc. but the research regarding Urdu Script is still immature due to cursive, variable and overlapping nature of Urdu characters and different writing styles. Research studies on printed Urdu characters have shown good recognition rate but the Handwritten Urdu Script Recognition is still an open and challenging area for researchers. This paper presents a review of Urdu handwritten character recognition methods with special reference to neural networks and includes information regarding the various operations that may be performed on the image for the recognition of Urdu characters. In literature, it has been found that B-Spline curves are not yet applied in combination with Neural Networks for Urdu script recognition. The current research work intends to use B-Splines curves for feature extraction with Feed Forward Neural Network as classifier and focuses on isolated characters in offline domain.

8 citations

Proceedings ArticleDOI
01 Nov 2014
TL;DR: A text and script independent method is proposed for identification of writer for handwritten scripts/languages using correlation and homogeneity properties of Gray Level Co-occurrence Matrices of the handwritten document images.
Abstract: If a set of writers know writing of more than one scripts/languages, identification of such writers is difficult and challenging problem of research One method is to design a script independent writer identification algorithm to identify the writer of underlying handwritten document Hence a text and script independent method is proposed for identification of writer for handwritten scripts/languages using correlation and homogeneity properties of Gray Level Co-occurrence Matrices of the handwritten document images The feature vector of size 40 is obtained from each input handwritten document image Handwritten documents are collected from the same 100 writers in Roman, Kannada and Devanagari scripts Using nearest neighbor classifier with modified 4-fold cross validation the results for writer identification are obtained Identification accuracies are 8275%, 8275% and 8525% when the handwritten documents are in only one script Roman, Kannada and Devanagari scripts respectively The writer identification rates are 806250%, 8375% and 84% respectively for Roman-Kannada, Roman-Devanagari and Kannada-Devanagari handwritten input documents The writer identification rate is 821995% for the input documents of Roman-Kannada-Devanagari

8 citations

Journal ArticleDOI
31 Mar 2016
TL;DR: A new approach for Devanagari handwritten character / digit recognition has been proposed and has been tested on large set of handwritten character and numeral database and empirical results reveals that the proposed method yields very good accuracy.
Abstract: Automated offline handwritten character recognition of Devanagari script is a growing area of research in the field of pattern recognition. A new approach for Devanagari handwritten character / digit recognition has been proposed in this paper. This approach employs Uniform Local Binary Pattern (ULBP) operator as the feature extraction method. This operator has great performance in research areas such as texture classification and object recognition, but it has not been used in Devanagari handwritten character/digit recognition problem. The proposed method extracts both local and global features. The proposed method have two steps, in the first step image is preprocessed to remove noise and to convert it to binary image and then resizing it to a fixed size of 48x48. In the second step, ULBP operator is applied to the image to extract global features then input image is divided into 9 blocks, ULBP operator is applied to each block to extract local features. Finally, global and local features are used to train Support Vector Machine(SVM). The proposed method has been tested on large set of handwritten character and numeral database and empirical results reveals that the proposed method yields very good accuracy (98.77%) . To establish the superiority of the proposed method, it has also been compared with the contemporary algorithms. The comparative analysis shows that the proposed method out performs the existing methods.

8 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
77% related
Support vector machine
73.6K papers, 1.7M citations
75% related
Image segmentation
79.6K papers, 1.8M citations
74% related
Convolutional neural network
74.7K papers, 2M citations
74% related
Encryption
98.3K papers, 1.4M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202342
202298
202148
202061
201938
201843