scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Handwritten Numeral Databases of Indian Scripts and Multistage Recognition of Mixed Numerals

01 Mar 2009-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Computer Society)-Vol. 31, Iss: 3, pp 444-457
TL;DR: P pioneering development of two databases for handwritten numerals of two most popular Indian scripts, a multistage cascaded recognition scheme using wavelet based multiresolution representations and multilayer perceptron classifiers and application for the recognition of mixed handwritten numeral recognition of three Indian scripts Devanagari, Bangla and English.
Abstract: This article primarily concerns the problem of isolated handwritten numeral recognition of major Indian scripts. The principal contributions presented here are (a) pioneering development of two databases for handwritten numerals of two most popular Indian scripts, (b) a multistage cascaded recognition scheme using wavelet based multiresolution representations and multilayer perceptron classifiers and (c) application of (b) for the recognition of mixed handwritten numerals of three Indian scripts Devanagari, Bangla and English. The present databases include respectively 22,556 and 23,392 handwritten isolated numeral samples of Devanagari and Bangla collected from real-life situations and these can be made available free of cost to researchers of other academic Institutions. In the proposed scheme, a numeral is subjected to three multilayer perceptron classifiers corresponding to three coarse-to-fine resolution levels in a cascaded manner. If rejection occurred even at the highest resolution, another multilayer perceptron is used as the final attempt to recognize the input numeral by combining the outputs of three classifiers of the previous stages. This scheme has been extended to the situation when the script of a document is not known a priori or the numerals written on a document belong to different scripts. Handwritten numerals in mixed scripts are frequently found in Indian postal mails and table-form documents.
Citations
More filters
Journal ArticleDOI
TL;DR: An extremely fast leaning algorithm called ELM for single hidden layer feed forward networks (SLFN), which randomly chooses the input weights and analytically determines the output weights of SLFN, which learns much faster than traditional popular learning algorithms for feed forward neural networks.
Abstract: This paper deals with the recognition of handwritten Malayalam character using wavelet energy feature (WEF) and extreme learning machine (ELM). The wavelet energy (WE) is a new and robust parameter, and is derived using wavelet transform. It can reduce the influences of different types of noise at different levels. WEF can reflect the WE distribution of characters in several directions at different scales. To a non oscillating pattern, the amplitudes of wavelet coefficients increase when the scale of wavelet decomposition increase. WE of different decomposition levels have different powers to discriminate the character images. These features constitute patterns of handwritten characters for classification. The traditional learning algorithms of the different classifiers are far slower than required. So we have used an extremely fast leaning algorithm called ELM for single hidden layer feed forward networks (SLFN), which randomly chooses the input weights and analytically determines the output weights of SLFN. This algorithm learns much faster than traditional popular learning algorithms for feed forward neural networks. This feature vector, classifier combination gave good recognition accuracy at level 6 of the wavelet decomposition.

175 citations


Cites background or methods from "Handwritten Numeral Databases of In..."

  • ...The state of the art classifiers include statistical classifier such as modified quadratic discriminant function (MQDF), neural classifiers such as multi layer perceptron (MLP), radial basis function (RBF) classifier, polynomial classifier (PC), learning vector quantization (LVQ) and support vector machine (SVM) [5]....

    [...]

  • ...In [5], the wavelet filter is applied three times, and the chain code histogram feature is extracted from each of the detailed image components DL/L (k) , k = 1, 2, 3....

    [...]

  • ...Ujjwal and Chaudhuri [5] presented a multistage cascaded recognition scheme using wavelet based multi resolution representations and MLP classifiers for the recognition of mixed handwritten numerals Bengla, Devanagari and English....

    [...]

Journal ArticleDOI
01 Nov 2011
TL;DR: In this paper, the state of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in various sections of the paper.
Abstract: In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. State of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in this paper. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari OCR.

159 citations


Cites background or methods from "Handwritten Numeral Databases of In..."

  • ...Pal and Chaudhuri [12] and [99] also proposed a suffix- and prefix-based error correction technique, which can take care of different inflectional languages....

    [...]

  • ...To get an idea about the occurrence frequency of different Devanagari characters, Chaudhuri and Pal [5] provided the occurrence statistics of 20 frequent characters in Devanagari script, as shown in Table I, based on a study of three million words....

    [...]

  • ...An approach based on the detection of “shirorekha” is proposed by Chaudhuri and Pal [13] with the assumption that the skew of such header lines show the skew of the whole document....

    [...]

  • ...Bhattacharya and Chaudhuri [2] use a distinct MLP classifier at each stage of their recognition scheme for handwritten numerals....

    [...]

  • ...Bhattacharya and Chaudhuri [2] use a distinct MLP classifier at each stage of their recognition...

    [...]

Journal ArticleDOI
TL;DR: This review article serves the purpose of presenting state of the art results and techniques on OCR and also provide research directions by highlighting research gaps.
Abstract: Given the ubiquity of handwritten documents in human transactions, Optical Character Recognition (OCR) of documents have invaluable practical worth. Optical character recognition is a science that enables to translate various types of documents or images into analyzable, editable and searchable data. During last decade, researchers have used artificial intelligence/machine learning tools to automatically analyze handwritten and printed documents in order to convert them into electronic format. The objective of this review paper is to summarize research that has been conducted on character recognition of handwritten documents and to provide research directions. In this Systematic Literature Review (SLR) we collected, synthesized and analyzed research articles on the topic of handwritten OCR (and closely related topics) which were published between year 2000 to 2019. We followed widely used electronic databases by following pre-defined review protocol. Articles were searched using keywords, forward reference searching and backward reference searching in order to search all the articles related to the topic. After carefully following study selection process 176 articles were selected for this SLR. This review article serves the purpose of presenting state of the art results and techniques on OCR and also provide research directions by highlighting research gaps.

139 citations


Cites background or methods from "Handwritten Numeral Databases of In..."

  • ...multilayer perceptron classifier gave better accuracy on Devanagri, and Bangla numerals [25], 140] but gave...

    [...]

  • ...Another research carried out on Hindi numerals [25] used a relatively large dataset of 22,556 isolated numeral samples of Devanagari and 23,392 samples of Bangla scripts....

    [...]

Journal ArticleDOI
TL;DR: An off-line handwritten alphabetical character recognition system using multilayer feed forward neural network that will be suitable for converting handwritten documents into structural text form and recognizing handwritten names is described in the paper.
Abstract: An off-line handwritten alphabetical character recognition system using multilayer feed forward neural network is described in the paper. A new method, called, diagonal based feature extraction is introduced for extracting the features of the handwritten alphabets. Fifty data sets, each containing 26 alphabets written by various people, are used for training the neural network and 570 different handwritten alphabetical characters are used for testing. The proposed recognition system performs quite well yielding higher levels of recognition accuracy compared to the systems employing the conventional horizontal and vertical methods of feature extraction. This system will be suitable for converting handwritten documents into structural text form and recognizing handwritten names.

135 citations


Cites methods from "Handwritten Numeral Databases of In..."

  • ...KEYWORDS Handwritten character recognition, Image processing, Feature extraction, feed forward neural networks....

    [...]

Journal ArticleDOI
TL;DR: The Histogram of Oriented Gradient is extended and two new feature descriptors are proposed: Co-occurrence HOG (Co-HOG) and Convolutional Co-Hog (ConvCo- HOG) for accurate recognition of scene texts of different languages.

130 citations

References
More filters
Proceedings ArticleDOI
31 Aug 2005
TL;DR: Wavelet transform is considered to obtain multi-resolution representation of each input character image in handwritten character recognition problem to produce 99.10% correct recognition rate on the test set of Bangia numeral database.
Abstract: In handwritten character recognition problem, the input images are often affected by distortions and noise. Thus such images at different resolutions include different variations in the input data. In the present work, we considered wavelet transform to obtain multi-resolution representation of each input character image. At each resolution level, we considered three MLPs with different numbers of nodes in their hidden layers and combined the outputs produced by all the MLPs of the whole ensemble by using weighted sum rule, product rule and majority voting. The set of misclassified samples produced by one combination rule is neither a subset nor a superset of a similar set produced by another rule. So, majority voting has been used for the second and final round to produce final outputs after combining the results of the three combinations of the first stage. The proposed approach produced 99.10% correct recognition rate on the test set of Bangia (a major Indian script) numeral database.

14 citations


Additional excerpts

  • ...Ç...

    [...]

Proceedings Article
01 Jan 2002
TL;DR: A novel off-line handprinted Bangla (a major Indian script) numeral recognition scheme using a multistage classifier system comprising multilayer perceptron (MLP) neural networks is proposed.
Abstract: This paper proposes a novel off-line handprinted Bangla (a major Indian script) numeral recognition scheme using a multistage classifier system comprising multilayer perceptron (MLP) neural networks. In this scheme we consider multiresolution features based on wavelet transforms. We start from certain coarse resolution level of wavelet representation and if rejection occurs at this level of the classifier, the input pattern is passed to a larger MLP network corresponding to the next higher resolution level. For simplicity and efficiency we considered only three coarse-to-fine resolution levels in the present work. The system was trained and tested on a database of 9000 samples of handprinted Bangla (a major Indian script) numerals. For improved generalization and to avoid overtraining, the whole available data set had been divided into three subsets – training set, validation set and test set. We achieved 94.96% and 93.025% correct recognition rates on training and test sets respectively. The proposed recognition scheme is robust with respect to various writing styles and sizes as well as presence of considerable noise. Moreover, the present scheme is sufficiently fast for its real-life applications.

13 citations


Additional excerpts

  • ...Ç...

    [...]

Proceedings ArticleDOI
11 Dec 2005
TL;DR: A study showing how the recognition performance of an MLP based classifier varies with variation in the training set size is presented in this paper.
Abstract: A study showing how the recognition performance of an MLP based classifier varies with variation in the training set size is presented in this paper. The training set for the work is formed with samples of handwritten Bangla numerals. For recognition of handwritten Bangla numerals, we have used directional features extracted from the contour of each numeral. To extract these features, the minimum bounding box containing the image of each numeral is first segmented into few blocks and then the direction code histogram is computed with each of these blocks. Peak values of each such histogram are considered as the feature values of the corresponding blocks. Considering all the blocks a 100 element feature set is formed for representation of each image pattern and a database of 12000 numerals are used for the same.

11 citations

Book ChapterDOI
Fumitaka Kimura1
01 Jan 2007

9 citations


"Handwritten Numeral Databases of In..." refers background in this paper

  • ...Handwritten numeral samples from the two databases (10 samples per class are shown)....

    [...]