scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Handwritten Numeral Databases of Indian Scripts and Multistage Recognition of Mixed Numerals

01 Mar 2009-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Computer Society)-Vol. 31, Iss: 3, pp 444-457
TL;DR: P pioneering development of two databases for handwritten numerals of two most popular Indian scripts, a multistage cascaded recognition scheme using wavelet based multiresolution representations and multilayer perceptron classifiers and application for the recognition of mixed handwritten numeral recognition of three Indian scripts Devanagari, Bangla and English.
Abstract: This article primarily concerns the problem of isolated handwritten numeral recognition of major Indian scripts. The principal contributions presented here are (a) pioneering development of two databases for handwritten numerals of two most popular Indian scripts, (b) a multistage cascaded recognition scheme using wavelet based multiresolution representations and multilayer perceptron classifiers and (c) application of (b) for the recognition of mixed handwritten numerals of three Indian scripts Devanagari, Bangla and English. The present databases include respectively 22,556 and 23,392 handwritten isolated numeral samples of Devanagari and Bangla collected from real-life situations and these can be made available free of cost to researchers of other academic Institutions. In the proposed scheme, a numeral is subjected to three multilayer perceptron classifiers corresponding to three coarse-to-fine resolution levels in a cascaded manner. If rejection occurred even at the highest resolution, another multilayer perceptron is used as the final attempt to recognize the input numeral by combining the outputs of three classifiers of the previous stages. This scheme has been extended to the situation when the script of a document is not known a priori or the numerals written on a document belong to different scripts. Handwritten numerals in mixed scripts are frequently found in Indian postal mails and table-form documents.
Citations
More filters
Journal ArticleDOI
TL;DR: An extremely fast leaning algorithm called ELM for single hidden layer feed forward networks (SLFN), which randomly chooses the input weights and analytically determines the output weights of SLFN, which learns much faster than traditional popular learning algorithms for feed forward neural networks.
Abstract: This paper deals with the recognition of handwritten Malayalam character using wavelet energy feature (WEF) and extreme learning machine (ELM). The wavelet energy (WE) is a new and robust parameter, and is derived using wavelet transform. It can reduce the influences of different types of noise at different levels. WEF can reflect the WE distribution of characters in several directions at different scales. To a non oscillating pattern, the amplitudes of wavelet coefficients increase when the scale of wavelet decomposition increase. WE of different decomposition levels have different powers to discriminate the character images. These features constitute patterns of handwritten characters for classification. The traditional learning algorithms of the different classifiers are far slower than required. So we have used an extremely fast leaning algorithm called ELM for single hidden layer feed forward networks (SLFN), which randomly chooses the input weights and analytically determines the output weights of SLFN. This algorithm learns much faster than traditional popular learning algorithms for feed forward neural networks. This feature vector, classifier combination gave good recognition accuracy at level 6 of the wavelet decomposition.

175 citations


Cites background or methods from "Handwritten Numeral Databases of In..."

  • ...The state of the art classifiers include statistical classifier such as modified quadratic discriminant function (MQDF), neural classifiers such as multi layer perceptron (MLP), radial basis function (RBF) classifier, polynomial classifier (PC), learning vector quantization (LVQ) and support vector machine (SVM) [5]....

    [...]

  • ...In [5], the wavelet filter is applied three times, and the chain code histogram feature is extracted from each of the detailed image components DL/L (k) , k = 1, 2, 3....

    [...]

  • ...Ujjwal and Chaudhuri [5] presented a multistage cascaded recognition scheme using wavelet based multi resolution representations and MLP classifiers for the recognition of mixed handwritten numerals Bengla, Devanagari and English....

    [...]

Journal ArticleDOI
01 Nov 2011
TL;DR: In this paper, the state of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in various sections of the paper.
Abstract: In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. State of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in this paper. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari OCR.

159 citations


Cites background or methods from "Handwritten Numeral Databases of In..."

  • ...Pal and Chaudhuri [12] and [99] also proposed a suffix- and prefix-based error correction technique, which can take care of different inflectional languages....

    [...]

  • ...To get an idea about the occurrence frequency of different Devanagari characters, Chaudhuri and Pal [5] provided the occurrence statistics of 20 frequent characters in Devanagari script, as shown in Table I, based on a study of three million words....

    [...]

  • ...An approach based on the detection of “shirorekha” is proposed by Chaudhuri and Pal [13] with the assumption that the skew of such header lines show the skew of the whole document....

    [...]

  • ...Bhattacharya and Chaudhuri [2] use a distinct MLP classifier at each stage of their recognition scheme for handwritten numerals....

    [...]

  • ...Bhattacharya and Chaudhuri [2] use a distinct MLP classifier at each stage of their recognition...

    [...]

Journal ArticleDOI
TL;DR: This review article serves the purpose of presenting state of the art results and techniques on OCR and also provide research directions by highlighting research gaps.
Abstract: Given the ubiquity of handwritten documents in human transactions, Optical Character Recognition (OCR) of documents have invaluable practical worth. Optical character recognition is a science that enables to translate various types of documents or images into analyzable, editable and searchable data. During last decade, researchers have used artificial intelligence/machine learning tools to automatically analyze handwritten and printed documents in order to convert them into electronic format. The objective of this review paper is to summarize research that has been conducted on character recognition of handwritten documents and to provide research directions. In this Systematic Literature Review (SLR) we collected, synthesized and analyzed research articles on the topic of handwritten OCR (and closely related topics) which were published between year 2000 to 2019. We followed widely used electronic databases by following pre-defined review protocol. Articles were searched using keywords, forward reference searching and backward reference searching in order to search all the articles related to the topic. After carefully following study selection process 176 articles were selected for this SLR. This review article serves the purpose of presenting state of the art results and techniques on OCR and also provide research directions by highlighting research gaps.

139 citations


Cites background or methods from "Handwritten Numeral Databases of In..."

  • ...multilayer perceptron classifier gave better accuracy on Devanagri, and Bangla numerals [25], 140] but gave...

    [...]

  • ...Another research carried out on Hindi numerals [25] used a relatively large dataset of 22,556 isolated numeral samples of Devanagari and 23,392 samples of Bangla scripts....

    [...]

Journal ArticleDOI
TL;DR: An off-line handwritten alphabetical character recognition system using multilayer feed forward neural network that will be suitable for converting handwritten documents into structural text form and recognizing handwritten names is described in the paper.
Abstract: An off-line handwritten alphabetical character recognition system using multilayer feed forward neural network is described in the paper. A new method, called, diagonal based feature extraction is introduced for extracting the features of the handwritten alphabets. Fifty data sets, each containing 26 alphabets written by various people, are used for training the neural network and 570 different handwritten alphabetical characters are used for testing. The proposed recognition system performs quite well yielding higher levels of recognition accuracy compared to the systems employing the conventional horizontal and vertical methods of feature extraction. This system will be suitable for converting handwritten documents into structural text form and recognizing handwritten names.

135 citations


Cites methods from "Handwritten Numeral Databases of In..."

  • ...KEYWORDS Handwritten character recognition, Image processing, Feature extraction, feed forward neural networks....

    [...]

Journal ArticleDOI
TL;DR: The Histogram of Oriented Gradient is extended and two new feature descriptors are proposed: Co-occurrence HOG (Co-HOG) and Convolutional Co-Hog (ConvCo- HOG) for accurate recognition of scene texts of different languages.

130 citations

References
More filters
Journal ArticleDOI
TL;DR: These methods are reviewed from the two points of view: feature projection and feature density equalization and a systematic comparison of them has been made based on the following criteria: recognition rate, processing speed, computational complexity and degree of variation.

104 citations


Additional excerpts

  • ...Ç...

    [...]

Journal ArticleDOI
TL;DR: The hierarchical OCR dynamically adapts to factors such as the quality of the input pattern, its intrinsic similarities and differences from patterns of other classes it is being compared against, and the processing time available, which leads to optimal use of computational resources.
Abstract: This paper describes hierarchical OCR, a character recognition methodology that achieves high speed and accuracy by using a multiresolution and hierarchical feature space. Features at different resolutions, from coarse to fine-grained, are implemented by means of a recursive classification scheme. Typically, recognizers have to balance the use of features at many resolutions (which yields a high accuracy), with the burden on computational resources in terms of storage space and processing time. We present in this paper, a method that adaptively determines the degree of resolution necessary in order to classify an input pattern. This leads to optimal use of computational resources. The hierarchical OCR dynamically adapts to factors such as the quality of the input pattern, its intrinsic similarities and differences from patterns of other classes it is being compared against, and the processing time available. Furthermore, the finer resolution is accorded to only certain "zones" of the input pattern which are deemed important given the classes that are being discriminated. Experimental results support the methodology presented. When tested on standard NIST data sets, the hierarchical OCR proves to be 300 times faster than a traditional K-nearest-neighbor classification method, and 10 times taster than a neural network method. The comparison uses the same feature set for all methods. Recognition rate of about 96 percent is achieved by the hierarchical OCR. This is at par with the other two traditional methods.

83 citations


Additional excerpts

  • ...Ç...

    [...]

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Three procedures, based on the curvature coefficient, biquadratic interpolation and gradient vector interpolation, are proposed for calculating the curvatures of the equi-gray-scale curves of an input image.
Abstract: Studies the use of curvature in addition to the gradient of gray-scale character images in order to improve the accuracy of handwritten numeral recognition. Three procedures, based on the curvature coefficient, biquadratic interpolation and gradient vector interpolation, are proposed for calculating the curvature of the equi-gray-scale curves of an input image. The efficiency of the feature vector is tested by recognition experiments for the handwritten numeral database IPTP CDROM1, which is a ZIP code database provided by the Institute for Posts and Telecommunications Policy (IPTP). The experimental results show the usefulness of the curvature feature, and a recognition rate of 99.40%, which is the highest that has ever been reported for this database, is achieved.

81 citations


Additional excerpts

  • ...Ç...

    [...]

Journal ArticleDOI
TL;DR: Two new features based on distance information are proposed which contains rich information encoding both the black/white and directional distance distributions and a new concept of map tiling is introduced and applied to the DDD feature to improve its discriminative power.
Abstract: Features play an important role in OCR systems. In this paper, we propose two new features which are based on distance information. In the first feature (called DT, Distance Transformation), each white pixel has a distance value to the nearest black pixel. The second feature is called DDD (Directional Distance Distribution) which contains rich information encoding both the black/white and directional distance distributions. A new concept of map tiling is introduced and applied to the DDD feature to improve its discriminative power. For an objective evaluation and comparison of the proposed and conventional features, three distinct sets of characters (i.e., numerals, English capital letters, and Hangul initial sounds) have been tested using standard databases. Based on the results, three propositions can be derived to confirm the superiority of both the DDD feature and the map tilings.

78 citations


Additional excerpts

  • ...Ç...

    [...]

Book ChapterDOI
22 Nov 2004
TL;DR: A moderately large database of Bangla handwritten character images is used for the recognition purpose and an MLP classifier is trained using a variant of the backpropagation algorithm that uses self-adaptive learning rates.
Abstract: A recognition scheme for handwritten basic Bangla (an Indian script) characters is proposed. No such work has been reported before on a reasonably large representative database. Here a moderately large database of Bangla handwritten character images is used for the recognition purpose. A handwritten character is composed of several strokes whose characteristics depend on the handwriting style. The strokes present in a character image are identified in a simple fashion and 10 certain features are extracted from each of them. These stroke features are concatenated in an appropriate order to form the feature vector of a character image on the basis of which an MLP classifier is trained using a variant of the backpropagation algorithm that uses self-adaptive learning rates. The training and test sets consist respectively of 350 and 90 sample images for each of 50 Bangla basic characters. A separate validation set is used for termination of training of the MLP.

76 citations


Additional excerpts

  • ...Ç...

    [...]