scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Indian script character recognition: a survey

01 Sep 2004-Pattern Recognition (Pergamon)-Vol. 37, Iss: 9, pp 1887-1899
TL;DR: A review of the OCR work done on Indian language scripts and the scope of future work and further steps needed for Indian script OCR development is presented.
Abstract: Intensive research has been done on optical character recognition (OCR) and a large number of articles have been published on this topic during the last few decades. Many commercial OCR systems are now available in the market. But most of these systems work for Roman, Chinese, Japanese and Arabic characters. There are no sufficient number of work on Indian language character recognition although there are 12 major scripts in India. In this paper, we present a review of the OCR work done on Indian language scripts. The review is organized into 5 sections. Sections 1 and 2 cover introduction and properties on Indian scripts. In Section 3, we discuss different methodologies in OCR development as well as research work done on Indian scripts recognition. In Section 4, we discuss the scope of future work and further steps needed for Indian script OCR development. In Section 5 we conclude the paper.
Citations
More filters
Journal ArticleDOI
TL;DR: The objective of this paper is to present a survey of existing methods, developed during the last decade and dedicated to documents of historical interest.
Abstract: There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines), automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade and dedicated to documents of historical interest.

416 citations


Cites background from "Indian script character recognition..."

  • ...In the alphabets of some Indian scripts (like Devnagari, Bangla and Gurumukhi), many basic characters have an horizontal line (the head line) in the upper part [42]....

    [...]

Journal ArticleDOI
TL;DR: An overview of the different script identification methodologies under each of the two broad categories-structure-based and visual-appearance-based techniques is given.
Abstract: A variety of different scripts are used in writing languages throughout the world. In a multiscript, multilingual environment, it is essential to know the script used in writing a document before an appropriate character recognition and document analysis algorithm can be chosen. In view of this, several methods for automatic script identification have been developed so far. They mainly belong to two broad categories-structure-based and visual-appearance-based techniques. This survey report gives an overview of the different script identification methodologies under each of these categories. Methods for script identification in online data and video-texts are also presented. It is noted that the research in this field is relatively thin and still more research is to be done, particularly in the case of handwritten documents.

234 citations


Cites methods from "Indian script character recognition..."

  • ...Index Terms—Document analysis, Optical character recognition, Script identification, Multi-script document....

    [...]

  • ...Script identification methods that are based on extraction and analysis of connected components fall under the category of structurebased methods....

    [...]

Journal ArticleDOI
TL;DR: A neural network is proposed for Gujarati handwritten digits identification and a multi layered feed forward Neural network is suggested for classification of digits.
Abstract: This paper deals with an optical character recognition (OCR) system for handwritten Gujarati numbers. One may find so much of work for Indian languages like Hindi, Kannada, Tamil, Bangala, Malayalam, Gurumukhi etc, but Gujarati is a language for which hardly any work is traceable especially for handwritten characters. Here in this work a neural network is proposed for Gujarati handwritten digits identification. A multi layered feed forward neural network is suggested for classification of digits. The features of Gujarati digits are abstracted by four different profiles of digits. Thinning and skew-correction are also done for preprocessing of handwritten numerals before their classification. This work has achieved approximately 82% of success rate for Gujarati handwritten digit identification.

176 citations


Cites background from "Indian script character recognition..."

  • ...[13] presented a survey of character recognition in 2004....

    [...]

Proceedings ArticleDOI
23 Sep 2007
TL;DR: A modified quadratic classifier based scheme towards the recognition of off-line handwritten numerals of six popular Indian scripts, including Devnagari, Bangla, Telugu, Oriya, Kannada, and Tamil scripts.
Abstract: India is a multi-lingual multi-script country but there is not much work towards handwritten character recognition of Indian languages. In this paper we propose a modified quadratic classifier based scheme towards the recognition of off-line handwritten numerals of six popular Indian scripts. Here we consider Devnagari, Bangla, Telugu, Oriya, Kannada and Tamil scripts for our experiment. The features used in the classifier are obtained from the directional information of the numerals. For feature computation, the bounding box of a numeral is segmented into blocks and the directional features are computed in each of the blocks. These blocks are then down sampled by a Gaussian filter and the features obtained from the down sampled blocks are fed to a modified quadratic classifier for recognition. Here we have used two sets of feature. We have used 64 dimensional features for high-speed recognition and 400 dimensional features for high-accuracy recognition in our proposed system. A five-fold cross validation technique has been used for result computation and we obtained 99.56%, 98.99%, 99.37%, 98.40%, 98.71% and 98.51% accuracy from Devnagari, Bangla, Telugu, Oriya, Kannada, and Tamil scripts, respectively.

174 citations


Cites background from "Indian script character recognition..."

  • ...Moreover, Hindi is the national language of India and the third most popular language in the world [ 3 ]....

    [...]

  • ...Some pieces of work have been done on the recognition of Indian printed characters [ 3 ] but only a few attempts have been made towards the recognition of Indian handwritten numerals [4-10]....

    [...]

Journal ArticleDOI
01 Nov 2011
TL;DR: In this paper, the state of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in various sections of the paper.
Abstract: In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. State of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in this paper. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari OCR.

159 citations


Cites background or methods from "Indian script character recognition..."

  • ...for line segmentation in printed documents [3]....

    [...]

  • ...Most of the Indian scripts including Devanagari originated from ancient Brahmi script through various transformations [3]....

    [...]

  • ...The recognition or classification process of characters, symbols, or words is normally carried out using template or featurebased approaches [3], [5], [7]....

    [...]

  • ...The same technology has been transferred to Center for Development for the Advance Computing (CDAC) in 2001 for commercialization and is marketed as “Chitrankan” [3]....

    [...]

  • ...Computing (CDAC) in 2001 for commercialization and is marketed as “Chitrankan” [3]....

    [...]

References
More filters
Book ChapterDOI
01 Jan 1988
TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.
Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

17,604 citations

Book
03 Jan 1986
TL;DR: In this paper, the problem of the generalized delta rule is discussed and the Generalized Delta Rule is applied to the simulation results of simulation results in terms of the generalized delta rule.
Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

13,579 citations

Journal ArticleDOI
TL;DR: The nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms are described.
Abstract: Handwriting has continued to persist as a means of communication and recording information in day-to-day life even with the introduction of new technologies. Given its ubiquity in human transactions, machine recognition of handwriting has practical significance, as in reading handwritten notes in a PDA, in postal addresses on envelopes, in amounts in bank checks, in handwritten fields in forms, etc. This overview describes the nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms. Both the online case (which pertains to the availability of trajectory data during writing) and the off-line case (which pertains to scanned images) are considered. Algorithms for preprocessing, character and word recognition, and performance with practical systems are indicated. Other fields of application, like signature verification, writer authentification, handwriting learning tools are also considered.

2,653 citations

Journal ArticleDOI
TL;DR: This paper presents an overview of feature extraction methods for off-line recognition of segmented (isolated) characters in terms of invariance properties, reconstructability and expected distortions and variability of the characters.
Abstract: This paper presents an overview of feature extraction methods for off-line recognition of segmented (isolated) characters Selection of a feature extraction method is probably the single most important factor in achieving high recognition performance in character recognition systems Different feature extraction methods are designed for different representations of the characters, such as solid binary characters, character contours, skeletons (thinned characters) or gray-level subimages of each individual character The feature extraction methods are discussed in terms of invariance properties, reconstructability and expected distortions and variability of the characters The problem of choosing the appropriate feature extraction method for a given application is also discussed When a few promising feature extraction methods have been identified, they need to be evaluated experimentally to find the best method for the given application

1,376 citations