scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A survey on optical character recognition for Bangla and Devanagari scripts

TL;DR: A review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India, and the various methodologies and their reported results are presented.
Abstract: The past few decades have witnessed an intensive research on optical character recognition (OCR) for Roman, Chinese, and Japanese scripts. A lot of work has been also reported on OCR efforts for various Indian scripts, like Devanagari, Bangla, Oriya, Tamil, Telugu, Malayalam, Kannada, Gurmukhi, Gujarati, etc. In this paper, we present a review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India. We have summarized most of the published papers on this topic and have also analysed the various methodologies and their reported results. Future directions of research in OCR for Indian scripts have been also given.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A novel deep learning technique for the recognition of handwritten Bangla isolated compound character is presented and a new benchmark of recognition accuracy on the CMATERdb 3.3.1.3 dataset is reported.

113 citations


Cites background from "A survey on optical character recog..."

  • ...There are various research works which have attempted to deal with the history of Bangla character recognition [49, 5, 6, 19, 20, 50, 3]....

    [...]

Journal Article
TL;DR: Multilayer perceptrons (MLP) trained by backpropagation (BP) algorithm are used as classifiers in the present study and results of this study on recognition of handwritten Bangla basic characters will be reported.
Abstract: Recently, a few works on recognition of handwritten Bangla characters have been reported in the literature. However, there is scope for further research in this area. In the present article, results of our recent study on recognition of handwritten Bangla basic characters will be reported. This is a 50 class problem since the alphabet of Bangla has 50 basic characters. In this study, features are obtained by computing local chain code histograms of input character shape. Comparative recognition results are obtained between computation of the above feature based on the contour and one-pixel skeletal representations of the input character image. Also, the classification results are obtained after down sampling the histogram feature by applying Gaussian filter in both these cases. Multilayer perceptrons (MLP) trained by back propagation (BP) algorithm are used as classifiers in the present study. Near exhaustive studies are done for selection of its hidden layer size. An analysis of the misclassified samples shows an interesting error pattern and this has been used for further improvement in the recognition results. Final recognition accuracies on the training and the test sets are respectively 94.65% and 92.14%.

84 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel shape decomposition-based segmentation technique to decompose the compound characters into prominent shape components, which reduces the classification complexity in terms of less number of classes to recognize, and at the same time improves the recognition accuracy.

76 citations

Journal ArticleDOI
TL;DR: The writing style is a unique characteristic of a human being as it varies from one person to another and handwritten character recognition under the purv...
Abstract: The writing style is a unique characteristic of a human being as it varies from one person to another. Due to such diversity in writing style, handwritten character recognition (HCR) under the purv...

41 citations

Journal ArticleDOI
TL;DR: The authors have proposed to use deep learning model as a feature extractor as well as a classifier for the recognition of 33 classes of basic characters of Devanagari ancient manuscripts and the accuracy achieved is better than other state-of-the-art techniques.
Abstract: Devanagari script is the most widely used script in India and other Asian countries. There is a rich collection of ancient Devanagari manuscripts, which is a wealth of knowledge. To make these manuscripts available to people, efforts are being done to digitize these documents. Optical Character Recognition (OCR) plays an important role in recognizing these documents. Convolutional Neural Network (CNN) is a powerful model that is giving very promising results in the field of character recognition, pattern recognition etc. CNN has never been used for the recognition of the Devanagari ancient manuscripts. Our aim in the proposed work is to use the power of CNN for extracting the wealth of knowledge from Devanagari handwritten ancient manuscripts. In addition, we aim is to experiment with various design options like number of layes, stride size, number of filters, kenel size and different functions in various layers and to select the best of these. In this paper, the authors have proposed to use deep learning model as a feature extractor as well as a classifier for the recognition of 33 classes of basic characters of Devanagari ancient manuscripts. A dataset containing 5484 characters has been used for the experimental work. Various experiments show that the accuracy achieved using CNN as a feature extractor is better than other state-of-the-art techniques. The recognition accuracy of 93.73% has been achieved by using the model proposed in this paper for Devanagari ancient character recognition.

35 citations

References
More filters
Journal ArticleDOI
TL;DR: A computer program that emulates the distributed optimization process represented by the activity of social bacterial foraging is presented and applied to a simple multiple-extremum function minimization problem and briefly discusses its relationship to some existing optimization algorithms.
Abstract: We explain the biology and physics underlying the chemotactic (foraging) behavior of E. coli bacteria. We explain a variety of bacterial swarming and social foraging behaviors and discuss the control system on the E. coli that dictates how foraging should proceed. Next, a computer program that emulates the distributed optimization process represented by the activity of social bacterial foraging is presented. To illustrate its operation, we apply it to a simple multiple-extremum function minimization problem and briefly discuss its relationship to some existing optimization algorithms. The article closes with a brief discussion on the potential uses of biomimicry of social foraging to develop adaptive controllers and cooperative control strategies for autonomous vehicles. For this, we provide some basic ideas and invite the reader to explore the concepts further.

2,917 citations

Journal ArticleDOI
TL;DR: The nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms are described.
Abstract: Handwriting has continued to persist as a means of communication and recording information in day-to-day life even with the introduction of new technologies. Given its ubiquity in human transactions, machine recognition of handwriting has practical significance, as in reading handwritten notes in a PDA, in postal addresses on envelopes, in amounts in bank checks, in handwritten fields in forms, etc. This overview describes the nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms. Both the online case (which pertains to the availability of trajectory data during writing) and the off-line case (which pertains to scanned images) are considered. Algorithms for preprocessing, character and word recognition, and performance with practical systems are indicated. Other fields of application, like signature verification, writer authentification, handwriting learning tools are also considered.

2,653 citations


"A survey on optical character recog..." refers background in this paper

  • ...Various designers have been actively involved in developing perfect optical character recognition (OCR) systems (Mantas 1986; Govindan & Shivaprasad 1990; Mori et al 1992; Plamondon & Srihari 2000); still the state-of-the-art accuracy levels have room for improvement....

    [...]

Journal ArticleDOI
TL;DR: Various forms of line drawing representation are described, different schemes of quantization are compared, and the manner in which a line drawing can be extracted from a tracing or a photographic image is reviewed.
Abstract: This paper describes various forms of line drawing representation, compares different schemes of quantization, and reviews the manner in which a line drawing can be extracted from a tracing or a photographic image. The subjective aspects of a line drawing are examined. Different encoding schemes are compared, with emphasis on the so-called chain code which is convenient for highly irregular line drawings. The properties of chain-coded line drawings are derived, and algorithms are developed for analyzing line drawings to determine various geometric features. Procedures are described for rotating, expanding, and smoothing line structures, and for establishing the degree of similarity between two contours by a correlation technique. Three applications are described in detail: automatic assembly of jigsaw puzzles, map matching, and optimum two-dimensional template layout

1,485 citations

Journal ArticleDOI
01 Jul 1992
TL;DR: Both template matching and structure analysis approaches to R&D are considered and it is noted that the two approaches are coming closer and tending to merge.
Abstract: Research and development of OCR systems are considered from a historical point of view. The historical development of commercial systems is included. Both template matching and structure analysis approaches to R&D are considered. It is noted that the two approaches are coming closer and tending to merge. Commercial products are divided into three generations, for each of which some representative OCR systems are chosen and described in some detail. Some comments are made on recent techniques applied to OCR, such as expert systems and neural networks, and some open problems are indicated. The authors' views and hopes regarding future trends are presented. >

892 citations


"A survey on optical character recog..." refers background in this paper

  • ...Various designers have been actively involved in developing perfect optical character recognition (OCR) systems (Mantas 1986; Govindan & Shivaprasad 1990; Mori et al 1992; Plamondon & Srihari 2000); still the state-of-the-art accuracy levels have room for improvement....

    [...]

Journal ArticleDOI
TL;DR: A review of the OCR work done on Indian language scripts and the scope of future work and further steps needed for Indian script OCR development is presented.

592 citations


"A survey on optical character recog..." refers background in this paper

  • ...Optical character recognition is a process of automatic computer recognition of optically scanned and digitized character images to produce an electronic text document....

    [...]