Author
Veena Bansal
Other affiliations: Indian Institutes of Technology
Bio: Veena Bansal is an academic researcher from Indian Institute of Technology Kanpur. The author has contributed to research in topics: Devanagari & Optical character recognition. The author has an hindex of 11, co-authored 19 publications receiving 631 citations. Previous affiliations of Veena Bansal include Indian Institutes of Technology.
Papers
More filters
TL;DR: A two pass algorithm for the segmentation and decomposition of Devanagari composite characters/symbols into their constituent symbols and a recognition rate has been achieved on the segmented conjuncts.
Abstract: Devanagari script is a two dimensional composition of symbols It is highly cumbersome to treat each composite character as a separate atomic symbol because such combinations are very large in number This paper presents a two pass algorithm for the segmentation and decomposition of Devanagari composite characters/symbols into their constituent symbols The proposed algorithm extensively uses structural properties of the script In the first pass, words are segmented into easily separable characters/composite characters Statistical information about the height and width of each separated box is used to hypothesize whether a character box is composite In the second pass, the hypothesized composite characters are further segmented A recognition rate of 85 percent has been achieved on the segmented conjuncts The algorithm is designed to segment a pair of touching characters
143 citations
01 Jul 2000
TL;DR: The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role, which is the underlying philosophy of the Devanagari document recognition system described in this work.
Abstract: The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role. This is the underlying philosophy of the Devanagari document recognition system described in this work. The knowledge sources we use are mostly statistical in nature or in the form of a word dictionary tailored specifically for optical character recognition (OCR). We do not perform any reasoning on these. However, we explore their relative importance and role in the hierarchy. Some of the knowledge sources are acquired a priori by an automated training process while others are extracted from the text as it is processed. A complete Devanagari OCR system has been designed and tested with real-life printed documents of varying size and font. Most of the documents used were photocopies of the original. A performance of approximately 90% correct recognition is achieved.
132 citations
01 Sep 2001
TL;DR: A complete OCR for printed Hindi text in Devanagari script is presented and a performance of 93% at character level is obtained.
Abstract: In this paper, we present a complete OCR for printed Hindi text in Devanagari script. A performance of 93% at character level is obtained.
74 citations
20 Sep 1999
TL;DR: A schema for the description of shapes of Devanagari characters and its application in their recognition is presented, which exploits certain features of the script in both reducing the search space and creating a reference with respect to which correspondence could be established, during the matching process.
Abstract: The paper presents a schema for the description of shapes of Devanagari characters and its application in their recognition. It exploits certain features of the script in both reducing the search space and creating a reference with respect to which correspondence could be established, during the matching process. The description prototypes are constructed using the real-life script after segmentation so that the aberrations introduced during the inevitable process of segmentation get accounted for in the description. This has been tested on printed Devanagari text with a success of approximately 70% without any post-processing and 88% correct recognition with the help of a word dictionary.
62 citations
01 Jan 2004
TL;DR: PATSEEK is an image based search system for US patent database that is to let the user check the similarity of his query image with the images that exist in US patents.
Abstract: A patent always contains some images along with the text. Many text based systems have been developed to search the patent database. In this paper, we describe PATSEEK that is an image based search system for US patent database. The objective is to let the user check the similarity of his query image with the images that exist in US patents. The user can specify a set of key words that must exist in the text of the patents whose images will be searched for similarity. PATSEEK automatically grabs images from the US patent database on the request of the user and represents them through an edge orientation autocorrelogram. L1 and L2 distance measures are used to compute the distance between the images. A recall rate of 100% for 61% of query images and an average 32% recall rate for rest of the images has been observed.
52 citations
Cited by
More filters
09 Mar 2012
TL;DR: Artificial neural networks (ANNs) constitute a class of flexible nonlinear models designed to mimic biological neural systems as mentioned in this paper, and they have been widely used in computer vision applications.
Abstract: Artificial neural networks (ANNs) constitute a class of flexible nonlinear models designed to mimic biological neural systems. In this entry, we introduce ANN using familiar econometric terminology and provide an overview of ANN modeling approach and its implementation methods. † Correspondence: Chung-Ming Kuan, Institute of Economics, Academia Sinica, 128 Academia Road, Sec. 2, Taipei 115, Taiwan; ckuan@econ.sinica.edu.tw. †† I would like to express my sincere gratitude to the editor, Professor Steven Durlauf, for his patience and constructive comments on early drafts of this entry. I also thank Shih-Hsun Hsu and Yu-Lieh Huang for very helpful suggestions. The remaining errors are all mine.
2,069 citations
TL;DR: A review of the OCR work done on Indian language scripts and the scope of future work and further steps needed for Indian script OCR development is presented.
Abstract: Intensive research has been done on optical character recognition (OCR) and a large number of articles have been published on this topic during the last few decades. Many commercial OCR systems are now available in the market. But most of these systems work for Roman, Chinese, Japanese and Arabic characters. There are no sufficient number of work on Indian language character recognition although there are 12 major scripts in India. In this paper, we present a review of the OCR work done on Indian language scripts. The review is organized into 5 sections. Sections 1 and 2 cover introduction and properties on Indian scripts. In Section 3, we discuss different methodologies in OCR development as well as research work done on Indian scripts recognition. In Section 4, we discuss the scope of future work and further steps needed for Indian script OCR development. In Section 5 we conclude the paper.
592 citations
TL;DR: P pioneering development of two databases for handwritten numerals of two most popular Indian scripts, a multistage cascaded recognition scheme using wavelet based multiresolution representations and multilayer perceptron classifiers and application for the recognition of mixed handwritten numeral recognition of three Indian scripts Devanagari, Bangla and English.
Abstract: This article primarily concerns the problem of isolated handwritten numeral recognition of major Indian scripts. The principal contributions presented here are (a) pioneering development of two databases for handwritten numerals of two most popular Indian scripts, (b) a multistage cascaded recognition scheme using wavelet based multiresolution representations and multilayer perceptron classifiers and (c) application of (b) for the recognition of mixed handwritten numerals of three Indian scripts Devanagari, Bangla and English. The present databases include respectively 22,556 and 23,392 handwritten isolated numeral samples of Devanagari and Bangla collected from real-life situations and these can be made available free of cost to researchers of other academic Institutions. In the proposed scheme, a numeral is subjected to three multilayer perceptron classifiers corresponding to three coarse-to-fine resolution levels in a cascaded manner. If rejection occurred even at the highest resolution, another multilayer perceptron is used as the final attempt to recognize the input numeral by combining the outputs of three classifiers of the previous stages. This scheme has been extended to the situation when the script of a document is not known a priori or the numerals written on a document belong to different scripts. Handwritten numerals in mixed scripts are frequently found in Indian postal mails and table-form documents.
328 citations
01 Nov 2011
TL;DR: In this paper, the state of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in various sections of the paper.
Abstract: In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. State of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in this paper. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari OCR.
159 citations
13 Dec 2006
TL;DR: A quadratic classifier based scheme for the recognition of off-line Devnagari handwritten characters using chain code information of the contour points of the characters and using five-fold cross-validation technique for result computation.
Abstract: Recognition of handwritten characters is a challenging task because of the variability involved in the writing styles of different individuals. In this paper we propose a quadratic classifier based scheme for the recognition of off-line Devnagari handwritten characters. The features used in the classifier are obtained from the directional chain code information of the contour points of the characters. The bounding box of a character is segmented into blocks and the chain code histogram is computed in each of the blocks. Based on the chain code histogram, here we have used 64 dimensional features for recognition. These chain code features are fed to the quadratic classifier for recognition. From the proposed scheme we obtained 98.86% and 80.36% recognition accuracy on Devnagari numerals and characters, respectively. We used five-fold cross-validation technique for result computation.
157 citations