scispace - formally typeset
Search or ask a question
Author

R.M.K. Sinha

Bio: R.M.K. Sinha is an academic researcher from Indian Institutes of Technology. The author has contributed to research in topics: Devanagari & Hindi. The author has an hindex of 1, co-authored 1 publications receiving 67 citations.

Papers
More filters
Proceedings ArticleDOI
01 Sep 2001
TL;DR: A complete OCR for printed Hindi text in Devanagari script is presented and a performance of 93% at character level is obtained.
Abstract: In this paper, we present a complete OCR for printed Hindi text in Devanagari script. A performance of 93% at character level is obtained.

74 citations


Cited by
More filters
Journal ArticleDOI
01 Nov 2011
TL;DR: In this paper, the state of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in various sections of the paper.
Abstract: In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. State of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in this paper. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari OCR.

159 citations

Proceedings ArticleDOI
25 Jul 2009
TL;DR: Effort has been concentrated on enabling generic multi-lingual operation such that negligible customization is required for a new language beyond providing a corpus of text.
Abstract: We describe efforts to adapt the Tesseract open source OCR engine for multiple scripts and languages. Effort has been concentrated on enabling generic multi-lingual operation such that negligible customization is required for a new language beyond providing a corpus of text. Although change was required to various modules, including physical layout analysis, and linguistic post-processing, no change was required to the character classifier beyond changing a few limits. The Tesseract classifier has adapted easily to Simplified Chinese. Test results on English, a mixture of European languages, and Russian, taken from a random sample of books, show a reasonably consistent word error rate between 3.72% and 5.78%, and Simplified Chinese has a character error rate of only 3.77%.

117 citations

Journal ArticleDOI
01 Jan 2007
TL;DR: This paper addresses the problem of Bangla basic character recognition with multi-font Bangla character recognition and proposes a novel feature extraction scheme based on the digital curvelet transform.
Abstract: This paper addresses the problem of Bangla basic character recognition. Multi-font Bangla character recognition has not been attempted previously. Twenty popular Bangla fonts have been used for the purpose of character recognition. A novel feature extraction scheme based on the digital curvelet transform is proposed. The curvelet transform, although heavily utilized in various areas of image processing, has not been used as the feature extraction scheme for character recognition. The curvelet coefficients of an original image as well as its morphologically altered versions are used to train separate k– nearest neighbor classifiers. The output values of these classifiers are fused using a simple majority voting scheme to arrive at a final decision.

93 citations

01 Jan 2010
TL;DR: The recognition rate of the proposed OCR system with the image document of Devnagari Script has been found to be quite high and a technique for OCR System for different five fonts and sizes of printed DevNagari script using Artificial Neural Network is proposed.
Abstract: There are about 300 million people in India who speak Hindi and write Devnagari script. Research in Optical Character Recognition (OCR) is popular for its application potential in banks, post offices, defense organizations and library automation etc. However most of the OCR systems are available for European texts. In this paper, we have proposed a technique for OCR System for different five fonts and sizes of printed Devnagari script using Artificial Neural Network. The recognition rate of the proposed OCR system with the image document of Devnagari Script has been found to be quite high.

71 citations

Journal ArticleDOI
TL;DR: A review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India, and the various methodologies and their reported results are presented.
Abstract: The past few decades have witnessed an intensive research on optical character recognition (OCR) for Roman, Chinese, and Japanese scripts. A lot of work has been also reported on OCR efforts for various Indian scripts, like Devanagari, Bangla, Oriya, Tamil, Telugu, Malayalam, Kannada, Gurmukhi, Gujarati, etc. In this paper, we present a review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India. We have summarized most of the published papers on this topic and have also analysed the various methodologies and their reported results. Future directions of research in OCR for Indian scripts have been also given.

70 citations