scispace - formally typeset
Search or ask a question
Author

R.M.K. Sinha

Bio: R.M.K. Sinha is an academic researcher from Indian Institute of Technology Kanpur. The author has contributed to research in topics: Devanagari & Natural language. The author has an hindex of 13, co-authored 32 publications receiving 766 citations. Previous affiliations of R.M.K. Sinha include Université du Québec & Institut national de la recherche scientifique.

Papers
More filters
Journal ArticleDOI
TL;DR: A two pass algorithm for the segmentation and decomposition of Devanagari composite characters/symbols into their constituent symbols and a recognition rate has been achieved on the segmented conjuncts.

143 citations

Journal ArticleDOI
01 Jul 2000
TL;DR: The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role, which is the underlying philosophy of the Devanagari document recognition system described in this work.
Abstract: The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role. This is the underlying philosophy of the Devanagari document recognition system described in this work. The knowledge sources we use are mostly statistical in nature or in the form of a word dictionary tailored specifically for optical character recognition (OCR). We do not perform any reasoning on these. However, we explore their relative importance and role in the hierarchy. Some of the knowledge sources are acquired a priori by an automated training process while others are extracted from the text as it is processed. A complete Devanagari OCR system has been designed and tested with real-life printed documents of varying size and font. Most of the documents used were photocopies of the original. A performance of approximately 90% correct recognition is achieved.

132 citations

Journal ArticleDOI
TL;DR: This paper presents a design of a post-processor which corrects the Devanagari symbol string based on this observation and its accumulated penalty value for a word gives a measure of its confidence level.

73 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: A schema for the description of shapes of Devanagari characters and its application in their recognition is presented, which exploits certain features of the script in both reducing the search space and creating a reference with respect to which correspondence could be established, during the matching process.
Abstract: The paper presents a schema for the description of shapes of Devanagari characters and its application in their recognition. It exploits certain features of the script in both reducing the search space and creating a reference with respect to which correspondence could be established, during the matching process. The description prototypes are constructed using the real-life script after segmentation so that the aberrations introduced during the inevitable process of segmentation get accounted for in the description. This has been tested on printed Devanagari text with a success of approximately 70% without any post-processing and 88% correct recognition with the help of a word dictionary.

62 citations

Proceedings ArticleDOI
22 Oct 1995
TL;DR: An English to Indian languages machine aided translation system, named ANGLABHARTI, has been developed, which is better than the transfer approach, but falls short of genuine interlingua, in the sense that it ignores complete disambiguation/understanding of the text to be translated.
Abstract: An English to Indian languages machine aided translation system, named ANGLABHARTI, has been developed. It uses pattern directed approach using context free grammar like structures. A 'pseudo-target' is generated which is applicable to a group of Indian languages. Set of rules are acquired through corpus analysis to identify the plausible constituents with respect to which movement rules for the 'pseudo-target' are constructed. A number of semantic tags are used to resolve sense ambiguity in the source language. Alternative meanings for the unresolved ambiguities are retained in the pseudo target language code. A text generator module for each of the target languages transforms the pseudo target language to the target language. A corrector for ill-formed sentences is used for each of the target languages. Finally, a human-engineered post-editing package is used to make the final corrections. The post-editor needs to know only the target language. The strategy used in ANGLABHARTI lies in between the transfer and the interlingua approach. It is better than the transfer approach, as the translation is valid for a host of target language sentences, but falls short of genuine interlingua, in the sense that it ignores complete disambiguation/understanding of the text to be translated.

58 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A comprehensive survey of thinning methodologies, including iterative deletion of pixels and nonpixel-based methods, is presented and the relationships among them are explored.
Abstract: A comprehensive survey of thinning methodologies is presented. A wide range of thinning algorithms, including iterative deletion of pixels and nonpixel-based methods, is covered. Skeletonization algorithms based on medial axis and other distance transforms are not considered. An overview of the iterative thinning process and the pixel-deletion criteria needed to preserve the connectivity of the image pattern is given first. Thinning algorithms are then considered in terms of these criteria and their modes of operation. Nonpixel-based methods that usually produce a center line of the pattern directly in one pass without examining all the individual pixels are discussed. The algorithms are considered in great detail and scope, and the relationships among them are explored. >

1,827 citations

Book
25 Nov 1996
TL;DR: Algorithms for Image Processing and Computer Vision, 2nd Edition provides the tools to speed development of image processing applications.
Abstract: A cookbook of algorithms for common image processing applicationsThanks to advances in computer hardware and software, algorithms have been developed that support sophisticated image processing without requiring an extensive background in mathematics This bestselling book has been fully updated with the newest of these, including 2D vision methods in content-based searches and the use of graphics cards as image processing computational aids Its an ideal reference for software engineers and developers, advanced programmers, graphics programmers, scientists, and other specialists who require highly specialized image processingAlgorithms now exist for a wide variety of sophisticated image processing applications required by software engineers and developers, advanced programmers, graphics programmers, scientists, and related specialistsThis bestselling book has been completely updated to include the latest algorithms, including 2D vision methods in content-based searches, details on modern classifier methods, and graphics cards used as image processing computational aidsSaves hours of mathematical calculating by using distributed processing and GPU programming, and gives non-mathematicians the shortcuts needed to program relatively sophisticated applicationsAlgorithms for Image Processing and Computer Vision, 2nd Edition provides the tools to speed development of image processing applications

1,517 citations

Journal ArticleDOI
TL;DR: Research aimed at correcting words in text has focused on three progressively more difficult problems: nonword error detection; (2) isolated-word error correction; and (3) context-dependent work correction, which surveys documented findings on spelling error patterns.
Abstract: Research aimed at correcting words in text has focused on three progressively more difficult problems:(1) nonword error detection; (2) isolated-word error correction; and (3) context-dependent work correction. In response to the first problem, efficient pattern-matching and n-gram analysis techniques have been developed for detecting strings that do not appear in a given word list. In response to the second problem, a variety of general and application-specific spelling correction techniques have been developed. Some of them were based on detailed studies of spelling error patterns. In response to the third problem, a few experiments using natural-language-processing tools or statistical-language models have been carried out. This article surveys documented findings on spelling error patterns, provides descriptions of various nonword detection and isolated-word error correction techniques, reviews the state of the art of context-dependent word correction techniques, and discusses research issues related to all three areas of automatic error correction in text.

1,417 citations

Journal ArticleDOI
TL;DR: This work presents algorithms for detecting and tracking text in digital video that implements a scale-space feature extractor that feeds an artificial neural processor to detect text blocks.
Abstract: Text that appears in a scene or is graphically added to video can provide an important supplemental source of index information as well as clues for decoding the video's structure and for classification. In this work, we present algorithms for detecting and tracking text in digital video. Our system implements a scale-space feature extractor that feeds an artificial neural processor to detect text blocks. Our text tracking scheme consists of two modules: a sum of squared difference (SSD) based module to find the initial position and a contour-based module to refine the position. Experiments conducted with a variety of video sources show that our scheme can detect and track text robustly.

635 citations

Journal ArticleDOI
TL;DR: A review of the OCR work done on Indian language scripts and the scope of future work and further steps needed for Indian script OCR development is presented.

592 citations