scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Segmentation of touching and fused Devanagari characters

01 Apr 2002-Pattern Recognition (Pergamon)-Vol. 35, Iss: 4, pp 875-893
TL;DR: A two pass algorithm for the segmentation and decomposition of Devanagari composite characters/symbols into their constituent symbols and a recognition rate has been achieved on the segmented conjuncts.
About: This article is published in Pattern Recognition.The article was published on 2002-04-01. It has received 143 citations till now. The article focuses on the topics: Devanagari.
Citations
More filters
Journal ArticleDOI
TL;DR: A review of the OCR work done on Indian language scripts and the scope of future work and further steps needed for Indian script OCR development is presented.

592 citations

Journal ArticleDOI
01 Nov 2011
TL;DR: In this paper, the state of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in various sections of the paper.
Abstract: In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. State of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in this paper. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari OCR.

159 citations


Cites background from "Segmentation of touching and fused ..."

  • ...The system described by Sinha and Mahabala [10] for printed Devanagari characters stores structural descriptions for each symbol of the script in terms of primitives and their relationships....

    [...]

  • ...A syntactic pattern analysis system for Devanagari script recognition is presented in Sinha’s Ph.D. thesis [9]....

    [...]

  • ...Bansal and Sinha [20] considered several statistical classifying features like horizontal zero crossings, moments, vertex points, and pixel density in different zones for Devanagari characters....

    [...]

  • ...Sinha [24] also demonstrated how the spatial association among the constituent symbols of Devanagari script plays an important role in understanding Devanagari words....

    [...]

  • ...Bansal and Sinha [18] presented a two-pass algorithm for the segmentation of machine-printed composite characters into their constituent symbols....

    [...]

Journal ArticleDOI
TL;DR: A review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India, and the various methodologies and their reported results are presented.
Abstract: The past few decades have witnessed an intensive research on optical character recognition (OCR) for Roman, Chinese, and Japanese scripts. A lot of work has been also reported on OCR efforts for various Indian scripts, like Devanagari, Bangla, Oriya, Tamil, Telugu, Malayalam, Kannada, Gurmukhi, Gujarati, etc. In this paper, we present a review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India. We have summarized most of the published papers on this topic and have also analysed the various methodologies and their reported results. Future directions of research in OCR for Indian scripts have been also given.

70 citations


Cites background from "Segmentation of touching and fused ..."

  • ...OCR systems have to segment the word into individual characters (Bansal & Sinha 2002; Chowdhury et al 2008; Ma & Doermann 2003; Pal & Datta 2003)....

    [...]

  • ...The comparison is done with respect to feature set, classifier, and reported accuracy rate....

    [...]

Journal ArticleDOI
TL;DR: This paper is the first survey that focuses on touched character segmentation and provides segmentation rates, descriptions of the test data for the approaches discussed, and the main trends in the field of touched character segmentsation.
Abstract: Character segmentation is a challenging problem in the field of optical character recognition. Presence of touched characters make this dilemma more crucial. The goal of this paper is to provide major concepts and progress in domain of off-line cursive touched character segmentation. Accordingly, two broad classes of technique are identified. These include methods that perform explicit or implicit character segmentation. The basic methods used by each class of technique are presented and the contributions of individual algorithms within each class are discussed. It is the first survey that focuses on touched character segmentation and provides segmentation rates, descriptions of the test data for the approaches discussed. Finally, the main trends in the field of touched character segmentation are examined, important contributions are presented and future directions are also suggested.

69 citations


Cites methods from "Segmentation of touching and fused ..."

  • ...Such problems are investigated using recognition-based segmentation in Bansal and Sinha (2002)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, robust algorithms for character segmentation and recognition are presented for multilingual Indian document images of Latin and Devanagari scripts, where primary segmentation paths are obtained using structural property of characters, whereas overlapped and joined characters are separated using graph distance theory.
Abstract: In this paper, robust algorithms for character segmentation and recognition are presented for multilingual Indian document images of Latin and Devanagari scripts. These documents generally suffer from their layout organizations, local skews, and low print quality and contain intermixed texts (machine-printed and handwritten). In the proposed character segmentation algorithm, primary segmentation paths are obtained using structural property of characters, whereas overlapped and joined characters are separated using graph distance theory. Finally, segmentation results are validated using highly accurate support vector machine classifier. For the proposed character recognition algorithm, three new geometrical shape-based features are computed. First and second features are formed with respect to the center pixel of character, whereas neighborhood information of text pixels is used for the calculation of third feature. For recognizing the input character, $k$ -Nearest Neighbor classifier is used, as it has intrinsically zero training time. Comprehensive experiments are carried out on different databases containing printed as well as handwritten texts. Benchmarking results illustrate that proposed algorithms have better performances compared to other contemporary approaches, where highest segmentation and recognition rates of 98.86% and 99.84%, respectively, are obtained.

69 citations


Cites methods from "Segmentation of touching and fused ..."

  • ...Bansal and Sinha [9] worked on Devanagari characters segmentation using projections and statistical dimensional information of the characters....

    [...]

  • ...[9] V. Bansal and R. M. K. Sinha, ‘‘Segmentation of touching and fused Devanagari characters,’’ Pattern Recognit., vol. 35, no. 4, pp. 875–893, Apr. 2002....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: H holistic approaches that avoid segmentation by recognizing entire character strings as units are described, including methods that partition the input image into subimages, which are then classified.
Abstract: Character segmentation has long been a critical area of the OCR process. The higher recognition rates for isolated characters vs. those obtained for words and connected character strings well illustrate this fact. A good part of recent progress in reading unconstrained printed and written text may be ascribed to more insightful handling of segmentation. This paper provides a review of these advances. The aim is to provide an appreciation for the range of techniques that have been developed, rather than to simply list sources. Segmentation methods are listed under four main headings. What may be termed the "classical" approach consists of methods that partition the input image into subimages, which are then classified. The operation of attempting to decompose the image into classifiable units is called "dissection." The second class of methods avoids dissection, and segments the image either explicitly, by classification of prespecified windows, or implicitly by classification of subsets of spatial features collected from the image as a whole. The third strategy is a hybrid of the first two, employing dissection together with recombination rules to define potential segments, but using classification to select from the range of admissible segmentation possibilities offered by these subimages. Finally, holistic approaches that avoid segmentation by recognizing entire character strings as units are described.

880 citations

Journal ArticleDOI
TL;DR: A complete Optical Character Recognition (OCR) system for printed Bangla, the fourth most popular script in the world, is presented and extension of the work to Devnagari, the third most popular Script in the World, is discussed.

381 citations

Journal ArticleDOI
TL;DR: The current state of a system that recognizes printed text of various fonts and sizes for the Roman alphabet is described, which combines several techniques in order to improve the overall recognition rate.
Abstract: We describe the current state of a system that recognizes printed text of various fonts and sizes for the Roman alphabet. The system combines several techniques in order to improve the overall recognition rate. Thinning and shape extraction are performed directly on a graph of the run-length encoding of a binary image. The resulting strokes and other shapes are mapped, using a shape-clustering approach, into binary features which are then fed into a statistical Bayesian classifier. Large-scale trials have shown better than 97 percent top choice correct performance on mixtures of six dissimilar fonts, and over 99 percent on most single fonts, over a range of point sizes. Certain remaining confusion classes are disambiguated through contour analysis, and characters suspected of being merged are broken and reclassified. Finally, layout and linguistic context are applied. The results are illustrated by sample pages.

381 citations

Proceedings ArticleDOI
18 Aug 1997
TL;DR: An OCR system is proposed that can read two Indian language scripts: Bangla and Devnagari (Hindi), the most popular ones in the Indian subcontinent, and shows a good performance for single font scripts printed on clear documents.
Abstract: An OCR system is proposed that can read two Indian language scripts: Bangla and Devnagari (Hindi), the most popular ones in the Indian subcontinent. These scripts, having the same origin in ancient Brahmi script, have many features in common and hence a single system can be modeled to recognize them. In the proposed model, document digitization, skew detection, text line segmentation and zone separation, word and character segmentation, character grouping into basic, modifier and compound character category are done for both scripts by the same set of algorithms. The feature sets and classification tree as well as the knowledge base required for error correction (such as lexicon) differ for Bangla and Devnagari. The system shows a good performance for single font scripts printed on clear documents.

198 citations


"Segmentation of touching and fused ..." refers background in this paper

  • ...hary et al. (12; 13; 14 ) , no attempt has been made so far in isolating the touching and fused...

    [...]

Book
01 Nov 1992
TL;DR: This is the first book to offer a broad selection of state-of-the-art research papers, including authoritative critical surveys of the literature, and parallel studies of the architecture of complete high-performance printed-document reading systems.
Abstract: Document image analysis is the automatic computer interpretation of images of printed and handwritten documents, including text, drawings, maps, music scores, etc. Research in this field supports a rapidly growing international industry. This is the first book to offer a broad selection of state-of-the-art research papers, including authoritative critical surveys of the literature, and parallel studies of the architectureof complete high-performance printed-document reading systems. A unique feature is the extended section on music notation, an ideal vehicle for international sharing of basic research. Also, the collection includes important new work on line drawings, handwriting, character and symbol recognition, and basic methodological issues. The IAPR 1990 Workshop on Syntactic and Structural Pattern Recognition is summarized,including the reports of its expert working groups, whose debates provide a fascinating perspective on the field. The book is an excellent text for a first-year graduate seminar in document image analysis,and is likely to remain a standard reference in the field for years.

185 citations