Segmentation of touching and fused Devanagari characters

doi:10.1016/S0031-3203(01)00081-4

Home
/
Papers
/
Segmentation of touching and fused Devanagari characters

Journal Article•DOI•

Segmentation of touching and fused Devanagari characters

Veena Bansal¹, R.M.K. Sinha¹•Institutions (1)

Indian Institute of Technology Kanpur¹

01 Apr 2002-Pattern Recognition (Pergamon)-Vol. 35, Iss: 4, pp 875-893

TL;DR: A two pass algorithm for the segmentation and decomposition of Devanagari composite characters/symbols into their constituent symbols and a recognition rate has been achieved on the segmented conjuncts.

read less

About: This article is published in Pattern Recognition.The article was published on 2002-04-01. It has received 143 citations till now. The article focuses on the topics: Devanagari.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Indian script character recognition: a survey

[...]

Umapada Pal¹, Bidyut B. Chaudhuri¹•Institutions (1)

Indian Statistical Institute¹

01 Sep 2004-Pattern Recognition

TL;DR: A review of the OCR work done on Indian language scripts and the scope of future work and further steps needed for Indian script OCR development is presented.

...read moreread less

592 citations

Journal Article•DOI•

Offline Recognition of Devanagari Script: A Survey

[...]

R. Jayadevan¹, Satish R. Kolhe, Pradeep M. Patil², Umapada Pal•Institutions (2)

Pune Institute of Computer Technology¹, Vishwakarma Institute of Technology²

01 Nov 2011

TL;DR: In this paper, the state of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in various sections of the paper.

...read moreread less

Abstract: In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of printed as well as handwritten Devanagari text in the past few years. State of the art from 1970s of machine printed and handwritten Devanagari optical character recognition (OCR) is discussed in this paper. All feature-extraction techniques as well as training, classification and matching techniques useful for the recognition are discussed in various sections of the paper. An attempt is made to address the most important results reported so far and it is also tried to highlight the beneficial directions of the research till date. Moreover, the paper also contains a comprehensive bibliography of many selected papers appeared in reputed journals and conference proceedings as an aid for the researchers working in the field of Devanagari OCR.

...read moreread less

159 citations

Cites background from "Segmentation of touching and fused ..."

...The system described by Sinha and Mahabala [10] for printed Devanagari characters stores structural descriptions for each symbol of the script in terms of primitives and their relationships....
[...]
...A syntactic pattern analysis system for Devanagari script recognition is presented in Sinha’s Ph.D. thesis [9]....
[...]
...Bansal and Sinha [20] considered several statistical classifying features like horizontal zero crossings, moments, vertex points, and pixel density in different zones for Devanagari characters....
[...]
...Sinha [24] also demonstrated how the spatial association among the constituent symbols of Devanagari script plays an important role in understanding Devanagari words....
[...]
...Bansal and Sinha [18] presented a two-pass algorithm for the segmentation of machine-printed composite characters into their constituent symbols....
[...]

Journal Article•DOI•

A survey on optical character recognition for Bangla and Devanagari scripts

[...]

Soumen Bag¹, Gaurav Harit²•Institutions (2)

Indian Institute of Technology Kharagpur¹, Indian Institutes of Technology²

02 Apr 2013-Sadhana-academy Proceedings in Engineering Sciences

TL;DR: A review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India, and the various methodologies and their reported results are presented.

...read moreread less

Abstract: The past few decades have witnessed an intensive research on optical character recognition (OCR) for Roman, Chinese, and Japanese scripts. A lot of work has been also reported on OCR efforts for various Indian scripts, like Devanagari, Bangla, Oriya, Tamil, Telugu, Malayalam, Kannada, Gurmukhi, Gujarati, etc. In this paper, we present a review of OCR work on Indian scripts, mainly on Bangla and Devanagari—the two most popular scripts in India. We have summarized most of the published papers on this topic and have also analysed the various methodologies and their reported results. Future directions of research in OCR for Indian scripts have been also given.

...read moreread less

70 citations

Cites background from "Segmentation of touching and fused ..."

...OCR systems have to segment the word into individual characters (Bansal & Sinha 2002; Chowdhury et al 2008; Ma & Doermann 2003; Pal & Datta 2003)....
[...]
...The comparison is done with respect to feature set, classifier, and reported accuracy rate....
[...]

Journal Article•DOI•

Methods and strategies on off-line cursive touched characters segmentation: a directional review

[...]

Tanzila Saba¹, Amjad Rehman², Mohamed Elarbi-Boudihir²•Institutions (2)

Universiti Teknologi Malaysia¹, Islamic University²

01 Dec 2014-Artificial Intelligence Review

TL;DR: This paper is the first survey that focuses on touched character segmentation and provides segmentation rates, descriptions of the test data for the approaches discussed, and the main trends in the field of touched character segmentsation.

...read moreread less

Abstract: Character segmentation is a challenging problem in the field of optical character recognition. Presence of touched characters make this dilemma more crucial. The goal of this paper is to provide major concepts and progress in domain of off-line cursive touched character segmentation. Accordingly, two broad classes of technique are identified. These include methods that perform explicit or implicit character segmentation. The basic methods used by each class of technique are presented and the contributions of individual algorithms within each class are discussed. It is the first survey that focuses on touched character segmentation and provides segmentation rates, descriptions of the test data for the approaches discussed. Finally, the main trends in the field of touched character segmentation are examined, important contributions are presented and future directions are also suggested.

...read moreread less

69 citations

Cites methods from "Segmentation of touching and fused ..."

...Such problems are investigated using recognition-based segmentation in Bansal and Sinha (2002)....
[...]

Journal Article•DOI•

Multilingual Character Segmentation and Recognition Schemes for Indian Document Images

[...]

Parul Sahare¹, Sanjay B. Dhok¹•Institutions (1)

Visvesvaraya National Institute of Technology¹

18 Jan 2018-IEEE Access

TL;DR: In this paper, robust algorithms for character segmentation and recognition are presented for multilingual Indian document images of Latin and Devanagari scripts, where primary segmentation paths are obtained using structural property of characters, whereas overlapped and joined characters are separated using graph distance theory.

...read moreread less

Abstract: In this paper, robust algorithms for character segmentation and recognition are presented for multilingual Indian document images of Latin and Devanagari scripts. These documents generally suffer from their layout organizations, local skews, and low print quality and contain intermixed texts (machine-printed and handwritten). In the proposed character segmentation algorithm, primary segmentation paths are obtained using structural property of characters, whereas overlapped and joined characters are separated using graph distance theory. Finally, segmentation results are validated using highly accurate support vector machine classifier. For the proposed character recognition algorithm, three new geometrical shape-based features are computed. First and second features are formed with respect to the center pixel of character, whereas neighborhood information of text pixels is used for the calculation of third feature. For recognizing the input character, $k$ -Nearest Neighbor classifier is used, as it has intrinsically zero training time. Comprehensive experiments are carried out on different databases containing printed as well as handwritten texts. Benchmarking results illustrate that proposed algorithms have better performances compared to other contemporary approaches, where highest segmentation and recognition rates of 98.86% and 99.84%, respectively, are obtained.

...read moreread less

69 citations

Cites methods from "Segmentation of touching and fused ..."

...Bansal and Sinha [9] worked on Devanagari characters segmentation using projections and statistical dimensional information of the characters....
[...]
...[9] V. Bansal and R. M. K. Sinha, ‘‘Segmentation of touching and fused Devanagari characters,’’ Pattern Recognit., vol. 35, no. 4, pp. 875–893, Apr. 2002....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

A survey of methods and strategies in character segmentation

[...]

R.G. Casey¹, Eric Lecolinet²•Institutions (2)

IBM¹, École Normale Supérieure²

01 Jul 1996-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: H holistic approaches that avoid segmentation by recognizing entire character strings as units are described, including methods that partition the input image into subimages, which are then classified.

...read moreread less

Abstract: Character segmentation has long been a critical area of the OCR process. The higher recognition rates for isolated characters vs. those obtained for words and connected character strings well illustrate this fact. A good part of recent progress in reading unconstrained printed and written text may be ascribed to more insightful handling of segmentation. This paper provides a review of these advances. The aim is to provide an appreciation for the range of techniques that have been developed, rather than to simply list sources. Segmentation methods are listed under four main headings. What may be termed the "classical" approach consists of methods that partition the input image into subimages, which are then classified. The operation of attempting to decompose the image into classifiable units is called "dissection." The second class of methods avoids dissection, and segments the image either explicitly, by classification of prespecified windows, or implicitly by classification of subsets of spatial features collected from the image as a whole. The third strategy is a hybrid of the first two, employing dissection together with recombination rules to define potential segments, but using classification to select from the range of admissible segmentation possibilities offered by these subimages. Finally, holistic approaches that avoid segmentation by recognizing entire character strings as units are described.

...read moreread less

880 citations

Journal Article•DOI•

A complete printed Bangla OCR system

[...]

Bidyut B. Chaudhuri¹, Umapada Pal¹•Institutions (1)

Indian Statistical Institute¹

01 Mar 1998-Pattern Recognition

TL;DR: A complete Optical Character Recognition (OCR) system for printed Bangla, the fourth most popular script in the world, is presented and extension of the work to Devnagari, the third most popular Script in the World, is discussed.

...read moreread less

381 citations

Journal Article•DOI•

On the Recognition of Printed Characters of Any Font and Size

[...]

Simon Kahan¹, Theo Pavlidis², Henry S. Baird³•Institutions (3)

University of Washington¹, Stony Brook University², Bell Labs³

01 Feb 1987-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The current state of a system that recognizes printed text of various fonts and sizes for the Roman alphabet is described, which combines several techniques in order to improve the overall recognition rate.

...read moreread less

Abstract: We describe the current state of a system that recognizes printed text of various fonts and sizes for the Roman alphabet. The system combines several techniques in order to improve the overall recognition rate. Thinning and shape extraction are performed directly on a graph of the run-length encoding of a binary image. The resulting strokes and other shapes are mapped, using a shape-clustering approach, into binary features which are then fed into a statistical Bayesian classifier. Large-scale trials have shown better than 97 percent top choice correct performance on mixtures of six dissimilar fonts, and over 99 percent on most single fonts, over a range of point sizes. Certain remaining confusion classes are disambiguated through contour analysis, and characters suspected of being merged are broken and reclassified. Finally, layout and linguistic context are applied. The results are illustrated by sample pages.

...read moreread less

381 citations

Proceedings Article•DOI•

An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi)

[...]

Bidyut B. Chaudhuri, Umapada Pal¹•Institutions (1)

Indian Statistical Institute¹

18 Aug 1997

TL;DR: An OCR system is proposed that can read two Indian language scripts: Bangla and Devnagari (Hindi), the most popular ones in the Indian subcontinent, and shows a good performance for single font scripts printed on clear documents.

...read moreread less

Abstract: An OCR system is proposed that can read two Indian language scripts: Bangla and Devnagari (Hindi), the most popular ones in the Indian subcontinent. These scripts, having the same origin in ancient Brahmi script, have many features in common and hence a single system can be modeled to recognize them. In the proposed model, document digitization, skew detection, text line segmentation and zone separation, word and character segmentation, character grouping into basic, modifier and compound character category are done for both scripts by the same set of algorithms. The feature sets and classification tree as well as the knowledge base required for error correction (such as lexicon) differ for Bangla and Devnagari. The system shows a good performance for single font scripts printed on clear documents.

...read moreread less

198 citations

"Segmentation of touching and fused ..." refers background in this paper

...hary et al. (12; 13; 14 ) , no attempt has been made so far in isolating the touching and fused...
[...]

Book•

Structured Document Image Analysis

[...]

Henry S. Baird, Horst Bunke, Kazuhiko Yamamoto

01 Nov 1992

TL;DR: This is the first book to offer a broad selection of state-of-the-art research papers, including authoritative critical surveys of the literature, and parallel studies of the architecture of complete high-performance printed-document reading systems.

...read moreread less

Abstract: Document image analysis is the automatic computer interpretation of images of printed and handwritten documents, including text, drawings, maps, music scores, etc. Research in this field supports a rapidly growing international industry. This is the first book to offer a broad selection of state-of-the-art research papers, including authoritative critical surveys of the literature, and parallel studies of the architectureof complete high-performance printed-document reading systems. A unique feature is the extended section on music notation, an ideal vehicle for international sharing of basic research. Also, the collection includes important new work on line drawings, handwriting, character and symbol recognition, and basic methodological issues. The IAPR 1990 Workshop on Syntactic and Structural Pattern Recognition is summarized,including the reports of its expert working groups, whose debates provide a fascinating perspective on the field. The book is an excellent text for a first-year graduate seminar in document image analysis,and is likely to remain a standard reference in the field for years.

...read moreread less

185 citations