Showing papers by "Umapada Pal published in 2004"

PDF

Open Access

Journal Article•DOI•

Indian script character recognition: a survey

[...]

Umapada Pal¹, Bidyut B. Chaudhuri¹•Institutions (1)

01 Sep 2004-Pattern Recognition

TL;DR: A review of the OCR work done on Indian language scripts and the scope of future work and further steps needed for Indian script OCR development is presented.

...read moreread less

592 citations

Journal Article•DOI•

Multioriented and curved text lines extraction from Indian documents

[...]

Umapada Pal¹, Partha Pratim Roy²•Institutions (2)

Indian Statistical Institute¹, Tata Consultancy Services²

01 Aug 2004

TL;DR: A novel scheme, mainly based on the concept of water reservoir analogy, to extract individual text lines from printed Indian documents containing multioriented and/or curve text lines is proposed.

...read moreread less

Abstract: There are printed artistic documents where text lines of a single page may not be parallel to each other. These text lines may have different orientations or the text lines may be curved shapes. For the optical character recognition (OCR) of these documents, we need to extract such lines properly. In this paper, we propose a novel scheme, mainly based on the concept of water reservoir analogy, to extract individual text lines from printed Indian documents containing multioriented and/or curve text lines. A reservoir is a metaphor to illustrate the cavity region of a character where water can be stored. In the proposed scheme, at first, connected components are labeled and identified either as isolated or touching. Next, each touching component is classified either straight type (S-type) or curve type (C-type), depending on the reservoir base-area and envelope points of the component. Based on the type (S-type or C-type) of a component two candidate points are computed from each touching component. Finally, candidate regions (neighborhoods of the candidate points) of the candidate points of each component are detected and after analyzing these candidate regions, components are grouped to get individual text lines.

...read moreread less

83 citations

Proceedings Article•DOI•

Handwriting segmentation of unconstrained Oriya text

[...]

N. Tripathy¹, Umapada Pal¹•Institutions (1)

Indian Statistical Institute¹

26 Oct 2004

TL;DR: A water reservoir- concept based scheme is proposed for the segmentation of unconstrained Oriya handwritten text into individual characters, which combines structural, topological and water-reservoir-concept based features touching characters of the word.

...read moreread less

Abstract: Segmentation of handwritten text into lines, words and characters is one of the important steps in the handwritten recognition system. For the segmentation of unconstrained Oriya handwritten text into individual characters, a water reservoir-concept based scheme is proposed in this paper. Here, at first, the text image is segmented into lines, and then lines are segmented into individual words, and words are segmented into individual characters. For line segmentation the document is divided into vertical stripes. Analyzing the heights of the water reservoirs obtained from different components of the document, the width of a stripe is calculated. Stripe-wise horizontal histograms are then computed and the relationship of the peak-valley points of the histograms is used for line segment. Based on vertical projection profile and structural features of Oriya characters, text lines are segmented into words. For character segmentation, at first, isolated and connected (touching) characters in a word are detected. Using structural, topological and water-reservoir-concept based features touching characters of the word are then segmented.

...read moreread less

74 citations

Proceedings Article•DOI•

A system towards Indian postal automation

[...]

Kaushik Roy¹, Szilárd Vajda, Umapada Pal¹, Bidyut B. Chaudhuri¹•Institutions (1)

Indian Statistical Institute¹

26 Oct 2004

TL;DR: A two-stage MLP based classifier is employed to recognise Bangla and Arabic numerals for the sorting of postal documents written in Arabic and a local language Bangla for postal automation in India.

...read moreread less

Abstract: In this paper, we present a system towards Indian postal automation. In the proposed system, at first, using run length smoothing algorithm (RLSA), we decompose the image into blocks. Based on the black pixel density and number of components inside a block, non-text block (postal stamp, postal seal etc.) are detected. Using positional information, the destination address block (DAB) is identified from text block. Next, pin-code box from the DAB is detected and numerals from the pin-code box are extracted. Since India is a multi-lingual and multi-script country, the address part may be written by combination of two languages: Arabic and a local language. For the sorting of postal documents written in Arabic and a local language Bangla, a two-stage MLP based classifier is employed to recognise Bangla and Arabic numerals. At present, the accuracy of the handwritten numeral recognition module is 92.10%.

...read moreread less

59 citations

Proceedings Article•DOI•

A system for word-wise handwritten script identification for Indian postal automation

[...]

Kaushik Roy¹, Ansuman Banerjee¹, Umapada Pal¹•Institutions (1)

Indian Statistical Institute¹

20 Dec 2004

TL;DR: In the proposed scheme at first document skew is detected and corrected, non-text parts are then segmented from the document using run length smoothing algorithm (RLSA), and a tree classifier is generated for word-wise Bangla/Devnagari and English scripts identification.

...read moreread less

Abstract: Postal automation is a topic of research over the last few years. There are many works towards the postal automation in USA, UK, Japan and Australia, but for Indian postal automation there is no significant work. This paper deals with word-wise handwritten script identification for Indian postal automation. In the proposed scheme at first document skew is detected and corrected. Non-text parts are then segmented from the document using run length smoothing algorithm (RLSA). Next, using a piece-wise projection method the destination address block (DAB) is at first segmented into lines and then links into words. Using water reservoir concept we compute the busy-zone of the word. Finally, using matra/Shirorekha, water reservoir concept based feature, etc. a tree classifier is generated for word-wise Bangla/Devnagari and English scripts identification.

...read moreread less

41 citations

Book Chapter•DOI•

Word–Wise Script Identification from Indian Documents

[...]

Suranjit Sinha, Umapada Pal, Bidyut B. Chaudhuri

08 Sep 2004

TL;DR: A robust technique is proposed to extract word-wise script identification from Indian doublet form documents using different topological and structural features to separate different script words from such documents.

...read moreread less

Abstract: In a country like India, a single text line of most of the official documents contains two different script words. Under two-language formula, the Indian documents are written in English and the state official language. For Optical Character Recognition (OCR) of such a document page, it is necessary to separate different script words before feeding them to the OCRs of individual scripts. In this paper a robust technique is proposed to extract word-wise script identification from Indian doublet form documents. Here, at first, the document is segmented into lines and then the lines are segmented into words. Using different topological and structural features (like number of loops, headline feature, water reservoir concept based features, profile features, etc.) individual script words are identified from the documents. The proposed scheme is tested on 24210 words of different doublets and we received more than 97% accuracy, on average.

...read moreread less

24 citations

Proceedings Article•

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation.

[...]

Kaushik Roy¹, Umapada Pal, Bidyut B. Chaudhuri•Institutions (1)

Indian Statistical Institute¹

01 Jan 2004

TL;DR: A system towards recognition of Bangla pincode numerals for Indian postal automation by combining Neural Network and tree classifier based approach, with overall accuracy at present 94.21%.

...read moreread less

Abstract: In this paper, we present a system towards recognition of Bangla pincode numerals for Indian postal automation In the proposed system, at first, using structural features the broken numerals are joined Next combining Neural Network (NN) and tree classifier based approach the numerals are recognized Considering similar shaped numerals at first, NN classifies the 10 numerals into six groups Next tree classifier is used for final recognition The features used for the NN based recognition are the number and position of end points, junction points, position of the centre of gravity, and distance between the centre of the bounding box and the centre of gravity etc of a numeral Different features used for tree classifier are based on water reservoir concept, structural features, and topological features Overall accuracy of the proposed system is at present 9421%

...read moreread less

14 citations

Proceedings Article•

Recognition of Unconstrained Malayalam Handwritten Numeral.

[...]

Umapada Pal, Sayani Kundu, Y. Ali, H. Islam, N. Tripathy - Show less +1 more

01 Jan 2004

TL;DR: A recognition scheme for isolated off-line unconstrained Malayalam handwritten numeral is proposed here, based on water-reservoir concept, which considers the morphological pattern of the numeral.

...read moreread less

Abstract: Main problem in handwritten recognition is the huge variability and distortion of patterns. To take care of writing variability of different individuals, a recognition scheme for isolated off-line unconstrained Malayalam handwritten numeral is proposed here. Main features used in the scheme are based on water-reservoir concept. A reservoir is a metaphor to illustrate the cavity region of the numeral where water can store if water is poured from a side of the numeral. The important reservoir based features used in the scheme are: (i) number of reservoirs (ii) positions of reservoirs with respect to bounding box of the touching pattern (iii) height and width of the reservoirs (iv) water flow direction, etc. Topological and structural features are also used for the recognition along with water reservoir concept based features. Close loop features (number of close loop, position of loops with respect to the bounding box of the component) are the main topological features used here. In the structural feature we consider the morphological pattern of the numeral. At present we obtained 96.34% overall recognition accuracy.

...read moreread less

3 citations