Syntactic and Semantic Labeling of Hierarchically Organized Document Image Components of Indian Scripts

doi:10.1109/ICAPR.2009.88

Proceedings ArticleDOI

Syntactic and Semantic Labeling of Hierarchically Organized Document Image Components of Indian Scripts

Gaurav Harit, +2 more

- pp 314-317

Chats0

TLDR

A document image analysis system which performs segmentation, content characterization as well as semantic labeling of components, and has obtained promising results for semantic segmentation of over 30 categories of documents in Indian scripts.

Abstract:

In this paper we describe our document image analysis system which performs segmentation, content characterization as well as semantic labeling of components. Segmentation is done using white spaces and gives the segmented components arranged in a hierarchy. Semantic labeling is done using domain knowledge which is specified where possible in the form of a document model applicable to a class of documents. The novelty of the system lies in the suite of methods it employs which are capable of handling documents in Indian scripts. We have obtained promising results for semantic segmentation of over 30 categories of documents in Indian scripts.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Text graphic separation in Indian newspapers

Ritu Garg, +3 more

TL;DR: A novel framework for learning optimal parameters for text graphic separation in the presence of complex layouts of Indian newspaper is proposed.

...read moreread less

Proceedings ArticleDOI

An approach for printed document labeling

Chandranath Adak

TL;DR: A model which performs labeling of different components of a printed document image, i.e. identification of heading, subheading, caption, article and photo is proposed, which gives promising results on printed document of different scripts.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Document analysis system

Kwan Y. Wong, +2 more

- 01 Nov 1982 -

Ibm Journal of Research and Development

TL;DR: The requirements and components for a proposed Document Analysis System, which assists a user in encoding printed documents for computer processing, are outlined and several critical functions have been investigated and the technical approaches are discussed.

...read moreread less

Journal ArticleDOI

The document spectrum for page layout analysis

Lawrence O'Gorman

- 01 Nov 1993 -

IEEE Transactions on Pattern Analysis an...

TL;DR: The document spectrum (or docstrum) as discussed by the authors is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, which yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.

...read moreread less

Book

The document spectrum for page layout analysis

Lawrence O'Gorman

TL;DR: The document spectrum (or docstrum), which is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.

...read moreread less

Journal ArticleDOI

A prototype document image analysis system for technical journals

George Nagy, +2 more

- 01 Jul 1992 -

IEEE Computer

TL;DR: The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described, and the process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools.

...read moreread less

Journal ArticleDOI

Segmentation of Page Images Using the Area Voronoi Diagram

Koichi Kise, +2 more

- 01 Jun 1998 -

Computer Vision and Image Understanding

TL;DR: It is confirmed that the proposed method of page segmentation based on the approximated area Voronoi diagram is effective for extraction of body text regions, and it is as efficient as other methods based on connected component analysis.

...read moreread less

Syntactic and Semantic Labeling of Hierarchically Organized Document Image Components of Indian Scripts

Citations

Text graphic separation in Indian newspapers

An approach for printed document labeling

References

Document analysis system

The document spectrum for page layout analysis

The document spectrum for page layout analysis

A prototype document image analysis system for technical journals

Segmentation of Page Images Using the Area Voronoi Diagram

Related Papers (5)

A document classification and extraction system with learning ability

Syntactic image parsing using ontology and semantic descriptions

Semantic repository modeling in image database

Image indexing and retrieval using expressive fuzzy description logics

Tamil Document Summarization Using Semantic Graph Method