scispace - formally typeset
Search or ask a question

Showing papers by "Gaurav Harit published in 2001"


Proceedings ArticleDOI
10 Sep 2001
TL;DR: A new model-based document image segmentation scheme that uses XML-DTDs (eXtensible Markup Language Document Type Definitions) and makes use of this tool for identifying the logical components of a document image.
Abstract: This paper presents a new model-based document image segmentation scheme that uses XML-DTDs (eXtensible Markup Language Document Type Definitions). Given a document image, the algorithm has the ability to select the appropriate model. A new wavelet-based tool has been designed for distinguishing text from non-text regions and characterization of font sizes. Our model-based analysis scheme makes use of this tool for identifying the logical components of a document image.

16 citations


Journal ArticleDOI
TL;DR: A novel scheme for construction and delivery of legacy documents in Indian and other languages in the form of e-books with the facility of hyper-linking and indexing of various logical components in the document image is proposed.
Abstract: E-book refers to books in Electronic form. In this paper we provide a review of the technological issues involved in design and construction of e-books. We also address the issues involved in designing e-books in Indian languages. We have proposed a novel scheme for construction and delivery of legacy documents (in Indian and other languages) in the form of e-books with the facility of hyper-linking and indexing of various logical components in the document image.