Showing papers on "Document layout analysis published in 1988"

PDF

Open Access

Journal Article•DOI•

High level document analysis guided by geometric aspects

[...]

Andreas Dengel¹, Gerhard Barth¹•Institutions (1)

01 Dec 1988-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: This article proposes an approach to identify the layout of a document page by dividing it recursively into nested rectangular areas and uses it as a basis for a document layout model, which is able to control an automatic interpretation mechanism for deriving a high level representation of the contents of a documents.

...read moreread less

Abstract: The realization of the paper-free office seems to be difficult that expected. Therefore, good paper-computer interfaces are necessary to transform paper documents into an electronic form, which allows the use of a filing and retrieval system. An electronic document page is an optically scanned and digitized representation of a printed page. Document analysis is the problem of interpreting and labeling the constitutents of the document. Although there are very reliable optical character recognition (OCR) methods, the process could be very inefficient. To prune the search space and to become more efficient, some search supporting methods have to be developed. This article proposes an approach to identify the layout of a document page by dividing it recursively into nested rectangular areas. The procedure is used as a basis for a document layout model, which is able to control an automatic interpretation mechanism for deriving a high level representation of the contents of a document. We have implemented our method in Common Lisp on a Symbolies 3640 Workstation and have run it for a large population of office documents. The results obtained have been very encouraging and have convincingly confirmed the soundness of our approach.

...read moreread less

43 citations

Journal Article•DOI•

Knowledge based document classification supporting integrated document handling

[...]

Helmut Eirund, Klaus Kreplin

01 Apr 1988-ACM Sigois Bulletin

TL;DR: An experimental office system currently being developed at Olivetti research integrates two major requirements of office work: content based document retrieval and mail distribution that closes the gap between electronic document entry systems and processing of (semi-) structured document content.

...read moreread less

Abstract: An experimental office system currently being developed at Olivetti research integrates two major requirements of office work: content based document retrieval and mail distribution In this system documents are described and classified by their semantic structure that provides access to abstract concepts contained in the document The derivation of the semantic structure of a document supports both an efficient retrieval by content and an intelligent mail filtering through document semantics A knowledge based classification system automatically generates the conceptual description of a document to be inserted into the system by means of content analysis, and associates the document to an appropriate predefined type The classification system closes the gap between electronic document entry systems and processing of (semi-) structured document content

...read moreread less

34 citations

Proceedings Article•DOI•

Document image analysis for generating syntactic structure description

[...]

Y. Tsuji¹•Institutions (1)

NEC¹

14 Nov 1988

TL;DR: Experimental results showed that this proposed method can be appropriately used to automatically describe an input image as a layout structure, and both the elements and their relations in the generated tree were finally determined by the bottom-up strategy, based on the general document layout property.

...read moreread less

Abstract: A document image analysis is described which automatically converts an input image into a syntactic document tree structure, while simultaneously representing the elements and their relative relations. Top-down image segmentation, using projection profiles, was greatly improved by systematically using a feedback process. As a result, the tree structure, including the blocks and their relative relations, was generated. Both the elements and their relations in the generated tree were finally determined by the bottom-up strategy, based on the general document layout property. Experimental results showed that this proposed method can be appropriately used to automatically describe an input image as a layout structure. >

...read moreread less

19 citations

Proceedings Article•DOI•

Model Based Segmentation And Hypothesis Generation For The Recognition Of Printed Documents

[...]

Andreas Dengel¹, Achim Luhn², Birgit Ueberreiter²•Institutions (2)

University of Stuttgart¹, Siemens²

11 Apr 1988

TL;DR: The concept of model driven segmentation allows quick focussing of the analysis on important regions of a document without necessarily requiring CPU-intensive preprocessing steps for the whole document.

...read moreread less

Abstract: The task of document recognition requires the scanning of a paper document and the analysis of its content and structure. The resulting electronic representation has to capture the content as well as the logic and layout structure of the document. The first step in the recognition process is scanning, filtering and binarization of the paper document. Based on the preprocessing results we delineate key areas like address or signature for a letter, or the abstract for a report. This segmentation procedure uses a specific document layout model. The validity of this segmentation can be verified in a second step by using the results of more time-consuming procedures like text/graphic classification, optical character recognition (OCR) and the comparison with more elaborate models for specific document parts. Thus our concept of model driven segmentation allows quick focussing of the analysis on important regions. The segmentation is able to operate directly on the raster image of a document without necessarily requiring CPU-intensive preprocessing steps for the whole document. A test version for the analysis of simple business letters has been implemented.

...read moreread less

9 citations