Showing papers on "Document layout analysis published in 1990"

PDF

Open Access

Journal Article•DOI•

Automated entry system for printed documents

[...]

T. Akiyama, Norihiro Hagita¹•Institutions (1)

01 Oct 1990-Pattern Recognition

TL;DR: Recognition experiments with a prototype system for a variety of complex printed documents shows that the proposed system is capable of reading different types of printed documents at an accuracy rate of 94.8–97.2%.

...read moreread less

258 citations

Patent•

Document storage and retrieval system

[...]

Hiromichi Fujisawa¹, Atsushi Hatakeyama¹, Yasuaki Nakano¹, Junichi Higashino¹, Toshihiro Hananoi¹ - Show less +1 more•Institutions (1)

Hitachi¹

30 Jul 1990

TL;DR: A document storage and retrieval system stores a document body in the form of an image, storing text information in a form of a character code string for retrieval, and executing a retrieval with reference to the text information, followed by displaying a document image relating thereto on a retrieval terminal according to the retrieval result as mentioned in this paper.

...read moreread less

Abstract: A document storage and retrieval system stores a document body in the form of an image, storing text information in the form of a character code string for retrieval, and executing a retrieval with reference to the text information, followed by displaying a document image relating thereto on a retrieval terminal according to the retrieval result. Such a form of the system is available for retrieving the full contents of a document and also for displaying the document body printed in a format easy to read straight in the form of an image.

...read moreread less

160 citations

Proceedings Article•DOI•

Understanding multi-articled documents

[...]

S. Tsujimoto¹, H. Asada¹•Institutions (1)

Toshiba¹

16 Jun 1990

TL;DR: Experimental results on a variety of document formats have shown that the proposed method is applicable to most of the documents commonly encountered in daily use, although there is still room for further refinement of the transformation rules.

...read moreread less

Abstract: A document understanding method based on the tree representation of document structures is proposed. It is shown that documents have an obvious hierarchical structure in their geometry which is represented by a tree. A small number of rules are introduced to transform the geometric structure into the logical structure which represents the semantics. The virtual field separator technique is employed to utilize the information carried by special constituents of documents such as field separators and frames, keeping the number of transformation rules small. Experimental results on a variety of document formats have shown that the proposed method is applicable to most of the documents commonly encountered in daily use, although there is still room for further refinement of the transformation rules. >

...read moreread less

122 citations

Proceedings Article•DOI•

Using constraints to achieve stability in automatic graph layout algorithms

[...]

Karl F. Böhringer¹, Frances Paulisch²•Institutions (2)

Cornell University¹, Karlsruhe Institute of Technology²

01 Mar 1990

TL;DR: This paper shows how user-specified layout constraints may be easily added to many automatic graph layout algorithms and allows a continuum between manual and automatic layout by allowing the user to specify how stable the graph's layout should be.

...read moreread less

Abstract: Automatic layout algorithms are commonly used when displaying graphs on the screen because they provide a “nice” drawing of the graph without user intervention. There are, however, a couple of disadvantages to automatic layout. Without user intervention, an automatic layout algorithm is only capable of producing an aesthetically pleasing drawing of the graph. User- or application-specified layout constraints (often concerning the semantics of a graph) are difficult or impossible to specify. A second problem is that automatic layout algorithms seldom make use of information in the current layout when calculating the new layout. This can also be frustrating to the user because whenever a new layout is done, the user's orientation in the graph is lost.This paper suggests using layout constraints to solve both of these problems. We show how user-specified layout constraints may be easily added to many automatic graph layout algorithms. Additionally, the constraints specified by the current layout are used when calculating the new layout to achieve a more stable layout. This approach allows a continuum between manual and automatic layout by allowing the user to specify how stable the graph's layout should be.

...read moreread less

92 citations

Proceedings Article•DOI•

An experimental page layout recognition system for office document automatic classification: an integrated approach for inductive generalization

[...]

Floriana Esposito, Donato Malerba, Giovanni Semeraro, E. Annese, G. Scafuro - Show less +1 more

16 Jun 1990

TL;DR: A novel approach to automatic classification of digitized office documents based on the inductive generalization of their layout style, is presented, supported by the observation that for a number of printed documents it is possible to find a set of relevant and invariant layout features.

...read moreread less

Abstract: A novel approach to automatic classification of digitized office documents based on the inductive generalization of their layout style, is presented. It is supported by the observation that for a number of printed documents it is possible to find a set of relevant and invariant layout features. These are geometrical characteristics automatically detected through a segmentation and layout analysis process. The learning step, in which significant examples of document classes are used to train the classification system, involves the novel idea of integrating parametric (numerical) and conceptual (symbolic) learning methods. >

...read moreread less

75 citations

Patent•

System for automatically processing a document including text and associated image information

[...]

Isamu Iwai¹, Miwako Doi¹, Mika Fukui¹•Institutions (1)

Toshiba¹

29 Mar 1990

TL;DR: A document processing system includes an input section, a memory section, text analyzing section, image identifying section, an image size identifying section and a layout processing section, and an output section as discussed by the authors.

...read moreread less

Abstract: A document processing system includes an input section, a memory section, a text analyzing section, an image identifying section, an image size identifying section, a layout processing section, and an output section. Document data is constituted by text data and image data. The test data includes key information corresponding to the image data, and the image data is laid out in the document data. The text data and image data input through the input section are stored in the memory section. The text analyzing section identifies a position in the document data at which the image data is to be laid out, based on a position of key information in the text data. The image identifying section identifies image data corresponding to the key information. The image size identifying section identifies an image size of the image data identified by the image identifying section. The layout processing section lays out the identified image data at the identified image layout position in accordance with a predetermined layout rule.

...read moreread less

41 citations

Proceedings Article•

Techniques for Line Drawing Interpretation: An Overview

[...]

Rangachar Kasturi, Senthil Siva, Lawrence O'Gorman

01 Jan 1990

TL;DR: An overview of techniques for document image analysis can be found in this article, with an emphasis on those for grnphics recognition and interpretation, which is derived from the fields of image processing pattern recognition, and machine vision.

...read moreread less

Abstract: An overview is presented of algorithms and techniques for document image analysis with an emphasis on those for grnphics recognition and interpretation The techniques are derived from the fields of image processing pattern recognition, and machine vision The objective in document image analysis is to recognize page contents including layout, text, and figures Although optical character recognition (OCR) fds within the context of document image analysis we do not cover this area since OCR techniques have been covered extensively in the literature We also limit the focus to images containing binary information Topics covered are segmentation of document image into text and graphics regions, vectorization to obtain lines, identification of graphical primitives, and generation of succinct image interpretations

...read moreread less

29 citations

Proceedings Article•DOI•

Segmentation of document images

[...]

Philip J. Bones¹, Todd C. Griffin¹, Chris Carey-Smith¹•Institutions (1)

University of Canterbury¹

01 Aug 1990

TL;DR: Applications forseen for the image segmentation include modified facsimile systems, achievement of artifact-free OCR and conversion of document images into files with separate formats for text, graphics and pictures.

...read moreread less

Abstract: Document scanning is now an accepted part of office procedure, allowing the incorporation of digitized images into new documents and the conversion of scanned print into ASCII by optical character recognition ( OCR). Often document pages contain more than one form of information - textual, graphical and/or pictorial. Segmentation of document images into these three categories is feasible with the aid of image processing. Projections of the thresholded document images in conjunction with autocorrelation are used to check text alignment. Then the edge shifting properties of the rank filter are used to coalesce image regions containing text into solid near-rectangular blocks. Pyramidal reduction is combined with the filtering to ease the computational burden. Horizontal and vertical projections are used to segment whole pages recursively into homogeneous blocks whose properties are then analysed. Applications forseen for the image segmentation include modified facsimile systems, achievement of artifact-free OCR and conversion of document images into files with separate formats for text, graphics and pictures.© (1990) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

...read moreread less

20 citations

Patent•

Method and apparatus for document formatting

[...]

Mika Fukui¹, Isamu Iwai¹, Koji Yamaguchi¹, Miwako Doi¹•Institutions (1)

Toshiba¹

07 Dec 1990

TL;DR: In this paper, a method and an apparatus for document formatting, capable of reflecting the preference of the operator and the overall balance such that the desired formatting can be obtained efficiently without tedious post-processing operations.

...read moreread less

Abstract: A method and an apparatus for document formatting, capable of reflecting the preference of the operator and the overall balance such that the desired formatting can be obtained efficiently without tedious post-processing operations. In the apparatus, document data representing the document including figure data representing figure elements of the document, and region data indicating layout region to which the document is to be laid out are inputted, candidate layouts for each figure element to be laid out are generated, one of the generated candidate layouts is selected, and the document is formatted in the layout region, according to the selected one of the candidate layouts.

...read moreread less

20 citations

Proceedings Article•DOI•

Document image processing based on enhanced border following algorithm

[...]

M. Yamada, K. Hasuike

16 Jun 1990

TL;DR: An enhanced border-following algorithm and its application to document image processing is presented and various kinds of components in a document image can be flexibly segmented and extracted with a variable-size mask for border following instead of the conventional 3*3-size Mask.

...read moreread less

Abstract: An enhanced border-following algorithm and its application to document image processing is presented. Various kinds of components (characters, text lines, text blocks, figures, tables, etc.) in a document image can be flexibly segmented and extracted with a variable-size mask for border following instead of the conventional 3*3-size mask. An automatic document image structuring process to construct a multimedia document and a raster/geometric conversion method for the segmented graphic parts of the image. such as diagrams and tables, are discussed. >

...read moreread less

19 citations

Journal Article•

Model Based Understanding of Document Images.

[...]

Koichi Kise, Ken'ichi Momota, Masaki Yamaoka, Jun'ichi Sugiyama, Noboru Babaguchi, Yoshikazu Tezuka - Show less +2 more

01 Jan 1990-Journal of Machine Vision and Applications

TL;DR: This paper proposes a new method of document image understanding which employs the domain specific knowledge base called document model, and introduces the strategy of hypothesis generation and testing.

...read moreread less

Abstract: Document image understanding is a task to generate the structured description about contents of a document. In this paper, we propose a new method of document image understanding which employs the domain specific knowledge base called document model. Document model is structural representation of constraints on the layout structure as well as the logical structure of a target document. Since the variation of the structure can be described in document model, intermediate results of understanding generally include multiple candidates. In order to generate plausible description from such candidates, we introduce the strategy of hypothesis generation and testing. From the experiments for 100 visiting cards, we demonstrate the effectiveness of our method.

...read moreread less

Journal Article•

Recognition of Document Structure on the Basis of Spatial and Geometric Relationships between Document Items.

[...]

Qin Luo, Toyohide Watanabe, Yuuji Yoshida, Yasuyoshi Inagaki

01 Jan 1990-Journal of Machine Vision and Applications

TL;DR: The basic idea in the method is to utilize the spatial and geometric relationships between document items to extract and classify the meaningful information from documents automatically.

...read moreread less

Abstract: This paper introduces a new method to extract and classify the meaningful information from documents automatically. The basic idea in our method is to utilize the spatial and geometric relationships between document items. Our approach is adaptable even if the layout structures are modified more or less, because the coordinate values of positions, sizes, lengthes and so on are not specified directly. Additionally, some experiments for typical documents such as library cataloging cards, name cards and letters are shown concretely.

...read moreread less

Patent•

Document logical structure generating method

[...]

Hiromichi Fujisawa, Masashi Koga, Tatsuya Murakami, Yoshihiro Shima

09 Nov 1990

TL;DR: In this article, the authors propose to convert a text file which is represented with linear character strings into a hierarchical tree structure by analyzing index character strings corresponding to the chapters, paragraphs, and clauses in the main body of a document and automatically generating the tree-shaped logical structure.

...read moreread less

Abstract: PURPOSE: To convert a text file which is represented with linear character strings into a hierarchical tree structure by analyzing index character strings corresponding to the chapters, paragraphs, and clauses in the main body of a document and automatically generating the tree-shaped logical structure. CONSTITUTION: A document read part 101 recognizes the characters of inputted document image data and the recognized document data are stored, document by document, in a document data storage part 103; and an index symbol analytic part 102 extracts index symbols and generate the logical structures of the documents from the meaning of the index symbols, and the generated logical structures are stored in the logical structure data storage part 104. A display control part 105 displays the logical structure of a document on a terminal device 106 with a screen according to the stored logical structure data. Consequently, the document file which is represented with linear character strings can be converted into the hierarchical tree structure. COPYRIGHT: (C)1992,JPO&Japio

...read moreread less

Journal Article•

Document recognition system with layout structure generator

[...]

Yoshitake Tsuji, Hiroyuki Kami, Masaaki Mizuno, Toshiyuki Tanaka, Haruhiko Tanaka, Masao Iwashita, Tsutomu Temma - Show less +3 more

01 Jan 1990-Journal of Machine Vision and Applications

TL;DR: The authors have developed a document image structure analysis method to generate a layout structure, as well as to detect such document elements as characters, pictures and figures.

...read moreread less

Abstract: 1. Abstract A document input system, with character recognition technique, is used for converting printed matter, such as books and magazines, into code-format information. In order to improve this document input system's performance, an appropriate document structure analysis technique is indispen~able(''~'). When storing data from general printed documents into a database, it is necessary to represent the document structure. Therefore, a document layout structure generation method is especially important(*)(6). For this purpose, the authors have developed a document image structure analysis method to generate a layout structure, as well as to detect such document elements as characters, pictures and figures. This method was developed on a personal computer. Its usability is described in this paper.

...read moreread less

Journal Article•DOI•

Logical and layout structures of documents

[...]

Jürgen Eickel

01 Nov 1990-Computer Physics Communications

TL;DR: More flexiblw and interactively formatting editors for structured document preparation presuppose a strong distinction of logical and layout structure and incorporate a formal description of the mapping, how the layout is derived from the logical structure.

...read moreread less

Patent•

Utilization of a presentation document structure for interchange.

[...]

Barbara Ann Barker¹, Thomas R. Edel¹, Jeffrey A. Stark¹•Institutions (1)

IBM¹

03 Jan 1990

TL;DR: In this paper, a general layout structure of a document is used to optimize its processing by identifying the possible layout presentation constructs appearing in the subsequent specific instance of the conforming document.

...read moreread less

Abstract: A method is disclosed for utilizing a general layout structure of a document which contains relationships within its layout constructs that offer choices when creating the document and conforming instances of logical elements with the general layout structure, taking in to account specific device characteristics, to generate the final-form document. The relationships are defined as expressions similar to those existing in general logical document structure definitions. Thus, an intermediate phase of document interchange between revision and final-form is introduced which saves data transmission time and gives the receiver some flexibility in presentation options while still conforming to a general layout definition. Further, the general layout definition may be used by a receiver to optimize its processing by identifying the possible layout presentation constructs appearing in the subsequent specific instance of the conforming document.

...read moreread less

Patent•

Document formatting apparatus

[...]

Isamu Iwai¹, Koji Yamaguchi¹, Mika Fukui¹•Institutions (1)

Toshiba¹

13 Dec 1990

TL;DR: In this paper, a document formatting apparatus automatically arranged input document data so as to match a pre-formatted document by extracting layout structure including character properties and logical properties of each text data item from an input document.

...read moreread less

Abstract: A document formatting apparatus automatically arrange input document data so as to match a pre-formatted document. Firstly, layout structure including character properties and logical properties of each text data item are extracted from a pre-formatted document. Secondly, logical properties of each text data item are extracted from an input document. Thirdly each of the logical properties of the input document is compared with corresponding logical properties of the pre-formatted document. When the logical properties of input text data are matched with logical properties of the pre-formatted document, corresponding character properties of the pre-formatted document are applied to the input text data. Therefore, each text data item of the input document is automatically arranged in accordance with the preset layout structure and corresponding character properties.

...read moreread less

DOI•

A step towards understanding paper documents

[...]

Andreas Dengel

01 Jan 1990

TL;DR: A knowledge-based approach, developed for the identification of logical objects in a document image, is described, which has been implemented for the analysis of single-sided business letters in Common Lisp on a SUN 3/60 Workstation.

...read moreread less

Abstract: This report focuses on analysis steps necessary for a paper document processing. It is divided in three major parts: a document image preprocessing, a knowledge-based geometric classification of the image, and a expectation-driven text recognition. It first illustrates the several low level image processing procedures providing the physical document structure of a scanned document image. Furthermore, it describes a knowledge-based approach, developed for the identification of logical objects (e.g., sender or the footnote of a letter) in a document image. The logical identifiers provide a context-restricted consideration of the containing text. While using specific logical dictionaries, a expectation-driven text recognition is possible to identify text parts of specific interest. The system has been implemented for the analysis of single-sided business letters in Common Lisp on a SUN 3/60 Workstation. It is running for a large population of different letters. The report also illustrates and discusses examples of typical results obtained by the system.

...read moreread less