scispace - formally typeset
Search or ask a question

Showing papers on "Document layout analysis published in 1993"


Journal ArticleDOI
Lawrence O'Gorman1
TL;DR: The document spectrum (or docstrum) as discussed by the authors is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, which yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Abstract: Page layout analysis is a document processing technique used to determine the format of a page. This paper describes the document spectrum (or docstrum), which is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components. The method yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks. It is advantageous over many other methods in three main ways: independence from skew angle, independence from different text spacings, and the ability to process local regions of different text orientations within the same image. Results of the method shown for several different page formats and for randomly oriented subpages on the same image illustrate the versatility of the method. We also discuss the differences, advantages, and disadvantages of the docstrum with respect to other lay-out methods. >

654 citations


Patent
23 Sep 1993
TL;DR: In this article, a method and apparatus for formatting a document and creating a best document layout from an input list of picture and text objects is disclosed, which includes calculating multiple document layouts while maintaining the correct reading order of the picture and texts at all times.
Abstract: A method and apparatus for formatting a document and creating a best document layout from an input list of picture and text objects is disclosed. The method includes calculating multiple document layouts while maintaining the correct reading order of the picture and text objects at all times. The method positions each picture and text object at multiple anchor points to create multiple document layouts, and then selects a best document layout which is the layout using the least number of pages to display the entire list of objects. If more than one layout uses the least number of pages, the layout positioning the least number of objects on the last page is the best layout.

100 citations


Proceedings ArticleDOI
20 Oct 1993
TL;DR: A syntactic approach to deducing the logical structure of printed documents from their physical layout by a two-dimensional grammar, similar to a context-free string grammar, and a chart parser is used to parse segmented page images according to the grammar.
Abstract: Describes a syntactic approach to deducing the logical structure of printed documents from their physical layout. Page layout is described by a two-dimensional grammar, similar to a context-free string grammar, and a chart parser is used to parse segmented page images according to the grammar. This process is part of a system which reads scanned document images and produces computer-readable text in a logical mark-up format such as SGML. The system is briefly outlined, the grammar formalism and the parsing algorithm are described in detail, and some experimental results are reported. >

68 citations


Patent
24 Aug 1993
TL;DR: In this paper, a document storage and retrieval system is provided with means for storing a document body in the form of image, and storing text information in a character code string for retrieval, means for executing a retrieval with reference to the text information, and means for displaying a document image relating thereto on a retrieval terminal according to the retrieval result.
Abstract: A document storage and retrieval system is provided with means for storing a document body in the form of image, means for storing text information in the form of a character code string for retrieval, means for executing a retrieval with reference to the text information, and means for displaying a document image relating thereto on a retrieval terminal according to the retrieval result. Such a form of the system is available for retrieving the full contents of a document and also for displaying the document body printed in a format easy to read straight in the form of image. Accordingly, users are capable of retrieving documents with arbitrary words and also capable of reading even such a document as is complicated to include mathematical expressions and charts through a terminal in the form of image, the same as on paper. Further, the invention provides a system wherein the text information for retrieval is extracted automatically from the document image through character recognition. Since a precision of the character recognition has not been satisfactory hitherto, a visual retrieval and correction have been carried out without fail by operators. However, there is no necessity for the operators to attend therefor according to the invention. Thus, the text information for retrieval can be generated at the cost of practical time and money even in case of volumes of documents.

55 citations


Journal ArticleDOI
01 Mar 1993
TL;DR: Experimental methods of recognizing the document structures of various types of documents in the framework of document understanding are described and are effective under the knowledge-based frame-work and are integrated complementarily from the top-down (model-driven) and bottom-up (data- driven) approaches.
Abstract: In this paper, we describe experimental methods of recognizing the document structures of various types of documents in the framework of document understanding Namely, we interpret document structures with individually characterized document knowledge The document understanding process is divided into three procedures: the first is the recognition of document structures from a two-dimensional point of view; the second is the recognition of item relationships from a one-dimensional point of view; and the third is the recognition of characters from a zero-dimensional point of view The procedure for recognizing structures plays the most important role in document understanding This procedure extracts and classifies the logical item blocks from paper-based documents distinctly

53 citations


Patent
Eisaku Nakatani1
11 Mar 1993
TL;DR: In this article, a document data stored in a document storage area is extracted line by line to analyze the structure of the document data and the document layout information is extracted from the analysis result.
Abstract: Document data stored in a document storage area are extracted line by line to analyze the structure of the document data. The document layout information is extracted from the analysis result. The extracted layout information is stored, as learning data, in a document layout information learning area. In format conversion, the document data to be output, which is extracted in the same manner as described above, is converted on the basis of the learning data. Document data having a consistent layout is output to a CRT or a printer in accordance with the converted layout information.

38 citations


Proceedings ArticleDOI
20 Oct 1993
TL;DR: A layout editor is proposed, which allows the interactive generation of a layout query using object oriented drawing functions, and can be done in a relational database, which was filled with the help of layout recognition methods.
Abstract: Document image archives are increasingly used to replace paper and microfilm filing. Usually those archives are combined with a database management system or with full text retrieval to search the documents. An additional retrieval method to search already known images in personal document image archives using layout knowledge is presented. This knowledge can be the size, position, and color of layout objects, but also the position of keywords. A layout editor is proposed, which allows the interactive generation of a layout query using object oriented drawing functions. The layout search can be done in a relational database, which was filled with the help of layout recognition methods. For better results and performance special search methods and structures are necessary. Examples for those methods are quad trees to locate layout objects at absolute positions, neighborhood tables to find object pairs with certain spatial relationships, and full text search in page or object areas. >

34 citations


Journal ArticleDOI
TL;DR: An advanced document images analysis which involves a multi-layer description of a document and leads to a semantic analysis of its content for an adaptive coding orientation in order to optimize the archiving is described.

33 citations


Proceedings ArticleDOI
20 Oct 1993
TL;DR: A new approach to the layout analysis, called nested segmentation, is introduced and an ordered labeled tree structure (L-S-Tree) is introduced to represent the segmented document for document classification.
Abstract: Office information systems (OISs) are employed to support office workers in their management of information and to assist them in their daily work. In the OISs, document classification is one of the major functional capabilities. Classifying a document can be facilitated through the layout analysis of the document. A new approach to the layout analysis, called nested segmentation, is introduced. The layout relationships of components of a document are defined in terms of the adjacency of blocks. Given the adjacency of blocks, an adjacent block graph is introduced where the problem of the nested segmentation is transformed to a classic minimal cut problem for the graph. Also, an ordered labeled tree structure (L-S-Tree) is introduced to represent the segmented document for document classification. >

30 citations


Proceedings ArticleDOI
20 Oct 1993
TL;DR: The authors mainly discuss a structural extraction technique called "text area extraction" which locates all text areas in 255 of the 309 (83%) evaluation documents.
Abstract: The development of a computational method for extracting character portions from a complicated document is described. The goal of the proposed method is to realize a practical document structure analysis for advanced optical character recognition systems with document databases. The authors mainly discuss a structural extraction technique called "text area extraction" which locates all text areas in 255 of the 309 (83%) evaluation documents. >

25 citations


Proceedings ArticleDOI
20 Oct 1993
TL;DR: The authors first briefly describe the makeup of a database of scanned document images of scientific and technical documents written in English which are being produced in a CD-ROM format and concentrate on the implementation methodology used to prepare the database.
Abstract: Producing a database of scanned document images for development or evaluation of OCR and document image understanding algorithms is neither easy nor inexpensive. The authors first briefly describe the makeup of a database of scanned document images of scientific and technical documents written in English which are being produced in a CD-ROM format. Then, the authors concentrate on the implementation methodology used to prepare the database. The methodology gives the protocols for each step of the database preparation, and the error model used for the estimation of the ground-truth errors that may exist in the database is discussed. >

Proceedings ArticleDOI
20 Oct 1993
TL;DR: A survey which contains a collection of many methods of describing document structures is presented and several novel concepts and theoretical analyses are also presented.
Abstract: Knowing the structure of a document is the key to successful processing of that document. From different points of view, there exist different definitions for document structures. A survey which contains a collection of many methods of describing document structures is presented. Several novel concepts and theoretical analyses are also presented in this survey. >

Patent
Tsuyoshi Tanaka1
01 Jun 1993
TL;DR: In this article, a document processing system which, in response to a simple instruction given by a user, is able to change and modify automatically the design of an input document is presented.
Abstract: A document processing system which, in response to a simple instruction given by a user, is able to change and modify automatically the design of an input document. The document processing system includes target area instructor for instructing an area serving as an edit target to a document image stored in document image holding unit, target area extraction unit for extracting the instructed edit target area out of the document image, design instruction unit for instructing a desired document design to an output document, design parameter decision unit, responsive to the instruction from the design instruction unit, for deciding a parameter value relating to the document design of the target area, and output image generation unit for processing the document image of the edit target area on the basis of the decided parameter value to thereby generate an output image.

Proceedings ArticleDOI
20 Oct 1993
TL;DR: It is shown how drawing analysis can be seen as a reading of the document in terms of discrete and non-discrete signs, the reading strategies being activated by specific symbols.
Abstract: Two kinds of knowledge can be distinguished in technical document interpretation: structural and syntactic knowledge, referring to the representation rules of objects, and semantic knowledge, referring to the objects themselves. It is shown how drawing analysis can be seen as a reading of the document in terms of discrete and non-discrete signs, the reading strategies being activated by specific symbols. An example of such a reading for a specific layer, annotations such as dimensioning, is presented. >

Proceedings ArticleDOI
G.E. Kopec1, P.A. Chou1
20 Oct 1993
TL;DR: A framework for document image recognition, called document image decoding (DID), that supports the automatic generation of custom document recognition systems from user-specified document models is discussed and use of an automatically generated decoder to analyze telephone yellow pages is described.
Abstract: A framework for document image recognition, called document image decoding (DID), that supports the automatic generation of custom document recognition systems from user-specified document models is discussed. A document recognition problem is viewed as consisting of a message source, an imager, a noisy channel, and an image decoder (recognizer). The inputs to a decoder generator are explicit models for the message source, imager and channel; the output is a specialized program that decodes an image in terms of these models. The models used in DID are based on a stochastic attribute grammar model of document production. Use of an automatically generated decoder to analyze telephone yellow pages is described. >

Proceedings ArticleDOI
20 Oct 1993
TL;DR: The authors propose a layout analysis method based on a pattern classification scheme that defines the feature space in terms of low-level image processing features such as connected components and projection profiles and assigns each connected component its logical label according to its features.
Abstract: Layout analysis is both the segmentation and labeling of document images for automatic document input systems. The authors propose a layout analysis method based on a pattern classification scheme. They define the feature space in terms of low-level image processing features such as connected components and projection profiles. The classifier assigns each connected component its logical label according to its features. Publication specific information is kept in the reference vector dictionary. An experiment using technical journal title pages gives connected component level recognition rates of 95% and 81% for learning and unknown samples, respectively. >

Proceedings ArticleDOI
20 Oct 1993
TL;DR: Current work to develop an intelligent document understanding system prototype is described, which can be stored in a database which then supports a variety of document understanding applications.
Abstract: Starting with scanned image(s) of a document's pages (either from hardcopy or from a fax source), the authors attempt to produce document representations which can be stored in a database which then supports a variety of document understanding applications. Current work to develop an intelligent document understanding system prototype, aimed towards this goal, is described. >

Proceedings ArticleDOI
20 Oct 1993
TL;DR: A block image labeling method based on connected component analysis to label the blocks' contents as smallletter text, medium letter text, large letterText, graphics or photographs, giving the percentage of each of these components with respect to the surface area it occupies.
Abstract: A block image labeling method is presented It does not assume that the blocks to be treated are already segmented or that they contain homogeneous data It is based on connected component analysis to label the blocks' contents as small letter text, medium letter text, large letter text, graphics or photographs, giving the percentage of each of these components with respect to the surface area it occupies It uses a recursive algorithm that allows one to improve on the result of segmentation The performance of the method is given >

Proceedings ArticleDOI
28 Mar 1993
TL;DR: The authors present a new approach to document recognition using fuzzy rules that provides a compact rule base with accuracy and efficiency and is powerful enough to recognize a variety of layout structures observed in technical papers.
Abstract: The authors present a new approach to document recognition using fuzzy rules. The system uses information such as relative locations, relative sizes, and positions. A prototype DOCREC-III is described, which takes bitmap scanned images as input and uses spatial knowledge (layout structure) to reason about the rectangular segments (logical structure) in technical papers. The system provides a compact rule base with accuracy and efficiency. The rules are concise but powerful enough to recognize a variety of layout structures observed in technical papers. >

Patent
15 Jul 1993
TL;DR: In this article, the authors propose a structure converting method of document information which is capable of converting a document example conforming to a document type A into the document example conforming to document type B by designating the correspondence relation between two kinds of A and B of document type definitions which define the structure of the document.
Abstract: PURPOSE:To provide a structure converting method of document information which is capable of converting a document example conforming to a document type A into the document example conforming to a document type B by designating the correspondence relation between two kinds of A and B of document type definitions which define the structure of the document, preparing correspondence information and utilizing this. CONSTITUTION:A conversion origin document type definition 131 and a conversion destination document type definition 132 including the description of the document type definition sentence of various kinds of documents are held, document type correspondence information 133 is prepared by the corresponding operation processing 113 between document type definitions for a document type correspondence designated example designating the correspondence relation between these definitions and the preparation processing 114 of document type correspondence information, this information and a conversion origin document example 134 are made input information, a conversion destination document example 135 is prepared by the structure conversion processing 123 of the contents of the document example and the document example is imparted to a structure document edition processing 124.

Book ChapterDOI
20 Sep 1993
TL;DR: The design and implementation of a user-friendly graphical package for the interactive layout and content description of document image description assists the automatic processing of handwritten forms using OCR techniques by enabling appropriate OCR algorithms and local syntax constraints to be applied to each field to produce the best recognition performance.
Abstract: The design and implementation of a user-friendly graphical package for the interactive layout and content description of document images are discussed. The specific problems addressed are those associated with predefined forms, such as proposal forms used in the insurance business, which have been completed by hand. The document image description allows position and content of the handwritten fields to be defined and labeled and their interrelationships to be specified. This information assists the automatic processing of handwritten forms using OCR techniques by enabling appropriate OCR algorithms and local syntax constraints to be applied to each field to produce the best recognition performance. >

Journal ArticleDOI
TL;DR: This paper will discuss the integration of document image processing and text retrieval principles in order to process and load existing paper documents automatically in an electronic document database that broadens the user's capability to retrieve relevant information more accurately, without going through costly processes to get paper documents into electronic text.
Abstract: This paper will discuss the integration of document image processing and text retrieval principles in order to process and load existing paper documents automatically in an electronic document database that broadens the user's capability to retrieve relevant information more accurately, without going through costly processes to get paper documents into electronic text. The principles of document image processing systems, as well as the problems and shortcomings of most of today's document image processing systems, will be discussed. Then concept retrieval as the latest development in text retrieval will be discussed, with specific reference to the ability of the TOPIC intelligent text retrieval system to allow users to build up a knowledge base of search objects or concepts that can be used at any point in time by all users for the system. This paper will further specifically look at the automatic processing of paper documents by converting the scanned document image pages through to electronic text. The use of optical character recognition technology, the indexing and loading of the documents in a text database, the automatic linking of the documents to the related document images and the retrieval technology available in TOPIC, specifically the TYPO operator that was developed to handle so‐called dirty data such as the common misspellings, character transpositions and ‘dirty’ text received as output from the OCR process, will be discussed. A possible solution to load paper documents quickly and cost‐effectively into an electronic document database will be discussed and demonstrated in detail. The advantages and disadvantages of this approach will be discussed with specific reference to an electronic news clipping service application.

Patent
Teruyuki Satake1
15 Apr 1993
TL;DR: In this article, a word processor with an automatic layout function which automatically moves a possible figure, related to a text of a document to be moved in conjunction with the movement of the text is described.
Abstract: A word processor with an automatic layout function which automatically moves a possible figure, related to a text of a document to be moved in an automatic layout process, in conjunction with the movement of the text. For example, when a text surrounded in a horizontally long rectangular frame figure and related in position to the figure is to be shifted rightward, a CPU analyses the layout structure of the document the data on which is stored in a document memory 12-1, stores data on the result of the analysis in an analysis data buffer 12-3 and determines whether the result of the analysis meets layout rules the data on which is set in an edition rule dictionary memory 12-2. In this case, if the text data does not meet the layout rules and the text data is to be moved rightward, the horizontally long frame figure related in position to the text is also moved automatically rightward in conjunction with the movement of the text. In summary, when the text is to be moved in an automatic layout process, the figure also is moved automatically in conjunction with the movement of the text.

Patent
02 Aug 1993
TL;DR: In this paper, a common document file stores the appearance order rule of the logical structure of a document and an input document file 16 stores the document prepared based on a general purpose markup language rule.
Abstract: PURPOSE:To efficiently prepare a routine document corresponding to a fixed rule. CONSTITUTION:A common document file 15 stores the appearance order rule of the logical structure of a document and an input document file 16 stores the document prepared based on a general purpose markup language rule. A program 174 develops the rule of the file 15 in a chain form common logical structure table and stores it in an area 172 and the program 175 acquires document tags from markup documents fetched to an operation area 171 in order and sets them in the area 173 as a tag table. The program 176 refers to the logical structure table, judges whether or not the appearance order of the document tags of the tag table matches with the rule, extends the chain of the tag table when it matches with the rule and points out the effect to a user when it does not match with the rule. After the matching of the entire document tags is judged, the program 177 generates the logical structure of the document.

Proceedings ArticleDOI
20 Oct 1993
TL;DR: An approach to image based typographic analysis of documents is provided and a hierarchical representation of the page layout is developed which preserves the two-dimensional layout, the read-order and the attributes of document components.
Abstract: An approach to image based typographic analysis of documents is provided. The problem requires a spatial understanding of the document layout as well as knowledge of the proper syntax. The system performs a page synthesis from the stream of formatting commands defined in a DVI file. Since the two-dimensional relationships between document components are not explicit in the page language, the authors develop a representation which preserves the two-dimensional layout, the read-order and the attributes of document components. From this hierarchical representation of the page layout we extract and analyze relevant typographic features such as margins, line and character spacing, and figure placement. >

Patent
27 Dec 1993
TL;DR: In this paper, the connection components of document pictures inputted from a document picture input means are extracted by the extraction means 11 of connection components and classified corresponding to size information by the classification means 12 of the connected components and picture planes provided with only the connection component.
Abstract: PURPOSE: To perform highly accurate layout analysis to a document in which the characters of different point numbers coexist. CONSTITUTION: The connection components of document pictures inputted from a document picture input means are extracted by the extraction means 11 of the connection components and classified corresponding to size information by the classification means 12 of the connection components and picture planes provided with only the connection components.belonging to a class obtained by performing the classification are respectively generated by a partial picture generation means 13. The layout analysis is performed to the respective divided picture planes by layout analysis means 14 and 15, layout information extracted from the respective picture planes is synthesized in a layout information synthesis means 16 and a layout analyzed result over the entire document pictures is obtained. When discrepancy is generated at the time of synthesizing the layout information, the layout information of the plane provided with a lot of the connection components and provided with the connection components in a size appropriate for the characters is preferentially synthesized and the final layout analyzed result is obtained. COPYRIGHT: (C)1995,JPO

Journal ArticleDOI
TL;DR: The layout language based on the general-purpose C language and the processing system is discussed and it is demonstrated that the layout pattern is generated automatically from the layout description.
Abstract: With the rapid progress of VLSI microfabrication technology, the functional cell library must be updated frequently. Then it is important to develop a functional cell generation technique. For this purpose, the study of technique is considered interesting, which represents the layout pattern efficiently and independently of the process technology, using a language. This paper discusses the layout language based on the general-purpose C language and the processing system. The layout pattern is described using C language and newly defined functions for layout description. Then the layout data are generated automatically from the description. First, the assumed layout pattern structure is considered. Then the proposed layout description technique, as well as the realization of the processing system, are discussed. Finally, the processing system actually is constructed and it is demonstrated that the layout pattern is generated automatically from the layout description.

Patent
04 Mar 1993
TL;DR: In this article, the authors replace less relevant sections of a document with a pattern that can be described in a limited data space, and the original document is re-displayed as a combination of the relevant sections against a background of the patterns.
Abstract: This invention teaches replacing less relevant sections of a document with a pattern that can be described in a limited data space. The original document is re-displayed as a combination of the relevant sections against a background of the patterns. This approach leads to a reduction in the space occupied by the document in memory, leads to more rapid retrieval and display of selected portions of the document and facilitates rapid reading comprehension of the outline meaning of the document or finding the location of desired sections within the document.


Proceedings Article
24 Oct 1993
TL;DR: This paper will provide a description of the MEDIADOC model including a discussion of hierarchical structures, rendering scenarios, and multimedia document creation tools.
Abstract: Multimedia documents differ significantly from traditional documents, which are composed of text and possibly geometric graphics. The inclusion of continuous media such as audio and video imposes new requirements on document representation. The MEDIADOC model supports the creation of multimedia documents by means of a logical structure, a layout structure, and a rendering scenario that is a schedule for document playback. Special document creation tools are also required to facilitate the potentially complicated nature of multimedia document design. In this paper we will provide a description of the MEDIADOC model including a discussion of hierarchical structures, rendering scenarios, and multimedia document creation tools.