scispace - formally typeset
Search or ask a question

Showing papers on "Document layout analysis published in 1992"


Journal ArticleDOI
TL;DR: The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described, and the process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools.
Abstract: Gobbledoc, a system providing remote access to stored documents, which is based on syntactic document analysis and optical character recognition (OCR), is discussed. In Gobbledoc, image processing, document analysis, and OCR operations take place in batch mode when the documents are acquired. The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described. The process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools is also described. Syntactic analysis is used in Gobbledoc to divide each page into labeled rectangular blocks. Blocks labeled text are converted by OCR to obtain a secondary (ASCII) document representation. Since such symbolic files are better suited for computerized search than for human access to the document content and because too many visual layout clues are lost in the OCR process (including some special characters), Gobbledoc preserves the original block images for human browsing. Storage, networking, and display issues specific to document images are also discussed. >

466 citations


Journal ArticleDOI
01 Jul 1992
TL;DR: In this paper, two-dimensional Gabor filters are used to extract texture features for each text region in a given document image, and the text in the document is considered as a textured region.
Abstract: There is a considerable interest in designing automatic systems that will scan a given paper document and store it on electronic media for easier storage, manipulation, and access. Most documents contain graphics and images in addition to text. Thus, the document image has to be segmented to identify the text regions, so that OCR techniques may be applied only to those regions. In this paper, we present a simple method for document image segmentation in which text regions in a given document image are automatically identified. The proposed segmentation method for document images is based on a multichannel filtering approach to texture segmentation. The text in the document is considered as a textured region. Nontext contents in the document, such as blank spaces, graphics, and pictures, are considered as regions with different textures. Thus, the problem of segmenting document images into text and nontext regions can be posed as a texture segmentation problem. Two-dimensional Gabor filters are used to extract texture features for each of these regions. These filters have been extensively used earlier for a variety of texture segmentation tasks. Here we apply the same filters to the document image segmentation problem. Our segmentation method does not assume any a priori knowledge about the content or font styles of the document, and is shown to work even for skewed images and handwritten text. Results of the proposed segmentation method are presented for several test images which demonstrate the robustness of this technique.

326 citations


Journal ArticleDOI
Guy A. Story1, Lawrence O'Gorman1, David S. Fox1, L.L. Schaper1, H. V. Jagadish1 
TL;DR: The RightPages electronic library prototype system, which gives users full online library services, is described, and the system's image and document processing, including noise reduction, document layout analysis, text processing, and display processing are discussed.
Abstract: The RightPages electronic library prototype system, which gives users full online library services, is described. The prototype takes advantage of fast hardware, multimedia workstations, and broadband networks to process scientific and technical journals for users and to offer a service that: alerts them to the arrival of new journal articles matching their interest profiles; lets them immediately examine images of pages in the alerted articles and browse through other articles in the database; and enables them to order paper copies of any articles in the database. The system runs on a local area network that connects one or more scanning stations, a centralized document database server and multiple user stations running X Windows servers. The RightPages interface runs as an X Windows application on Sun workstations or X terminals. The system's image and document processing, including noise reduction, document layout analysis, text processing, and display processing are discussed. >

186 citations


Patent
John F Cullen1, Koichi Ejiri1
06 Apr 1992
TL;DR: In this article, a method and apparatus for segmenting a document image into areas containing text and non-text is presented, which is comprised of the steps of: providing a bit-mapped representation of the document image, extracting run lengths for each scanline from the bit-map representation of document image; constructing rectangles from the run lengths; initially classifying each of the rectangles as either text or nontext; correcting for the skew in the Rectangles; merging associated text into one or more text blocks; and logically ordering the text blocks.
Abstract: In a character recognition system, a method and apparatus for segmenting a document image into areas containing text and non-text. Document segmentation in the present invention is comprised generally of the steps of: providing a bit-mapped representation of the document image, extracting run lengths for each scanline from the bit-mapped representation of the document image; constructing rectangles from the run lengths; initially classifying each of the rectangles as either text or non-text; correcting for the skew in the rectangles; merging associated text into one or more text blocks; and logically ordering the text blocks.

186 citations


Journal ArticleDOI
TL;DR: The principles of the model-based document analysis system called Pi ODA (paper interface to office document architecture), which was developed as a prototype for the analysis of single-sided business letters in German, are presented.
Abstract: The principles of the model-based document analysis system called Pi ODA (paper interface to office document architecture), which was developed as a prototype for the analysis of single-sided business letters in German, are presented. Initially, Pi ODA extracts a part-of hierarchy of nested layout objects such as text-blocks, lines, and words based on their presentation on the page. Subsequently, in a step called logical labeling, the layout objects and their compositions are geometrically analyzed to identify corresponding logical objects that can be related to a human perceptible meaning, such as sender, recipient, and date in a letter. A context-sensitive text recognition for logical objects is then applied using logical vocabularies and syntactic knowledge. As a result, Pi ODA produces a document representation that conforms to the ODA international standard. >

168 citations


Journal ArticleDOI
Shuichi Tsujimoto1, Haruo Asada1
01 Jul 1992
TL;DR: Experiments have proved that the proposed approaches to document analysis and document understanding are robust even for multicolumned and multiarticle documents containing graphics and photographs, and thatThe proposed character segmentation/recognition method is robust enough to cope with omnifont characters which frequently touch each other.
Abstract: The document image processes used in a recently developed text reading system are described. The system consists of three major components: document analysis, document understanding, and character segmentation/recognition. The document analysis component extracts lines of text from a page for recognition. The document understanding component extracts logical relationships between the document constituents. The character segmentation/recognition component extracts characters from a text line and recognizes them. Experiments on more than a hundred documents have proved that the proposed approaches to document analysis and document understanding are robust even for multicolumned and multiarticle documents containing graphics and photographs, and that the proposed character segmentation/recognition method is robust enough to cope with omnifont characters which frequently touch each other. >

100 citations


Proceedings ArticleDOI
Lawrence O'Gorman1
30 Aug 1992
TL;DR: Three techniques are described for noise reduction from binary document pages to improve page appearance and subsequent optical character recognition and compression, and for subsampling the text image to fit on the computer screen white maintaining readability.
Abstract: Describes some of the document processing techniques used in the RightPages electronic library system. Since the system deals with scanned images of document pages, these techniques are critical to the use and appearance of the system. The author describes three techniques: (1) for noise reduction from binary document pages to improve page appearance and subsequent optical character recognition and compression; (2) for subsampling the text image to fit on the computer screen white maintaining readability; and (3) a document layout analysis technique to determine text blocks. >

95 citations


Proceedings ArticleDOI
30 Aug 1992
TL;DR: Outlines a fast and efficient method for extracting graphics and text paragraphs from printed documents based on bottom-up approach to document analysis and achieves very good performance in most cases.
Abstract: Outlines a fast and efficient method for extracting graphics and text paragraphs from printed documents. The method presented is based on bottom-up approach to document analysis and it achieves very good performance in most cases. During the preprocessing characters are linked together to form blocks. Created blocks are segmented, labelled and merged into paragraphs. Simultaneously, graphics are extracted from the image. Algorithms for each step of processing are presented. Also, the obtained experimental results are included. >

59 citations


Patent
01 Sep 1992
TL;DR: In this article, a method and apparatus for excerpting and summarizing an undecoded document image, without first converting the document image to optical character codes such as ASCII text, identifies significant words, phrases and graphics in document image using automatic or interactive morphological image recognition techniques.
Abstract: A method and apparatus for excerpting and summarizing an undecoded document image, without first converting the document image to optical character codes such as ASCII text, identifies significant words, phrases and graphics in the document image using automatic or interactive morphological image recognition techniques, document summaries or indices are produced based on the identified significant portions of the document image. The disclosed method is particularly adept for improvement of reading machines for the blind.

54 citations


Journal ArticleDOI
01 Jul 1992
TL;DR: The authors present a conceptual framework for solving the task of document analysis, which, in essence, consists in the conversion of the document's pixel representation into an equivalent knowledge network representation holding the document"s content and layout.
Abstract: The authors present a conceptual framework for solving the task of document analysis, which, in essence, consists in the conversion of the document's pixel representation into an equivalent knowledge network representation holding the document's content and layout. Starting on the pixel level, the formation of elementary geometric objects on which layout analysis as well as the definition of character objects is based is described. Character recognition accomplishes the mapping from geometric object to character meaning in ASCII representation. On the next level of abstraction words are formed and verified by contextual processing. Modeled knowledge about complete documents and about how their constituents are related to the application forms the highest level of abstraction. The various problems arising at each stage are discussed. The dependencies between the different levels are exemplified and technical solutions put forward. >

49 citations


Patent
02 Dec 1992
TL;DR: In this article, a text/image separation method is proposed which handles the text segment and the image of a document to be printed in a separate and parallel fashion to provide a better printing quality of the document.
Abstract: A text/image separation method is disclosed which handles the text segment and the image of a document to be printed in a separate and parallel fashion to provide a better printing quality of the document.

Patent
17 Jul 1992
TL;DR: In this paper, a layout method for formatting and reprocessing of structured documents containing text and graphics is employed by a formatting module, which first performs reading of the generic logical structure, the specific logical structure and the generic layout structure, in order to create a specific layout structure gradually by a plurality of recursive layout processes.
Abstract: A layout method for formatting and reprocessing of structured documents containing text and graphics is employed by a formatting module. The method first performs reading of the generic logical structure, the specific logical structure, and the generic layout structure, in order to create a specific layout structure gradually by a plurality of recursive layout processes.

Patent
01 Dec 1992
TL;DR: A document reading apparatus which can determine an effective image pickup area containing no object such as operator's hands or fingers pressing a document and rectifying image data prior to imaging operation, making use of a difference of the object from the document in chromaticity, luminous density, and the like as mentioned in this paper.
Abstract: A document reading apparatus which can determine an effective image pickup area containing no object such as operator's hands or fingers pressing a document and rectify image data prior to imaging operation, making use of a difference of the object from the document in chromaticity, luminous density, and the like.

Patent
31 Mar 1992
TL;DR: An apparatus and method for editing a document to automatically produce a satisfactory, well ordered layout which includes the steps of extracting characteristic quantities which characterize different elements of the document, deriving relationships among the different elements in accordance with the characteristic quantities, determining a layout of the different parts of the documents, and processing the documents in accordance to the layout is described in this paper.
Abstract: An apparatus and method for editing a document to automatically produce a satisfactory, well ordered layout which includes the steps of (a) extracting characteristic quantities which characterize different elements of the document; (b) deriving relationships among the different elements of the document in accordance with the characteristic quantities; (c) determining a layout of the different elements of the document in accordance with the relationships; and (d) processing the document in accordance with the layout.


Journal ArticleDOI
TL;DR: A workstation-based prototype document analysis system that uses optical character recognition (OCR) and provides functions for image capture, block segmentation, page structure analysis, and character recognition with contextual postprocessing, as well as a user interface for error correction.
Abstract: Document recognition system (DRS), a workstation-based prototype document analysis system that uses optical character recognition (OCR), is described. The system provides functions for image capture, block segmentation, page structure analysis, and character recognition with contextual postprocessing, as well as a user interface for error correction. All the functions except image capture and character recognition have been implemented by means of software for the Japanese edition of OS/2. >


Proceedings ArticleDOI
30 Aug 1992
TL;DR: This paper proposes a more advanced method based on the spatial relationships among neighboring segments of compositive items, in addition to the geometric aspects, for document understanding.
Abstract: The main objective of document understanding is to extract and classify the meaningful data automatically from documents. Some researches, concerning this issue, have already been reported. However, these methods are not always successful because the recognition procedures analyze document images on the basis of only physical coordinate values of compositive items. This paper proposes a more advanced method based on the spatial relationships among neighboring segments of compositive items, in addition to the geometric aspects. The knowledge about documents is not a single layer, but organized as multi-level layers: knowledge about layout structures, knowledge about item sequences and knowledge about item properties. Three kinds of knowledge are not only specified hierarchically, but also interrelated mutually between the layout recognition, item recognition and character recognition procedures. >

Proceedings ArticleDOI
15 Sep 1992
TL;DR: This paper presents a new approach to automatic layout of diagrams: layout-by-example, in which a layout is produced by applying the layout rules which are generated from layout examples called stereotypes.
Abstract: This paper presents a new approach to automatic layout of diagrams: layout-by-example. In this approach, a layout is produced by applying the layout rules which are generated from layout examples called stereotypes. A fuzzy visual language is proposed for specifying stereotypes of diagram layout. The concept of fuzzy theory is incorporated into parsing visual sentences representing stereotypes and generating layout rules from the stereotypes. A layout produced by applying layout rules may be modified manually and such modifications on the layout can be used as counterexamples to the existing layout rules so that the tool can tune the layout rules. A prototype of an automatic layout tool based on layout-by-example is implemented in Common Lisp. >

Patent
Akio Yamashita1, Kazuharu Toyokawa1
02 Dec 1992
TL;DR: In this paper, a tree structure and layout model are automatically generated by automatically extracting the tree structure in accordance with document image analysis before a user executes graphical correction, where the input document image is physically analyzed to extract a separator with a high possibility to separate the objects of the document and segment the above document image into a plurality of areas (51A through 51G).
Abstract: To provide a method for extracting a tree structure by using image analysis results of an actual document and generating a flexible layout model. A tree structure and layout model are newly generated by automatically extracting the tree structure in accordance with document image analysis before a user executes graphical correction. That is, the inputted document image 51 is physically analyzed to extract a separator with a high possibility to separate the objects of the document and segment the above document image into a plurality of areas (51A through 51G) in accordance with the information for the separator. Then, the area segmentation is displayed on a display unit 13 together with the document image 51 and interactively corrected by the user to define a desired tree structure and complete a flexible layout model 80 by setting a parameter to each of the nodes (61A through 61G) of the tree structure.

Journal ArticleDOI
TL;DR: This paper is to present a High Level Document Recognition method and the experience in developing and using a number of implementations of the method, and to formalize the concept of document recognition.
Abstract: Document recognition is a task in which a document in its physical presentation format is transformed into a structured author-oriented model of the document. The presentation format can be bitmaps of document pages, a description of the document in a Page Description Language (PDL), or encoding of the document in a printer or graphics language. The structured model is a format allowing for addition to the document, manipulation of the document, and reformating the layout and the output appearance of the document.Fully automatic document recognition is not possible, in general, for the same reason that it is not possible to de-translate computer programs automatically. However, it is possible to develop a man-assisted semi-automatic document recognition method. This method uses two passes. The first pass is completely automatic; it produces a document format called Interactive Document Model. The Interactive Document Model comprises recognized typesetting and descriptive structures together with derived ODA logical and layout structures for the document. The model generated in the first pass is enough for most purposes and applications. However, if it is not acceptable, the user can then enter the second pass and interactively edit the logical structure.This paper has three objectives. The first is to formalize the concept of document recognition. The second is to subdivide the problem of document recognition and classify it into a number of subproblems, each dealing with different aspects of the problem. The third objective is to introduce a problem which we wish to solve, and then to present a High Level Document Recognition method and the experience in developing and using a number of implementations of the method.


Proceedings ArticleDOI
30 Aug 1992
TL;DR: A generalized contour following technique is presented that can constitute the main tool of document analysis software and is applied to the segmentation of document in order to isolate blocks of text or other document components.
Abstract: A generalized contour following technique is presented that can constitute the main tool of document analysis software. The progression among the contour is proceed by zone testing instead of using the classical pixel to pixel displacement. This produces the detection of a modified contour which is in fact the envelope of an area containing elements sufficiently close. This characteristic is then applied to the segmentation of document in order to isolate blocks of text or other document components. A discussion about parameters selection is developed. >

Proceedings ArticleDOI
30 Aug 1992
TL;DR: A knowledge based system for document image analysis which is applicable to various kinds of documents and aiming at high expressivity and maintainability of the knowledge description is proposed.
Abstract: Document image analysis is the process of deriving logically structured representation of a document by analyzing the layout structure of its image. This paper proposes a knowledge based system for document image analysis which is applicable to various kinds of documents. The characteristics of the system are as follows: (1) The knowledge base called document model encodes only object-level knowledge hierarchically, declaratively and symbolically, aiming at high expressivity and maintainability of the knowledge description; (2) the document model is automatically constructed by referring samples of document images, and incrementally refined by feedback of error information of analysis. >

Patent
25 Mar 1992
TL;DR: In this paper, a format feature is detected from document data inputted from an input part 1, the logical structure of the document data is analyzed by a logical structure analyzing part 5 and stored in logical structure storing part 6 and at least one of a character interval and a line pitch in the stored logical structure is changed based upon a shaping rule stored in a shape rule dictionary.
Abstract: PURPOSE:To determine an output format restricted by a specified page condition by determining a document output format based upon the volume of document data. CONSTITUTION:A format feature is detected from document data inputted from an input part 1, the logical structure of the document data is analyzed by a logical structure analyzing part 5 and stored in a logical structure storing part 6 and at least one of a character interval and a line pitch in the stored logical structure is changed based upon a shaping rule stored in a shaping rule dictionary 7 so that the document data are included in a prescribed page so as to be easily observed.

Patent
Thomas Acquaviva1
14 Dec 1992
TL;DR: In this paper, a system and method of designating edit information for an original document is presented, which includes a mechanism for designating a location on the original document while the document is in a document feeder tray.
Abstract: A system and method of designating edit information for an original document. The system includes a mechanism for designating a location on the original document while the document is in a document feeder tray. The mechanism does not deface the original document.

Proceedings ArticleDOI
01 Aug 1992
TL;DR: A model for the control strategy of a document image analysis system as well as mechanisms for its interpretation that describe three important aspects: which specialist can be applied to which object in which analysis state and all possible sequences of processing steps which are relevant for the analysis tasks are presented.
Abstract: Generally, document analysis and understanding involves many processing steps, like unskewing, segmentation, logical labeling, text recognition, and text analysis. Most of these steps can be subdivided into different tasks depending on the problem-solving methods available. All of the techniques are more or less specialized to certain input, but some are also competitive. As a consequence, a document analysis system incorporating many analysis methods must properly schedule and control these methods to obtain an optimal result. In this paper, we present a model for the control strategy of a document image analysis system as well as mechanisms for its interpretation that describe three important aspects: which specialist can be applied to which object in which analysis state. The analysis model comprises all possible sequences of processing steps which are relevant for the analysis tasks. The underlying document architecture supports the analysis specialists by corresponding knowledge and provides a framework for representing the analysis results.© (1992) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

01 Jan 1992
TL;DR: The Centaur system automatically generates structured environments for Tioga and Latex documents and conversions between them from the specifications of the logical and physical structures of the Article document class.
Abstract: This paper proposes an application of programming environments generation to structured documents manipulation. We use Centaur as a formal tool to model and implement logical and physical structure, logical editing and layout processing, document analysis, re-use and conversion for a sample class of documents : scientific articles including equations and figures. To make connections with real document systems, we choose to give two particular externals forms to the logical structure : Tioga source and Latex source. From the specifications of the logical and physical structures of the Article document class on one hand, and, on the other hand, the specification of the layout processing (viewed as its semantics according to the Tioga or the Latex layout model) and other semantic tools, the Centaur system automatically generates structured environments for Tioga and Latex documents and conversions between them.

Patent
25 Mar 1992
TL;DR: In this article, a header sentence is extracted from document data by referring to a header dictionary 6a and a header rule dictionary 7a for document data given from an input part 2.
Abstract: PURPOSE:To appropriately distribute the development of the output of document data into plural frames so as to execute a document processing. CONSTITUTION:A header sentence is extracted from document data by referring to a header dictionary 6a and a header rule dictionary 7a for document data given from an input part 2. The document structure of respective sentences divided into the header sentences and the following texts is judged by referring to a document structure rule dictionary 8a, and hierarchical logical structure shown by the document structure of document data is obtained in accordance with the document structure. When developed document data exceeds the storage range of an output destination frame in accordance with a layout rule corresponding to the document structure of the respective sentences shown by the logical structure, control for switching the output destination of data which is successively developed to the equal frame of a next attribute is executed.

Book ChapterDOI
01 Jan 1992
TL;DR: Many researches are carried out worldwide on the segmentation of characters and graphics as well as the search for a layout structure.
Abstract: Many researches are carried out worldwide on the segmentation of characters and graphics as well as the search for a layout structure Various approaches are introduced in the following section