Showing papers on "Document layout analysis published in 1992"

PDF

Open Access

Journal Article•DOI•

A prototype document image analysis system for technical journals

[...]

George Nagy¹, Sharad C. Seth², Mahesh Viswanathan³•Institutions (3)

Rensselaer Polytechnic Institute¹, University of Nebraska–Lincoln², IBM³

01 Jul 1992-IEEE Computer

TL;DR: The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described, and the process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools.

...read moreread less

Abstract: Gobbledoc, a system providing remote access to stored documents, which is based on syntactic document analysis and optical character recognition (OCR), is discussed. In Gobbledoc, image processing, document analysis, and OCR operations take place in batch mode when the documents are acquired. The document image acquisition process and the knowledge base that must be entered into the system to process a family of page images are described. The process by which the X-Y tree data structure converts a 2-D page-segmentation problem into a series of 1-D string-parsing problems that can be tackled using conventional compiler tools is also described. Syntactic analysis is used in Gobbledoc to divide each page into labeled rectangular blocks. Blocks labeled text are converted by OCR to obtain a secondary (ASCII) document representation. Since such symbolic files are better suited for computerized search than for human access to the document content and because too many visual layout clues are lost in the OCR process (including some special characters), Gobbledoc preserves the original block images for human browsing. Storage, networking, and display issues specific to document images are also discussed. >

...read moreread less

466 citations

Journal Article•DOI•

Text segmentation using Gabor filters for automatic document processing

[...]

Anil K. Jain¹, Sushil Bhattacharjee¹•Institutions (1)

Michigan State University¹

01 Jul 1992

TL;DR: In this paper, two-dimensional Gabor filters are used to extract texture features for each text region in a given document image, and the text in the document is considered as a textured region.

...read moreread less

Abstract: There is a considerable interest in designing automatic systems that will scan a given paper document and store it on electronic media for easier storage, manipulation, and access. Most documents contain graphics and images in addition to text. Thus, the document image has to be segmented to identify the text regions, so that OCR techniques may be applied only to those regions. In this paper, we present a simple method for document image segmentation in which text regions in a given document image are automatically identified. The proposed segmentation method for document images is based on a multichannel filtering approach to texture segmentation. The text in the document is considered as a textured region. Nontext contents in the document, such as blank spaces, graphics, and pictures, are considered as regions with different textures. Thus, the problem of segmenting document images into text and nontext regions can be posed as a texture segmentation problem. Two-dimensional Gabor filters are used to extract texture features for each of these regions. These filters have been extensively used earlier for a variety of texture segmentation tasks. Here we apply the same filters to the document image segmentation problem. Our segmentation method does not assume any a priori knowledge about the content or font styles of the document, and is shown to work even for skewed images and handwritten text. Results of the proposed segmentation method are presented for several test images which demonstrate the robustness of this technique.

...read moreread less

326 citations

Journal Article•DOI•

The RightPages image-based electronic library for alerting and browsing

[...]

Guy A. Story¹, Lawrence O'Gorman¹, David S. Fox¹, L.L. Schaper¹, H. V. Jagadish¹ - Show less +1 more•Institutions (1)

Bell Labs¹

01 Sep 1992-IEEE Computer

TL;DR: The RightPages electronic library prototype system, which gives users full online library services, is described, and the system's image and document processing, including noise reduction, document layout analysis, text processing, and display processing are discussed.

...read moreread less

Abstract: The RightPages electronic library prototype system, which gives users full online library services, is described. The prototype takes advantage of fast hardware, multimedia workstations, and broadband networks to process scientific and technical journals for users and to offer a service that: alerts them to the arrival of new journal articles matching their interest profiles; lets them immediately examine images of pages in the alerted articles and browse through other articles in the database; and enables them to order paper copies of any articles in the database. The system runs on a local area network that connects one or more scanning stations, a centralized document database server and multiple user stations running X Windows servers. The RightPages interface runs as an X Windows application on Sun workstations or X terminals. The system's image and document processing, including noise reduction, document layout analysis, text processing, and display processing are discussed. >

...read moreread less

186 citations

Patent•

Segmentation of text, picture and lines of a document image

[...]

John F Cullen¹, Koichi Ejiri¹•Institutions (1)

Ricoh¹

06 Apr 1992

TL;DR: In this article, a method and apparatus for segmenting a document image into areas containing text and non-text is presented, which is comprised of the steps of: providing a bit-mapped representation of the document image, extracting run lengths for each scanline from the bit-map representation of document image; constructing rectangles from the run lengths; initially classifying each of the rectangles as either text or nontext; correcting for the skew in the Rectangles; merging associated text into one or more text blocks; and logically ordering the text blocks.

...read moreread less

Abstract: In a character recognition system, a method and apparatus for segmenting a document image into areas containing text and non-text. Document segmentation in the present invention is comprised generally of the steps of: providing a bit-mapped representation of the document image, extracting run lengths for each scanline from the bit-mapped representation of the document image; constructing rectangles from the run lengths; initially classifying each of the rectangles as either text or non-text; correcting for the skew in the rectangles; merging associated text into one or more text blocks; and logically ordering the text blocks.

...read moreread less

186 citations

Journal Article•DOI•

From paper to office document standard representation

[...]

Andreas Dengel¹, Rainer Bleisinger¹, Rainer Hoch¹, Frank Fein¹, Frank Hönes¹ - Show less +1 more•Institutions (1)

German Research Centre for Artificial Intelligence¹

01 Jul 1992-IEEE Computer

TL;DR: The principles of the model-based document analysis system called Pi ODA (paper interface to office document architecture), which was developed as a prototype for the analysis of single-sided business letters in German, are presented.

...read moreread less

Abstract: The principles of the model-based document analysis system called Pi ODA (paper interface to office document architecture), which was developed as a prototype for the analysis of single-sided business letters in German, are presented. Initially, Pi ODA extracts a part-of hierarchy of nested layout objects such as text-blocks, lines, and words based on their presentation on the page. Subsequently, in a step called logical labeling, the layout objects and their compositions are geometrically analyzed to identify corresponding logical objects that can be related to a human perceptible meaning, such as sender, recipient, and date in a letter. A context-sensitive text recognition for logical objects is then applied using logical vocabularies and syntactic knowledge. As a result, Pi ODA produces a document representation that conforms to the ODA international standard. >

...read moreread less

168 citations

Journal Article•DOI•

Major components of a complete text reading system

[...]

Shuichi Tsujimoto¹, Haruo Asada¹•Institutions (1)

Toshiba¹

01 Jul 1992

TL;DR: Experiments have proved that the proposed approaches to document analysis and document understanding are robust even for multicolumned and multiarticle documents containing graphics and photographs, and thatThe proposed character segmentation/recognition method is robust enough to cope with omnifont characters which frequently touch each other.

...read moreread less

Abstract: The document image processes used in a recently developed text reading system are described. The system consists of three major components: document analysis, document understanding, and character segmentation/recognition. The document analysis component extracts lines of text from a page for recognition. The document understanding component extracts logical relationships between the document constituents. The character segmentation/recognition component extracts characters from a text line and recognizes them. Experiments on more than a hundred documents have proved that the proposed approaches to document analysis and document understanding are robust even for multicolumned and multiarticle documents containing graphics and photographs, and that the proposed character segmentation/recognition method is robust enough to cope with omnifont characters which frequently touch each other. >

...read moreread less

100 citations

Proceedings Article•DOI•

Image and document processing techniques for the RightPages electronic library system

[...]

Lawrence O'Gorman¹•Institutions (1)

Bell Labs¹

30 Aug 1992

TL;DR: Three techniques are described for noise reduction from binary document pages to improve page appearance and subsequent optical character recognition and compression, and for subsampling the text image to fit on the computer screen white maintaining readability.

...read moreread less

Abstract: Describes some of the document processing techniques used in the RightPages electronic library system. Since the system deals with scanned images of document pages, these techniques are critical to the use and appearance of the system. The author describes three techniques: (1) for noise reduction from binary document pages to improve page appearance and subsequent optical character recognition and compression; (2) for subsampling the text image to fit on the computer screen white maintaining readability; and (3) a document layout analysis technique to determine text blocks. >

...read moreread less

95 citations

Proceedings Article•DOI•

A fast and efficient method for extracting text paragraphs and graphics from unconstrained documents

[...]

Franck Lebourgeois¹, Z. Bublinski, H. Emptoz•Institutions (1)

Institut national des sciences Appliquées de Lyon¹

30 Aug 1992

TL;DR: Outlines a fast and efficient method for extracting graphics and text paragraphs from printed documents based on bottom-up approach to document analysis and achieves very good performance in most cases.

...read moreread less

Abstract: Outlines a fast and efficient method for extracting graphics and text paragraphs from printed documents. The method presented is based on bottom-up approach to document analysis and it achieves very good performance in most cases. During the preprocessing characters are linked together to form blocks. Created blocks are segmented, labelled and merged into paragraphs. Simultaneously, graphics are extracted from the image. Algorithms for each step of processing are presented. Also, the obtained experimental results are included. >

...read moreread less

59 citations

Patent•

Method and apparatus for summarizing a document without document image decoding

[...]

M. Margaret Withgott¹, Steven C. Bagley¹, Dan S. Bloomberg¹, Per-Kristian Halvorsen¹, Daniel P. Huttenlocher¹, Todd A. Cass¹, Ronald M. Kaplan¹, Ramana B. Rao¹ - Show less +4 more•Institutions (1)

Xerox¹

01 Sep 1992

TL;DR: In this article, a method and apparatus for excerpting and summarizing an undecoded document image, without first converting the document image to optical character codes such as ASCII text, identifies significant words, phrases and graphics in document image using automatic or interactive morphological image recognition techniques.

...read moreread less

Abstract: A method and apparatus for excerpting and summarizing an undecoded document image, without first converting the document image to optical character codes such as ASCII text, identifies significant words, phrases and graphics in the document image using automatic or interactive morphological image recognition techniques, document summaries or indices are produced based on the identified significant portions of the document image. The disclosed method is particularly adept for improvement of reading machines for the blind.

...read moreread less

54 citations

Journal Article•DOI•

Document analysis-from pixels to contents

[...]

Jürgen Schürmann¹, Norbert Bartneck¹, Thomas Bayer¹, Jürgen Franke¹, E. Mandler¹, Matthias Oberländer¹ - Show less +2 more•Institutions (1)

Daimler AG¹

01 Jul 1992

TL;DR: The authors present a conceptual framework for solving the task of document analysis, which, in essence, consists in the conversion of the document's pixel representation into an equivalent knowledge network representation holding the document"s content and layout.

...read moreread less

Abstract: The authors present a conceptual framework for solving the task of document analysis, which, in essence, consists in the conversion of the document's pixel representation into an equivalent knowledge network representation holding the document's content and layout. Starting on the pixel level, the formation of elementary geometric objects on which layout analysis as well as the definition of character objects is based is described. Character recognition accomplishes the mapping from geometric object to character meaning in ASCII representation. On the next level of abstraction words are formed and verified by contextual processing. Modeled knowledge about complete documents and about how their constituents are related to the application forms the highest level of abstraction. The various problems arising at each stage are discussed. The dependencies between the different levels are exemplified and technical solutions put forward. >

...read moreread less

49 citations

Patent•

Text/image separation method

[...]

Te-Mei Wang¹, Po-Chih Wang¹, King-Lung Huang¹•Institutions (1)

Industrial Technology Research Institute¹

02 Dec 1992

TL;DR: In this article, a text/image separation method is proposed which handles the text segment and the image of a document to be printed in a separate and parallel fashion to provide a better printing quality of the document.

...read moreread less

Abstract: A text/image separation method is disclosed which handles the text segment and the image of a document to be printed in a separate and parallel fashion to provide a better printing quality of the document.

...read moreread less

Patent•

Layout method for structured documents

[...]

Sylvia Allouche, Francoise Lopez, Rachid Charquaoui

17 Jul 1992

TL;DR: In this paper, a layout method for formatting and reprocessing of structured documents containing text and graphics is employed by a formatting module, which first performs reading of the generic logical structure, the specific logical structure and the generic layout structure, in order to create a specific layout structure gradually by a plurality of recursive layout processes.

...read moreread less

Abstract: A layout method for formatting and reprocessing of structured documents containing text and graphics is employed by a formatting module. The method first performs reading of the generic logical structure, the specific logical structure, and the generic layout structure, in order to create a specific layout structure gradually by a plurality of recursive layout processes.

...read moreread less

Patent•

Document reading apparatus having a function of determining effective document region based on a detected data

[...]

Noriyuki Okisu¹, Shinya Matsuda¹, Satoshi Nakamura¹, Jun Minakuti¹•Institutions (1)

Minolta¹

01 Dec 1992

TL;DR: A document reading apparatus which can determine an effective image pickup area containing no object such as operator's hands or fingers pressing a document and rectifying image data prior to imaging operation, making use of a difference of the object from the document in chromaticity, luminous density, and the like as mentioned in this paper.

...read moreread less

Abstract: A document reading apparatus which can determine an effective image pickup area containing no object such as operator's hands or fingers pressing a document and rectify image data prior to imaging operation, making use of a difference of the object from the document in chromaticity, luminous density, and the like.

...read moreread less

Patent•

Method and apparatus for editing documents

[...]

Mika Fukui¹, Isamu Iwai¹, Miwako Doi¹, Yoichi Takebayashi¹•Institutions (1)

Toshiba¹

31 Mar 1992

TL;DR: An apparatus and method for editing a document to automatically produce a satisfactory, well ordered layout which includes the steps of extracting characteristic quantities which characterize different elements of the document, deriving relationships among the different elements in accordance with the characteristic quantities, determining a layout of the different parts of the documents, and processing the documents in accordance to the layout is described in this paper.

...read moreread less

Abstract: An apparatus and method for editing a document to automatically produce a satisfactory, well ordered layout which includes the steps of (a) extracting characteristic quantities which characterize different elements of the document; (b) deriving relationships among the different elements of the document in accordance with the characteristic quantities; (c) determining a layout of the different elements of the document in accordance with the relationships; and (d) processing the document in accordance with the layout.

...read moreread less

Book Chapter•DOI•

Document Image Analysis and Recognition

[...]

Sargur N. Srihari, Stephen W. Lam, Peter B. Cullen, Tin Kam Ho

01 Dec 1992

Journal Article•DOI•

DRS: a workstation-based document recognition system for text entry

[...]

Tomio Amano¹, Akio Yamashita¹, N. Itoh¹, Y. Kobayashi¹, Shin Katoh¹, Kazuharu Toyokawa¹, Hiroyasu Goh Greenhill Takahashi¹ - Show less +3 more•Institutions (1)

IBM¹

01 Jul 1992-IEEE Computer

TL;DR: A workstation-based prototype document analysis system that uses optical character recognition (OCR) and provides functions for image capture, block segmentation, page structure analysis, and character recognition with contextual postprocessing, as well as a user interface for error correction.

...read moreread less

Abstract: Document recognition system (DRS), a workstation-based prototype document analysis system that uses optical character recognition (OCR), is described. The system provides functions for image capture, block segmentation, page structure analysis, and character recognition with contextual postprocessing, as well as a user interface for error correction. All the functions except image capture and character recognition have been implemented by means of software for the Japanese edition of OS/2. >

...read moreread less

Journal Article•DOI•

Document image analysis techniques

[...]

Rangachar Kasturi, Lawrence O'Gorman

01 Jun 1992

Proceedings Article•DOI•

A cooperative document understanding method among multiple recognition procedures

[...]

Toyohide Watanabe¹, Q. Luo¹, Noboru Sugie¹•Institutions (1)

Nagoya University¹

30 Aug 1992

TL;DR: This paper proposes a more advanced method based on the spatial relationships among neighboring segments of compositive items, in addition to the geometric aspects, for document understanding.

...read moreread less

Abstract: The main objective of document understanding is to extract and classify the meaningful data automatically from documents. Some researches, concerning this issue, have already been reported. However, these methods are not always successful because the recognition procedures analyze document images on the basis of only physical coordinate values of compositive items. This paper proposes a more advanced method based on the spatial relationships among neighboring segments of compositive items, in addition to the geometric aspects. The knowledge about documents is not a single layer, but organized as multi-level layers: knowledge about layout structures, knowledge about item sequences and knowledge about item properties. Three kinds of knowledge are not only specified hierarchically, but also interrelated mutually between the layout recognition, item recognition and character recognition procedures. >

...read moreread less

Proceedings Article•DOI•

Layout-by-example: a fuzzy visual language for specifying stereotypes of diagram layout

[...]

K. Sugihara¹, K. Yamamoto, K. Takeda, Mitsuyuki Inaba•Institutions (1)

University of Hawaii¹

15 Sep 1992

TL;DR: This paper presents a new approach to automatic layout of diagrams: layout-by-example, in which a layout is produced by applying the layout rules which are generated from layout examples called stereotypes.

...read moreread less

Abstract: This paper presents a new approach to automatic layout of diagrams: layout-by-example. In this approach, a layout is produced by applying the layout rules which are generated from layout examples called stereotypes. A fuzzy visual language is proposed for specifying stereotypes of diagram layout. The concept of fuzzy theory is incorporated into parsing visual sentences representing stereotypes and generating layout rules from the stereotypes. A layout produced by applying layout rules may be modified manually and such modifications on the layout can be used as counterexamples to the existing layout rules so that the tool can tune the layout rules. A prototype of an automatic layout tool based on layout-by-example is implemented in Common Lisp. >

...read moreread less

Patent•

Method and apparatus for generating a layout model to define objects of a document image

[...]

Akio Yamashita¹, Kazuharu Toyokawa¹•Institutions (1)

IBM¹

02 Dec 1992

TL;DR: In this paper, a tree structure and layout model are automatically generated by automatically extracting the tree structure in accordance with document image analysis before a user executes graphical correction, where the input document image is physically analyzed to extract a separator with a high possibility to separate the objects of the document and segment the above document image into a plurality of areas (51A through 51G).

...read moreread less

Abstract: To provide a method for extracting a tree structure by using image analysis results of an actual document and generating a flexible layout model. A tree structure and layout model are newly generated by automatically extracting the tree structure in accordance with document image analysis before a user executes graphical correction. That is, the inputted document image 51 is physically analyzed to extract a separator with a high possibility to separate the objects of the document and segment the above document image into a plurality of areas (51A through 51G) in accordance with the information for the separator. Then, the area segmentation is displayed on a display unit 13 together with the document image 51 and interactively corrected by the user to define a desired tree structure and complete a flexible layout model 80 by setting a parameter to each of the nodes (61A through 61G) of the tree structure.

...read moreread less

Journal Article•DOI•

Document recognition: concepts and implementations

[...]

Nenad Marovac

01 Dec 1992-ACM Sigois Bulletin

TL;DR: This paper is to present a High Level Document Recognition method and the experience in developing and using a number of implementations of the method, and to formalize the concept of document recognition.

...read moreread less

Abstract: Document recognition is a task in which a document in its physical presentation format is transformed into a structured author-oriented model of the document. The presentation format can be bitmaps of document pages, a description of the document in a Page Description Language (PDL), or encoding of the document in a printer or graphics language. The structured model is a format allowing for addition to the document, manipulation of the document, and reformating the layout and the output appearance of the document.Fully automatic document recognition is not possible, in general, for the same reason that it is not possible to de-translate computer programs automatically. However, it is possible to develop a man-assisted semi-automatic document recognition method. This method uses two passes. The first pass is completely automatic; it produces a document format called Interactive Document Model. The Interactive Document Model comprises recognized typesetting and descriptive structures together with derived ODA logical and layout structures for the document. The model generated in the first pass is enough for most purposes and applications. However, if it is not acceptable, the user can then enter the second pass and interactively edit the logical structure.This paper has three objectives. The first is to formalize the concept of document recognition. The second is to subdivide the problem of document recognition and classify it into a number of subproblems, each dealing with different aspects of the problem. The third objective is to introduce a problem which we wish to solve, and then to present a High Level Document Recognition method and the experience in developing and using a number of implementations of the method.

...read moreread less

Journal Article•

A Model Based Layout Understanding Method for Document Images

[...]

Akio Yamashita, Tomio Amano

25 Oct 1992-Transactions of the Institute of electronics, information and communication engineers

Proceedings Article•DOI•

A modified contour following algorithm applied to document segmentation

[...]

E. Trupin, Y. Lecourtier

30 Aug 1992

TL;DR: A generalized contour following technique is presented that can constitute the main tool of document analysis software and is applied to the segmentation of document in order to isolate blocks of text or other document components.

...read moreread less

Abstract: A generalized contour following technique is presented that can constitute the main tool of document analysis software. The progression among the contour is proceed by zone testing instead of using the classical pixel to pixel displacement. This produces the detection of a modified contour which is in fact the envelope of an area containing elements sufficiently close. This characteristic is then applied to the segmentation of document in order to isolate blocks of text or other document components. A discussion about parameters selection is developed. >

...read moreread less

Proceedings Article•DOI•

Model based system for analyzing document images

[...]

Koichi Kise, M. Yamaoka, Noboru Babaguchi, Yoshikazu Tezuka

30 Aug 1992

TL;DR: A knowledge based system for document image analysis which is applicable to various kinds of documents and aiming at high expressivity and maintainability of the knowledge description is proposed.

...read moreread less

Abstract: Document image analysis is the process of deriving logically structured representation of a document by analyzing the layout structure of its image. This paper proposes a knowledge based system for document image analysis which is applicable to various kinds of documents. The characteristics of the system are as follows: (1) The knowledge base called document model encodes only object-level knowledge hierarchically, declaratively and symbolically, aiming at high expressivity and maintainability of the knowledge description; (2) the document model is automatically constructed by referring samples of document images, and incrementally refined by feedback of error information of analysis. >

...read moreread less

Patent•

Method for shaping document

[...]

Miwako Doi, Miyoshi Fukui, Isamu Iwai, 美和子土井, 勇岩井, 美佳福井 - Show less +2 more

25 Mar 1992

TL;DR: In this paper, a format feature is detected from document data inputted from an input part 1, the logical structure of the document data is analyzed by a logical structure analyzing part 5 and stored in logical structure storing part 6 and at least one of a character interval and a line pitch in the stored logical structure is changed based upon a shaping rule stored in a shape rule dictionary.

...read moreread less

Abstract: PURPOSE:To determine an output format restricted by a specified page condition by determining a document output format based upon the volume of document data. CONSTITUTION:A format feature is detected from document data inputted from an input part 1, the logical structure of the document data is analyzed by a logical structure analyzing part 5 and stored in a logical structure storing part 6 and at least one of a character interval and a line pitch in the stored logical structure is changed based upon a shaping rule stored in a shaping rule dictionary 7 so that the document data are included in a prescribed page so as to be easily observed.

...read moreread less

Patent•

System and method for editing a document image

[...]

Thomas Acquaviva¹•Institutions (1)

Xerox¹

14 Dec 1992

TL;DR: In this paper, a system and method of designating edit information for an original document is presented, which includes a mechanism for designating a location on the original document while the document is in a document feeder tray.

...read moreread less

Abstract: A system and method of designating edit information for an original document. The system includes a mechanism for designating a location on the original document while the document is in a document feeder tray. The mechanism does not deface the original document.

...read moreread less

Proceedings Article•DOI•

Model-based control strategy for document image analysis

[...]

Frank Fein, Frank Hoenes

01 Aug 1992

TL;DR: A model for the control strategy of a document image analysis system as well as mechanisms for its interpretation that describe three important aspects: which specialist can be applied to which object in which analysis state and all possible sequences of processing steps which are relevant for the analysis tasks are presented.

...read moreread less

Abstract: Generally, document analysis and understanding involves many processing steps, like unskewing, segmentation, logical labeling, text recognition, and text analysis. Most of these steps can be subdivided into different tasks depending on the problem-solving methods available. All of the techniques are more or less specialized to certain input, but some are also competitive. As a consequence, a document analysis system incorporating many analysis methods must properly schedule and control these methods to obtain an optimal result. In this paper, we present a model for the control strategy of a document image analysis system as well as mechanisms for its interpretation that describe three important aspects: which specialist can be applied to which object in which analysis state. The analysis model comprises all possible sequences of processing steps which are relevant for the analysis tasks. The underlying document architecture supports the analysis specialists by corresponding knowledge and provides a framework for representing the analysis results.© (1992) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

...read moreread less

Language-based document processing

[...]

Dennis S. Arnon, Isabelle Attali, Paul Franchi-Zannettacci

01 Jan 1992

TL;DR: The Centaur system automatically generates structured environments for Tioga and Latex documents and conversions between them from the specifications of the logical and physical structures of the Article document class.

...read moreread less

Abstract: This paper proposes an application of programming environments generation to structured documents manipulation. We use Centaur as a formal tool to model and implement logical and physical structure, logical editing and layout processing, document analysis, re-use and conversion for a sample class of documents : scientific articles including equations and figures. To make connections with real document systems, we choose to give two particular externals forms to the logical structure : Tioga source and Latex source. From the specifications of the logical and physical structures of the Article document class on one hand, and, on the other hand, the specification of the layout processing (viewed as its semantics according to the Tioga or the Latex layout model) and other semantic tools, the Centaur system automatically generates structured environments for Tioga and Latex documents and conversions between them.

...read moreread less

Patent•

Document output method

[...]

Miwako Doi, Isamu Iwai, Toshio Okamoto, 美和子土井, 利夫岡本, 勇岩井 - Show less +2 more

25 Mar 1992

TL;DR: In this article, a header sentence is extracted from document data by referring to a header dictionary 6a and a header rule dictionary 7a for document data given from an input part 2.

...read moreread less

Abstract: PURPOSE:To appropriately distribute the development of the output of document data into plural frames so as to execute a document processing. CONSTITUTION:A header sentence is extracted from document data by referring to a header dictionary 6a and a header rule dictionary 7a for document data given from an input part 2. The document structure of respective sentences divided into the header sentences and the following texts is judged by referring to a document structure rule dictionary 8a, and hierarchical logical structure shown by the document structure of document data is obtained in accordance with the document structure. When developed document data exceeds the storage range of an output destination frame in accordance with a layout rule corresponding to the document structure of the respective sentences shown by the logical structure, control for switching the output destination of data which is successively developed to the equal frame of a next attribute is executed.

...read moreread less

Book Chapter•DOI•

Layout and Logical Structure Recognition

[...]

Pascal Lobbrecht, Xavier Blanca, Luc Sonke

01 Jan 1992

TL;DR: Many researches are carried out worldwide on the segmentation of characters and graphics as well as the search for a layout structure.

...read moreread less

Abstract: Many researches are carried out worldwide on the segmentation of characters and graphics as well as the search for a layout structure Various approaches are introduced in the following section

...read moreread less