scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
Proceedings ArticleDOI
H. Yashiro1, Takuhiro Murakami1, Yoshihiro Shima1, Yasushi Nakano1, Hiromichi Fujisawa1 
10 Apr 1989
TL;DR: It is shown experimentally that the logical structure of a technical paper can be extracted and the principal components of the method are extraction of logical structure elements using a rectangular set operation and generation of hierarchical links of the logicalructure between the extracted document elements.
Abstract: A method of document structure extraction using generic layout knowledge is described. With this method, it is possible to translate images of multimedia documents, i.e. documents that include pictures, graphics, and color information, to hypertext. Hypertext consists of decomposed elements linked with each other through some logical relationship. The principal components of the method are extraction of logical structure elements using a rectangular set operation and generation of hierarchical links of the logical structure between the extracted document elements. It is shown experimentally that the logical structure of a technical paper can be extracted. >

20 citations

Patent
07 Dec 1990
TL;DR: In this paper, a method and an apparatus for document formatting, capable of reflecting the preference of the operator and the overall balance such that the desired formatting can be obtained efficiently without tedious post-processing operations.
Abstract: A method and an apparatus for document formatting, capable of reflecting the preference of the operator and the overall balance such that the desired formatting can be obtained efficiently without tedious post-processing operations. In the apparatus, document data representing the document including figure data representing figure elements of the document, and region data indicating layout region to which the document is to be laid out are inputted, candidate layouts for each figure element to be laid out are generated, one of the generated candidate layouts is selected, and the document is formatted in the layout region, according to the selected one of the candidate layouts.

20 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: This paper presents the two-phased skew estimation algorithm and the adaptive document block segmentation and classification techniques forISDOM++.
Abstract: WISDOM++ is a document analysis system whose main design requirements are real-time user interaction and adaptivity. This paper presents the two-phased skew estimation algorithm and the adaptive document block segmentation and classification techniques. An evaluation of the performance of some of these tasks is also conducted according to a benchmarking procedure.

19 citations

01 Jan 2013
TL;DR: A framework for classify document image retrieval approaches is proposed, and then these approaches are evaluated based on important measures.
Abstract: During the last decades, Due to the advances in Information technology and communication and increase in volume of printed documents in many applications, document image databases have become increasingly important. Document Images are documents that normally begin on paper and are then via electronics scanned that move towards a paperless office and stored documents as images. Document Image retrieval is one of an important research area in the field of document image databases. Many approaches come in for indexing and retrieval document images. Traditionally, Optical character recognition (OCR) has been used for completely convert the manuscript to an electronic version which can be indexed automatically. Then, Keyword spotting has been proposed for indexing document image retrieval. Keyword spotting method has lower cost than OCR. But there are some problems in both of methods for indexing document images with non-text components. Three approaches have been presented to solve this problem, Signature based approach, layout structural and logo based approach. In this paper we proposed a framework for classify document image retrieval approaches, and then we evaluated these approaches based on important measures.

19 citations

Proceedings ArticleDOI
26 Jul 2009
TL;DR: A way to semi-automatically generate ground-truthed datasets for newspapers and provide a comprehensive dataset for layout analysis ground truth is proposed.
Abstract: In document image understanding, public datasets with ground-truth are an important part of scientific work. They are not only helpful for developing new methods, but also provide a way of comparing performance. Generating these datasets, however, is time consuming and cost-intensive work, requiring a lot of manual effort. In this paper we both propose a way to semi-automatically generate ground-truthed datasets for newspapers and provide a comprehensive dataset. The focus of this paper is layout analysis ground truth. The proposed two step approach consists of a module which automatically creates layouts and an image matching module which allows to map the ground truth information from the synthetic layout to the scanned version. In the first step, layouts are generated automatically from a news corpus. The output consists of a digital newspaper (PDF file) and an XML file containing geometric and logical layout information. In the second step, the PDF files are printed, scanned and aligned with the synthetic image obtained by rendering the PDF. Finally, the geometric and logical layout ground truth is mapped onto the scanned image.

19 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189