scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
Book ChapterDOI
Shoubin Li1, Xuyan Ma1, Shuaiqun Pan1, Jun Hu1, Lin Shi1, Qing Wang1 
08 Nov 2021
TL;DR: Wang et al. as discussed by the authors proposed a VTLayout model fusing the documents' deep visual, shallow visual, and text features to localize and identify different category blocks.
Abstract: Documents often contain complex physical structures, which make the Document Layout Analysis (DLA) task challenging. As a pre-processing step for content extraction, DLA has the potential to capture rich information in historical or scientific documents on a large scale. Although many deep-learning-based methods from computer vision have already achieved excellent performance in detecting Figure from documents, they are still unsatisfactory in recognizing the List, Table, Text and Title category blocks in DLA. This paper proposes a VTLayout model fusing the documents’ deep visual, shallow visual, and text features to localize and identify different category blocks. The model mainly includes two stages, and the three feature extractors are built in the second stage. In the first stage, the Cascade Mask R-CNN model is applied directly to localize all category blocks of the documents. In the second stage, the deep visual, shallow visual, and text features are extracted for fusion to identify the category blocks of documents. As a result, we strengthen the classification power of different category blocks based on the existing localization technique. The experimental results show that the identification capability of the VTLayout is superior to the most advanced method of DLA based on the PubLayNet dataset, and the F1 score is as high as 0.9599.

7 citations

Patent
03 Jan 1990
TL;DR: In this paper, a general layout structure of a document is used to optimize its processing by identifying the possible layout presentation constructs appearing in the subsequent specific instance of the conforming document.
Abstract: A method is disclosed for utilizing a general layout structure of a document which contains relationships within its layout constructs that offer choices when creating the document and conforming instances of logical elements with the general layout structure, taking in to account specific device characteristics, to generate the final-form document. The relationships are defined as expressions similar to those existing in general logical document structure definitions. Thus, an intermediate phase of document interchange between revision and final-form is introduced which saves data transmission time and gives the receiver some flexibility in presentation options while still conforming to a general layout definition. Further, the general layout definition may be used by a receiver to optimize its processing by identifying the possible layout presentation constructs appearing in the subsequent specific instance of the conforming document.

7 citations

Dissertation
06 Dec 2012
TL;DR: A number of improvements are demonstrated on separating text columns when one is situated very close to the other; on preventing the contents of a cell in a table to be merged with the contents in other adjacent cells; and on preventing regions inside a frame to be merge with other text regions around, especially side notes, even when the latter are written using a font similar to that the text body.
Abstract: Document page segmentation is one of the most crucial steps in document image analysis It ideally aims to explain the full structure of any document page, distinguishing text zones, graphics, photographs, halftones, figures, tables, etc Although to date, there have been made several attempts of achieving correct page segmentation results, there are still many difficulties The leader of the project in the framework of which this PhD work has been funded (*) uses a complete processing chain in which page segmentation mistakes are manually corrected by human operators Aside of the costs it represents, this demands tuning of a large number of parameters; moreover, some segmentation mistakes sometimes escape the vigilance of the operators Current automated page segmentation methods are well accepted for clean printed documents; but, they often fail to separate regions in handwritten documents when the document layout structure is loosely defined or when side notes are present inside the page Moreover, tables and advertisements bring additional challenges for region segmentation algorithms Our method addresses these problems The method is divided into four parts:1 Unlike most of popular page segmentation methods, we first separate text and graphics components of the page using a boosted decision tree classifier2 The separated text and graphics components are used among other features to separate columns of text in a two-dimensional conditional random fields framework3 A text line detection method, based on piecewise projection profiles is then applied to detect text lines with respect to text region boundaries4 Finally, a new paragraph detection method, which is trained on the common models of paragraphs, is applied on text lines to find paragraphs based on geometric appearance of text lines and their indentations Our contribution over existing work lies in essence in the use, or adaptation, of algorithms borrowed from machine learning literature, to solve difficult cases Indeed, we demonstrate a number of improvements : on separating text columns when one is situated very close to the other; on preventing the contents of a cell in a table to be merged with the contents of other adjacent cells; on preventing regions inside a frame to be merged with other text regions around, especially side notes, even when the latter are written using a font similar to that the text body Quantitative assessment, and comparison of the performances of our method with competitive algorithms using widely acknowledged metrics and evaluation methodologies, is also provided to a large extend(*) This PhD thesis has been funded by Conseil General de Seine-Saint-Denis, through the FUI6 project Demat-Factory, lead by Safig SA

7 citations

Journal ArticleDOI
TL;DR: Algorithms for the automated segmentation and classification of layout structures in electronic documents are presented and the key idea is to use the patterns in the distribution of white space in a document to recognize and interpret its components.

7 citations

Proceedings ArticleDOI
J. Tatemura1
23 Apr 1997
TL;DR: An interactive document keyword layout technique that enables browsing and manipulation of a collection of documents visually by applies a force directed graph drawing algorithm and clusters documents and keywords by reacting to a user's interaction dynamically.
Abstract: We propose an interactive document keyword layout technique that enables browsing and manipulation of a collection of documents visually. This layout technique applies a force directed graph drawing algorithm and clusters documents and keywords by reacting to a user's interaction dynamically. An example of visual interaction is demonstrated on an experimental system.

7 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189