Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

VTLayout: Fusion of Visual and Text Features for Document Layout Analysis

[...]

Shoubin Li¹, Xuyan Ma¹, Shuaiqun Pan¹, Jun Hu¹, Lin Shi¹, Qing Wang¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

08 Nov 2021

TL;DR: Wang et al. as discussed by the authors proposed a VTLayout model fusing the documents' deep visual, shallow visual, and text features to localize and identify different category blocks.

...read moreread less

Abstract: Documents often contain complex physical structures, which make the Document Layout Analysis (DLA) task challenging. As a pre-processing step for content extraction, DLA has the potential to capture rich information in historical or scientific documents on a large scale. Although many deep-learning-based methods from computer vision have already achieved excellent performance in detecting Figure from documents, they are still unsatisfactory in recognizing the List, Table, Text and Title category blocks in DLA. This paper proposes a VTLayout model fusing the documents’ deep visual, shallow visual, and text features to localize and identify different category blocks. The model mainly includes two stages, and the three feature extractors are built in the second stage. In the first stage, the Cascade Mask R-CNN model is applied directly to localize all category blocks of the documents. In the second stage, the deep visual, shallow visual, and text features are extracted for fusion to identify the category blocks of documents. As a result, we strengthen the classification power of different category blocks based on the existing localization technique. The experimental results show that the identification capability of the VTLayout is superior to the most advanced method of DLA based on the PubLayNet dataset, and the F1 score is as high as 0.9599.

...read moreread less

7 citations

Patent•

Utilization of a presentation document structure for interchange.

[...]

Barbara Ann Barker¹, Thomas R. Edel¹, Jeffrey A. Stark¹•Institutions (1)

IBM¹

03 Jan 1990

TL;DR: In this paper, a general layout structure of a document is used to optimize its processing by identifying the possible layout presentation constructs appearing in the subsequent specific instance of the conforming document.

...read moreread less

Abstract: A method is disclosed for utilizing a general layout structure of a document which contains relationships within its layout constructs that offer choices when creating the document and conforming instances of logical elements with the general layout structure, taking in to account specific device characteristics, to generate the final-form document. The relationships are defined as expressions similar to those existing in general logical document structure definitions. Thus, an intermediate phase of document interchange between revision and final-form is introduced which saves data transmission time and gives the receiver some flexibility in presentation options while still conforming to a general layout definition. Further, the general layout definition may be used by a receiver to optimize its processing by identifying the possible layout presentation constructs appearing in the subsequent specific instance of the conforming document.

...read moreread less

7 citations

Dissertation•

Segmentation of heterogeneous document images : an approach based on machine learning, connected components analysis, and texture analysis

[...]

Omid Bonakdar Sakhi

06 Dec 2012

TL;DR: A number of improvements are demonstrated on separating text columns when one is situated very close to the other; on preventing the contents of a cell in a table to be merged with the contents in other adjacent cells; and on preventing regions inside a frame to be merge with other text regions around, especially side notes, even when the latter are written using a font similar to that the text body.

...read moreread less

Abstract: Document page segmentation is one of the most crucial steps in document image analysis It ideally aims to explain the full structure of any document page, distinguishing text zones, graphics, photographs, halftones, figures, tables, etc Although to date, there have been made several attempts of achieving correct page segmentation results, there are still many difficulties The leader of the project in the framework of which this PhD work has been funded (*) uses a complete processing chain in which page segmentation mistakes are manually corrected by human operators Aside of the costs it represents, this demands tuning of a large number of parameters; moreover, some segmentation mistakes sometimes escape the vigilance of the operators Current automated page segmentation methods are well accepted for clean printed documents; but, they often fail to separate regions in handwritten documents when the document layout structure is loosely defined or when side notes are present inside the page Moreover, tables and advertisements bring additional challenges for region segmentation algorithms Our method addresses these problems The method is divided into four parts:1 Unlike most of popular page segmentation methods, we first separate text and graphics components of the page using a boosted decision tree classifier2 The separated text and graphics components are used among other features to separate columns of text in a two-dimensional conditional random fields framework3 A text line detection method, based on piecewise projection profiles is then applied to detect text lines with respect to text region boundaries4 Finally, a new paragraph detection method, which is trained on the common models of paragraphs, is applied on text lines to find paragraphs based on geometric appearance of text lines and their indentations Our contribution over existing work lies in essence in the use, or adaptation, of algorithms borrowed from machine learning literature, to solve difficult cases Indeed, we demonstrate a number of improvements : on separating text columns when one is situated very close to the other; on preventing the contents of a cell in a table to be merged with the contents of other adjacent cells; on preventing regions inside a frame to be merged with other text regions around, especially side notes, even when the latter are written using a font similar to that the text body Quantitative assessment, and comparison of the performances of our method with competitive algorithms using widely acknowledged metrics and evaluation methodologies, is also provided to a large extend(*) This PhD thesis has been funded by Conseil General de Seine-Saint-Denis, through the FUI6 project Demat-Factory, lead by Safig SA

...read moreread less

7 citations

Journal Article•DOI•

Geometric algorithms and experiments for automated document structuring

[...]

Daniela Rus¹, K. Summers²•Institutions (2)

Dartmouth College¹, Cornell University²

01 Jul 1997-Mathematical and Computer Modelling

TL;DR: Algorithms for the automated segmentation and classification of layout structures in electronic documents are presented and the key idea is to use the patterns in the distribution of white space in a document to recognize and interpret its components.

...read moreread less

7 citations

Proceedings Article•DOI•

Visualizing document space by force-directed dynamic layout

[...]

J. Tatemura¹•Institutions (1)

University of Tokyo¹

23 Apr 1997

TL;DR: An interactive document keyword layout technique that enables browsing and manipulation of a collection of documents visually by applies a force directed graph drawing algorithm and clusters documents and keywords by reacting to a user's interaction dynamically.

...read moreread less

Abstract: We propose an interactive document keyword layout technique that enables browsing and manipulation of a collection of documents visually. This layout technique applies a force directed graph drawing algorithm and clusters documents and keywords by reacting to a user's interaction dynamically. An example of visual interaction is demonstrated on an experimental system.

...read moreread less

7 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics