Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•

Document visualization device and document visualization program

[...]

Kameshiro Taizou, Fumiko Takahashi, 泰三亀代, 史子高橋

14 Mar 2006

TL;DR: In this paper, a document collection part 1 collects an electronic document, and a document attribute extraction part 2 extracts document attributes from the electronic document; a document significance calculation part 3 calculates the document significance of the electronic documents based on the significance calculation rule.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To achieve efficient operation for the purpose of use such as the tendency research of every region from various electronic documents existing in the Internet or the like. SOLUTION: A document collection part 1 collects an electronic document. A document attribute extraction part 2 extracts document attributes from the electronic document. A document significance calculation part 3 calculates the document significance of the electronic document based on the a significance calculation rule. A keyword/attribute retrieval part 6 retrieves document attributes having the name of a place on the electronic map from the document attributes extracted by the document attribute extraction part 2, and extracts the document name, place name, and significance from the hit document attributes. A document attribute/map association part 7 converts the name of a place into latitude/longitude information. A document processing part 8 prepares a document list of every region constituted of the document name, place name, and significance extracted by the keyword/attribute retrieval part 6 by arranging it in the order of significance, and marks showing document positions and linking with the document list and number of documents are displayed so as to be overlapped on the electronic map at a display device 13 based on the latitude/longitude information. COPYRIGHT: (C)2007,JPO&INPIT

...read moreread less

9 citations

Proceedings Article•DOI•

Semi-automated document image clustering and retrieval

[...]

Markus Diem, Florian Kleber, Stefan Fiel, Robert Sablatnig

27 Dec 2013

TL;DR: The methods presented allow for the analysis of heterogeneous documents that contain printed and handwritten text and allow for a hierarchically clustering with different feature subsets in different layers.

...read moreread less

Abstract: In this paper a semi-automated document image clustering and retrieval is presented to create links between different documents based on their content. Ideally the initial bundling of shuffled document images can be reproduced to explore large document databases. Structural and textural features, which describe the visual similarity, are extracted and used by experts (e.g. registrars) to interactively cluster the documents with a manually defined feature subset (e.g. checked paper, handwritten). The methods presented allow for the analysis of heterogeneous documents that contain printed and handwritten text and allow for a hierarchically clustering with different feature subsets in different layers.

...read moreread less

9 citations

Patent•

Digital original layout producing method for e.g. graphical web browser, involves assigning unique names to image tiles based on position of tiles in original layout, zooming degree and page number of print document

[...]

Ludwig Märthesheimer, Dirk Morgenroth

14 Sep 2006

TL;DR: In this article, the original layout is divided into a number of image tiles and each tile is assigned unique names based on position of the tiles in the original layouts, the zooming degrees and a page number of the print document.

...read moreread less

Abstract: The method involves converting a print document into a digital original layout. The original layout is scaled into different desired zooming degrees. The original layout is divided into a number of image tiles, which form a matrix of image tiles. Unique names are assigned to each tile based on position of the tiles in the original layout, the zooming degrees and a page number of the print document. An interactive element e.g. animated display, film data or music data, is assigned to the original layout. An independent claim is also included for a computer program for performing a digital original layout producing method.

...read moreread less

9 citations

Proceedings Article•DOI•

docExtractor: An off-the-shelf historical document element extraction

[...]

Tom Monnier¹, Mathieu Aubry¹•Institutions (1)

École Normale Supérieure¹

01 Sep 2020

TL;DR: It is argued that the performance obtained without fine-tuning on a specific dataset is critical for applications, in particular in digital humanities, and that the line-level page segmentation is the most relevant for a general purpose element extraction engine.

...read moreread less

Abstract: We present docExtractor, a generic approach for extracting visual elements such as text lines or illustrations from historical documents without requiring any real data annotation. We demonstrate it provides high-quality performances as an off-the-shelf system across a wide variety of datasets and leads to results on par with state-of-the-art when fine-tuned. We argue that the performance obtained without fine-tuning on a specific dataset is critical for applications, in particular in digital humanities, and that the line-level page segmentation we address is the most relevant for a general purpose element extraction engine. We rely on a fast generator of rich synthetic documents and design a fully convolutional network, which we show to generalize better than a detection-based approach. Furthermore, we introduce a new public dataset dubbed IlluHisDoc dedicated to the fine evaluation of illustration segmentation in historical documents.

...read moreread less

9 citations

Proceedings Article•DOI•

The Delaunay Document Layout Descriptor

[...]

Sebastien Eskenazi¹, Petra Gomez-Krämer¹, Jean-Marc Ogier¹•Institutions (1)

University of La Rochelle¹

08 Sep 2015

TL;DR: A new layout descriptor is presented that achieves 100% accuracy and retrieval in a document retrieval scheme on a database of 960 document images and it reduces the size of the index of the database by a factor 400.

...read moreread less

Abstract: Security applications related to document authentication require an exact match between an authentic copy and the original of a document. This implies that the documents analysis algorithms that are used to compare two documents (original and copy) should provide the same output. This kind of algorithm includes the computation of layout descriptors from the segmentation result, as the layout of a document is a part of its semantic content. To this end, this paper presents a new layout descriptor that significantly improves the state of the art. The basic of this descriptor is the use of a Delaunay triangulation of the centroids of the document regions. This triangulation is seen as a graph and the adjacency matrix of the graph forms the descriptor. While most layout descriptors have a stability of 0% with regard to an exact match, our descriptor has a stability of 74% which can be brought up to 100% with the use of an appropriate matching algorithm. It also achieves 100% accuracy and retrieval in a document retrieval scheme on a database of 960 document images. Furthermore, this descriptor is extremely efficient as it performs a search in constant time with respect to the size of the document database and it reduces the size of the index of the database by a factor 400.

...read moreread less

9 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics