Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A TaLISMAN: automatic text and line segmentation of historical manuscripts

[...]

Ruggero Pintus, Ying Yang¹, Enrico Gobbetti, Holly Rushmeier¹•Institutions (1)

Yale University¹

06 Oct 2014

TL;DR: A completely automatic algorithm is presented to perform a robust text segmentation of old handwritten manuscripts on a per-book basis, and it is shown how to exploit this outcome to find two layout elements, i.e., text blocks and text lines.

...read moreread less

Abstract: Historical and artistic handwritten books are valuable cultural heritage (CH) items, as they provide information about tangible and intangible cultural aspects from the past. Massive digitization projects have made these kind of data available to a world-wide population, and pose real challenges for automatic processing. In this scenario, document layout analysis plays a significant role, being a fundamental step of any document image understanding system. In this paper, we present a completely automatic algorithm to perform a robust text segmentation of old handwritten manuscripts on a per-book basis, and we show how to exploit this outcome to find two layout elements, i.e., text blocks and text lines. Our proposed technique have been evaluated on a large and heterogeneous corpus content, and our experimental results demonstrate that this approach is efficient and reliable, even when applied to very noisy and damaged books.

...read moreread less

8 citations

Proceedings Article•DOI•

Graphics extraction in PDF document

[...]

Hui Chao¹•Institutions (1)

Hewlett-Packard¹

20 Jan 2003

TL;DR: This paper presents a bottom up approach to recognize graphic illustration in PDF document and shows how this technique can be used in automatic figure extraction, document re-flow and document transformation.

...read moreread less

Abstract: PDF is a document format for final presentation. It preserves the original document layout but often not the document logical structure. Graphic illustrations such as figures and tables in PDF often consist of ungrouped graphic primitives such as lines, curves and small text elements. In this paper, we present a bottom up approach to recognize graphic illustration in PDF document. Vicinities of page elements in both 2D space and indexes in layer are used to understand the logical connection between elements. Graphics recognition and elements grouping for illustration is an important part in understanding the document logical structure. This technique can be used in automatic figure extraction, document re-flow and document transformation.

...read moreread less

8 citations

Proceedings Article•DOI•

Fuzzy approach to document recognition

[...]

H. Fujihara¹, E. Babiker¹, Dick B. Simmons¹•Institutions (1)

Texas A&M University¹

28 Mar 1993

TL;DR: The authors present a new approach to document recognition using fuzzy rules that provides a compact rule base with accuracy and efficiency and is powerful enough to recognize a variety of layout structures observed in technical papers.

...read moreread less

Abstract: The authors present a new approach to document recognition using fuzzy rules. The system uses information such as relative locations, relative sizes, and positions. A prototype DOCREC-III is described, which takes bitmap scanned images as input and uses spatial knowledge (layout structure) to reason about the rectangular segments (logical structure) in technical papers. The system provides a compact rule base with accuracy and efficiency. The rules are concise but powerful enough to recognize a variety of layout structures observed in technical papers. >

...read moreread less

8 citations

Patent•

Structure converting method of document information

[...]

Tetsuzo Uehara, 徹三上原

15 Jul 1993

TL;DR: In this article, the authors propose a structure converting method of document information which is capable of converting a document example conforming to a document type A into the document example conforming to document type B by designating the correspondence relation between two kinds of A and B of document type definitions which define the structure of the document.

...read moreread less

Abstract: PURPOSE:To provide a structure converting method of document information which is capable of converting a document example conforming to a document type A into the document example conforming to a document type B by designating the correspondence relation between two kinds of A and B of document type definitions which define the structure of the document, preparing correspondence information and utilizing this. CONSTITUTION:A conversion origin document type definition 131 and a conversion destination document type definition 132 including the description of the document type definition sentence of various kinds of documents are held, document type correspondence information 133 is prepared by the corresponding operation processing 113 between document type definitions for a document type correspondence designated example designating the correspondence relation between these definitions and the preparation processing 114 of document type correspondence information, this information and a conversion origin document example 134 are made input information, a conversion destination document example 135 is prepared by the structure conversion processing 123 of the contents of the document example and the document example is imparted to a structure document edition processing 124.

...read moreread less

8 citations

Patent•

Method of authenticating a printed document

[...]

Tian Yibin, Ming Wei

28 Dec 2012

TL;DR: In this paper, a method for authenticating a printed document which carries barcode that encode authentication data, including word bounding boxes for each word in the original document image and data for reconstructing the original image is presented.

...read moreread less

Abstract: A method for authenticating a printed document which carries barcode that encode authentication data, including word bounding boxes for each word in the original document image and data for reconstructing the original image. The printed document is scanned to generate a target document image, which is then segmented into text words. The word bounding boxes of the original and target document images are used to align the target document image. Then, each word in the original document image is compared to corresponding words in the target document image using word difference map and Hausdorff distance between them. Symbols of the original document image are further compared to corresponding symbols in the target document image using feature comparison, symbol difference map and Hausdorff distance comparison, and point matching. These various comparison results can identify alterations in the target document with respect to the original document, which can be visualized.

...read moreread less

7 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics