scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
Proceedings ArticleDOI
06 Oct 2014
TL;DR: A completely automatic algorithm is presented to perform a robust text segmentation of old handwritten manuscripts on a per-book basis, and it is shown how to exploit this outcome to find two layout elements, i.e., text blocks and text lines.
Abstract: Historical and artistic handwritten books are valuable cultural heritage (CH) items, as they provide information about tangible and intangible cultural aspects from the past. Massive digitization projects have made these kind of data available to a world-wide population, and pose real challenges for automatic processing. In this scenario, document layout analysis plays a significant role, being a fundamental step of any document image understanding system. In this paper, we present a completely automatic algorithm to perform a robust text segmentation of old handwritten manuscripts on a per-book basis, and we show how to exploit this outcome to find two layout elements, i.e., text blocks and text lines. Our proposed technique have been evaluated on a large and heterogeneous corpus content, and our experimental results demonstrate that this approach is efficient and reliable, even when applied to very noisy and damaged books.

8 citations

Proceedings ArticleDOI
Hui Chao1
20 Jan 2003
TL;DR: This paper presents a bottom up approach to recognize graphic illustration in PDF document and shows how this technique can be used in automatic figure extraction, document re-flow and document transformation.
Abstract: PDF is a document format for final presentation. It preserves the original document layout but often not the document logical structure. Graphic illustrations such as figures and tables in PDF often consist of ungrouped graphic primitives such as lines, curves and small text elements. In this paper, we present a bottom up approach to recognize graphic illustration in PDF document. Vicinities of page elements in both 2D space and indexes in layer are used to understand the logical connection between elements. Graphics recognition and elements grouping for illustration is an important part in understanding the document logical structure. This technique can be used in automatic figure extraction, document re-flow and document transformation.

8 citations

Proceedings ArticleDOI
28 Mar 1993
TL;DR: The authors present a new approach to document recognition using fuzzy rules that provides a compact rule base with accuracy and efficiency and is powerful enough to recognize a variety of layout structures observed in technical papers.
Abstract: The authors present a new approach to document recognition using fuzzy rules. The system uses information such as relative locations, relative sizes, and positions. A prototype DOCREC-III is described, which takes bitmap scanned images as input and uses spatial knowledge (layout structure) to reason about the rectangular segments (logical structure) in technical papers. The system provides a compact rule base with accuracy and efficiency. The rules are concise but powerful enough to recognize a variety of layout structures observed in technical papers. >

8 citations

Patent
15 Jul 1993
TL;DR: In this article, the authors propose a structure converting method of document information which is capable of converting a document example conforming to a document type A into the document example conforming to document type B by designating the correspondence relation between two kinds of A and B of document type definitions which define the structure of the document.
Abstract: PURPOSE:To provide a structure converting method of document information which is capable of converting a document example conforming to a document type A into the document example conforming to a document type B by designating the correspondence relation between two kinds of A and B of document type definitions which define the structure of the document, preparing correspondence information and utilizing this. CONSTITUTION:A conversion origin document type definition 131 and a conversion destination document type definition 132 including the description of the document type definition sentence of various kinds of documents are held, document type correspondence information 133 is prepared by the corresponding operation processing 113 between document type definitions for a document type correspondence designated example designating the correspondence relation between these definitions and the preparation processing 114 of document type correspondence information, this information and a conversion origin document example 134 are made input information, a conversion destination document example 135 is prepared by the structure conversion processing 123 of the contents of the document example and the document example is imparted to a structure document edition processing 124.

8 citations

Patent
28 Dec 2012
TL;DR: In this paper, a method for authenticating a printed document which carries barcode that encode authentication data, including word bounding boxes for each word in the original document image and data for reconstructing the original image is presented.
Abstract: A method for authenticating a printed document which carries barcode that encode authentication data, including word bounding boxes for each word in the original document image and data for reconstructing the original image. The printed document is scanned to generate a target document image, which is then segmented into text words. The word bounding boxes of the original and target document images are used to align the target document image. Then, each word in the original document image is compared to corresponding words in the target document image using word difference map and Hausdorff distance between them. Symbols of the original document image are further compared to corresponding symbols in the target document image using feature comparison, symbol difference map and Hausdorff distance comparison, and point matching. These various comparison results can identify alterations in the target document with respect to the original document, which can be visualized.

7 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189