scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
Proceedings ArticleDOI
14 Aug 1995
TL;DR: A computational model for document logical structure derivation is developed, in which a rule-based control strategy utilizes the data obtained from analyzing a digitized document image, and makes inferences using a multi-level knowledge base of document layout rules.
Abstract: The analysis of a document image to derive a symbolic description of its structure and contents involves using spatial domain knowledge to classify the different printed blocks (e.g., text paragraphs), group them into logical units (e.g., newspaper stories), and determine the reading order of the text blocks within each unit. These steps describe the conversion of the physical structure of a document into its logical structure. We have developed a computational model for document logical structure derivation, in which a rule-based control strategy utilizes the data obtained from analyzing a digitized document image, and makes inferences using a multi-level knowledge base of document layout rules. The knowledge-based document logical structure derivation system (DeLoS) based on this model consists of a hierarchical rule-based control system to guide the block classification, grouping and read-ordering operations; a global data structure to store the document image data and incremental inferences; and a domain knowledge base to encode the rules governing document layout.

72 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper takes advantage of the inherently one-dimensional pattern observed in text and table blocks to reduce the dimension analysis from bi-dimensional documents images to 1D signatures, improving significantly the overall performance.
Abstract: Automatic document layout analysis is a crucial step in cognitive computing and processes that extract information out of document images, such as specific-domain knowledge database creation, graphs and images understanding, extraction of structured data from tables, and others. Even with the progress observed in this field in the last years, challenges are still open and range from accurately detecting content boxes to classifying them into semantically meaningful classes. With the popularization of mobile devices and cloud-based services, the need for approaches that are both fast and economic in data usage is a reality. In this paper we propose a fast one-dimensional approach for automatic document layout analysis considering text, figures and tables based on convolutional neural networks (CNN). We take advantage of the inherently one-dimensional pattern observed in text and table blocks to reduce the dimension analysis from bi-dimensional documents images to 1D signatures, improving significantly the overall performance: we present considerably faster execution times and more compact data usage with no loss in overall accuracy if compared with a classical bidimensional CNN approach.

72 citations

Journal ArticleDOI
TL;DR: Experimental results show that the proposed identification technique is accurate, easy for extension, and tolerant to noise and various types of document degradation.
Abstract: This paper reports an identification technique that detects scripts and languages of noisy and degraded document images. In the proposed technique, scripts and languages are identified through the document vectorization, which converts each document image into a document vector that characterizes the shape and frequency of the contained character or word images. Document images are vectorized by using vertical component cuts and character extremum points, which are both tolerant to the variation in text fonts and styles, noise, and various types of document degradation. For each script or language under study, a script or language template is first constructed through a training process. Scripts and languages of document images are then determined according to the distances between converted document vectors and the preconstructed script and language templates. Experimental results show that the proposed technique is accurate, easy for extension, and tolerant to noise and various types of document degradation.

72 citations

Proceedings ArticleDOI
01 Jan 2001
TL;DR: In this paper, three efficient techniques that can be used to discriminate between text written in Arabic script and textwritten in English script are presented and evaluated.
Abstract: Because of the different characteristics of Arabic language and Romance and Anglo Saxon languages, recognition of documents written in hybrids of these languages requires that the language of the text is to be identified prior to the recognition phase. In this paper, three efficient techniques that can be used to discriminate between text written in Arabic script and text written in English script are presented and evaluated. These techniques address the language identification problem on the word level and on text level. The characteristics of horizontal projection profiles as well as runlength histograms for text written in both languages are the basic features underlying these techniques. Solving this problem is very important in building bilingual document image analysis systems which are capable of processing documents containing hybrid Arabic/Romance and Anglo Saxon languages.

72 citations

Patent
Takashi Saito1
18 Aug 1995
TL;DR: In this article, an extracting step extracts text regions from an input document image and a classifying step classifies the text regions into in-order reading regions to be successively read in a predetermined order and different-attribute regions.
Abstract: An extracting step extracts text regions from an input document image. A classifying step classifies the text regions into in-order reading regions to be successively read in the predetermined order and different-attribute regions. A detecting step detects the construction of the in-order reading regions. A determining step determines the reading order, in which the in-order reading regions are to be read, using the construction. The detecting step detects the construction in a manner that is the same whether the input document image comprises a vertically typeset document or a horizontally typeset document. The detecting step further includes a tree graph formation step c-1) forming a tree graph representing the construction including nodes respectively representing the in-order reading regions.

70 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189