scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
01 Jan 2013
TL;DR: A new approach to segment and classify the document regions as text, image, graphics and table is proposed and Multilayer perceptron, a supervised learning technique has been used to construct the classifier and found 97.49% classification accuracy.
Abstract: A document comprises lot of knowledge and documents are considered as the common mode of sharing information to others. Pursuance of information from documents involves lot of human effort, time consuming and can severely restrict the usage of information systems. Thus automatic information pursuance from the document has become a significant issue. It has been shown that document segmentation can help to overcome such issues. Document segmentation is a process of splitting the document into distinct regions. This paper proposes a new approach to segment and classify the document regions as text, image, graphics and table. Document image is segmented into blocks using Run length smearing algorithm and features are extracted from each blocks. Multilayer perceptron, a supervised learning technique has been used to construct the classifier and found 97.49% classification accuracy.

4 citations

Proceedings ArticleDOI
10 Apr 1989
TL;DR: The author presents the knowledge-based document-analysis system ANASTASIL, which uses a formalism for document layout description and modeling and is realized by a tree structure, which describes the layout of a document page in different layout abstraction levels.
Abstract: The author presents the knowledge-based document-analysis system ANASTASIL. The system uses a formalism for document layout description and modeling. The model is realized by a tree structure, which describes the layout of a document page in different layout abstraction levels. The tree is used to initiate a best-first search in combination with a hypothesize-and-test strategy to establish a high-level electronic representation of the contents of a document. Results obtained in the different analysis phases are shown and confirm the soundness of the approach. >

4 citations

Posted Content
TL;DR: As a part of the Computer Vision domain, layout analysis is the process through which the regions of interest from a document available as an image are being classified.
Abstract: As a part of the Computer Vision domain, layout analysis is the process through which the regions of interest from a document available as an image are being classified. A scanned file could be an example of such a document. The components of the layout analysis process are: the geometrical analysis and the logical layout.

4 citations

Patent
12 May 2005
TL;DR: In this article, a character string is set by numbering the respective characters so that a character with a larger number than that for the selected characters is not included and adding the characters one by one according to the given number order.
Abstract: PROBLEM TO BE SOLVED: To provide a document analysis program capable of accurately extracting a document layout structure of an electronic document, a computer-readable storage medium storing the document layout analysis program, a document layout analysis method, and a document layout analysis device. SOLUTION: Coordinate information about respective characters in a document image is acquired, a character string in the document image is detected based on the acquired coordinate information, and characters included in the detected character string are selected one by one. In a rectangular inspection area taking a predetermined angle of a circumscribing rectangle, which surrounds the character string, as one angle and including the circumscribing rectangle surrounding the selected characters, a character string is set by numbering the respective characters so that a character with a larger number than that for the selected characters is not included and adding the characters one by one according to the given number order. In the rectangular inspection area containing the characters already added to the character string and a newly added character, if a character other than the characters already added and the newly added character is contained, the newly added character is removed while the already added characters are combined together to be set again as one sentence. COPYRIGHT: (C)2005,JPO&NCIPI

4 citations

Book ChapterDOI
04 Nov 1998
TL;DR: An Extended Split Detection Method that can hierarchically segment a machine-printed page image with a complex layout into smaller layout elements and represents an analyzed layout of a hierarchical structure in a tree data structure.
Abstract: This paper describes an Extended Split Detection Method that can hierarchically segment a machine-printed page image with a complex layout into smaller layout elements. The method performs piecewise-linear segmentation using many kinds of separator elements such as field separators, lines, edges of figures, and edges of white background areas. Furthermore, this method represents an analyzed layout of a hierarchical structure in a tree data structure, in which all nodes are traversed according to the simple rules for generating the reading sequence. We demonstrated that the new method increases the correct character line segmentation rate by 15.5%, to 95.5%, and we achieved a correct reading sequence generation of 88.1%.

4 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189