Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•

Image filing system for memorizing images read from a given document together with small characterizing image

[...]

Masamichi Sugiura¹, Kaoru Tada¹, Hiroya Sugawa¹•Institutions (1)

Minolta¹

07 Jun 1995

TL;DR: In this article, the auxiliary document identification data is formulated to be representative of a characterizing portion of the image read from a given document, wherein the image data representative of the whole document read from the document is stored in one data storage area of a memory and the image representation of the part of the document read in another memory for use as the auxiliary data.

...read moreread less

Abstract: An image filing system which uses, in addition to main document identification data in the form of a keyword assigned to a given document, auxiliary document identification data formulated to be representative of a characterizing portion of the image read from a given document, wherein the image data representative of the whole image read from the document is stored in one data storage area of a memory and the image data representative of the characterizing portion of the image is stored in another data storage area of the memory for use as the auxiliary document identification data. When the documents accessed as a result of the searching on the basis of a keyword or keywords, the auxiliary document identification data is displayed for each of the documents in addition to the keyword or keywords assigned to each of the documents listed on the display. This will help the operator of the system select and call the target document from among the listed documents by referencing the auxiliary document identification data as well as the main document identification data. The auxiliary document identification data is thus useful for distinguishing one document from another or others when the documents have a common keyword as the main document identification data.

...read moreread less

5 citations

Patent•

Device, method and program for creating print data

[...]

Satoru Yokota, 悟横田

26 Jan 2004

TL;DR: In this paper, the problem of automatically converting or dividing document page data according to the page size of an output book and controlling page layout without considering the size of document pages and a writing position on an open page is addressed.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To automatically convert or divide document page data according to the page size of an output book and control page layout without considering the size of document pages and a writing position on an open page. SOLUTION: In this device for creating print data, a page layout control part 102 determines, based on a document size conversion rule 60, conversion information for every document page inside document data 111 created by a document data creating part 101, and determines the order in which the document pages are written. The conversion information and the order are set in a document layout management table 70 to create an output layout management table 80 in which the result of allocating pages for printing matching the style of binding for weekly magazines, the order of printing, and information on printing surfaces are set. A document data conversion part 103 creates an intermediate file 112 in the units of book pages on the basis of the setting information in the output layout management table 80, and using the intermediate file 112 creates the raster data of printed pages for output to a printer 200. COPYRIGHT: (C)2005,JPO&NCIPI

...read moreread less

5 citations

Journal Article•DOI•

Identification of scripts and orientations of degraded document images

[...]

Shijian Lu¹, Linlin Li², Chew Lim Tan²•Institutions (2)

Institute for Infocomm Research Singapore¹, National University of Singapore²

01 Nov 2010-Pattern Analysis and Applications

TL;DR: A document script and document orientation identification method that addresses this issue by converting a document image into a pair of document vectors using the density and distribution of character strokes.

...read moreread less

Abstract: Document scripts and document orientations are important information for the document digitalization. Prior work has been reported to identify document scripts and document orientations, whereas most reported methods are very sensitive to document skew and low image resolution. This paper reports a document script and document orientation identification method that addresses this issue by converting a document image into a pair of document vectors using the density and distribution of character strokes. Experiments over 3,024 document images of 12 scripts show that the proposed methods are accurate and tolerant to various types of document degradation.

...read moreread less

5 citations

Proceedings Article•DOI•

Logical Entity Recognition in Multi-Style Document Page Images

[...]

Song Mao, Zheng Xu¹, T. Tjahjadi¹, George R. Thoma²•Institutions (2)

University of Warwick¹, National Institutes of Health²

20 Aug 2006

TL;DR: This paper proposes a novel method in which style information is used in both logical entity classifier training and recognition phases, and shows that the use of the style information significantly improves the accuracy of logical entity recognition in multi-style document page images.

...read moreread less

Abstract: Logical entity recognition in document page images is the essential part of a document image analysis system. A heterogeneous collection of document pages usually has many layout styles. Features extracted from same logical entities in different styles may have very different values and vice versa. Therefore, logical entity classifiers learned from a training set of multi-style document pages may not be reliable due to possible feature overlap of different logical entities in different styles. In this paper, we propose a novel method in which style information is used in both logical entity classifier training and recognition phases. In the training phase, training data are first classified into distinct styles, and a dedicated support vector machine (SVM) is then learned for each style. In the recognition phase, the style of a new document page image is first identified and its logical entities are then recognized using corresponding SVM. We show in our experiments that the use of the style information significantly improves the accuracy of logical entity recognition in multi-style document page images

...read moreread less

5 citations

Journal Article•DOI•

Combination of Different Layout Approaches

[...]

Sonja Maier, Mark Minas

01 Jan 2010-Electronic Communication of The European Association of Software Science and Technology

TL;DR: An approach that is capable of combining diverse layout approaches, such as standard graph drawing algorithms, constraint-based algorithms, or rule-based layout algorithms, is presented and an algorithm is presented that automatically computes the complete layout in a straightforward way.

...read moreread less

Abstract: In an interactive environment such as a visual language editor, it is not sufficient to apply the same layout algorithm in every situation. Instead, the user often wants to select the layout behavior at runtime. With the approach presented, the user can control the layout behavior by choosing different layout patterns for different parts of a diagram, e.g., a graph drawing algorithm may be applied to some selected components while others are aligned vertically or horizontally. To enable the specification of layout behavior, we introduced the concept of layout patterns in previous work. Each layout pattern encapsulates certain layout behavior, and hence enables modularization and reuse. To specify user-controlled layout behavior, a flexible combination of arbitrary layout patterns needs to be enabled. Therefore, we introduce an approach that is capable of combining diverse layout approaches, such as standard graph drawing algorithms, constraint-based algorithms, or rule-based layout algorithms. More specifically, an algorithm is presented that automatically computes the complete layout in a straightforward way.

...read moreread less

5 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics