Home
/
Topics
/
Document layout analysis

Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1969

Papers

PDF

Open Access

More filters

Patent•

White space graphs and trees for content-adaptive scaling of document images

[...]

Kathrin Berkner¹•Institutions (1)

Ricoh¹

30 Jun 2005

TL;DR: In this article, a method, article of manufacture, and apparatus for content-adaptive scaling of document images is described, which comprises identifying spatial relationships between document objects of a document image, determining space separating pairs of neighboring document objects, and determining at least one scaling factor based on the space separating the document objects in the document image and based on display device characteristics.

...read moreread less

Abstract: A method, article of manufacture, and apparatus for content-adaptive scaling of document images is described. In one embodiment, the method comprises identifying spatial relationships between document objects of a document image, determining space separating pairs of neighboring document objects, and determining at least one scaling factor based on the space separating the document objects in the document image and based on display device characteristics.

...read moreread less

61 citations

Patent•

Image-based data management method and system

[...]

Yung-Chun Huang

15 Nov 2007

TL;DR: In this article, the authors present a system for storing, organizing, and accessing image-based documents, which includes OCR conversion process to produce an equivalent document in text format, identifying the keywords of the equivalent document, linking the keywords with the image based document and storing the imagebased document, the corresponding equivalent document and the keywords in a relational database.

...read moreread less

Abstract: Methods and systems are provided for storing, organizing, and accessing image-based documents The method includes receiving an image-based document, conducting an OCR conversion process to produce an equivalent document in text format, identifying keywords of the equivalent document in text format, linking the keywords with the image-based document and the corresponding equivalent document in text format, and storing the image-based document, the corresponding equivalent document in text format, and the keywords in a relational database

...read moreread less

61 citations

Journal Article•DOI•

Comparison and Classification of Documents Based on Layout Similarity

[...]

Jianying Hu¹, Ramanujan S. Kashi¹, Gordon Wilfong¹•Institutions (1)

Alcatel-Lucent¹

01 May 2000-Information Retrieval

TL;DR: The usefulness of the features derived from interval coding in a hidden Markov model based page layout classification system that is trainable and extendible are demonstrated.

...read moreread less

Abstract: This paper describes features and methods for document image comparison and classification at the spatial layout level. The methods are useful for visual similarity based document retrieval as well as fast algorithms for initial document type classification without OCR. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors by capturing structural characteristics of the image. These fixed-length vectors are then compared to each other through a Manhattan distance computation for fast page layout comparison. The paper describes experiments and results to rank-order a set of document pages in terms of their layout similarity to a test document. We also demonstrate the usefulness of the features derived from interval coding in a hidden Markov model based page layout classification system that is trainable and extendible. The methods described in the paper can be used in various document retrieval tasks including visual similarity based retrieval, categorization and information extraction.

...read moreread less

59 citations

Patent•

Method and apparatus for document formatting with efficient figure element layout manipulation

[...]

Mika Fukui¹, Isamu Iwai¹, Koji Yamaguchi¹, Miwako Doi¹•Institutions (1)

Toshiba¹

26 Aug 1994

TL;DR: In this article, a method and an apparatus for document formatting, capable of reflecting the preference of the operator and overall balance, such that the desired formatting can be obtained efficiently without tedious post-processing operations.

...read moreread less

Abstract: A method and an apparatus for document formatting, capable of reflecting the preference of the operator and overall balance, such that the desired formatting can be obtained efficiently without tedious post-processing operations. In the apparatus, document data representing the document, including figure data representing figure elements of the document, and region data indicating layout regions to which the document is to be laid out, are inputted, candidate layouts for each figure element to be laid out are generated, one of the generated candidate layouts is selected, and the document is formatted in the layout region, according to the selected one of the candidate layouts.

...read moreread less

59 citations

Proceedings Article•DOI•

Extraction of text areas in printed document images

[...]

Jean Duong¹, Myriam Côte¹, Hubert Emptoz², Ching Y. Suen³•Institutions (3)

École de technologie supérieure¹, Institut national des sciences appliquées², Concordia University³

09 Nov 2001

TL;DR: A document analysis system which is expected to extract regions of interest in greyscale document images using geometric and texture features and some entropic heuristic is presented.

...read moreread less

Abstract: In this paper, we present a document analysis system which is expected to extract regions of interest in greyscale document images. Collected areas are then clustered in text zones and non-text areas using geometric and texture features. The system works in two steps. Regions of interest are retrieved via cumulative gradient considerations. In classification module, we introduced some entropic heuristic. Experiments are done on the MediaTeam Document Database to show the relevance of this criteria.

...read moreread less

59 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
…
27
28
29
30
31
32
33
…
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics