Home
/
Topics
/
Document layout analysis

Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1969

Papers

PDF

Open Access

More filters

Patent•

Method of compound document comparison

[...]

Deepak Massand

30 May 2006

TL;DR: In this article, an original compound document and a modified compound document are analyzed to determine and mark the location of embedded objects, and a comparison is performed between an original primary document and the modified primary document, the output of which is a comparison output document.

...read moreread less

Abstract: A method and system for comparing compound documents. An original compound document and a modified compound document are analyzed to determine and mark the location of embedded objects. A comparison is performed between an original primary document and the modified primary document, ignoring the embedded objects, the output of which is a comparison output document. The embedded objects are compared by copying the contents of the embedded objects to compatible documents, comparing the embedded object from the original compound document and the embedded object from the modified compound document, the output of which is inserted into the comparison output document using the location markers of the embedded objects.

...read moreread less

31 citations

Proceedings Article•DOI•

Combining DOM tree and geometric layout analysis for online medical journal article segmentation

[...]

Daniel Le¹, George R. Thoma¹, Jie Zou¹•Institutions (1)

National Institutes of Health¹

11 Jun 2006

TL;DR: An HTML Web page segmentation algorithm, which is applied to segment online medical journal articles (regular HTML and PDF-converted-HTML files), shows that segmenting the entire Web page into zones can significantly expedite and increase the accuracy of the subsequent information retrieval steps.

...read moreread less

Abstract: We describe an HTML web page segmentation algorithm, which is applied to segment online medical journal articles (regular HTML and PDF-Converted-HTML files). The web page content is modeled by a zone tree structure based primarily on the geometric layout of the web page. For a given journal article, a zone tree is generated by combining DOM tree analysis and recursive X-Y cut algorithm. Combining with other visual cues, such as background color, font size, font color and so on, the page is segmented into homogeneous regions. Evaluation is conducted with 104 articles from 11 journals. Out of 9726 ground-truth zones, 9376 zones are correctly segmented, for an accuracy of 96.40%. Segmenting the entire web page into zones can significantly expedite and increase the accuracy of the subsequent information retrieval steps.

...read moreread less

31 citations

Proceedings Article•DOI•

Page segmentation for Manhattan and non-Manhattan layout documents via selective CRLA

[...]

Hung-Ming Sun¹•Institutions (1)

Kainan University¹

31 Aug 2005

TL;DR: The proposed method, named selective CRLA, has been successfully applied to extraction of text from commercial magazine pages with complicated layouts and is capable of processing documents with both Manhattan and non-Manhattan layouts.

...read moreread less

Abstract: The constrained run-length algorithm (CRLA) is a well-known technique for page segmentation. The algorithm is fast and can be used to partition documents with Manhattan layouts. It is not, however, suited to deal with pages with layouts beyond the Manhattan format, e.g. irregular halftone images embedded in text paragraphs. A modified version of the CRLA, named selective CRLA, is presented in this paper. The selective CRLA is capable of processing documents with both Manhattan and non-Manhattan layouts. The selective CRLA is performed twice with different sets of parameters on a label image derived from the input document image. After both of its executions, the yielded text regions are extracted. The proposed method has been successfully applied to extraction of text from commercial magazine pages with complicated layouts.

...read moreread less

31 citations

Patent•

Combined image and text document

[...]

Oliver H. Foehr¹, Alan John Michaelis¹•Institutions (1)

Microsoft¹

08 Jan 2009

TL;DR: In this article, a combined image and text document is described, where a scanned image of a document can be generated utilizing a scanning application, and text representations of text that is included in the document can also be generated using a character recognition application.

...read moreread less

Abstract: A combined image and text document is described. In embodiment(s), a scanned image of a document can be generated utilizing a scanning application, and text representations of text that is included in the document can be generated utilizing a character recognition application. Position data of the text representations can be correlated with locations of corresponding text in the scanned image of the document. The scanned image can then be rendered for display overlaid with the text representations as a transparent overlay, where the scanned image and the text representations are independently user-selectable for display. A user-selectable input can be received to display the text representations without the scanned image, the scanned image without the text representations, or to display the text representations adjacent the scanned image.

...read moreread less

30 citations

Patent•

Method of analyzing a layout structure of an image using character recognition, and displaying or modifying the layout

[...]

Yoshiaki Kurosawa¹, Katsumi Kato¹•Institutions (1)

Toshiba¹

17 Mar 1999

TL;DR: A layout analysis section analyzes a layout structure of an input image and a layout information memory section stores layout information representing a relationship between the layout structure and a corresponding area in the input image.

...read moreread less

Abstract: A document image processing apparatus. A layout analysis section analyzes a layout structure of an input image. A layout information memory section stores layout information representing a relationship between the layout structure and a corresponding area in the input image. An image display section displays the corresponding area in the input image according to the layout information. An indication input section inputs an indication to modify the corresponding area in the input image displayed. A modification section modifies the corresponding area in the input image and the layout structure of the corresponding area in the layout information according to the indication.

...read moreread less

30 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
…
56
57
58
59
60
61
62
…
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics