Home
/
Topics
/
Document layout analysis

Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1969

Papers

PDF

Open Access

More filters

Patent•

Document structure identifier

[...]

David Slocombe

20 May 2003

TL;DR: In this article, a method of automated document structure identification based on visual cues is proposed, which can be applied in the generation of extensible mark-up language files, natural language parsing and search engine ranking mechanisms.

...read moreread less

Abstract: A method of automated document structure identification based on visual cues is disclosed herein The two dimensional layout of the document is analyzed to discern visual cues related to the structure of the document, and the text of the document is tokenized so that similarly structured elements are treated similarly The method can be applied in the generation of extensible mark-up language files, natural language parsing and search engine ranking mechanisms

...read moreread less

91 citations

Patent•

System for document layout analysis

[...]

Robert Cooperman¹•Institutions (1)

Xerox¹

23 May 1996

TL;DR: In this paper, a system for providing information on the structure of a document page so as to complement the textual information provided in an optical character recognition system is presented, which can be used to produce a file editable in a native word-processing environment from input data including the content and characteristics of regions of at least one page forming the document.

...read moreread less

Abstract: The present invention is a system for providing information on the structure of a document page so as to complement the textual information provided in an optical character recognition system. The system employs a method that can be used to produce a file editable in a native word-processing environment from input data including the content and characteristics of regions of at least one page forming the document. The method includes the steps of: (a) identifying sections within the page; (b) identifying captions; (c) determining boundaries of at least one column on the page, and optionally (d) resizing at least one element of the page of the document so that all pages of the document are of a common size.

...read moreread less

91 citations

Patent•

Automated document layout design

[...]

Kathrin Berkner¹, Siddharth Joshi¹, Edward L. Schwartz¹, Andrea Mariotti¹•Institutions (1)

Ricoh¹

06 Mar 2006

TL;DR: In this article, a method and apparatus for automated document layout creation is described, which comprises receiving a first layout of document image objects and creating a second layout of image objects subject to placement constraints corresponding to the placement of the image objects.

...read moreread less

Abstract: A method and apparatus for automated document layout creation is disclosed. In one embodiment, the method comprises receiving a first layout of document image objects and creating a second layout of document image objects subject to placement constraints corresponding to placement of document image objects, at least one of the placement constraints being based on object content in one or more of the document image objects.

...read moreread less

90 citations

Patent•

Method and apparatus for adapting web contents to different display area dimensions

[...]

Vincent Wen-Jeng Lue

14 Jan 2004

TL;DR: In this paper, a method is proposed to generate a minimum set of simplified and navigable web contents from a single web document that is oversized for targeted smaller devices, while preserving text, image, transactional and embedded presentation constraint information.

...read moreread less

Abstract: A method is disclosed to generate, while preserving text, image, transactional and embedded presentation constraint information, a minimum set of simplified and navigable web contents from a single web document that is oversized for targeted smaller devices. The method includes a parser, a content tree builder, a document tree builder, a document simplifier, a virtual layout engine, a document partitioner, a content scalar and a markup generator. The parser generates markup and data tags from an HTML source document. The builder constructs a content tree. The simplifier transforms the document tree into an intermediate one defined by a subset of XHTML tags and attributes. Layout constraints, including size, area, placement order, and column/row relationships, are calculated for partitioning and scaling the document tree into sub document trees with assigned navigation order and hierarchical hyperlinks. A simplified HTML document is then generated with the markup generator.

...read moreread less

90 citations

Book•

Document analysis—from pixels to contents

[...]

Jürgen Schürmann, Norbert Bartneck, Thomas Bayer, Jürgen Franke, E. Mandler, Matthias Oberländer - Show less +2 more

01 Jan 1995

TL;DR: In this article, a conceptual framework for solving the task of document analysis, which consists in the conversion of the document's pixel representation into an equivalent knowledge network representation holding the document content and layout, is presented.

...read moreread less

Abstract: The authors present a conceptual framework for solving the task of document analysis, which, in essence, consists in the conversion of the document's pixel representation into an equivalent knowledge network representation holding the document's content and layout. Starting on the pixel level, the formation of elementary geometric objects on which layout analysis as well as the definition of character objects is based is described. Character recognition accomplishes the mapping from geometric object to character meaning in ASCII representation. On the next level of abstraction words are formed and verified by contextual processing. Modeled knowledge about complete documents and about how their constituents are related to the application forms the highest level of abstraction. The various problems arising at each stage are discussed. The dependencies between the different levels are exemplified and technical solutions put forward. >

...read moreread less

89 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
…
15
16
17
18
19
20
21
…
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics