Home
/
Topics
/
Document layout analysis

Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1969

Papers

PDF

Open Access

More filters

Patent•

System and method for processing structured document

[...]

Shigehisa Kawabe¹, Setsu Kunitake¹, Ichiro Yamashita¹, 節國武, 一郎山下, 惠久川邉 - Show less +2 more•Institutions (1)

Fuji Xerox¹

25 Jan 2000

TL;DR: In this article, the authors propose to perform document synthesizing processing by extracting document components from a structure document and inserting/replacing the respective document components in a model document without using a script with described procedure.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To perform a document synthesizing processing by extracting document components from a structure document and inserting/replacing the respective document components in a model document without using a script with described procedure. SOLUTION: In a structured document, an extraction instruction for taking out the document components and repetitive copying and insertion/replacement instructions are imparted. Thus, as the result of specifying the take-out of the document components, the repetitive copying and the document components (parts) for inserting or replacing the document components, dynamically synthesizing the instructions taken out from the inputted plural structured documents and preparing a document processing description, the need of a document processing description script is eliminated. Thus, the time and labor of managing the script separately from original documents are omitted.

...read moreread less

10 citations

Journal Article•DOI•

A methodology for document processing: separating text from images

[...]

Nikolaos Bourbakis¹•Institutions (1)

Technical University of Crete¹

01 Feb 2001-Engineering Applications of Artificial Intelligence

TL;DR: The methodology is based on the recognition of text characters and words for the efficient separation text paragraphs from images by keeping their relationships for a possible reconstruction of the original page.

...read moreread less

10 citations

Patent•

Method for layout based document zone querying

[...]

Boris Chidlovskii¹•Institutions (1)

Xerox¹

02 Jul 2010

TL;DR: In this paper, a method and a system are disclosed for querying a document collection based on the layout of only a fragment of the content of a document, specified as a query zone.

...read moreread less

Abstract: A method and a system are disclosed for querying a document collection based on the layout of only a fragment of the content of a document, specified as a query zone. The method includes providing an index for a collection of documents. In the index, content of a document page in the collection that has been decomposed into layout blocks is indexed according to representations of the blocks and one or more geometric relations between the blocks. A query is generated which is based on representations of blocks determined to be within the query zone and geometric relations between them. This is used to query the index to retrieve pages of documents in the collection which can each be expected to include a layout zone somewhere in the page that is similar in layout to the query zone.

...read moreread less

10 citations

Journal Article•DOI•

Image-based logical document structure recognition

[...]

Grzegorz Kamola¹, Michał Spytkowski¹, Mariusz Paradowski¹, Urszula Markowska-Kaczmar¹•Institutions (1)

Wrocław University of Technology¹

01 Aug 2015-Pattern Analysis and Applications

TL;DR: The results of the proposed method for paragraph structure recognition are comparable to the referenced methods which offer segmentation only.

...read moreread less

Abstract: The paper presents a complete solution for recognition of textual and graphic structures in various types of documents acquired from the Internet. In the proposed approach, the document structure recognition problem is divided into sub-problems. The first one is localizing logical structure elements within the document. The second one is recognizing segmented logical structure elements. The input to the method is an image of document page, the output is the XML file containing all graphic and textual elements included in the document, preserving the reading order of document blocks. This file contains information about the identity and position of all logical elements in the document image. The paper describes all details of the proposed method and shows the results of the experiments validating its effectiveness. The results of the proposed method for paragraph structure recognition are comparable to the referenced methods which offer segmentation only.

...read moreread less

10 citations

Patent•

Printing and validation of self validating security documents

[...]

Jonathan Scott Carr, Burt W. Perry, Geoffrey B. Rhoads

13 Nov 1999

TL;DR: In this paper, the authors present a system that can detect hidden and visual information on the security documents, and it can also detect information about the user, and also automatically detect the user's identity.

...read moreread less

Abstract: Security documents which has multiple field each of which contains information that is perceptible in more than one way. One field can contain a visually perceptible image (23, 24, 25) and a ditigal watermark (22) that can be detected when the image is scanned (302) and processed, another field can contain machine readable OCR text (24) that can be read by both a human and by a programmed computer, and still another field can contain watermark data (22). Documents are produced by beginning with a template (21) which defines the placement of elements on the document and the interrelationships between hidden and visual information on the document. Pictures, graphics and digital data are extracted from a data bank, and watermark data is embedded (27) in the pictures and graphics as appropriate. An automatic validation system (312) of the present invention reads multiple fields on the document, and it also automatically detects information about the user.

...read moreread less

10 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
…
133
134
135
136
137
138
139
…
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics