Home
/
Topics
/
Document layout analysis

Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1969

Papers

PDF

Open Access

More filters

Proceedings Article•

Using local features for efficient layout analysis of ancient manuscripts

[...]

Angelika Garz¹, Robert Sablatnig¹, Markus Diem¹•Institutions (1)

Vienna University of Technology¹

01 Aug 2011

TL;DR: A binarization-free layout analysis method for ancient manuscripts is proposed, which identifies and localizes layout entities exploiting their structural similarities on the local level.

...read moreread less

Abstract: A binarization-free layout analysis method for ancient manuscripts is proposed, which identifies and localizes layout entities exploiting their structural similarities on the local level. Hence, the textual entities are disassembled into segments, and a part-based detection is done which employs local gradient features known from the field of object recognition, the Scale Invariant Feature Transform (SIFT), to describe these structures. Layout analysis is the first step in the process of document understanding; it identifies regions of interest and, hence, serves as input for other algorithms such as Optical Character Recognition (OCR). Moreover, the document layout allows scholars to establish the spatio-temporal origin, authenticate, or index a document. The layout entities considered in this approach include the body text, embellished initials, plain initials and headings.

...read moreread less

8 citations

Patent•

A means for document security tracking

[...]

William John Bailey, Simon Jeffrey Pringle, John Karl Atkinson

17 Feb 2004

TL;DR: In this article, a document coding system that allows a single person, company or organization to tack leaked or copied confidential documents, issued to different departments or associates within an organisation, back to the department or person of the non-approved document copy is described.

...read moreread less

Abstract: The invention described here consists of a document coding system that will give a single person, company or organisation the ability to tack leaked or copied confidential documents, issued to different departments or associates within an organisation, back to the department or person of the non approved document copy. It will give each printed copy of a document a unique fingerprint. The invention described herein may achieve this by encoding information in a word processed document using subtle changes in font, spacing and page layout, for example, to reflect the time, user and printer information. This will provide a way of tracking a document to the time and place of creation. The invention includes means for either automatically decoding information in documents by using optical character recognition (OCR) or document image analysis, for example, or to provide a visual means to assist manual document decoding.

...read moreread less

8 citations

Patent•

Document processor and method of processing document

[...]

Kazuyuki Saito, 和之齋藤

24 Aug 2000

TL;DR: In this paper, the problem of taking out contents such as a text, a picture, and a table from an electronic document and making them integrally handleable and reusable is addressed.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To take out contents such as a 'text', a 'picture' and a 'table' from an electronic document and to make them integrally handleable and reusable. SOLUTION: In the document processor processing the electronic document, an electronic document preparation part 103 analyzes the layout of picture data, divides it into the areas of prescribed attributes and prepares the electronic document 104 including the contents for every divided area by designating the attribute so that it can be extracted. A contents detection part 109 detects the contents in the electronic document 104 and a content management part 110 registers and manages the detected contents based on information showing the attribute.

...read moreread less

8 citations

Patent•

Detecting content-rich text

[...]

Einat Amitay¹, Nadav Har'El¹•Institutions (1)

IBM¹

19 Jan 2005

TL;DR: In this article, a method for finding content-rich text in a document by identifying areas of narrative in the document is presented. But this method requires a large number of annotated documents.

...read moreread less

Abstract: A method includes finding content-rich text in a document by identifying areas of narrative in the document. An apparatus includes a detector and a content-rich text indicator. The detector detects linguistic parameters which characterize narrative text in an input document and the content-rich text indicator provides the locations of narrative text in the input document.

...read moreread less

8 citations

Book Chapter•DOI•

An Approach for Processing Mathematical Expressions in Printed Document

[...]

Bidyut B. Chaudhuri¹, Utpal Garain¹•Institutions (1)

Indian Statistical Institute¹

04 Nov 1998

TL;DR: The system consists of three main components namely detection of mathematical expressions in a document, recognition of the symbols present in the expression and meaningful arrangement of the recognized symbols.

...read moreread less

Abstract: In this paper, we propose an approach for understanding mathematical expressions in printed document. The system consists of three main components namely (i) detection of mathematical expressions in a document, (ii) recognition of the symbols present in the expression and (iii) meaningful arrangement of the recognized symbols. However, detection of mathematical expressions is done through recognition of symbols. Moreover, some structural features of the expressions are also used for this purpose. For recognition of the symbols a hybrid of feature based and template based recognition techniques is used. The bounding-box coordinates and the size information of the symbols help to determine the spatial relationships among the symbols. A set of predefined grammar rules is used to form the meaningful symbol groups to properly arrange the symbols. Experiments conducted using these approaches on a large number of documents show high accuracy.

...read moreread less

8 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
…
148
149
150
151
152
153
154
…
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics