Home
/
Topics
/
Document layout analysis

Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1969

Papers

PDF

Open Access

More filters

Patent•

Character recognition system for reading a document edited with handwritten symbols

[...]

Frank Alan I

30 Oct 1969

TL;DR: In this article, a font of editing symbols is provided which are handwritable yet recognizable by a character recognition system, each of the symbols is representative of an editing instruction, and an appropriate symbol is inserted adjacent each portion of the textual material which is in error.

...read moreread less

Abstract: A method and apparatus for editing a document having textual material thereon. A unique font of editing symbols is provided which are handwritable yet recognizable by a character recognition system. Each of the symbols is representative of an editing instruction. An appropriate symbol is inserted adjacent each portion of the textual material which is in error. The document is then inserted into a character recognition system without requiring reproduction of the document with the alterations incorporated.

...read moreread less

17 citations

Journal Article•DOI•

An adaptive over-split and merge algorithm for page segmentation

[...]

Ha Dai-Ton, Nguyen Duc-Dung¹, Le Duc-Hieu¹•Institutions (1)

Vietnam Academy of Science and Technology¹

01 Sep 2016-Pattern Recognition Letters

TL;DR: Experiments show the effectiveness of the proposed algorithm in reducing both the under and over-segmentation errors and boost the performance significantly when comparing with popular page segmentation algorithms.

...read moreread less

17 citations

Proceedings Article•DOI•

Text Line Extraction Using Adaptive Partial Projection for Palm Leaf Manuscripts from Thailand

[...]

Rapeeporn Chamchong¹, Chun Che Fung¹•Institutions (1)

Murdoch University¹

18 Sep 2012

TL;DR: The results from this research suggested that the proposed approach for practical data on palm leaf manuscripts has better performance in solving the line segmentation problem.

...read moreread less

Abstract: Text line extraction is one of the critical steps in document analysis and optical character recognition (OCR) systems. The purpose of this study is to address the problem of text line extraction of ancient Thai manuscripts written on palm leaves, using an Adaptive Partial Projection (APP) technique by integrating a modified partial projection and smooth histogram with recursion. The proposed approach was compared with a Modified Partial Projection (MPP) looking at vowel analysis and touching components of two consecutive lines. The results from this research suggested that the proposed approach for practical data on palm leaf manuscripts has better performance in solving the line segmentation problem.

...read moreread less

16 citations

Patent•

Apparatus, method, and computer program for analyzing document layout

[...]

Hiroaki Takebe¹, Katsuhito Fujimoto¹, Satoshi Naoi¹•Institutions (1)

Fujitsu¹

05 Jul 2005

TL;DR: In this article, a plurality of different extraction conditions are stored in an extraction condition memory for use in extracting text blocks from a given document image, in accordance with those extraction conditions, a text block extractor extracts a plurality set of sets of text blocks.

...read moreread less

Abstract: A document layout analysis program capable of extracting an appropriate set of text blocks from a given document image even in the case where the document layout is so complicated that conventional extraction methods with a single extraction condition would not work well. A plurality of different extraction conditions are stored in an extraction condition memory for use in extracting text blocks from a given document image. In accordance with those extraction conditions, a text block extractor extracts a plurality of sets of text blocks from the document image. A text block consolidator produces a consolidated set of text blocks by performing character recognition on each extracted text block, evaluating validity of each text block based on a result of the character recognition, and selecting most valid text blocks from among the plurality of sets of text blocks.

...read moreread less

16 citations

Journal Article•DOI•

Arabic document layout analysis

[...]

Amany M. Hesham¹, Mohsen A. Rashwan¹, Hassanin M. Al-Barhamtoshy², Sherif M. Abdou¹, Amr Badr¹, Ibrahim Farag¹ - Show less +2 more•Institutions (2)

Cairo University¹, King Abdulaziz University²

08 Feb 2017-Pattern Analysis and Applications

TL;DR: The proposed system was evaluated against two other systems that represent the best available tools for the Arabic documents analysis, and evaluation results show that the proposed system works well on multi-font and multi-size documents with a variety of layouts even on some historical documents.

...read moreread less

Abstract: Document layout analysis is a key step in the process of converting document images into text. Arabic language script is cursive and written in different styles which cause some challenges in the analysis of Arabic text documents. In this paper, we introduce an approach for Arabic documents layout analysis. In that approach, the document is segmented into set of zones using morphological operations. The segmented zones are classified as text or non-text ones using a support vector machine classifier. Features used in zone classification are combination between texture-based features and connected component-based features. The textural-based feature vector size is reduced using genetic algorithm. Classified text zones are clustered, using adaptive sample set clustering algorithm, into lines. Each segmented line is segmented into words by clustering inter- and intra-spaces. The proposed system was evaluated against two other systems that represent the best available tools for the Arabic documents analysis, and evaluation results show that the proposed system works well on multi-font and multi-size documents with a variety of layouts even on some historical documents.

...read moreread less

16 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
…
98
99
100
101
102
103
104
…
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics