Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•

High-speed document verification system

[...]

William N Stratigos, Stephen R Landau

02 Aug 1991

TL;DR: In this paper, a high-speed document verification system includes a document which is printed with a pattern having a predetermined arrangement of different reflectivity due to varying densities, line resolutions, or fluorescence.

...read moreread less

Abstract: A high-speed document verification system includes a document which is printed with a pattern having a predetermined arrangement of different reflectivity due to varying densities, line resolutions, or fluorescence. The arrangement represents information about the document. The document is fed into a high-speed document scanner sensitive to the varying ink densities or line resolutions. A graphic image of the document is produced by the scanner and this image or a graphic file of the image is checked to see if the proper pattern exists. A comparison unit, such as an optical character recognition system may be used to compare the scanned document's image with known density arrangements of valid documents to determine what information, if any, is represented by the arrangement. The graphic image may be sent to an operator's work station to be visually checked rather than being compared by the comparison unit or the image may be sent to the operator after it has been rejected by the comparison unit.

...read moreread less

67 citations

Journal Article•DOI•

Design of a mathematical expression understanding system

[...]

Hsi-Jian Lee¹, Jiumn-Shine Wang¹•Institutions (1)

National Chiao Tung University¹

01 Mar 1997-Pattern Recognition Letters

TL;DR: A system for segmenting and understanding text and mathematical expressions in a document can be divided into six stages: page segmentation and labeling, character segmentation, feature extraction, character recognition, expression formation, and error correction and expression extraction.

...read moreread less

66 citations

Journal Article•DOI•

Document cleanup using page frame detection

[...]

Faisal Shafait¹, Joost van Beusekom², Daniel Keysers¹, Thomas M. Breuel²•Institutions (2)

German Research Centre for Artificial Intelligence¹, Kaiserslautern University of Technology²

08 Oct 2008-International Journal on Document Analysis and Recognition

TL;DR: A geometric matching algorithm is used to find the optimal page frame of structured documents (journal articles, books, magazines) by exploiting their text alignment property and shows that by removing characters outside the computed page frame, the OCR error rate is reduced.

...read moreread less

Abstract: When a page of a book is scanned or photocopied, textual noise (extraneous symbols from the neighboring page) and/or non-textual noise (black borders, speckles, ...) appear along the border of the document. Existing document analysis methods can handle non-textual noise reasonably well, whereas textual noise still presents a major issue for document analysis systems. Textual noise may result in undesired text in optical character recognition (OCR) output that needs to be removed afterwards. Existing document cleanup methods try to explicitly detect and remove marginal noise. This paper presents a new perspective for document image cleanup by detecting the page frame of the document. The goal of page frame detection is to find the actual page contents area, ignoring marginal noise along the page border. We use a geometric matching algorithm to find the optimal page frame of structured documents (journal articles, books, magazines) by exploiting their text alignment property. We evaluate the algorithm on the UW-III database. The results show that the error rates are below 4% each of the performance measures used. Further tests were run on a dataset of magazine pages and on a set of camera captured document images. To demonstrate the benefits of using page frame detection in practical applications, we choose OCR and layout-based document image retrieval as sample applications. Experiments using a commercial OCR system show that by removing characters outside the computed page frame, the OCR error rate is reduced from 4.3 to 1.7% on the UW-III dataset. The use of page frame detection in layout-based document image retrieval application decreases the retrieval error rates by 30%.

...read moreread less

66 citations

Patent•

Device and method for layout of a structured document using multi-column areas

[...]

Naoki Hayashi¹, Kazuo Saito¹, Minoru Ikeda¹•Institutions (1)

Fuji Xerox¹

08 Nov 1994

TL;DR: In this article, a document layout processing device for the layout of a structured document is disclosed wherein a logical structure of a document is stored in the device, the logical structure has a preselected specific page format with a plurality of columns; document contents corresponding to each of the components of the logical structures; and a layout directive information indicating whether the components should be laid out in a single column or in a multi-column area.

...read moreread less

Abstract: A document layout processing device for the layout of a structured document is disclosed wherein a logical structure of a document is stored in the device; the logical structure has a preselected specific page format with a plurality of columns; document contents corresponding to each of the components of the logical structure; and a layout directive information indicating whether the components of the logical structure should be laid out in a single column or in a multi-column area, whereby a content layout method lays out the document in one of the columns or in the multi-column area according to the logical structure while referring to the layout directive information; and a method of using the device for generating a multi-column area that extends over a number of columns including a specific column.

...read moreread less

66 citations

Proceedings Article•DOI•

Table structure recognition and its evaluation

[...]

Jianying Hu¹, Ramanujan S. Kashi¹, Daniel P. Lopresti², Gordon Wilfong²•Institutions (2)

Avaya¹, Alcatel-Lucent²

21 Dec 2000

TL;DR: A new paradigm, 'random graph probing,' is described for comparing the results returned by the recognition system and the representation created during ground-truthing, which could be applied to other document recognition tasks and perhaps even other computer vision problems as well.

...read moreread less

Abstract: Tables are an important means for communicating information in written media, and understanding such tables is a challenging problem in document layout analysis. In this paper we describe a general solution to the problem of recognizing the structure of a detected table region. First hierarchial clustering is used to identify columns and then spatial and lexical criteria to classify headers. We also address the problem of evaluating table structure recognition. Our model is based on a directed acyclic attribute graph, or table DAG. We describe a new paradigm, 'random graph probing,' for comparing the results returned by the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could be applied to other document recognition tasks and perhaps even other computer vision problems as well.© (2000) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

...read moreread less

65 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics