Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Historical document layout analysis using anisotropic diffusion and geometric features

[...]

Galal M. BinMakhashen¹, Sabri A. Mahmoud¹•Institutions (1)

King Fahd University of Petroleum and Minerals¹

01 Sep 2020-International Journal on Digital Libraries

TL;DR: A learning-free and hybrid document layout analysis for handwritten historical manuscripts with promising results in terms of segmentation quality of main-content that reaches up to 98.5% success rate.

...read moreread less

Abstract: There are several digital libraries worldwide which maintain valuable historical manuscripts. Usually, digital copies of these manuscripts are offered to researchers and readers in raster-image format. These images carry several document degradations that may hinder automatic information retrieval solutions such as manuscript indexing, categorization, retrieval by content, etc. In this paper, we propose a learning-free and hybrid document layout analysis for handwritten historical manuscripts. It has two main phases: page characterization and segmentation. First, the proposed method locates main-content initially using top-down whitespace analysis. It employs anisotropic diffusion filtering to find whitespaces. Then, it extracts template features representing manuscripts’ authors writing behavior. After that, moving windows are used to scan the manuscript page and define main-content boundaries more precisely. We evaluated the proposed method on two datasets: One set is publicly available with 38 historical manuscript pages, and the other set of 51 historical manuscript pages that are collected from the online Harvard Library. Experiments on both datasets show promising results in terms of segmentation quality of main-content that reaches up to 98.5% success rate.

...read moreread less

7 citations

Proceedings Article•DOI•

A comprehensive evaluation methodology for noisy historical document recognition techniques

[...]

Nikolaos Stamatopoulos, Georgios Louloudis¹, Basilis Gatos•Institutions (1)

National and Kapodistrian University of Athens¹

23 Jul 2009

TL;DR: Experimental results prove that using the proposed technique, the percentage of time saved for the text line, word and character segmentation ground truth creation is more than 90%.

...read moreread less

Abstract: In this paper, we propose a new comprehensive methodology in order to evaluate the performance of noisy historical document recognition techniques. We aim to evaluate not only the final noisy recognition result but also the main intermediate stages of text line, word and character segmentation. For this purpose, we efficiently create the text line, word and character segmentation ground truth guided by the transcription of the historical documents. The proposed methodology consists of (i) a semiautomatic procedure in order to detect the text line, word and character segmentation ground truth regions making use of the correct document transcription, (ii) calculation of proper evaluation metrics in order to measure the performance of the final OCR result as well as of the intermediate segmentation stages. The semi-automatic procedure for detecting the ground truth regions has been evaluated and proved efficient and time saving. Experimental results prove that using the proposed technique, the percentage of time saved for the text line, word and character segmentation ground truth creation is more than 90%. An analytic experiment using a commercial OCR engine applied to a historical book is also presented.

...read moreread less

7 citations

Proceedings Article•DOI•

A geometric approach for accurate and efficient performance evaluation of layout analysis methods

[...]

D. Bridson¹, Apostolos Antonacopoulos¹•Institutions (1)

University of Salford¹

01 Dec 2008

TL;DR: This paper presents an improved approach that uses polygons to accurately describe both segmentation and ground truth regions that is efficiently compared using a rectangular interval based decomposition.

...read moreread less

Abstract: A major component of performance evaluation of layout analysis methods is the comparison of ground truth regions with regions resulting from segmentation methods. The description of document regions must be both accurate in describing complex layouts and efficient in view of the large number of region comparisons that must be performed. Previous approaches favour either accuracy or efficiency, resulting in an impractical compromise. This paper presents an improved approach that uses polygons to accurately describe both segmentation and ground truth regions. Polygonal descriptions are efficiently compared using a rectangular interval based decomposition. This approach has been validated using data from the ICDAR page segmentation competitions.

...read moreread less

7 citations

Combining Color and Layout Features for the Identification of Low-resolution Documents

[...]

Ardhendu Behera, Denis Lalanne, Rolf Ingold

15 Mar 2005

TL;DR: In this article, the combined color and layout features are arranged in a symbolic file, which is unique for each document and is called the document's visual signature, and the identification method first uses the color information in the signatures in order to focus the search space on documents having a similar color distribution, and finally selects the document having the most similar layout structure in the remaining search space.

...read moreread less

Abstract: This paper proposes a method, combining color and layout features, for identifying documents captured from low-resolution handheld devices. On one hand, the document image color density surface is estimated and represented with an equivalent ellipse and on the other hand, the document shallow layout structure is computed and hierarchically represented. The combined color and layout features are arranged in a symbolic file, which is unique for each document and is called the document’s visual signature. Our identification method first uses the color information in the signatures in order to focus the search space on documents having a similar color distribution, and finally selects the document having the most similar layout structure in the remaining search space. Finally, our experiment considers slide documents, which are often captured using handheld devices

...read moreread less

7 citations

Patent•

Method and system for document interaction

[...]

Melyssa Barrett¹, Eric Christopher Lundquist, Nicholas Washburn•Institutions (1)

Parker Hannifin¹

20 Dec 2006

TL;DR: In this article, a method for creating a document object model is described, where the model models document objects on a document, and after reviewing the model, a code version is selected from a library of code versions.

...read moreread less

Abstract: A method is disclosed. The method includes creating a document object model, where the document object model models document objects on a document. The document object model is reviewed, and after reviewing the document object model, a code version is selected from a library of code versions.

...read moreread less

7 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics