Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Detection and segmentation of tables and math-zones from document images

[...]

Sekhar Mandal¹, S.P. Chowdhury¹, Asit Kumar Das¹, Bhabatosh Chanda²•Institutions (2)

Indian Institute of Engineering Science and Technology, Shibpur¹, Indian Statistical Institute²

23 Apr 2006

TL;DR: An algorithm to separate out tables and math-zones from document images with low computation cost is proposed and it has been observed that tables have distinct columns which imply that gaps between the fields are substantially larger than the gap between the words in text lines.

...read moreread less

Abstract: We propose an algorithm to separate out tables and math-zones from document images The algorithm relies on the spatial characteristics of tables and math-zones in a document It has been observed that tables have distinct columns which imply that gaps between the fields are substantially larger than the gaps between the words in text lines and in math-zones the characters and symbols are less dense in comparison to normal text lines These deceptively simple observations have led us to design a simple but powerful table and math-zone detection system with low computation cost

...read moreread less

6 citations

Patent•

Restoration of modified document to original state

[...]

Ajay Jain¹•Institutions (1)

Adobe Systems¹

31 Oct 2014

TL;DR: In this article, a modified document is scanned into a digital form using an optical scanning device and the content of the modified digital document including one or more annotations is then grouped into several components, including text, images, form fields and text boxes, and marked shapes, based on corresponding component specifications.

...read moreread less

Abstract: Techniques are disclosed for restoring a modified document to an original state. The modified document is scanned into a digital form using an optical scanning device. The content of the modified digital document including one or more annotations is then grouped into several components, including text, images, form fields and text boxes, and marked shapes, based on corresponding component specifications. Each component is then categorized as being structured or unstructured. Structured components that correspond with representative entries in a component repository, such as text in a standard font size, weight and style, are identified as core document content. Unstructured components are identified as annotated document content or highlighted document content, depending on certain characteristics of the components. The categorized and identified components can then be presented separately or in various combinations.

...read moreread less

6 citations

Proceedings Article•DOI•

Optical font recognition using conditional random field

[...]

Aziza Satkhozhina¹, Ildus Ahmadullin², Jan P. Allebach¹•Institutions (2)

Purdue University¹, Hewlett-Packard²

10 Sep 2013

TL;DR: The Conditional Random Field (CRF) model is used to perform OFR and it is shown that the effectiveness of this approach on a set of 616 fonts is demonstrated.

...read moreread less

Abstract: Automated publishing systems require large databases containing document page layout templates. Most of these layout templates are created manually. A lower cost alternative is to extract document page layouts from existing documents. In order to extract the layout from a scanned document image, it is necessary to perform Optical Font Recognition (OFR) since the font is an important element in layout design. In this paper, we use the Conditional Random Field (CRF) model to perform OFR. First, we extract typographical features of the text. Then, we train the probabilistic model using a log-linear parameterization of CRF. The advantage of using CRF is that it does not assume that the typographical features are independent of each other. We demonstrate the effectiveness of this approach on a set of 616 fonts.

...read moreread less

6 citations

Proceedings Article•DOI•

A Morphology Based Approach for Binarization of Handwritten Documents

[...]

Vassilis Papavassiliou, Fotini Simistira, Vassilis Katsouros, George Carayannis¹•Institutions (1)

National and Kapodistrian University of Athens¹

18 Sep 2012

TL;DR: The method was evaluated on the benchmarking dataset of the International Document Image Binarization Contest (DIBCO 2011) and show promising results.

...read moreread less

Abstract: Document image binarization is an initial though critical stage towards the recognition of the text components of a document. This paper describes an efficient method based on mathematical morphology for extracting text regions from degraded handwritten document images. The basic stages of our approach are: a) top-hat-by-reconstruction to produce a filtered image with reasonable even background, b) region growing starting from a set of seed points and attaching to each seed similar intensity neighboring pixels and c) conditional extension of the initially detected text regions based on the values of the second derivative of the filtered image. The method was evaluated on the benchmarking dataset of the International Document Image Binarization Contest (DIBCO 2011) and show promising results.

...read moreread less

6 citations

Proceedings Article•DOI•

Integrated system for automated financial document processing

[...]

Khaled Hassanein, Slawo Wesolkowski, Ray Higgins, Ralph Crabtree, Antai Peng - Show less +1 more

26 Feb 1997

TL;DR: A system was developed that integrates intelligent document analysis with multiple character/numeral recognition engines in order to achieve high accuracy automated financial document processing and performs well on a test set of machine printed business checks.

...read moreread less

Abstract: A system was developed that integrates intelligent document analysis with multiple character/numeral recognition engines in order to achieve high accuracy automated financial document processing. In this system, images are accepted in both their grayscale and binary formats. A document analysis module starts by extracting essential features from the document to help identify its type (e.g. personal check, business check, etc.). These features are also utilized to conduct a full analysis of the image to determine the location of interesting zones such as the courtesy amount and the legal amount. These fields are then made available to several recognition knowledge sources such as courtesy amount recognition engines and legal amount recognition engines through a blackboard architecture. This architecture allows all the available knowledge sources to contribute incrementally and opportunistically to the solution of the given recognition query. Performance results on a test set of machine printed business checks using the integrated system are also reported.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

...read moreread less

6 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

35,779

Citations

No. of papers in the topic in previous years
Year	Papers
2023	5
2022	19
2021	34
2020	19
2019	14
2018	9

Document layout analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics