scispace - formally typeset
Proceedings ArticleDOI

Hybrid page segmentation using multilevel homogeneity structure

TLDR
This paper presents a hybrid method of page segmentation based on the combination of connected component analysis and classification on multilevel homogeneous regions, and achieves the higher accuracy compared to other methods.
Abstract
This paper presents a hybrid method of page segmentation based on the combination of connected component analysis and classification on multilevel homogeneous regions. This suggests an iterative method. In which, connected component analysis is used to classify the non-text elements at each level of homogeneous region, and multilevel homogeneity structure is used to ensure this classification can identify all non-text elements. The result of this iterative method is the two documents, text document and non-text document. On text document, adaptive mathematical morphology in each text homogeneous region will give us the corresponding text region. On the non-text document, more detailed classification of the non-text components are made to get separators, tables, images, etc. For evaluation, we experiment our method with datasets from ICDAR2009 page segmentation competition. According to the results, our proposed method achieves the higher accuracy compared to other methods. This proves the effectiveness and superiority of our proposed method.

read more

Citations
More filters
Journal ArticleDOI

Document Layout Analysis: A Comprehensive Survey

TL;DR: This survey paper presents a critical study of different document layout analysis techniques and discusses comprehensively the different phases of the DLA algorithms based on a general framework that is formed as an outcome of reviewing the research in the field.
Journal ArticleDOI

Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology

TL;DR: A novel hybrid method, which includes three main stages to deal with document layout analysis or page segmentation, which is the combination of connected component analysis and multilevel homogeneity structure and achieves a higher accuracy compared to other methods.
Proceedings ArticleDOI

A hybrid method for table detection from document image

TL;DR: A hybrid method consisting of the alternative bottom-up and top-down approaches is implemented to find the table region candidates by analyzing text lines and spare lines for detecting tables in document images.
Journal ArticleDOI

Document image layout analysis via explicit edge embedding network

TL;DR: A novel document layout analysis framework with the Explicit Edge Embedding Network (E3 Net), which contains the edge embedding block and dynamic skip connection block to produce detailed features, as well as a lightweight fully convolutional subnet as the backbone for the effectiveness of the framework.
Journal ArticleDOI

Beyond document object detection: instance-level segmentation of complex layouts

TL;DR: In this article, the task of instance segmentation on the document image domain is defined, which is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image.
References
More filters
Journal ArticleDOI

Robust Real-Time Face Detection

TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Proceedings ArticleDOI

Robust real-time face detection

TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.
Journal ArticleDOI

Adaptive document image binarization

TL;DR: A new method is presented for adaptive document image binarization, where the page is considered as a collection of subcomponents such as text, background and picture, which adapts and performs well in each case qualitatively and quantitatively.
Journal ArticleDOI

The document spectrum for page layout analysis

TL;DR: The document spectrum (or docstrum) as discussed by the authors is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, which yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Book

The document spectrum for page layout analysis

TL;DR: The document spectrum (or docstrum), which is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Related Papers (5)