Proceedings ArticleDOI
Hybrid page segmentation using multilevel homogeneity structure
Tuan Anh Tran,In Seop Na,Soo-Hyung Kim +2 more
- pp 78
TLDR
This paper presents a hybrid method of page segmentation based on the combination of connected component analysis and classification on multilevel homogeneous regions, and achieves the higher accuracy compared to other methods.Abstract:
This paper presents a hybrid method of page segmentation based on the combination of connected component analysis and classification on multilevel homogeneous regions. This suggests an iterative method. In which, connected component analysis is used to classify the non-text elements at each level of homogeneous region, and multilevel homogeneity structure is used to ensure this classification can identify all non-text elements. The result of this iterative method is the two documents, text document and non-text document. On text document, adaptive mathematical morphology in each text homogeneous region will give us the corresponding text region. On the non-text document, more detailed classification of the non-text components are made to get separators, tables, images, etc. For evaluation, we experiment our method with datasets from ICDAR2009 page segmentation competition. According to the results, our proposed method achieves the higher accuracy compared to other methods. This proves the effectiveness and superiority of our proposed method.read more
Citations
More filters
Journal ArticleDOI
Document Layout Analysis: A Comprehensive Survey
TL;DR: This survey paper presents a critical study of different document layout analysis techniques and discusses comprehensively the different phases of the DLA algorithms based on a general framework that is formed as an outcome of reviewing the research in the field.
Journal ArticleDOI
Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology
TL;DR: A novel hybrid method, which includes three main stages to deal with document layout analysis or page segmentation, which is the combination of connected component analysis and multilevel homogeneity structure and achieves a higher accuracy compared to other methods.
Proceedings ArticleDOI
A hybrid method for table detection from document image
TL;DR: A hybrid method consisting of the alternative bottom-up and top-down approaches is implemented to find the table region candidates by analyzing text lines and spare lines for detecting tables in document images.
Journal ArticleDOI
Document image layout analysis via explicit edge embedding network
TL;DR: A novel document layout analysis framework with the Explicit Edge Embedding Network (E3 Net), which contains the edge embedding block and dynamic skip connection block to produce detailed features, as well as a lightweight fully convolutional subnet as the backbone for the effectiveness of the framework.
Journal ArticleDOI
Beyond document object detection: instance-level segmentation of complex layouts
TL;DR: In this article, the task of instance segmentation on the document image domain is defined, which is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image.
References
More filters
Journal ArticleDOI
Robust Real-Time Face Detection
Paul A. Viola,Michael Jones +1 more
TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Proceedings ArticleDOI
Robust real-time face detection
Paul A. Viola,Michael Jones +1 more
TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.
Journal ArticleDOI
Adaptive document image binarization
Jaakko Sauvola,Matti Pietikäinen +1 more
TL;DR: A new method is presented for adaptive document image binarization, where the page is considered as a collection of subcomponents such as text, background and picture, which adapts and performs well in each case qualitatively and quantitatively.
Journal ArticleDOI
The document spectrum for page layout analysis
TL;DR: The document spectrum (or docstrum) as discussed by the authors is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, which yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.
Book
The document spectrum for page layout analysis
TL;DR: The document spectrum (or docstrum), which is a method for structural page layout analysis based on bottom-up, nearest-neighbor clustering of page components, yields an accurate measure of skew, within-line, and between-line spacings and locates text lines and text blocks.