Table Extraction from Document Images using Fixed Point Model
Anukriti Bansal,Gaurav Harit,Sumantra Dutta Roy +2 more
- pp 67
TLDR
A novel learning-based framework to identify tables from scanned document images as a structured labeling problem, which learns the layout of the document and labels its various entities as table header, table trailer, table cell and non-table region is presented.Abstract:
The paper presents a novel learning-based framework to identify tables from scanned document images. The approach is designed as a structured labeling problem, which learns the layout of the document and labels its various entities as table header, table trailer, table cell and non-table region. We develop features which encode the foreground block characteristics and the contextual information. These features are provided to a fixed point model which learns the inter-relationship between the blocks. The fixed point model attains a contraction mapping and provides a unique label to each block. We compare the results with Condition Random Fields(CRFs). Unlike CRFs, the fixed point model captures the context information in terms of the neighbourhood layout more efficiently. Experiments on the images picked from UW-III (University of Washington) dataset, UNLV dataset and our dataset consisting of document images with multicolumn page layout, show the applicability of our algorithm in layout analysis and table detection.read more
Citations
More filters
Journal ArticleDOI
Table Detection from Document Image using Vertical Arrangement of Text Blocks
TL;DR: Experiments on the ICDAR 2013 dataset show that the results obtained are very encouraging and proves the effectiveness and superiority of the proposed method.
Proceedings ArticleDOI
Table Recognition in Heterogeneous Documents Using Machine Learning
TL;DR: This work proposes a novel learning based methodology for the recognition of table contents in heterogeneous document images and depicts more than 97% accuracy on a test set in detection of table and non-table elements.
Proceedings ArticleDOI
Parameter-Free Table Detection Method
TL;DR: Two parameter-free table detection methods are proposed: one for the closed tables and other for open tables, which requires no training dataset and achieves more than 90% in table recognition.
Proceedings ArticleDOI
Interpreting Data from Scanned Tables
Waleed Farrukh,Antonio Foncubierta-Rodríguez,Anca-Nicoleta Ciubotaru,Guillaume Jaume,Costas Bejas,Orcun Goksel,Maria Gabrani +6 more
TL;DR: A fully automatic methodology that uses a bottom-up reasoning that is independent on the existence of representation features, such as lines, is proposed that is effective on detecting tables cells and their content and classifying header and data cells.
Proceedings ArticleDOI
Extraction of Tabular Data from Document Images
TL;DR: An open source cross-platform tool capable of recognizing the semantic structure of documents containing tabular data has been implemented, thus widening the range of document types than can be successfully converted into alternative accessible formats, suitable for users with visual impairments.
References
More filters
Journal ArticleDOI
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images
Stuart Geman,Donald Geman +1 more
TL;DR: The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.
Proceedings Article
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Journal Article
LIBLINEAR: A Library for Large Linear Classification
TL;DR: LIBLINEAR is an open source library for large-scale linear classification that supports logistic regression and linear support vector machines and provides easy-to-use command-line tools and library calls for users and developers.
Journal ArticleDOI
Twenty years of document image analysis in PAMI
TL;DR: The contributions to document image analysis of 99 papers published in the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) are clustered, summarized, interpolated, interpreted, and evaluated.