scispace - formally typeset
Open AccessProceedings ArticleDOI

Table Extraction from Document Images using Fixed Point Model

TLDR
A novel learning-based framework to identify tables from scanned document images as a structured labeling problem, which learns the layout of the document and labels its various entities as table header, table trailer, table cell and non-table region is presented.
Abstract
The paper presents a novel learning-based framework to identify tables from scanned document images. The approach is designed as a structured labeling problem, which learns the layout of the document and labels its various entities as table header, table trailer, table cell and non-table region. We develop features which encode the foreground block characteristics and the contextual information. These features are provided to a fixed point model which learns the inter-relationship between the blocks. The fixed point model attains a contraction mapping and provides a unique label to each block. We compare the results with Condition Random Fields(CRFs). Unlike CRFs, the fixed point model captures the context information in terms of the neighbourhood layout more efficiently. Experiments on the images picked from UW-III (University of Washington) dataset, UNLV dataset and our dataset consisting of document images with multicolumn page layout, show the applicability of our algorithm in layout analysis and table detection.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Table Detection from Document Image using Vertical Arrangement of Text Blocks

TL;DR: Experiments on the ICDAR 2013 dataset show that the results obtained are very encouraging and proves the effectiveness and superiority of the proposed method.
Proceedings ArticleDOI

Table Recognition in Heterogeneous Documents Using Machine Learning

TL;DR: This work proposes a novel learning based methodology for the recognition of table contents in heterogeneous document images and depicts more than 97% accuracy on a test set in detection of table and non-table elements.
Proceedings ArticleDOI

Parameter-Free Table Detection Method

TL;DR: Two parameter-free table detection methods are proposed: one for the closed tables and other for open tables, which requires no training dataset and achieves more than 90% in table recognition.
Proceedings ArticleDOI

Interpreting Data from Scanned Tables

TL;DR: A fully automatic methodology that uses a bottom-up reasoning that is independent on the existence of representation features, such as lines, is proposed that is effective on detecting tables cells and their content and classifying header and data cells.
Proceedings ArticleDOI

Extraction of Tabular Data from Document Images

TL;DR: An open source cross-platform tool capable of recognizing the semantic structure of documents containing tabular data has been implemented, thus widening the range of document types than can be successfully converted into alternative accessible formats, suitable for users with visual impairments.
References
More filters
Journal ArticleDOI

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

TL;DR: The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.
Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Journal Article

LIBLINEAR: A Library for Large Linear Classification

TL;DR: LIBLINEAR is an open source library for large-scale linear classification that supports logistic regression and linear support vector machines and provides easy-to-use command-line tools and library calls for users and developers.
Journal ArticleDOI

Twenty years of document image analysis in PAMI

TL;DR: The contributions to document image analysis of 99 papers published in the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) are clustered, summarized, interpolated, interpreted, and evaluated.
Related Papers (5)