Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks

doi:10.1109/CVPR.2017.462

Proceedings ArticleDOI

Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks

- pp 4342-4351

TLDR

Li et al. as mentioned in this paper presented an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images, which considers document semantic structure extraction as a pixel-wise segmentation task, and proposes a unified model that classifies pixels based not only on their visual appearance, but also on the content of underlying text.

Abstract:

We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Yiheng Xu, +5 more

TL;DR: The LayoutLM is proposed to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

...read moreread less

Proceedings ArticleDOI

FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

Guillaume Jaume, +2 more

TL;DR: This work presents a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms, and is the first publicly available dataset with comprehensive annotations to address FoUn task.

...read moreread less

Proceedings ArticleDOI

Chargrid: Towards Understanding 2D Documents.

Anoop R Katti, +6 more

TL;DR: In this paper, a generic document understanding pipeline for structured documents is presented, which makes use of a fully convolutional encoder-decoder network that predicts a segmentation mask and bounding boxes.

...read moreread less

Proceedings ArticleDOI

Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection

Dafang He, +4 more

TL;DR: This work presents a page segmentation algorithm that incorporates state-of-the-art deep learning methods for segmenting three types of document elements: text blocks, tables, and figures and proposes a conditional random field (CRF) that uses features output from the semantic segmentsation and contour networks to improve upon the semantic segmentation network output.

...read moreread less

Posted Content

DocBank: A Benchmark Dataset for Document Layout Analysis

Minghao Li, +6 more

- 01 Jun 2020 -

arXiv: Computation and Language

TL;DR: DocBank as discussed by the authors is a large-scale dataset with fine-grained token-level annotations for document layout analysis, which contains 500k document pages with fine grained token level annotations.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Journal ArticleDOI

Support-Vector Networks

Corinna Cortes, +1 more

- 15 Sep 1995 -

Machine Learning

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

Jonathan Long, +2 more

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Collapse

Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks

Citations

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

Chargrid: Towards Understanding 2D Documents.

Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection

DocBank: A Benchmark Dataset for Document Layout Analysis

References

Deep Residual Learning for Image Recognition

Going deeper with convolutions

Support-Vector Networks

Microsoft COCO: Common Objects in Context

Fully convolutional networks for semantic segmentation

Related Papers (5)

Deep Residual Learning for Image Recognition

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Fully convolutional networks for semantic segmentation

Mask R-CNN