FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

doi:10.1109/ICDARW.2019.10029

Open AccessProceedings ArticleDOI

FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

Guillaume Jaume, +2 more

- Vol. 2, pp 1-6

Chats0

TLDR

This work presents a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms, and is the first publicly available dataset with comprehensive annotations to address FoUn task.

Abstract:

We present a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms. The dataset comprises 199 real, fully annotated, scanned forms. The documents are noisy and vary widely in appearance, making form understanding (FoUn) a challenging task. The proposed dataset can be used for various tasks, including text detection, optical character recognition, spatial layout analysis, and entity labeling/linking. To the best of our knowledge, this is the first publicly available dataset with comprehensive annotations to address FoUn task. We also present a set of baselines and introduce metrics to evaluate performance on the FUNSD dataset, which can be downloaded at https://guillaumejaume.github.io/FUNSD.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Yupan Huang, +4 more

TL;DR: LayoutLMv3 is proposed to pre-train multimodal Transformers for Document AI with unified text and image masking, and is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked.

...read moreread less

Proceedings ArticleDOI

SelfDoc: Self-Supervised Document Representation Learning

Peizhao Li, +7 more

TL;DR: SelfDoc as discussed by the authors proposes a task-agnostic pre-training framework for document image understanding, which exploits the positional, textual, and visual information of every semantically meaningful component in a document, and models the contextualization between each block of content.

...read moreread less

Book ChapterDOI

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

Rafal Powalski, +5 more

TL;DR: This article proposed a TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics, and achieved state-of-the-art results in extracting information from documents and answering questions which demand layout understanding.

...read moreread less

Proceedings ArticleDOI

LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding

Yang Xu, +11 more

TL;DR: In this article, a two-stream multi-modal Transformer encoder is proposed to model the interaction among text, layout, and image in a single multimodal framework.

...read moreread less

Proceedings ArticleDOI

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Yiheng Xu, +5 more

- 31 Dec 2019 -

arXiv: Computation and Language

TL;DR: Li et al. as mentioned in this paper proposed a pre-training model for document image understanding, where text and layout information are jointly learned in a single framework for document-level pre-learning, which achieves new state-of-the-art results in several downstream tasks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Journal ArticleDOI

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 01 Jun 2017 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

...read moreread less

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Proceedings ArticleDOI

An Overview of the Tesseract OCR Engine

Ray Smith

TL;DR: The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview.

...read moreread less

Proceedings ArticleDOI

EAST: An Efficient and Accurate Scene Text Detector

Xinyu Zhou, +6 more

TL;DR: This work proposes a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes, and significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency.

...read moreread less

FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

Citations

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

SelfDoc: Self-Supervised Document Representation Learning

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

References

Deep Residual Learning for Image Recognition

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

An Overview of the Tesseract OCR Engine

EAST: An Efficient and Accurate Scene Text Detector

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Graph Convolution for Multimodal Information Extraction from Visually Rich Documents

Attention is All you Need

Evaluation of deep convolutional nets for document image classification and retrieval