scispace - formally typeset
Open AccessProceedings ArticleDOI

FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents

Reads0
Chats0
TLDR
This work presents a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms, and is the first publicly available dataset with comprehensive annotations to address FoUn task.
Abstract
We present a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms. The dataset comprises 199 real, fully annotated, scanned forms. The documents are noisy and vary widely in appearance, making form understanding (FoUn) a challenging task. The proposed dataset can be used for various tasks, including text detection, optical character recognition, spatial layout analysis, and entity labeling/linking. To the best of our knowledge, this is the first publicly available dataset with comprehensive annotations to address FoUn task. We also present a set of baselines and introduce metrics to evaluate performance on the FUNSD dataset, which can be downloaded at https://guillaumejaume.github.io/FUNSD.

read more

Citations
More filters
Proceedings ArticleDOI

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

TL;DR: LayoutLMv3 is proposed to pre-train multimodal Transformers for Document AI with unified text and image masking, and is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked.
Proceedings ArticleDOI

SelfDoc: Self-Supervised Document Representation Learning

TL;DR: SelfDoc as discussed by the authors proposes a task-agnostic pre-training framework for document image understanding, which exploits the positional, textual, and visual information of every semantically meaningful component in a document, and models the contextualization between each block of content.
Book ChapterDOI

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

TL;DR: This article proposed a TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics, and achieved state-of-the-art results in extracting information from documents and answering questions which demand layout understanding.
Proceedings ArticleDOI

LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding

TL;DR: In this article, a two-stream multi-modal Transformer encoder is proposed to model the interaction among text, layout, and image in a single multimodal framework.
Proceedings ArticleDOI

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

TL;DR: Li et al. as mentioned in this paper proposed a pre-training model for document image understanding, where text and layout information are jointly learned in a single framework for document-level pre-learning, which achieves new state-of-the-art results in several downstream tasks.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings ArticleDOI

An Overview of the Tesseract OCR Engine

TL;DR: The Tesseract OCR engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview.
Proceedings ArticleDOI

EAST: An Efficient and Accurate Scene Text Detector

TL;DR: This work proposes a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes, and significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency.
Related Papers (5)