End-to-End Extraction of Structured Information from Business Documents with Pointer-Generator Networks

doi:10.18653/V1/2020.SPNLP-1.6

Open AccessProceedings ArticleDOI

End-to-End Extraction of Structured Information from Business Documents with Pointer-Generator Networks

Clément Sage, +4 more

- pp 43-52

Chats0

TLDR

This paper discusses a new method for training extraction models directly from the textual value of information and shows that it performs competitively with a standard word classifier without requiring costly word level supervision.

Abstract:

The predominant approaches for extracting key information from documents resort to classifiers predicting the information type of each word. However, the word level ground truth used for learning is expensive to obtain since it is not naturally produced by the extraction task. In this paper, we discuss a new method for training extraction models directly from the textual value of information. The extracted information of a document is represented as a sequence of tokens in the XML language. We learn to output this representation with a pointer-generator network that alternately copies the document words carrying information and generates the XML tags delimiting the types of information. The ability of our end-to-end method to retrieve structured information is assessed on a large set of business documents. We show that it performs competitively with a standard word classifier without requiring costly word level supervision.

Citations

PDF

Open Access

More filters

Book ChapterDOI

ViBERTgrid: A Jointly Trained Multi-modal 2D Document Representation for Key Information Extraction from Documents

Weihong Lin, +6 more

TL;DR: Li et al. as discussed by the authors propose a new multi-modal backbone network by concatenating a BERTgrid to an intermediate layer of a CNN model, where the input of CNN is a document image and the BERT grid is a grid of word embeddings, to generate a more powerful grid-based document representation.

...read moreread less

Proceedings ArticleDOI

Query-driven Generative Network for Document Information Extraction in the Wild

Haoyu Cao, +8 more

TL;DR: A novel architecture, termed Query-driven Generative Network (QGN), which is equipped with two consecutive modules, i.e., Layout Context-aware Module (LCM) and Structured Generation Module (SGM), to build up a more practical DIE paradigm for real-world scenarios where input document images may contain unknown layouts and keys in the scenes of the problematic OCR results.

...read moreread less

Book ChapterDOI

Data-Efficient Information Extraction from Documents with Pre-trained Language Models.

Clément Sage, +7 more

TL;DR: In this article, a pre-trained model for encoding 2D documents, LayoutLM, reveals a high sample-efficiency when fine-tuned on public and real-world Information Extraction (IE) datasets.

...read moreread less

Journal ArticleDOI

Fusion of visual representations for multimodal information extraction from unstructured transactional documents

Berke Mustafa Oral, +1 more

- 22 Apr 2022 -

International Journal on Document Analys...

Book ChapterDOI

DocReader: Bounding-Box Free Training of a Document Information Extraction Model.

Shachar Klaiman, +1 more

TL;DR: DocReader as mentioned in this paper is an end-to-end neural-network-based information extraction solution which can be trained using solely the images and the target values that need to be read, thus eliminating the need for any additional annotations beyond what is naturally available in existing human-operated service centres.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014 -

arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Journal ArticleDOI

The Hungarian method for the assignment problem

Harold W. Kuhn

- 01 Mar 1955 -

Naval Research Logistics Quarterly

TL;DR: This paper has always been one of my favorite children, combining as it does elements of the duality of linear programming and combinatorial tools from graph theory, and it may be of some interest to tell the story of its origin this article.

...read moreread less

Proceedings ArticleDOI

Effective Approaches to Attention-based Neural Machine Translation

Minh-Thang Luong, +2 more

TL;DR: A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.

...read moreread less

Posted Content

On the difficulty of training Recurrent Neural Networks

Razvan Pascanu, +2 more

- 21 Nov 2012 -

arXiv: Learning

TL;DR: This paper proposes a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem and validates empirically the hypothesis and proposed solutions.

...read moreread less

Collapse

End-to-End Extraction of Structured Information from Business Documents with Pointer-Generator Networks

Citations

ViBERTgrid: A Jointly Trained Multi-modal 2D Document Representation for Key Information Extraction from Documents

Query-driven Generative Network for Document Information Extraction in the Wild

Data-Efficient Information Extraction from Documents with Pre-trained Language Models.

Fusion of visual representations for multimodal information extraction from unstructured transactional documents

DocReader: Bounding-Box Free Training of a Document Information Extraction Model.

References

Neural Machine Translation by Jointly Learning to Align and Translate

Neural Machine Translation by Jointly Learning to Align and Translate

The Hungarian method for the assignment problem

Effective Approaches to Attention-based Neural Machine Translation

On the difficulty of training Recurrent Neural Networks

Related Papers (5)

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Chargrid: Towards Understanding 2D Documents.

Neural Architectures for Named Entity Recognition

Graph Convolution for Multimodal Information Extraction from Visually Rich Documents

Representation Learning for Information Extraction from Form-like Documents