End-to-End Extraction of Structured Information from Business Documents with Pointer-Generator Networks
Clément Sage,Alex Aussem,Véronique Eglin,Haytham Elghazel,Jérémy Espinas +4 more
- pp 43-52
Reads0
Chats0
TLDR
This paper discusses a new method for training extraction models directly from the textual value of information and shows that it performs competitively with a standard word classifier without requiring costly word level supervision.Abstract:
The predominant approaches for extracting key information from documents resort to classifiers predicting the information type of each word. However, the word level ground truth used for learning is expensive to obtain since it is not naturally produced by the extraction task. In this paper, we discuss a new method for training extraction models directly from the textual value of information. The extracted information of a document is represented as a sequence of tokens in the XML language. We learn to output this representation with a pointer-generator network that alternately copies the document words carrying information and generates the XML tags delimiting the types of information. The ability of our end-to-end method to retrieve structured information is assessed on a large set of business documents. We show that it performs competitively with a standard word classifier without requiring costly word level supervision.read more
Citations
More filters
Book ChapterDOI
ViBERTgrid: A Jointly Trained Multi-modal 2D Document Representation for Key Information Extraction from Documents
TL;DR: Li et al. as discussed by the authors propose a new multi-modal backbone network by concatenating a BERTgrid to an intermediate layer of a CNN model, where the input of CNN is a document image and the BERT grid is a grid of word embeddings, to generate a more powerful grid-based document representation.
Proceedings ArticleDOI
Query-driven Generative Network for Document Information Extraction in the Wild
TL;DR: A novel architecture, termed Query-driven Generative Network (QGN), which is equipped with two consecutive modules, i.e., Layout Context-aware Module (LCM) and Structured Generation Module (SGM), to build up a more practical DIE paradigm for real-world scenarios where input document images may contain unknown layouts and keys in the scenes of the problematic OCR results.
Book ChapterDOI
Data-Efficient Information Extraction from Documents with Pre-trained Language Models.
Clément Sage,Thibault Douzon,Alex Aussem,Véronique Eglin,Haytham Elghazel,Stefan Duffner,Christophe Garcia,Jérémy Espinas +7 more
TL;DR: In this article, a pre-trained model for encoding 2D documents, LayoutLM, reveals a high sample-efficiency when fine-tuned on public and real-world Information Extraction (IE) datasets.
Journal ArticleDOI
Fusion of visual representations for multimodal information extraction from unstructured transactional documents
Book ChapterDOI
DocReader: Bounding-Box Free Training of a Document Information Extraction Model.
Shachar Klaiman,Marius Lehne +1 more
TL;DR: DocReader as mentioned in this paper is an end-to-end neural-network-based information extraction solution which can be trained using solely the images and the target values that need to be read, thus eliminating the need for any additional annotations beyond what is naturally available in existing human-operated service centres.
References
More filters
Proceedings Article
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Posted Content
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Journal ArticleDOI
The Hungarian method for the assignment problem
TL;DR: This paper has always been one of my favorite children, combining as it does elements of the duality of linear programming and combinatorial tools from graph theory, and it may be of some interest to tell the story of its origin this article.
Proceedings ArticleDOI
Effective Approaches to Attention-based Neural Machine Translation
TL;DR: A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.
Posted Content
On the difficulty of training Recurrent Neural Networks
TL;DR: This paper proposes a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem and validates empirically the hypothesis and proposed solutions.