End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

Open AccessPosted Content

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

Devendra Singh Sachan, +4 more

- 09 Jun 2021 -

arXiv: Computation and Language

Chats0

TLDR

The authors proposed an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers, using an expectation-maximization algorithm.

Abstract:

We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and then use this estimate to update the retriever and reader parameters. We hypothesize that such end-to-end training allows training signals to flow to the reader and then to the retriever better than staged-wise training. This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all existing approaches of comparable size by 2-3% absolute exact match points, achieving new state-of-the-art results. Our results also demonstrate the feasibility of learning to retrieve to improve answer generation without explicit supervision of retrieval decisions.

Citations

PDF

Open Access

More filters

Posted Content

Pre-training Methods in Information Retrieval.

Yixing Fan, +8 more

- 29 Nov 2021 -

arXiv: Information Retrieval

TL;DR: A survey of pre-training methods for information retrieval can be found in this article, where the authors present an overview of different components of IR system, including retrieval component, re-ranking component, and other components.

...read moreread less

Journal ArticleDOI

Pre-training Methods in Information Retrieval

- 01 Jan 2022 -

Foundations and Trends in Information Re...

TL;DR: In this paper , the authors present an overview of pre-training methods for information retrieval, including the retrieval component, the re-ranking component, and other components, and summarize available datasets as well as benchmark leaderboards.

...read moreread less

Posted Content

KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

Donghan Yu, +8 more

- 29 Sep 2021 -

arXiv: Computation and Language

TL;DR: KG-FiD as mentioned in this paper uses graph neural network (GNN) to filter noisy passages by leveraging the structural relationship among the retrieved passages with a knowledge graph, which can improve the passage retrieval results in the retrieving module.

...read moreread less

Proceedings ArticleDOI

KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

TL;DR: Yu et al. as mentioned in this paper presented the 60th Annual Meeting of the Association for Computational Linguistics (ACLL) Long Paper No. 1: Long Papers 2022.

...read moreread less

Posted Content

Adversarial Retriever-Ranker for dense text retrieval

Hang Zhang, +5 more

- 07 Oct 2021 -

arXiv: Computation and Language

TL;DR: Zhang et al. as mentioned in this paper proposed adversarial retriever-ranker (AR2), which consists of a dual-encoder retriever plus a cross-encoders ranker.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Journal ArticleDOI

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977 -

Journal of the royal statistical society...

Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018 -

arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Proceedings Article

Language Models are Few-Shot Learners

Tom B. Brown, +30 more

TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

...read moreread less