scispace - formally typeset
Open AccessPosted Content

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

Reads0
Chats0
TLDR
The authors proposed an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers, using an expectation-maximization algorithm.
Abstract
We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and then use this estimate to update the retriever and reader parameters. We hypothesize that such end-to-end training allows training signals to flow to the reader and then to the retriever better than staged-wise training. This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all existing approaches of comparable size by 2-3% absolute exact match points, achieving new state-of-the-art results. Our results also demonstrate the feasibility of learning to retrieve to improve answer generation without explicit supervision of retrieval decisions.

read more

Citations
More filters
Posted Content

Pre-training Methods in Information Retrieval.

TL;DR: A survey of pre-training methods for information retrieval can be found in this article, where the authors present an overview of different components of IR system, including retrieval component, re-ranking component, and other components.
Journal ArticleDOI

Pre-training Methods in Information Retrieval

TL;DR: In this paper , the authors present an overview of pre-training methods for information retrieval, including the retrieval component, the re-ranking component, and other components, and summarize available datasets as well as benchmark leaderboards.
Posted Content

KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

TL;DR: KG-FiD as mentioned in this paper uses graph neural network (GNN) to filter noisy passages by leveraging the structural relationship among the retrieved passages with a knowledge graph, which can improve the passage retrieval results in the retrieving module.
Proceedings ArticleDOI

KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

TL;DR: Yu et al. as mentioned in this paper presented the 60th Annual Meeting of the Association for Computational Linguistics (ACLL) Long Paper No. 1: Long Papers 2022.
Posted Content

Adversarial Retriever-Ranker for dense text retrieval

TL;DR: Zhang et al. as mentioned in this paper proposed adversarial retriever-ranker (AR2), which consists of a dual-encoder retriever plus a cross-encoders ranker.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Related Papers (5)