Open AccessPosted Content
End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering
Reads0
Chats0
TLDR
The authors proposed an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers, using an expectation-maximization algorithm.Abstract:
We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and then use this estimate to update the retriever and reader parameters. We hypothesize that such end-to-end training allows training signals to flow to the reader and then to the retriever better than staged-wise training. This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all existing approaches of comparable size by 2-3% absolute exact match points, achieving new state-of-the-art results. Our results also demonstrate the feasibility of learning to retrieve to improve answer generation without explicit supervision of retrieval decisions.read more
Citations
More filters
Posted Content
Pre-training Methods in Information Retrieval.
Yixing Fan,Xiaohui Xie,Yinqiong Cai,Jia Chen,Xinyu Ma,Xiangsheng Li,Ruqing Zhang,Jiafeng Guo,Yiqun Liu +8 more
TL;DR: A survey of pre-training methods for information retrieval can be found in this article, where the authors present an overview of different components of IR system, including retrieval component, re-ranking component, and other components.
Journal ArticleDOI
Pre-training Methods in Information Retrieval
TL;DR: In this paper , the authors present an overview of pre-training methods for information retrieval, including the retrieval component, the re-ranking component, and other components, and summarize available datasets as well as benchmark leaderboards.
Posted Content
KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering
Donghan Yu,Chenguang Zhu,Yuwei Fang,Wenhao Yu,Shuohang Wang,Yichong Xu,Xiang Ren,Yiming Yang,Michael Zeng +8 more
TL;DR: KG-FiD as mentioned in this paper uses graph neural network (GNN) to filter noisy passages by leveraging the structural relationship among the retrieved passages with a knowledge graph, which can improve the passage retrieval results in the retrieving module.
Proceedings ArticleDOI
KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering
TL;DR: Yu et al. as mentioned in this paper presented the 60th Annual Meeting of the Association for Computational Linguistics (ACLL) Long Paper No. 1: Long Papers 2022.
Posted Content
Adversarial Retriever-Ranker for dense text retrieval
TL;DR: Zhang et al. as mentioned in this paper proposed adversarial retriever-ranker (AR2), which consists of a dual-encoder retriever plus a cross-encoders ranker.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings Article
Language Models are Few-Shot Learners
Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 more
TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.