scispace - formally typeset
Open AccessPosted Content

Retrieval Augmentation Reduces Hallucination in Conversation

Reads0
Chats0
TLDR
This paper explore the use of neural retrieval-in-the-loop architectures for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses.
Abstract: 
Despite showing increasingly human-like conversational abilities, state-of-the-art dialogue models often suffer from factual incorrectness and hallucination of knowledge (Roller et al., 2020). In this work we explore the use of neural-retrieval-in-the-loop architectures - recently shown to be effective in open-domain QA (Lewis et al., 2020b; Izacard and Grave, 2020) - for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses. We study various types of architectures with multiple components - retrievers, rankers, and encoder-decoders - with the goal of maximizing knowledgeability while retaining conversational ability. We demonstrate that our best models obtain state-of-the-art performance on two knowledge-grounded conversational tasks. The models exhibit open-domain conversational capabilities, generalize effectively to scenarios not within the training data, and, as verified by human evaluations, substantially reduce the well-known problem of knowledge hallucination in state-of-the-art chatbots.

read more

Citations
More filters
Posted Content

Internet-Augmented Dialogue Generation.

TL;DR: This article proposed an approach that learns to generate an internet search query based on the context, and then conditions on the search results to finally generate a response, a method that can employ up-to-the-minute relevant information.
Journal ArticleDOI

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

TL;DR: This work creates F AITH D IAL, a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (W O W) benchmark, and benchmark a series of state-of-the-art models and proposes an auxiliary contrastive objective that achieves the highest level of faithfulness and abstractiveness.
Posted Content

Beyond Goldfish Memory: Long-Term Open-Domain Conversation.

TL;DR: In this article, the authors collected and released a human-human dataset consisting of multiple chat sessions whereby the speaking partners learn about each other's interests and discuss the things they have learnt from past sessions, and they show how existing models trained on existing datasets perform poorly in this long-term conversation setting in both automatic and human evaluations.
Posted Content

TruthfulQA: Measuring How Models Mimic Human Falsehoods

TL;DR: This paper proposed a benchmark to measure whether a language model is truthful in generating answers to questions, which consists of 817 questions that span 38 categories, including health, law, finance and politics.
Proceedings ArticleDOI

Internet-Augmented Dialogue Generation

TL;DR: The authors proposed an approach that learns to generate an internet search query based on the context, and then conditions on the search results to finally generate a response, a method that can employ up-to-the-minute relevant information.
References
More filters
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Journal ArticleDOI

Natural Questions: A Benchmark for Question Answering Research

TL;DR: The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.
Proceedings ArticleDOI

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

TL;DR: It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers.
Related Papers (5)