Deep Reinforcement Learning for Dialogue Generation

doi:10.18653/V1/D16-1127

Open AccessProceedings ArticleDOI

Deep Reinforcement Learning for Dialogue Generation

Jiwei Li, +5 more

- pp 1192-1202

Chats0

TLDR

This work simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, non-repetitive turns, coherence, and ease of answering.

Abstract:

Recent neural models of dialogue generation offer great promise for generating responses for conversational agents, but tend to be shortsighted, predicting utterances one at a time while ignoring their influence on future outcomes. Modeling the future direction of a dialogue is crucial to generating coherent, interesting dialogues, a need which led traditional NLP models of dialogue to draw on reinforcement learning. In this paper, we show how to integrate these goals, applying deep reinforcement learning to model future reward in chatbot dialogue. The model simulates dialogues between two virtual agents, using policy gradient methods to reward sequences that display three useful conversational properties: informativity, coherence, and ease of answering (related to forward-looking function). We evaluate our model on diversity, length as well as with human judges, showing that the proposed algorithm generates more interactive responses and manages to foster a more sustained conversation in dialogue simulation. This work marks a first step towards learning a neural conversational model based on the long-term success of dialogues.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Recent Trends in Deep Learning Based Natural Language Processing [Review Article]

Tom Young, +3 more

- 20 Jul 2018 -

IEEE Computational Intelligence Magazine

TL;DR: This paper reviews significant deep learning related models and methods that have been employed for numerous NLP tasks and provides a walk-through of their evolution.

...read moreread less

Posted Content

The Curious Case of Neural Text Degeneration

Ari Holtzman, +4 more

- 22 Apr 2019 -

arXiv: Computation and Language

TL;DR: This paper showed that decoding strategies alone alone can dramatically affect the quality of machine text, even when generated from exactly the same neural language model, and they proposed Nucleus Sampling, a simple but effective method to draw the best out of neural generation.

...read moreread less

Proceedings ArticleDOI

Personalizing Dialogue Agents: I have a dog, do you have pets too?

Saizheng Zhang, +5 more

TL;DR: In this paper, the task of making chit-chat more engaging by conditioning on profile information is addressed, and the resulting dialogue can be used to predict profile information about the interlocutors.

...read moreread less

Proceedings ArticleDOI

QuAC: Question Answering in Context

Eunsol Choi, +9 more

TL;DR: QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as it shows in a detailed qualitative evaluation.

...read moreread less

Proceedings Article

The Curious Case of Neural Text Degeneration

Ari Holtzman, +4 more

TL;DR: By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Proceedings Article

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

Posted Content

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, +6 more

- 19 Dec 2013 -

arXiv: Learning

TL;DR: This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

...read moreread less

Collapse

Deep Reinforcement Learning for Dialogue Generation

Citations

Recent Trends in Deep Learning Based Natural Language Processing [Review Article]

The Curious Case of Neural Text Degeneration

Personalizing Dialogue Agents: I have a dog, do you have pets too?

QuAC: Question Answering in Context

The Curious Case of Neural Text Degeneration

References

Bleu: a Method for Automatic Evaluation of Machine Translation

Neural Machine Translation by Jointly Learning to Align and Translate

Mastering the game of Go with deep neural networks and tree search

Sequence to Sequence Learning with Neural Networks

Playing Atari with Deep Reinforcement Learning

Related Papers (5)

Bleu: a Method for Automatic Evaluation of Machine Translation

Neural Machine Translation by Jointly Learning to Align and Translate

Sequence to Sequence Learning with Neural Networks

Long short-term memory

Adam: A Method for Stochastic Optimization