Sample-efficient batch reinforcement learning for dialogue management optimization

doi:10.1145/1966407.1966412

Journal ArticleDOI

Sample-efficient batch reinforcement learning for dialogue management optimization

Olivier Pietquin, +3 more

- 06 Jun 2011 -

ACM Transactions on Speech and Language ...

- Vol. 7, Iss: 3, pp 7

TLDR

Experimental results show that a set of approximate dynamic programming algorithms combined to a method for learning a sparse representation of the value function can learn good dialogue policies directly from data, avoiding user modeling errors.

Abstract:

Spoken Dialogue Systems (SDS) are systems which have the ability to interact with human beings using natural language as the medium of interaction. A dialogue policy plays a crucial role in determining the functioning of the dialogue management module. Handcrafting the dialogue policy is not always an option, considering the complexity of the dialogue task and the stochastic behavior of users. In recent years approaches based on Reinforcement Learning (RL) for policy optimization in dialogue management have been proved to be an efficient approach for dialogue policy optimization. Yet most of the conventional RL algorithms are data intensive and demand techniques such as user simulation. Doing so, additional modeling errors are likely to occur. This paper explores the possibility of using a set of approximate dynamic programming algorithms for policy optimization in SDS. Moreover, these algorithms are combined to a method for learning a sparse representation of the value function. Experimental results show that these algorithms when applied to dialogue management optimization are particularly sample efficient, since they learn from few hundreds of dialogue examples. These algorithms learn in an off-policy manner, meaning that they can learn optimal policies with dialogue examples generated with a quite simple strategy. Thus they can learn good dialogue policies directly from data, avoiding user modeling errors.

Sample-efficient batch reinforcement learning for dialogue management optimization

Citations

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

POMDP-Based Statistical Spoken Dialog Systems: A Review

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

Neural Approaches to Conversational AI

Neural Approaches to Conversational AI

References

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning from delayed rewards

Universal approximation using radial-basis-function networks

Learning from delayed rewards

Related Papers (5)

Reinforcement Learning: An Introduction

The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management

Partially observable Markov decision processes for spoken dialog systems

Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System

Least-squares policy iteration