scispace - formally typeset
Journal ArticleDOI

Sample-efficient batch reinforcement learning for dialogue management optimization

TLDR
Experimental results show that a set of approximate dynamic programming algorithms combined to a method for learning a sparse representation of the value function can learn good dialogue policies directly from data, avoiding user modeling errors.
Abstract
Spoken Dialogue Systems (SDS) are systems which have the ability to interact with human beings using natural language as the medium of interaction. A dialogue policy plays a crucial role in determining the functioning of the dialogue management module. Handcrafting the dialogue policy is not always an option, considering the complexity of the dialogue task and the stochastic behavior of users. In recent years approaches based on Reinforcement Learning (RL) for policy optimization in dialogue management have been proved to be an efficient approach for dialogue policy optimization. Yet most of the conventional RL algorithms are data intensive and demand techniques such as user simulation. Doing so, additional modeling errors are likely to occur. This paper explores the possibility of using a set of approximate dynamic programming algorithms for policy optimization in SDS. Moreover, these algorithms are combined to a method for learning a sparse representation of the value function. Experimental results show that these algorithms when applied to dialogue management optimization are particularly sample efficient, since they learn from few hundreds of dialogue examples. These algorithms learn in an off-policy manner, meaning that they can learn optimal policies with dialogue examples generated with a quite simple strategy. Thus they can learn good dialogue policies directly from data, avoiding user modeling errors.

read more

Citations
More filters
Posted Content

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

TL;DR: This tutorial article aims to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcementlearning algorithms that utilize previously collected data, without additional online data collection.
Journal ArticleDOI

POMDP-Based Statistical Spoken Dialog Systems: A Review

TL;DR: This review article provides an overview of the current state of the art in the development of POMDP-based spoken dialog systems.
Journal Article

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

TL;DR: This work introduces benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL, and releases benchmark tasks and datasets with a comprehensive evaluation of existing algorithms and an evaluation protocol together with an open-source codebase.
Posted Content

Neural Approaches to Conversational AI

TL;DR: In this article, the authors present a survey of state-of-the-art neural approaches to conversational AI, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies.
Proceedings ArticleDOI

Neural Approaches to Conversational AI

TL;DR: This tutorial surveys neural approaches to conversational AI that were developed in the last few years, and presents a review of state-of-the-art neural approaches, drawing the connection between neural approaches and traditional symbolic approaches.
References
More filters
Book

Markov Decision Processes: Discrete Stochastic Dynamic Programming

TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.
BookDOI

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.
Journal ArticleDOI

Universal approximation using radial-basis-function networks

TL;DR: It is proved thatRBF networks having one hidden layer are capable of universal approximation, and a certain class of RBF networks with the same smoothing factor in each kernel node is broad enough for universal approximation.
Journal ArticleDOI

Learning from delayed rewards

TL;DR: The invention relates to a circuit for use in a receiver which can receive two-tone/stereo signals which is intended to make a choice between mono or stereo reproduction of signal A or of signal B and vice versa.
Related Papers (5)