What are the contributions in this paper?

In this paper, a policy search algorithm is proposed to find a policy that performs well over the entire state space, which is not inherently related to the number of state variables.

(Open Access) Reinforcement Learning and Dynamic Programming Using Function Approximators (2010) | Lucian Busoniu

Lucian Bus¸oniu, Robert Babuˇska, Bart De Schutter, and Damien Ernst

Reinforcement learning and

dynamic programming using

function approximators

Preface

Control systems are making a tremendous impact on our society. Though invisible

to most users, they are essential for the operation of nearly all devices – from basic

home appliances to aircraft and nuclear power plants. Apart from technical systems,

the principles of control are routinely applied and exploited in a variety of disciplines

such as economics, medicine, social sciences, and artiﬁcial intelligence.

A common denominator in the diverse applications of control is the need to in-

ﬂuence or modify the behavior of dynamic systems to attain prespeciﬁed goals. One

approach to achieve this is to assign a numerical performance index to each state tra-

jectory of the system. The control problem is then solved by searching for a control

policy that drives the system along trajectories corresponding to the best value of the

performance index. This approach essentially reduces the problem of ﬁnding good

control policies to the search for solutions of a mathematical optimization problem.

Early work in the ﬁeld of optimal control dates back to the 1940s with the pi-

oneering research of Pontryagin and Bellman. Dynamic programming (DP), intro-

duced by Bellman, is still among the state-of-the-art tools commonly used to solve

optimal control problems when a system model is available. The alternative idea of

ﬁnding a solution

in the absence

of a model was explored as early as the 1960s. In

the 1980s, a revival of interest in this model-free paradigm led to the development of

the ﬁeld of reinforcement learning (RL). The central theme in RL research is the de-

sign of algorithms that learn control policies solely from the knowledge of transition

samples or trajectories, which are collected beforehand or by online interaction with

the system. Most approaches developed to tackle the RL problem are closely related

to DP algorithms.

A core obstacle in DP and RL is that solutions cannot be represented exactly for

problems with large discrete state-action spaces or continuous spaces. Instead, com-

pact representations relying on function approximators must be used. This challenge

was already recognized while the ﬁrst DP techniques were being developed. How-

ever, it has only been in recent years – and largely in correlation with the advance

of RL – that approximation-based methods have grown in diversity, maturity, and

efﬁciency, enabling RL and DP to scale up to realistic problems.

This book provides an accessible in-depth treatment of reinforcement learning

and dynamic programming methods using function approximators. We start with a

concise introduction to classical DP and RL, in order to build the foundation for

the remainder of the book. Next, we present an extensive review of state-of-the-art

approaches to DP and RL with approximation. Theoretical guarantees are provided

on the solutions obtained, and numerical examples and comparisons are used to il-

lustrate the properties of the individual methods. The remaining three chapters are

dedicated to a detailed presentation of representative algorithms from the three ma-

jor classes of techniques: value iteration, policy iteration, and policy search. The

properties and the performance of these algorithms are highlighted in simulation and

experimental studies on a range of control applications.

We believe that this balanced combination of practical algorithms, theoretical

analysis, and comprehensive examples makes our book suitable not only for re-

searchers, teachers, and graduate students in the ﬁelds of optimal and adaptive con-

trol, machine learning and artiﬁcial intelligence, but also for practitioners seeking

novel strategies for solving challenging real-life control problems.

This book can be read in several ways. Readers unfamiliar with the ﬁeld are

advised to start with Chapter 1 for a gentle introduction, and continue with Chap-

ter 2 (which discusses classical DP and RL) and Chapter 3 (which considers

approximation-based methods). Those who are familiar with the basic concepts of

RL and DP may consult the list of notations given at the end of the book, and then

start directly with Chapter 3. This ﬁrst part of the book is sufﬁcient to get an overview

of the ﬁeld. Thereafter,readers can pick any combination of Chapters 4 to 6, depend-

ing on their interests: approximate value iteration (Chapter 4), approximate policy

iteration and online learning (Chapter 5), or approximate policy search (Chapter 6).

Supplementary information relevant to this book, including a complete archive

of the computer code used in the experimental studies, is available at the Web site:

http://www.dcsc.tudelft.nl/rlbook/

Comments, suggestions, or questions concerning the book or the Web site are wel-

come. Interested readers are encouraged to get in touch with the authors using the

contact information on the Web site.

The authors have been inspired over the years by many scientists who undoubt-

edly left their mark on this book; in particular by Louis Wehenkel, Pierre Geurts,

Guy-Bart Stan, R´emi Munos, Martin Riedmiller, and Michail Lagoudakis. Pierre

Geurts also provided the computer program for building ensembles of regression

trees, used in several examples in the book. This work would not have been pos-

sible without our colleagues, students, and the excellent professional environments

at the Delft Center for Systems and Control of the Delft University of Technology,

the Netherlands, the Monteﬁore Institute of the University of Li`ege, Belgium, and at

Sup´elec Rennes, France. Among our colleagues in Delft, Justin Rice deserves special

mention for carefully proofreading the manuscript. To all these people we extend our

sincere thanks.

We thank Sam Ge for giving us the opportunity to publish our book with Taylor

& Francis CRC Press, and the editorial and production team at Taylor & Francis for

their valuable help. We gratefully acknowledge the ﬁnancial support of the BSIK-

ICIS project “Interactive Collaborative InformationSystems” (grant no. BSIK03024)

and the Dutch funding organizations NWO and STW. Damien Ernst is a Research

Associate of the FRS-FNRS, the ﬁnancial support of which he acknowledges. We

appreciate the kind permission offered by the IEEE to reproduce material from our

previous works over which they hold copyright.

iii

Finally, we thank our families for their continual understanding, patience, and

support.

Lucian Bus¸oniu

Robert Babuˇska

Bart De Schutter

Damien Ernst

November 2009

Reinforcement Learning and Dynamic Programming Using Function Approximators

Figures

Citations

Reinforcement learning in robotics: A survey

Reinforcement Learning in Robotics: A Survey.

Algorithms for Reinforcement Learning

Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

References

A tutorial on support vector regression

Application of Fuzzy Logic to Approximate Reasoning Using Linguistic Synthesis

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Least-squares policy iteration

Cooperative Multi-Agent Learning: The State of the Art

Related Papers (5)

Reinforcement Learning: An Introduction

Dynamic Programming and Optimal Control

Dynamic Programming

Human-level control through deep reinforcement learning

Introduction to Reinforcement Learning

Frequently Asked Questions (1)

Q1. What are the contributions in this paper?