scispace - formally typeset
Open AccessReference BookDOI

Reinforcement Learning and Dynamic Programming Using Function Approximators

TLDR
Reinforcement Learning and Dynamic Programming Using Function Approximators provides a comprehensive and unparalleled exploration of the field of RL and DP, with a focus on continuous-variable problems.
Abstract
From household appliances to applications in robotics, engineered systems involving complex dynamics can only be as effective as the algorithms that control them. While Dynamic Programming (DP) has provided researchers with a way to optimally solve decision and control problems involving complex dynamic systems, its practical value was limited by algorithms that lacked the capacity to scale up to realistic problems. However, in recent years, dramatic developments in Reinforcement Learning (RL), the model-free counterpart of DP, changed our understanding of what is possible. Those developments led to the creation of reliable methods that can be applied even when a mathematical model of the system is unavailable, allowing researchers to solve challenging control problems in engineering, as well as in a variety of other disciplines, including economics, medicine, and artificial intelligence. Reinforcement Learning and Dynamic Programming Using Function Approximators provides a comprehensive and unparalleled exploration of the field of RL and DP. With a focus on continuous-variable problems, this seminal text details essential developments that have substantially altered the field over the past decade. In its pages, pioneering experts provide a concise introduction to classical RL and DP, followed by an extensive presentation of the state-of-the-art and novel methods in RL and DP with approximation. Combining algorithm development with theoretical guarantees, they elaborate on their work with illustrative examples and insightful comparisons. Three individual chapters are dedicated to representative algorithms from each of the major classes of techniques: value iteration, policy iteration, and policy search. The features and performance of these algorithms are highlighted in extensive experimental studies on a range of control applications. The recent development of applications involving complex systems has led to a surge of interest in RL and DP methods and the subsequent need for a quality resource on the subject. For graduate students and others new to the field, this book offers a thorough introduction to both the basics and emerging methods. And for those researchers and practitioners working in the fields of optimal and adaptive control, machine learning, artificial intelligence, and operations research, this resource offers a combination of practical algorithms, theoretical analysis, and comprehensive examples that they will be able to adapt and apply to their own work. Access the authors' website at www.dcsc.tudelft.nl/rlbook/ for additional material, including computer code used in the studies and information concerning new developments.

read more

Content maybe subject to copyright    Report

Lucian Bus¸oniu, Robert Babuˇska, Bart De Schutter, and Damien Ernst
Reinforcement learning and
dynamic programming using
function approximators


Preface
Control systems are making a tremendous impact on our society. Though invisible
to most users, they are essential for the operation of nearly all devices from basic
home appliances to aircraft and nuclear power plants. Apart from technical systems,
the principles of control are routinely applied and exploited in a variety of disciplines
such as economics, medicine, social sciences, and artificial intelligence.
A common denominator in the diverse applications of control is the need to in-
fluence or modify the behavior of dynamic systems to attain prespecified goals. One
approach to achieve this is to assign a numerical performance index to each state tra-
jectory of the system. The control problem is then solved by searching for a control
policy that drives the system along trajectories corresponding to the best value of the
performance index. This approach essentially reduces the problem of finding good
control policies to the search for solutions of a mathematical optimization problem.
Early work in the field of optimal control dates back to the 1940s with the pi-
oneering research of Pontryagin and Bellman. Dynamic programming (DP), intro-
duced by Bellman, is still among the state-of-the-art tools commonly used to solve
optimal control problems when a system model is available. The alternative idea of
finding a solution
in the absence
of a model was explored as early as the 1960s. In
the 1980s, a revival of interest in this model-free paradigm led to the development of
the field of reinforcement learning (RL). The central theme in RL research is the de-
sign of algorithms that learn control policies solely from the knowledge of transition
samples or trajectories, which are collected beforehand or by online interaction with
the system. Most approaches developed to tackle the RL problem are closely related
to DP algorithms.
A core obstacle in DP and RL is that solutions cannot be represented exactly for
problems with large discrete state-action spaces or continuous spaces. Instead, com-
pact representations relying on function approximators must be used. This challenge
was already recognized while the first DP techniques were being developed. How-
ever, it has only been in recent years and largely in correlation with the advance
of RL that approximation-based methods have grown in diversity, maturity, and
efficiency, enabling RL and DP to scale up to realistic problems.
This book provides an accessible in-depth treatment of reinforcement learning
and dynamic programming methods using function approximators. We start with a
concise introduction to classical DP and RL, in order to build the foundation for
the remainder of the book. Next, we present an extensive review of state-of-the-art
approaches to DP and RL with approximation. Theoretical guarantees are provided
on the solutions obtained, and numerical examples and comparisons are used to il-
lustrate the properties of the individual methods. The remaining three chapters are
i

ii
dedicated to a detailed presentation of representative algorithms from the three ma-
jor classes of techniques: value iteration, policy iteration, and policy search. The
properties and the performance of these algorithms are highlighted in simulation and
experimental studies on a range of control applications.
We believe that this balanced combination of practical algorithms, theoretical
analysis, and comprehensive examples makes our book suitable not only for re-
searchers, teachers, and graduate students in the fields of optimal and adaptive con-
trol, machine learning and artificial intelligence, but also for practitioners seeking
novel strategies for solving challenging real-life control problems.
This book can be read in several ways. Readers unfamiliar with the field are
advised to start with Chapter 1 for a gentle introduction, and continue with Chap-
ter 2 (which discusses classical DP and RL) and Chapter 3 (which considers
approximation-based methods). Those who are familiar with the basic concepts of
RL and DP may consult the list of notations given at the end of the book, and then
start directly with Chapter 3. This first part of the book is sufficient to get an overview
of the field. Thereafter,readers can pick any combination of Chapters 4 to 6, depend-
ing on their interests: approximate value iteration (Chapter 4), approximate policy
iteration and online learning (Chapter 5), or approximate policy search (Chapter 6).
Supplementary information relevant to this book, including a complete archive
of the computer code used in the experimental studies, is available at the Web site:
http://www.dcsc.tudelft.nl/rlbook/
Comments, suggestions, or questions concerning the book or the Web site are wel-
come. Interested readers are encouraged to get in touch with the authors using the
contact information on the Web site.
The authors have been inspired over the years by many scientists who undoubt-
edly left their mark on this book; in particular by Louis Wehenkel, Pierre Geurts,
Guy-Bart Stan, R´emi Munos, Martin Riedmiller, and Michail Lagoudakis. Pierre
Geurts also provided the computer program for building ensembles of regression
trees, used in several examples in the book. This work would not have been pos-
sible without our colleagues, students, and the excellent professional environments
at the Delft Center for Systems and Control of the Delft University of Technology,
the Netherlands, the Montefiore Institute of the University of Li`ege, Belgium, and at
Sup´elec Rennes, France. Among our colleagues in Delft, Justin Rice deserves special
mention for carefully proofreading the manuscript. To all these people we extend our
sincere thanks.
We thank Sam Ge for giving us the opportunity to publish our book with Taylor
& Francis CRC Press, and the editorial and production team at Taylor & Francis for
their valuable help. We gratefully acknowledge the financial support of the BSIK-
ICIS project “Interactive Collaborative InformationSystems” (grant no. BSIK03024)
and the Dutch funding organizations NWO and STW. Damien Ernst is a Research
Associate of the FRS-FNRS, the financial support of which he acknowledges. We
appreciate the kind permission offered by the IEEE to reproduce material from our
previous works over which they hold copyright.

iii
Finally, we thank our families for their continual understanding, patience, and
support.
Lucian Bus¸oniu
Robert Babuˇska
Bart De Schutter
Damien Ernst
November 2009

Citations
More filters
Journal ArticleDOI

Reinforcement learning in robotics: A survey

TL;DR: This article attempts to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots by highlighting both key challenges in robot reinforcement learning as well as notable successes.

Reinforcement Learning in Robotics: A Survey.

Jens Kober, +1 more
TL;DR: A survey of work in reinforcement learning for behavior generation in robots can be found in this article, where the authors highlight key challenges in robot reinforcement learning as well as notable successes and discuss the role of algorithms, representations and prior knowledge in achieving these successes.
Book

Algorithms for Reinforcement Learning

TL;DR: This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.
Journal ArticleDOI

Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers

TL;DR: In this article, the authors describe the use of reinforcement learning to design feedback controllers for discrete and continuous-time dynamical systems that combine features of adaptive control and optimal control, which are not usually designed to be optimal in the sense of minimizing user-prescribed performance functions.
Journal ArticleDOI

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

TL;DR: The workings of the natural gradient is described, which has made its way into many actor-critic algorithms over the past few years, and a review of several standard and natural actor-Critic algorithms is given.
References
More filters
Journal ArticleDOI

A tutorial on support vector regression

TL;DR: This tutorial gives an overview of the basic ideas underlying Support Vector (SV) machines for function estimation, and includes a summary of currently used algorithms for training SV machines, covering both the quadratic programming part and advanced methods for dealing with large datasets.
Journal ArticleDOI

Application of Fuzzy Logic to Approximate Reasoning Using Linguistic Synthesis

TL;DR: In this article, a fuzzy logic is used to synthesize linguistic control protocol of a skilled operator for industrial plants, which has been applied to pilot scale plants as well as in practical situations.
Book ChapterDOI

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

TL;DR: This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods, and presents and shows results for two Dyna architectures, based on Watkins's Q-learning, a new kind of reinforcement learning.
Journal ArticleDOI

Least-squares policy iteration

TL;DR: The new algorithm, least-squares policy iteration (LSPI), learns the state-action value function which allows for action selection without a model and for incremental policy improvement within a policy-iteration framework.
Journal ArticleDOI

Cooperative Multi-Agent Learning: The State of the Art

TL;DR: This survey attempts to draw from multi-agent learning work in a spectrum of areas, including RL, evolutionary computation, game theory, complex systems, agent modeling, and robotics, and finds that this broad view leads to a division of the work into two categories.
Frequently Asked Questions (1)
Q1. What are the contributions in this paper?

In this paper, a policy search algorithm is proposed to find a policy that performs well over the entire state space, which is not inherently related to the number of state variables.