scispace - formally typeset
Open AccessJournal ArticleDOI

Risk Sensitive Reinforcement Learning

Ralph Neuneier, +1 more
- Vol. 49, Iss: 2, pp 1031-1037
TLDR
This risk-sensitive reinforcement learning algorithm is based on a very different philosophy and reflects important properties of the classical exponential utility framework, but avoids its serious drawbacks for learning.
Abstract
Most reinforcement learning algorithms optimize the expected return of a Markov Decision Problem. Practice has taught us the lesson that this criterion is not always the most suitable because many applications require robust control strategies which also take into account the variance of the return. Classical control literature provides several techniques to deal with risk-sensitive optimization goals like the so-called worst-case optimality criterion exclusively focusing on risk-avoiding policies or classical risk-sensitive control, which transforms the returns by exponential utility functions. While the first approach is typically too restrictive, the latter suffers from the absence of an obvious way to design a corresponding model-free reinforcement learning algorithm. Our risk-sensitive reinforcement learning algorithm is based on a very different philosophy. Instead of transforming the return of the process, we transform the temporal differences during learning. While our approach reflects important properties of the classical exponential utility framework, we avoid its serious drawbacks for learning. Based on an extended set of optimality equations we are able to formulate risk-sensitive versions of various well-known reinforcement learning algorithms which converge with probability one under the usual conditions.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal Article

A comprehensive survey on safe reinforcement learning

TL;DR: This work categorize and analyze two approaches of Safe Reinforcement Learning, based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor and the incorporation of external knowledge or the guidance of a risk metric.
Journal ArticleDOI

Human Insula Activation Reflects Risk Prediction Errors As Well As Risk

TL;DR: Using functional imaging during a simple gambling task, it is shown that an early-onset activation in the human insula correlates significantly with risk prediction error and that its time course is consistent with a role in rapid updating.
Journal ArticleDOI

Reinforcement learning: The Good, The Bad and The Ugly

TL;DR: The latest dispatches from the forefront offorcement learning are reviewed, some of the territories where lie monsters are mapped, and the future of reinforcement learning is mapped.
Journal ArticleDOI

Pupil Dilation Signals Surprise: Evidence for Noradrenaline's Role in Decision Making.

TL;DR: This work demonstrates that the pupil does not signal expected reward or uncertainty per se, but instead signals surprise, that is, errors in judging uncertainty, and analyses this effect with respect to a specific mathematical model of uncertainty and surprise, namely risk and risk prediction error.
Journal ArticleDOI

Learning to trade via direct reinforcement

TL;DR: It is demonstrated how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs.
References
More filters
Book

Theory of Games and Economic Behavior

TL;DR: Theory of games and economic behavior as mentioned in this paper is the classic work upon which modern-day game theory is based, and it has been widely used to analyze a host of real-world phenomena from arms races to optimal policy choices of presidential candidates, from vaccination policy to major league baseball salary negotiations.
Book

Markov Decision Processes: Discrete Stochastic Dynamic Programming

TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.
Book

Dynamic Programming and Optimal Control

TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.
Journal ArticleDOI

Risk Aversion in the Small and in the Large

John W. Pratt
- 01 Jan 1964 - 
TL;DR: In this article, a measure of risk aversion in the small, the risk premium or insurance premium for an arbitrary risk, and a natural concept of decreasing risk aversion are discussed and related to one another.
MonographDOI

Markov Decision Processes

TL;DR: Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.
Related Papers (5)