scispace - formally typeset
Open AccessPosted Content

Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach

TLDR
This paper shows that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget, and presents an approximate value-iteration algorithm forCVaR MDPs and analyzes its convergence rate.
Abstract
In this paper we address the problem of decision making within a Markov decision process (MDP) framework where risk and modeling errors are taken into account. Our approach is to minimize a risk-sensitive conditional-value-at-risk (CVaR) objective, as opposed to a standard risk-neutral expectation. We refer to such problem as CVaR MDP. Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget. This result, which is of independent interest, motivates CVaR MDPs as a unifying framework for risk-sensitive and robust decision making. Our second contribution is to present an approximate value-iteration algorithm for CVaR MDPs and analyze its convergence rate. To our knowledge, this is the first solution algorithm for CVaR MDPs that enjoys error guarantees. Finally, we present results from numerical experiments that corroborate our theoretical findings and show the practicality of our approach.

read more

Citations
More filters
Posted Content

Reward Constrained Policy Optimization

TL;DR: Reward Constrained Policy Optimization (RCPO) as mentioned in this paper uses an alternative penalty signal to guide the policy towards a constraint satisfying one, and proves the convergence of the approach and provides empirical evidence of its ability to train constraint satisfying policies.
Journal ArticleDOI

A Review of Safe Reinforcement Learning: Methods, Theory and Applications

TL;DR: A review of the progress of safe RL from the perspectives of methods, theory and applications, and problems that are crucial for safe RL being deployed in real-world applications, coined as “2H3W” are reviewed.
Proceedings ArticleDOI

Portfolio Optimization for Influence Spread

TL;DR: This work adopts conditional value at risk (CVaR) as a risk measure, and proposes an algorithm that computes a portfolio over seed sets with a provable guarantee on its CVaR, and demonstrates that the portfolio computed by the algorithm has a significantly better CV aR than seed sets computed by other baseline methods.
Journal ArticleDOI

Robust artificial intelligence and robust human organizations

TL;DR: In this article, a short note reviews the properties of high-reliability organizations and draws implications for the development of AI technology and the safe application of that technology in high risk applications.
Posted Content

Deep Robust Kalman Filter

TL;DR: Two algorithms are proposed, RTD-DQN and Deep-RoK, for solving large-scale RMDPs using nonlinear approximation schemes such as deep neural networks, that incorporate the robust Bellman temporal difference error into a robust loss function, yielding robust policies for the agent.
References
More filters
Journal ArticleDOI

Coherent Measures of Risk

TL;DR: In this paper, the authors present and justify a set of four desirable properties for measures of risk, and call the measures satisfying these properties "coherent", and demonstrate the universality of scenario-based methods for providing coherent measures.
Journal ArticleDOI

Envelope Theorems for Arbitrary Choice Sets

TL;DR: The standard envelope theorems apply to choice sets with convex and topological structure, providing sufficient conditions for the value function to be differentiable in a parameter and characterizing its derivative as mentioned in this paper.
Book

Dynamic Programming and Optimal Control, Vol. II

TL;DR: A major revision of the second volume of a textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.
Journal ArticleDOI

Robust Dynamic Programming

TL;DR: It is proved that when this set of measures has a certain "rectangularity" property, all of the main results for finite and infinite horizon DP extend to natural robust counterparts.
Book

Measuring Market Risk

Kevin Dowd
TL;DR: In this paper, the authors proposed a mean-variance framework for measuring financial risk, which is used to measure the value at risk and the coherent risk measures in financial markets.
Related Papers (5)