Open AccessPosted Content
Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach
TLDR
This paper shows that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget, and presents an approximate value-iteration algorithm forCVaR MDPs and analyzes its convergence rate.Abstract:
In this paper we address the problem of decision making within a Markov decision process (MDP) framework where risk and modeling errors are taken into account. Our approach is to minimize a risk-sensitive conditional-value-at-risk (CVaR) objective, as opposed to a standard risk-neutral expectation. We refer to such problem as CVaR MDP. Our first contribution is to show that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget. This result, which is of independent interest, motivates CVaR MDPs as a unifying framework for risk-sensitive and robust decision making. Our second contribution is to present an approximate value-iteration algorithm for CVaR MDPs and analyze its convergence rate. To our knowledge, this is the first solution algorithm for CVaR MDPs that enjoys error guarantees. Finally, we present results from numerical experiments that corroborate our theoretical findings and show the practicality of our approach.read more
Citations
More filters
Posted Content
Reward Constrained Policy Optimization
TL;DR: Reward Constrained Policy Optimization (RCPO) as mentioned in this paper uses an alternative penalty signal to guide the policy towards a constraint satisfying one, and proves the convergence of the approach and provides empirical evidence of its ability to train constraint satisfying policies.
Journal ArticleDOI
A Review of Safe Reinforcement Learning: Methods, Theory and Applications
Shangding Gu,Longyu Yang,Yali Du,Guang Chen,Florian Walter,Jun Wang,Yaodong Yang,Alois Knoll +7 more
TL;DR: A review of the progress of safe RL from the perspectives of methods, theory and applications, and problems that are crucial for safe RL being deployed in real-world applications, coined as “2H3W” are reviewed.
Proceedings ArticleDOI
Portfolio Optimization for Influence Spread
Naoto Ohsaka,Yuichi Yoshida +1 more
TL;DR: This work adopts conditional value at risk (CVaR) as a risk measure, and proposes an algorithm that computes a portfolio over seed sets with a provable guarantee on its CVaR, and demonstrates that the portfolio computed by the algorithm has a significantly better CV aR than seed sets computed by other baseline methods.
Journal ArticleDOI
Robust artificial intelligence and robust human organizations
TL;DR: In this article, a short note reviews the properties of high-reliability organizations and draws implications for the development of AI technology and the safe application of that technology in high risk applications.
Posted Content
Deep Robust Kalman Filter
TL;DR: Two algorithms are proposed, RTD-DQN and Deep-RoK, for solving large-scale RMDPs using nonlinear approximation schemes such as deep neural networks, that incorporate the robust Bellman temporal difference error into a robust loss function, yielding robust policies for the agent.
References
More filters
Journal ArticleDOI
Coherent Measures of Risk
TL;DR: In this paper, the authors present and justify a set of four desirable properties for measures of risk, and call the measures satisfying these properties "coherent", and demonstrate the universality of scenario-based methods for providing coherent measures.
Journal ArticleDOI
Envelope Theorems for Arbitrary Choice Sets
Paul Milgrom,Ilya Segal +1 more
TL;DR: The standard envelope theorems apply to choice sets with convex and topological structure, providing sufficient conditions for the value function to be differentiable in a parameter and characterizing its derivative as mentioned in this paper.
Book
Dynamic Programming and Optimal Control, Vol. II
TL;DR: A major revision of the second volume of a textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.
Journal ArticleDOI
Robust Dynamic Programming
TL;DR: It is proved that when this set of measures has a certain "rectangularity" property, all of the main results for finite and infinite horizon DP extend to natural robust counterparts.
Book
Measuring Market Risk
TL;DR: In this paper, the authors proposed a mean-variance framework for measuring financial risk, which is used to measure the value at risk and the coherent risk measures in financial markets.