Search or ask a question

Showing papers by "Aviv Tamar published in 2012"

PDF

Open Access

Proceedings Article•

Policy Gradients with Variance Related Risk Criteria

[...]

Dotan Di Castro¹, Aviv Tamar¹, Shie Mannor¹•Institutions (1)

Technion – Israel Institute of Technology¹

26 Jun 2012

TL;DR: A framework for local policy gradient style algorithms for reinforcement learning for variance related criteria for policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost.

...read moreread less

Abstract: Managing risk in dynamic decision problems is of cardinal importance in many fields such as finance and process control. The most common approach to defining risk is through various variance related criteria such as the Sharpe Ratio or the standard deviation adjusted reward. It is known that optimizing many of the variance related risk criteria is NP-hard. In this paper we devise a framework for local policy gradient style algorithms for reinforcement learning for variance related criteria. Our starting point is a new formula for the variance of the cost-togo in episodic tasks. Using this formula we develop policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost. We prove the convergence of these algorithms to local minima and demonstrate their applicability in a portfolio planning problem.

...read moreread less

125 citations

Posted Content•

Policy Gradients with Variance Related Risk Criteria

[...]

Dotan Di Castro¹, Aviv Tamar¹, Shie Mannor¹•Institutions (1)

Technion – Israel Institute of Technology¹

27 Jun 2012-arXiv: Learning

TL;DR: In this paper, the authors devise a framework for local policy gradient style algorithms for reinforcement learning for variance related risk criteria and prove the convergence of these algorithms to local minima and demonstrate their applicability in a portfolio planning problem.

...read moreread less

Abstract: Managing risk in dynamic decision problems is of cardinal importance in many fields such as finance and process control. The most common approach to defining risk is through various variance related criteria such as the Sharpe Ratio or the standard deviation adjusted reward. It is known that optimizing many of the variance related risk criteria is NP-hard. In this paper we devise a framework for local policy gradient style algorithms for reinforcement learning for variance related criteria. Our starting point is a new formula for the variance of the cost-to-go in episodic tasks. Using this formula we develop policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost. We prove the convergence of these algorithms to local minima and demonstrate their applicability in a portfolio planning problem.

...read moreread less

102 citations

Journal Article•DOI•

Integrating a partial model into model free reinforcement learning

[...]

Aviv Tamar¹, Dotan Di Castro¹, Ron Meir¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 Jan 2012-Journal of Machine Learning Research

TL;DR: This work proposes a novel procedure which augments a model free algorithm with a partial model, and proves that this approach leads to improved policy evaluation whenever environmental knowledge is available, without compromising performance when such knowledge is absent.

...read moreread less

Abstract: In reinforcement learning an agent uses online feedback from the environment in order to adaptively select an effective policy. Model free approaches address this task by directly mapping environmental states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal actions based on that model. Given the complementary advantages of both approaches, we suggest a novel procedure which augments a model free algorithm with a partial model. The resulting hybrid algorithm switches between a model based and a model free mode, depending on the current state and the agent's knowledge. Our method relies on a novel definition for a partially known model, and an estimator that incorporates such knowledge in order to reduce uncertainty in stochastic approximation iterations. We prove that such an approach leads to improved policy evaluation whenever environmental knowledge is available, without compromising performance when such knowledge is absent. Numerical simulations demonstrate the effectiveness of the approach on policy gradient and Q-learning algorithms, and its usefulness in solving a call admission control problem.

...read moreread less

9 citations