scispace - formally typeset
L

Lior Shani

Researcher at Technion – Israel Institute of Technology

Publications -  16
Citations -  264

Lior Shani is an academic researcher from Technion – Israel Institute of Technology. The author has contributed to research in topics: Computer science & Reinforcement learning. The author has an hindex of 6, co-authored 11 publications receiving 143 citations.

Papers
More filters
Journal ArticleDOI

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

TL;DR: This work shows that the adaptive scaling mechanism used in TRPO is in fact the natural “RL version” of traditional trust-region methods from convex analysis, and proves fast rates of Õ(1/N), much like results in convex optimization.
Proceedings Article

Optimistic Policy Optimization with Bandit Feedback

TL;DR: This paper considers model-based RL in the tabular finite-horizon MDP setting with unknown transitions and bandit feedback, and proposes an optimistic trust region policy optimization (TRPO) algorithm, which establishes regret for stochastic rewards and proves regret for adversarial rewards.
Posted Content

Mirror Descent Policy Optimization.

TL;DR: Deep Reinforcement Learning algorithms inspired by mirror descent, a well-known first-order trust region optimization method for solving constrained convex problems, are proposed and the theoretical framework of MDPO can be scaled to deep RL while achieving good performance on popular benchmarks.
Posted Content

Optimistic Policy Optimization with Bandit Feedback

TL;DR: In this article, an optimistic trust region policy optimization (TRPO) algorithm is proposed for the MDP setting with unknown transitions and bandit feedback, which achieves sublinear regret bounds for stochastic rewards and adversarial rewards.
Proceedings Article

Exploration Conscious Reinforcement Learning Revisited

TL;DR: This work studies exploration-conscious criteria, that result in optimal policies with respect to the exploration mechanism, and establishes a surrogate Markov Decision Process to solve such criteria.