Shalabh Bhatnagar

Researcher at Indian Institute of Science

Publications - 308

Citations - 5153

Shalabh Bhatnagar is an academic researcher from Indian Institute of Science. The author has contributed to research in topics: Stochastic approximation & Markov decision process. The author has an hindex of 30, co-authored 294 publications receiving 4300 citations. Previous affiliations of Shalabh Bhatnagar include University of Marne-la-Vallée & Indian Institutes of Technology.

Papers

PDF

Open Access

More filters

Proceedings ArticleDOI

Fast gradient-descent methods for temporal-difference learning with linear function approximation

Richard S. Sutton, +6 more

TL;DR: In this paper, the authors introduced two new related algorithms with better convergence rates: linear TD with gradient correction (TDC) and TDC with zero term update rule, which can be used for off-policy TD.

...read moreread less

Journal ArticleDOI

Natural actor-critic algorithms

Shalabh Bhatnagar, +3 more

- 01 Nov 2009 -

Automatica

TL;DR: Four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas are presented, and their convergence proofs are provided, providing the first convergence proofs and the first fully incremental algorithms.

...read moreread less

Journal ArticleDOI

Reinforcement Learning With Function Approximation for Traffic Signal Control

L A Prashanth, +1 more

- 01 Jun 2011 -

IEEE Transactions on Intelligent Transpo...

TL;DR: A reinforcement learning (RL) algorithm with function approximation for traffic signal control that incorporates state-action features and is easily implementable in high-dimensional settings and outperforms all the other algorithms on all the road network settings that it considers.

...read moreread less

Proceedings Article

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

Shalabh Bhatnagar, +5 more

TL;DR: This work presents a Bellman error objective function and two gradient-descent TD algorithms that optimize it, and proves the asymptotic almost-sure convergence of both algorithms, for any finite Markov decision process and any smooth value function approximator, to a locally optimal solution.

...read moreread less

Proceedings Article

Toward Off-Policy Learning Control with Function Approximation

Hamid Reza Maei, +3 more

TL;DR: The Greedy-GQ algorithm is an extension of recent work on gradient temporal-difference learning to a control setting in which the target policy is greedy with respect to a linear approximation to the optimal action-value function.

...read moreread less

Collapse