S
Shalabh Bhatnagar
Researcher at Indian Institute of Science
Publications - 308
Citations - 5153
Shalabh Bhatnagar is an academic researcher from Indian Institute of Science. The author has contributed to research in topics: Stochastic approximation & Markov decision process. The author has an hindex of 30, co-authored 294 publications receiving 4300 citations. Previous affiliations of Shalabh Bhatnagar include University of Marne-la-Vallée & Indian Institutes of Technology.
Papers
More filters
Proceedings ArticleDOI
Fast gradient-descent methods for temporal-difference learning with linear function approximation
Richard S. Sutton,Hamid Reza Maei,Doina Precup,Shalabh Bhatnagar,David Silver,Csaba Szepesvári,Eric Wiewiora +6 more
TL;DR: In this paper, the authors introduced two new related algorithms with better convergence rates: linear TD with gradient correction (TDC) and TDC with zero term update rule, which can be used for off-policy TD.
Journal ArticleDOI
Natural actor-critic algorithms
TL;DR: Four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas are presented, and their convergence proofs are provided, providing the first convergence proofs and the first fully incremental algorithms.
Journal ArticleDOI
Reinforcement Learning With Function Approximation for Traffic Signal Control
L A Prashanth,Shalabh Bhatnagar +1 more
TL;DR: A reinforcement learning (RL) algorithm with function approximation for traffic signal control that incorporates state-action features and is easily implementable in high-dimensional settings and outperforms all the other algorithms on all the road network settings that it considers.
Proceedings Article
Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
Shalabh Bhatnagar,Doina Precup,David Silver,Richard S. Sutton,Hamid Reza Maei,Csaba Szepesvári +5 more
TL;DR: This work presents a Bellman error objective function and two gradient-descent TD algorithms that optimize it, and proves the asymptotic almost-sure convergence of both algorithms, for any finite Markov decision process and any smooth value function approximator, to a locally optimal solution.
Proceedings Article
Toward Off-Policy Learning Control with Function Approximation
TL;DR: The Greedy-GQ algorithm is an extension of recent work on gradient temporal-difference learning to a control setting in which the target policy is greedy with respect to a linear approximation to the optimal action-value function.