scispace - formally typeset
S

Shalabh Bhatnagar

Researcher at Indian Institute of Science

Publications -  308
Citations -  5153

Shalabh Bhatnagar is an academic researcher from Indian Institute of Science. The author has contributed to research in topics: Stochastic approximation & Markov decision process. The author has an hindex of 30, co-authored 294 publications receiving 4300 citations. Previous affiliations of Shalabh Bhatnagar include University of Marne-la-Vallée & Indian Institutes of Technology.

Papers
More filters
Proceedings ArticleDOI

Fast gradient-descent methods for temporal-difference learning with linear function approximation

TL;DR: In this paper, the authors introduced two new related algorithms with better convergence rates: linear TD with gradient correction (TDC) and TDC with zero term update rule, which can be used for off-policy TD.
Journal ArticleDOI

Natural actor-critic algorithms

TL;DR: Four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas are presented, and their convergence proofs are provided, providing the first convergence proofs and the first fully incremental algorithms.
Journal ArticleDOI

Reinforcement Learning With Function Approximation for Traffic Signal Control

TL;DR: A reinforcement learning (RL) algorithm with function approximation for traffic signal control that incorporates state-action features and is easily implementable in high-dimensional settings and outperforms all the other algorithms on all the road network settings that it considers.
Proceedings Article

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

TL;DR: This work presents a Bellman error objective function and two gradient-descent TD algorithms that optimize it, and proves the asymptotic almost-sure convergence of both algorithms, for any finite Markov decision process and any smooth value function approximator, to a locally optimal solution.
Proceedings Article

Toward Off-Policy Learning Control with Function Approximation

TL;DR: The Greedy-GQ algorithm is an extension of recent work on gradient temporal-difference learning to a control setting in which the target policy is greedy with respect to a linear approximation to the optimal action-value function.