S
Shalabh Bhatnagar
Researcher at Indian Institute of Science
Publications - 308
Citations - 5153
Shalabh Bhatnagar is an academic researcher from Indian Institute of Science. The author has contributed to research in topics: Stochastic approximation & Markov decision process. The author has an hindex of 30, co-authored 294 publications receiving 4300 citations. Previous affiliations of Shalabh Bhatnagar include University of Marne-la-Vallée & Indian Institutes of Technology.
Papers
More filters
Journal ArticleDOI
Q-Learning Based Energy Management Policies for a Single Sensor Node with Finite Buffer
TL;DR: This problem of finding optimal energy management policies in the presence of energy harvesting sources to maximize network performance is considered in the discounted cost Markov decision process framework and two reinforcement learning algorithms are applied.
Journal ArticleDOI
A two timescale stochastic approximation scheme for simulation-based parametric optimization
TL;DR: In this paper, a two timescale stochastic approximation scheme which uses coupled iterations is used for simulation-based parametric optimization as an alternative to traditional "infinitesimal perturbation analysis" schemes, it avoids the aggregation of data present in many other schemes.
Journal ArticleDOI
Threshold Tuning Using Stochastic Optimization for Graded Signal Control
L A Prashanth,Shalabh Bhatnagar +1 more
TL;DR: This paper presents an algorithm based on stochastic optimization to tune the thresholds that are associated with a TLC algorithm for optimal performance, and proposes the following three novel TLC algorithms: a full-state Q- learning algorithm with state aggregation, a Q-learning algorithm with function approximation that involves an enhanced feature selection scheme, and a priority-based TLC scheme.
Posted Content
Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge
TL;DR: The crucial idea in the method is the concept of partial observability and how UAVs can retain relevant information about the environment structure to make better future navigation decisions and it has a high inference rate and reduces power wastage as it minimizes oscillatory motion of UAV.
Journal ArticleDOI
Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning
TL;DR: An asymptotic convergence analysis of two time-scale stochastic approximation driven by “controlled” Markov noise is presented and a solution to the off-policy convergence problem for temporal-difference learning with linear function approximation is presented.