scispace - formally typeset
S

Shalabh Bhatnagar

Researcher at Indian Institute of Science

Publications -  308
Citations -  5153

Shalabh Bhatnagar is an academic researcher from Indian Institute of Science. The author has contributed to research in topics: Stochastic approximation & Markov decision process. The author has an hindex of 30, co-authored 294 publications receiving 4300 citations. Previous affiliations of Shalabh Bhatnagar include University of Marne-la-Vallée & Indian Institutes of Technology.

Papers
More filters
Journal ArticleDOI

Q-Learning Based Energy Management Policies for a Single Sensor Node with Finite Buffer

TL;DR: This problem of finding optimal energy management policies in the presence of energy harvesting sources to maximize network performance is considered in the discounted cost Markov decision process framework and two reinforcement learning algorithms are applied.
Journal ArticleDOI

A two timescale stochastic approximation scheme for simulation-based parametric optimization

TL;DR: In this paper, a two timescale stochastic approximation scheme which uses coupled iterations is used for simulation-based parametric optimization as an alternative to traditional "infinitesimal perturbation analysis" schemes, it avoids the aggregation of data present in many other schemes.
Journal ArticleDOI

Threshold Tuning Using Stochastic Optimization for Graded Signal Control

TL;DR: This paper presents an algorithm based on stochastic optimization to tune the thresholds that are associated with a TLC algorithm for optimal performance, and proposes the following three novel TLC algorithms: a full-state Q- learning algorithm with state aggregation, a Q-learning algorithm with function approximation that involves an enhanced feature selection scheme, and a priority-based TLC scheme.
Posted Content

Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge

TL;DR: The crucial idea in the method is the concept of partial observability and how UAVs can retain relevant information about the environment structure to make better future navigation decisions and it has a high inference rate and reduces power wastage as it minimizes oscillatory motion of UAV.
Journal ArticleDOI

Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning

TL;DR: An asymptotic convergence analysis of two time-scale stochastic approximation driven by “controlled” Markov noise is presented and a solution to the off-policy convergence problem for temporal-difference learning with linear function approximation is presented.