Shalabh Bhatnagar

Researcher at Indian Institute of Science

Publications - 308

Citations - 5153

Shalabh Bhatnagar is an academic researcher from Indian Institute of Science. The author has contributed to research in topics: Stochastic approximation & Markov decision process. The author has an hindex of 30, co-authored 294 publications receiving 4300 citations. Previous affiliations of Shalabh Bhatnagar include University of Marne-la-Vallée & Indian Institutes of Technology.

Papers

PDF

Open Access

More filters

Journal ArticleDOI

Q-Learning Based Energy Management Policies for a Single Sensor Node with Finite Buffer

K J Prabuchandran, +2 more

- 01 Feb 2013 -

IEEE Wireless Communications Letters

TL;DR: This problem of finding optimal energy management policies in the presence of energy harvesting sources to maximize network performance is considered in the discounted cost Markov decision process framework and two reinforcement learning algorithms are applied.

...read moreread less

Journal ArticleDOI

A two timescale stochastic approximation scheme for simulation-based parametric optimization

Shalabh Bhatnagar, +1 more

- 01 Oct 1998 -

Probability in the Engineering and Infor...

TL;DR: In this paper, a two timescale stochastic approximation scheme which uses coupled iterations is used for simulation-based parametric optimization as an alternative to traditional "infinitesimal perturbation analysis" schemes, it avoids the aggregation of data present in many other schemes.

...read moreread less

Journal ArticleDOI

Threshold Tuning Using Stochastic Optimization for Graded Signal Control

L A Prashanth, +1 more

- 23 Jul 2012 -

IEEE Transactions on Vehicular Technolog...

TL;DR: This paper presents an algorithm based on stochastic optimization to tune the thresholds that are associated with a TLC algorithm for optimal performance, and proposes the following three novel TLC algorithms: a full-state Q- learning algorithm with state aggregation, a Q-learning algorithm with function approximation that involves an enhanced feature selection scheme, and a priority-based TLC scheme.

...read moreread less

Posted Content

Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge

Abhik Singla, +2 more

- 08 Nov 2018 -

arXiv: Robotics

TL;DR: The crucial idea in the method is the concept of partial observability and how UAVs can retain relevant information about the environment structure to make better future navigation decisions and it has a high inference rate and reduces power wastage as it minimizes oscillatory motion of UAV.

...read moreread less

Journal ArticleDOI

Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning

Prasenjit Karmakar, +1 more

- 13 Jul 2017 -

Mathematics of Operations Research

TL;DR: An asymptotic convergence analysis of two time-scale stochastic approximation driven by “controlled” Markov noise is presented and a solution to the off-policy convergence problem for temporal-difference learning with linear function approximation is presented.

...read moreread less

Collapse