scispace - formally typeset
Journal ArticleDOI

Optimal adaptive policies for Markov decision processes

Apostolos Burnetas, +1 more
- 01 Feb 1997 - 
- Vol. 22, Iss: 1, pp 222-255
Reads0
Chats0
TLDR
This paper gives the explicit form for a class of adaptive policies that possess optimal increase rate properties for the total expected finite horizon reward, under sufficient assumptions of finite state-action spaces and irreducibility of the transition law.
Abstract
In this paper we consider the problem of adaptive control for Markov Decision Processes. We give the explicit form for a class of adaptive policies that possess optimal increase rate properties for the total expected finite horizon reward, under sufficient assumptions of finite state-action spaces and irreducibility of the transition law. A main feature of the proposed policies is that the choice of actions, at each state and time period, is based on indices that are inflations of the right-hand side of the estimated average reward optimality equations.

read more

Citations
More filters
Book

Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems

TL;DR: In this article, the authors focus on regret analysis in the context of multi-armed bandit problems, where regret is defined as the balance between staying with the option that gave highest payoff in the past and exploring new options that might give higher payoffs in the future.
Journal Article

Near-optimal Regret Bounds for Reinforcement Learning

TL;DR: For undiscounted reinforcement learning in Markov decision processes (MDPs), this paper presented a reinforcement learning algorithm with total regret O(DS√AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D.
Proceedings Article

Deep exploration via bootstrapped DQN

TL;DR: Bootstrapped DQN as discussed by the authors combines deep exploration with deep neural networks for exponentially faster learning than any dithering strategy, which is a promising approach to efficient exploration with generalization.
Proceedings Article

Minimax regret bounds for reinforcement learning

TL;DR: The problem of provably optimal exploration in reinforcement learning for finite horizon MDPs is considered, and an optimistic modification to value iteration achieves a regret bound of $\tilde{O}( \sqrt{HSAT} + H^2S^2A+H\sqrt {T})$ where $H$ is the time horizon, $S$ the number of states, $A$the number of actions and $T$ thenumber of time-steps.
Posted Content

Deep Exploration via Bootstrapped DQN

TL;DR: Bootstrapped DQN as mentioned in this paper is a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions, which can lead to exponentially faster learning.
References
More filters
Book

Large Deviations Techniques and Applications

Amir Dembo, +1 more
TL;DR: The LDP for Abstract Empirical Measures and applications-The Finite Dimensional Case and Applications of Empirically Measures LDP are presented.
Journal ArticleDOI

Some aspects of the sequential design of experiments

TL;DR: The authors proposed a theory of sequential design of experiments, in which the size and composition of the samples are not fixed in advance but are functions of the observations themselves, which is a major advance.
BookDOI

Entropy, large deviations, and statistical mechanics

TL;DR: In this paper, the authors introduce the concept of large deviations for random variables with a finite state space, which is a generalization of the notion of large deviation for random vectors.
Book

Applied Linear Algebra

TL;DR: In this article, the authors introduce geometric vectors and vector spaces, as well as linear transformations and matrix algebras, for solving equations and finding inverses in matrix algebra.
Related Papers (5)