Optimal adaptive policies for Markov decision processes

doi:10.1287/MOOR.22.1.222

Journal ArticleDOI

Optimal adaptive policies for Markov decision processes

Apostolos Burnetas, +1 more

- 01 Feb 1997 -

Mathematics of Operations Research

- Vol. 22, Iss: 1, pp 222-255

Chats0

TLDR

This paper gives the explicit form for a class of adaptive policies that possess optimal increase rate properties for the total expected finite horizon reward, under sufficient assumptions of finite state-action spaces and irreducibility of the transition law.

Abstract:

In this paper we consider the problem of adaptive control for Markov Decision Processes. We give the explicit form for a class of adaptive policies that possess optimal increase rate properties for the total expected finite horizon reward, under sufficient assumptions of finite state-action spaces and irreducibility of the transition law. A main feature of the proposed policies is that the choice of actions, at each state and time period, is based on indices that are inflations of the right-hand side of the estimated average reward optimality equations.

Citations

PDF

Open Access

More filters

Book

Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems

Sébastien Bubeck, +1 more

TL;DR: In this article, the authors focus on regret analysis in the context of multi-armed bandit problems, where regret is defined as the balance between staying with the option that gave highest payoff in the past and exploring new options that might give higher payoffs in the future.

...read moreread less

Journal Article

Near-optimal Regret Bounds for Reinforcement Learning

Thomas Jaksch, +2 more

- 01 Mar 2010 -

Journal of Machine Learning Research

TL;DR: For undiscounted reinforcement learning in Markov decision processes (MDPs), this paper presented a reinforcement learning algorithm with total regret O(DS√AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D.

...read moreread less

Proceedings Article

Deep exploration via bootstrapped DQN

Ian Osband, +3 more

TL;DR: Bootstrapped DQN as discussed by the authors combines deep exploration with deep neural networks for exponentially faster learning than any dithering strategy, which is a promising approach to efficient exploration with generalization.

...read moreread less

Proceedings Article

Minimax regret bounds for reinforcement learning

Mohammad Gheshlaghi Azar, +2 more

TL;DR: The problem of provably optimal exploration in reinforcement learning for finite horizon MDPs is considered, and an optimistic modification to value iteration achieves a regret bound of $\tilde{O}( \sqrt{HSAT} + H^2S^2A+H\sqrt {T})$ where $H$ is the time horizon, $S$ the number of states, $A$the number of actions and $T$ thenumber of time-steps.

...read moreread less

Posted Content

Deep Exploration via Bootstrapped DQN

Ian Osband, +3 more

- 15 Feb 2016 -

arXiv: Learning

TL;DR: Bootstrapped DQN as mentioned in this paper is a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions, which can lead to exponentially faster learning.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Large Deviations Techniques and Applications

Amir Dembo, +1 more

TL;DR: The LDP for Abstract Empirical Measures and applications-The Finite Dimensional Case and Applications of Empirically Measures LDP are presented.

...read moreread less

Journal ArticleDOI

Asymptotically efficient adaptive allocation rules

Tze Leung Lai, +1 more

- 01 Mar 1985 -

Advances in Applied Mathematics

Journal ArticleDOI

Some aspects of the sequential design of experiments

Herbert Robbins

- 01 Sep 1952 -

Bulletin of the American Mathematical So...

TL;DR: The authors proposed a theory of sequential design of experiments, in which the size and composition of the samples are not fixed in advance but are functions of the observations themselves, which is a major advance.

...read moreread less

BookDOI

Entropy, large deviations, and statistical mechanics

Richard S. Ellis

- 01 Jan 1985 -

Journal of the American Statistical Asso...

TL;DR: In this paper, the authors introduce the concept of large deviations for random variables with a finite state space, which is a generalization of the notion of large deviation for random vectors.

...read moreread less

Book

Applied Linear Algebra

Benjamin Noble

TL;DR: In this article, the authors introduce geometric vectors and vector spaces, as well as linear transformations and matrix algebras, for solving equations and finding inverses in matrix algebra.

...read moreread less