Learning with Good Feature Representations in Bandits and in RL with a Generative Model

Open AccessProceedings Article

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

Gellért Weisz, +2 more

- Vol. 1, pp 5662-5670

Chats0

About:

This article is published in International Conference on Machine Learning.The article was published on 2020-07-12 and is currently open access. It has received 108 citations till now. The article focuses on the topics: Feature (computer vision) & Generative model.

Citations

PDF

Open Access

More filters

Posted Content

Provably Efficient Reinforcement Learning with Linear Function Approximation

Chi Jin, +3 more

- 11 Jul 2019 -

arXiv: Learning

TL;DR: This paper proves that an optimistic modification of Least-Squares Value Iteration (LSVI) achieves regret, where d is the ambient dimension of feature space, H is the length of each episode, and T is the total number of steps, and is independent of the number of states and actions.

...read moreread less

Proceedings Article

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning

Simon S. Du, +3 more

TL;DR: For example, this article showed that even if the agent has a highly accurate linear representation, the agent still needs to sample an exponential number of trajectories in order to find a near-optimal policy.

...read moreread less

Posted Content

Provably Efficient Exploration in Policy Optimization

Qi Cai, +3 more

- 12 Dec 2019 -

arXiv: Learning

TL;DR: This paper proves that, in the problem of episodic Markov decision process with linear function approximation, unknown transition, and adversarial reward with full-information feedback, OPPO achieves regret.

...read moreread less

Proceedings Article

Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

Dylan J. Foster, +1 more

TL;DR: This work describes the minimax rates for contextual bandits with general, potentially nonparametric function classes, and shows that the first universal and optimal reduction from contextual bandits to online regression is provided, which requires no distributional assumptions beyond realizability.

...read moreread less

Posted Content

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

Dongruo Zhou, +2 more

- 15 Dec 2020 -

arXiv: Learning

TL;DR: A new Bernstein-type concentration inequality for self-normalized martingales for linear bandit problems with bounded noise and a new, computationally efficient algorithm with linear function approximation named UCRL-VTR for the aforementioned linear mixture MDPs in the episodic undiscounted setting are proposed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Theory of optimal experiments

Valerii V. Fedorov

Book

Algorithms for Reinforcement Learning

Csaba Szepesvári

TL;DR: This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.

...read moreread less

Journal ArticleDOI

Logarithmic regret algorithms for online convex optimization

Elad Hazan, +2 more

- 01 Dec 2007 -

Machine Learning

TL;DR: Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.

...read moreread less

Journal ArticleDOI

The Equivalence of Two Extremum Problems

J. Kiefer, +1 more

- 01 Jan 1960 -

Canadian Journal of Mathematics

TL;DR: In this article, the authors consider the problem of defining probability measures with finite support, i.e., measures that assign probability one to a set consisting of a finite number of points.

...read moreread less

Journal Article

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

Eyal Even-Dar, +2 more

- 01 Dec 2006 -

Journal of Machine Learning Research

TL;DR: A framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability) is described and a model-based and model-free variants of the elimination method are provided.

...read moreread less

arXiv: Learning

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

Citations

Provably Efficient Reinforcement Learning with Linear Function Approximation

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning

Provably Efficient Exploration in Policy Optimization

Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

References

Theory of optimal experiments

Algorithms for Reinforcement Learning

Logarithmic regret algorithms for online convex optimization

The Equivalence of Two Extremum Problems

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

Related Papers (5)

Improved Algorithms for Linear Stochastic Bandits

Provably efficient reinforcement learning with linear function approximation

Contextual decision processes with low Bellman rank are PAC-learnable

Contextual bandits with linear Payoff functions

Model-Based Reinforcement Learning with Value-Targeted Regression