scispace - formally typeset
Open AccessProceedings Article

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

Reads0
Chats0
About
This article is published in International Conference on Machine Learning.The article was published on 2020-07-12 and is currently open access. It has received 108 citations till now. The article focuses on the topics: Feature (computer vision) & Generative model.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Provably Efficient Reinforcement Learning with Linear Function Approximation

TL;DR: This paper proves that an optimistic modification of Least-Squares Value Iteration (LSVI) achieves regret, where d is the ambient dimension of feature space, H is the length of each episode, and T is the total number of steps, and is independent of the number of states and actions.
Proceedings Article

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning

TL;DR: For example, this article showed that even if the agent has a highly accurate linear representation, the agent still needs to sample an exponential number of trajectories in order to find a near-optimal policy.
Posted Content

Provably Efficient Exploration in Policy Optimization

TL;DR: This paper proves that, in the problem of episodic Markov decision process with linear function approximation, unknown transition, and adversarial reward with full-information feedback, OPPO achieves regret.
Proceedings Article

Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

TL;DR: This work describes the minimax rates for contextual bandits with general, potentially nonparametric function classes, and shows that the first universal and optimal reduction from contextual bandits to online regression is provided, which requires no distributional assumptions beyond realizability.
Posted Content

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

TL;DR: A new Bernstein-type concentration inequality for self-normalized martingales for linear bandit problems with bounded noise and a new, computationally efficient algorithm with linear function approximation named UCRL-VTR for the aforementioned linear mixture MDPs in the episodic undiscounted setting are proposed.
References
More filters
Book

Algorithms for Reinforcement Learning

TL;DR: This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.
Journal ArticleDOI

Logarithmic regret algorithms for online convex optimization

TL;DR: Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.
Journal ArticleDOI

The Equivalence of Two Extremum Problems

TL;DR: In this article, the authors consider the problem of defining probability measures with finite support, i.e., measures that assign probability one to a set consisting of a finite number of points.
Journal Article

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

TL;DR: A framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability) is described and a model-based and model-free variants of the elimination method are provided.
Related Papers (5)