scispace - formally typeset
Open AccessProceedings Article

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

TLDR
It is proved that the UCB procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples is optimal up to constants and also shows through simulations that it provides superior performance with respect to the state-of-the-art.
Abstract
The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCBtype algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal Article

On the complexity of best-arm identification in multi-armed bandit models

TL;DR: This work introduces generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings, and provides the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m ≥ 1 under general assumptions.
Journal Article

Hyperband: a novel bandit-based approach to hyperparameter optimization

TL;DR: A novel algorithm is introduced, Hyperband, for hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.
Posted Content

Non-stochastic Best Arm Identification and Hyperparameter Optimization

TL;DR: This work casts hyperparameter optimization as an instance of non-stochastic best-arm identification, identifies a known algorithm that is well-suited for this setting, and empirically evaluates its behavior.
Posted Content

Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits

TL;DR: This work introduces Hyperband for hyperparameter optimization as a pure-exploration non-stochastic infinitely many armed bandit problem where allocation of additional resources to an arm corresponds to training a configuration on larger subsets of the data.
Proceedings Article

Combinatorial Pure Exploration of Multi-Armed Bandits

TL;DR: This paper presents general learning algorithms which work for all decision classes that admit offline maximization oracles in both fixed confidence and fixed budget settings and establishes a general problem-dependent lower bound for the CPE problem.
References
More filters
Journal ArticleDOI

Finite-time Analysis of the Multiarmed Bandit Problem

TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Proceedings Article

Improved Algorithms for Linear Stochastic Bandits

TL;DR: A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.
Proceedings Article

Best Arm Identification in Multi-Armed Bandits

TL;DR: In this paper, the regret of a forecaster is defined by the gap between the mean reward of the optimal arm and the ultimately chosen arm, and the regret decreases exponentially at a rate which is, up to a logarithmic factor, the best possible.
Book ChapterDOI

Pure exploration in multi-armed bandits problems

TL;DR: The main result is that the required exploration-exploitation trade-offs are qualitatively different, in view of a general lower bound on the simple regret in terms of the cumulative regret.
Journal ArticleDOI

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

TL;DR: This work considers the Multi-armed bandit problem under the PAC (“probably approximately correct”) model and generalizes the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the Arms are not.
Related Papers (5)