lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Open AccessProceedings Article

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

- pp 423-439

TLDR

It is proved that the UCB procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples is optimal up to constants and also shows through simulations that it provides superior performance with respect to the state-of-the-art.

Abstract:

The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCBtype algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art.

Citations

PDF

Open Access

More filters

Journal Article

On the complexity of best-arm identification in multi-armed bandit models

Emilie Kaufmann, +2 more

- 01 Jan 2016 -

Journal of Machine Learning Research

TL;DR: This work introduces generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings, and provides the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m ≥ 1 under general assumptions.

...read moreread less

Journal Article

Hyperband: a novel bandit-based approach to hyperparameter optimization

Lisha Li, +4 more

- 01 Jan 2017 -

Journal of Machine Learning Research

TL;DR: A novel algorithm is introduced, Hyperband, for hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.

...read moreread less

Posted Content

Non-stochastic Best Arm Identification and Hyperparameter Optimization

Kevin Jamieson, +1 more

- 27 Feb 2015 -

arXiv: Learning

TL;DR: This work casts hyperparameter optimization as an instance of non-stochastic best-arm identification, identifies a known algorithm that is well-suited for this setting, and empirically evaluates its behavior.

...read moreread less

Posted Content

Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits

Afshin Rostamizadeh, +4 more

- 01 Jan 2017 -

arXiv: Learning

TL;DR: This work introduces Hyperband for hyperparameter optimization as a pure-exploration non-stochastic infinitely many armed bandit problem where allocation of additional resources to an arm corresponds to training a configuration on larger subsets of the data.

...read moreread less

Proceedings Article

Combinatorial Pure Exploration of Multi-Armed Bandits

Shouyuan Chen, +4 more

TL;DR: This paper presents general learning algorithms which work for all decision classes that admit offline maximization oracles in both fixed confidence and fixed budget settings and establishes a general problem-dependent lower bound for the CPE problem.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Finite-time Analysis of the Multiarmed Bandit Problem

Peter Auer, +2 more

- 01 May 2002 -

Machine Learning

TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

...read moreread less

Proceedings Article

Improved Algorithms for Linear Stochastic Bandits

Yasin Abbasi-Yadkori, +2 more

TL;DR: A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.

...read moreread less

Proceedings Article

Best Arm Identification in Multi-Armed Bandits

Jean-Yves Audibert, +1 more

TL;DR: In this paper, the regret of a forecaster is defined by the gap between the mean reward of the optimal arm and the ultimately chosen arm, and the regret decreases exponentially at a rate which is, up to a logarithmic factor, the best possible.

...read moreread less

Book ChapterDOI

Pure exploration in multi-armed bandits problems

Sébastien Bubeck, +2 more

TL;DR: The main result is that the required exploration-exploitation trade-offs are qualitatively different, in view of a general lower bound on the simple regret in terms of the cumulative regret.

...read moreread less

Journal ArticleDOI

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

Shie Mannor, +1 more

- 01 Dec 2004 -

Journal of Machine Learning Research

TL;DR: This work considers the Multi-armed bandit problem under the PAC (“probably approximately correct”) model and generalizes the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the Arms are not.

...read moreread less