Open AccessProceedings Article
lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits
Kevin Jamieson,Matthew L. Malloy,Robert Nowak,Sébastien Bubeck +3 more
- pp 423-439
TLDR
It is proved that the UCB procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples is optimal up to constants and also shows through simulations that it provides superior performance with respect to the state-of-the-art.Abstract:
The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower bound based on the law of the iterated logarithm (LIL). Inspired by the LIL, we construct our confidence bounds to explicitly account for the infinite time horizon of the algorithm. In addition, by using a novel stopping time for the algorithm we avoid a union bound over the arms that has been observed in other UCBtype algorithms. We prove that the algorithm is optimal up to constants and also show through simulations that it provides superior performance with respect to the state-of-the-art.read more
Citations
More filters
Journal Article
On the complexity of best-arm identification in multi-armed bandit models
TL;DR: This work introduces generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings, and provides the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m ≥ 1 under general assumptions.
Journal Article
Hyperband: a novel bandit-based approach to hyperparameter optimization
TL;DR: A novel algorithm is introduced, Hyperband, for hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.
Posted Content
Non-stochastic Best Arm Identification and Hyperparameter Optimization
Kevin Jamieson,Ameet Talwalkar +1 more
TL;DR: This work casts hyperparameter optimization as an instance of non-stochastic best-arm identification, identifies a known algorithm that is well-suited for this setting, and empirically evaluates its behavior.
Posted Content
Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits
TL;DR: This work introduces Hyperband for hyperparameter optimization as a pure-exploration non-stochastic infinitely many armed bandit problem where allocation of additional resources to an arm corresponds to training a configuration on larger subsets of the data.
Proceedings Article
Combinatorial Pure Exploration of Multi-Armed Bandits
TL;DR: This paper presents general learning algorithms which work for all decision classes that admit offline maximization oracles in both fixed confidence and fixed budget settings and establishes a general problem-dependent lower bound for the CPE problem.
References
More filters
Journal ArticleDOI
Finite-time Analysis of the Multiarmed Bandit Problem
TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Proceedings Article
Improved Algorithms for Linear Stochastic Bandits
TL;DR: A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.
Proceedings Article
Best Arm Identification in Multi-Armed Bandits
TL;DR: In this paper, the regret of a forecaster is defined by the gap between the mean reward of the optimal arm and the ultimately chosen arm, and the regret decreases exponentially at a rate which is, up to a logarithmic factor, the best possible.
Book ChapterDOI
Pure exploration in multi-armed bandits problems
TL;DR: The main result is that the required exploration-exploitation trade-offs are qualitatively different, in view of a general lower bound on the simple regret in terms of the cumulative regret.
Journal ArticleDOI
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
Shie Mannor,John N. Tsitsiklis +1 more
TL;DR: This work considers the Multi-armed bandit problem under the PAC (“probably approximately correct”) model and generalizes the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the Arms are not.