Optimal Best Arm Identification with Fixed Confidence

Open AccessProceedings Article

Optimal Best Arm Identification with Fixed Confidence

Aurélien Garivier, +2 more

- Vol. 49, pp 998-1027

Chats0

TLDR

A new, tight lower bound on the sample complexity is proved on the complexity of best-arm identification in one-parameter bandit problems and the `Track-and-Stop' strategy is proposed, which is proved to be asymptotically optimal.

Abstract:

We provide a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the 'Track-and-Stop' strategy, which is proved to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.

Citations

PDF

Open Access

More filters

Proceedings Article

Simple Bayesian Algorithms for Best Arm Identification

Daniel Russo

TL;DR: In this paper, the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs is studied. But the authors focus on the problem of selecting the best design after a small number of measurements.

...read moreread less

Posted Content

Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals

Emilie Kaufmann, +1 more

- 28 Nov 2018 -

arXiv: Machine Learning

TL;DR: New deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model are presented, allowing us to analyze stopping rules based on generalized likelihood ratios for a large class of sequential identification problems, and to construct tight confidence intervals for some functions of the means of the arms.

...read moreread less

Posted Content

Improving the Expected Improvement Algorithm

Chao Qin, +2 more

- 29 May 2017 -

arXiv: Learning

TL;DR: In this article, a simple modification of the expected improvement algorithm is proposed, which is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude.

...read moreread less

Proceedings Article

On Explore-Then-Commit strategies

Aurélien Garivier, +2 more

TL;DR: Existing deviation inequalities are refined, which allow us to design fully sequential strategies with finite-time regret guarantees that are asymptotically optimal as the horizon grows and order-optimal in the minimax sense.

...read moreread less

Posted Content

On the Optimal Sample Complexity for Best Arm Identification

Lijie Chen, +1 more

- 12 Nov 2015 -

arXiv: Learning

TL;DR: The first lower bound for BEST-1-ARM is obtained that goes beyond the classic Mannor-Tsitsiklis lower bound, by an interesting reduction from Sign to BEST- 1-ARM.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Paper: Modeling by shortest data description

Jorma Rissanen

- 01 Sep 1978 -

Automatica

TL;DR: The number of digits it takes to write down an observed sequence x1,...,xN of a time series depends on the model with its parameters that one assumes to have generated the observed data.

...read moreread less

Book

Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems

Sébastien Bubeck, +1 more

TL;DR: In this article, the authors focus on regret analysis in the context of multi-armed bandit problems, where regret is defined as the balance between staying with the option that gave highest payoff in the past and exploring new options that might give higher payoffs in the future.

...read moreread less

Journal ArticleDOI

Asymptotically efficient adaptive allocation rules

Tze Leung Lai, +1 more

- 01 Mar 1985 -

Advances in Applied Mathematics

Proceedings Article

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

Niranjan Srinivas, +3 more

TL;DR: This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

...read moreread less

Proceedings Article

Improved Algorithms for Linear Stochastic Bandits

Yasin Abbasi-Yadkori, +2 more

TL;DR: A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.

...read moreread less