scispace - formally typeset
Open AccessProceedings Article

Optimal Best Arm Identification with Fixed Confidence

Reads0
Chats0
TLDR
A new, tight lower bound on the sample complexity is proved on the complexity of best-arm identification in one-parameter bandit problems and the `Track-and-Stop' strategy is proposed, which is proved to be asymptotically optimal.
Abstract
We provide a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the 'Track-and-Stop' strategy, which is proved to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

Simple Bayesian Algorithms for Best Arm Identification

TL;DR: In this paper, the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs is studied. But the authors focus on the problem of selecting the best design after a small number of measurements.
Posted Content

Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals

TL;DR: New deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model are presented, allowing us to analyze stopping rules based on generalized likelihood ratios for a large class of sequential identification problems, and to construct tight confidence intervals for some functions of the means of the arms.
Posted Content

Improving the Expected Improvement Algorithm

TL;DR: In this article, a simple modification of the expected improvement algorithm is proposed, which is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude.
Proceedings Article

On Explore-Then-Commit strategies

TL;DR: Existing deviation inequalities are refined, which allow us to design fully sequential strategies with finite-time regret guarantees that are asymptotically optimal as the horizon grows and order-optimal in the minimax sense.
Posted Content

On the Optimal Sample Complexity for Best Arm Identification

Lijie Chen, +1 more
- 12 Nov 2015 - 
TL;DR: The first lower bound for BEST-1-ARM is obtained that goes beyond the classic Mannor-Tsitsiklis lower bound, by an interesting reduction from Sign to BEST- 1-ARM.
References
More filters
Journal ArticleDOI

Paper: Modeling by shortest data description

Jorma Rissanen
- 01 Sep 1978 - 
TL;DR: The number of digits it takes to write down an observed sequence x1,...,xN of a time series depends on the model with its parameters that one assumes to have generated the observed data.
Book

Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems

TL;DR: In this article, the authors focus on regret analysis in the context of multi-armed bandit problems, where regret is defined as the balance between staying with the option that gave highest payoff in the past and exploring new options that might give higher payoffs in the future.
Proceedings Article

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

TL;DR: This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.
Proceedings Article

Improved Algorithms for Linear Stochastic Bandits

TL;DR: A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.
Related Papers (5)