PAC Subset Selection in Stochastic Multi-armed Bandits

Open AccessProceedings Article

PAC Subset Selection in Stochastic Multi-armed Bandits

Shivaram Kalyanakrishnan, +3 more

- pp 227-234

Chats0

TLDR

The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.

Abstract:

We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset of size m of those arms with the highest expected rewards, based on efficiently sampling the arms. This "subset selection" problem finds application in a variety of areas. In the authors' previous work (Kalyanakrishnan & Stone, 2010), this problem is framed under a PAC setting (denoted "Explore-m"), and corresponding sampling algorithms are analyzed. Whereas the formal analysis therein is restricted to the worst case sample complexity of algorithms, in this paper, we design and analyze an algorithm ("LUCB") with improved expected sample complexity. Interestingly LUCB bears a close resemblance to the well-known UCB algorithm for regret minimization. The expected sample complexity bound we show for LUCB is novel even for single-arm selection (Explore-1). We also give a lower bound on the worst case sample complexity of PAC algorithms for Explore-m.

Citations

PDF

Open Access

More filters

Journal Article

On the complexity of best-arm identification in multi-armed bandit models

Emilie Kaufmann, +2 more

- 01 Jan 2016 -

Journal of Machine Learning Research

TL;DR: This work introduces generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings, and provides the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m ≥ 1 under general assumptions.

...read moreread less

Proceedings Article

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

Kevin Jamieson, +3 more

TL;DR: It is proved that the UCB procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples is optimal up to constants and also shows through simulations that it provides superior performance with respect to the state-of-the-art.

...read moreread less

Proceedings Article

Almost Optimal Exploration in Multi-Armed Bandits

Zohar Karnin, +2 more

TL;DR: Two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given atarget budget of arm pulls, are presented, for which upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters are proved.

...read moreread less

Posted Content

Non-stochastic Best Arm Identification and Hyperparameter Optimization

Kevin Jamieson, +1 more

- 27 Feb 2015 -

arXiv: Learning

TL;DR: This work casts hyperparameter optimization as an instance of non-stochastic best-arm identification, identifies a known algorithm that is well-suited for this setting, and empirically evaluates its behavior.

...read moreread less

Proceedings Article

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Victor Gabillon, +2 more

TL;DR: A performance bound is proved for the two versions of the UGapE algorithm showing that the two problems are characterized by the same notion of complexity.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book ChapterDOI

Probability Inequalities for sums of Bounded Random Variables

Wassily Hoeffding

- 01 Mar 1963 -

Journal of the American Statistical Asso...

TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.

...read moreread less

Journal ArticleDOI

Finite-time Analysis of the Multiarmed Bandit Problem

Peter Auer, +2 more

- 01 May 2002 -

Machine Learning

TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

...read moreread less

Journal ArticleDOI

The Nonstochastic Multiarmed Bandit Problem

Peter Auer, +3 more

- 01 Jan 2003 -

SIAM Journal on Computing

TL;DR: A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs.

...read moreread less

Journal ArticleDOI

Some aspects of the sequential design of experiments

Herbert Robbins

- 01 Sep 1952 -

Bulletin of the American Mathematical So...

TL;DR: The authors proposed a theory of sequential design of experiments, in which the size and composition of the samples are not fixed in advance but are functions of the observations themselves, which is a major advance.

...read moreread less

Journal Article

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

Eyal Even-Dar, +2 more

- 01 Dec 2006 -

Journal of Machine Learning Research

TL;DR: A framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability) is described and a model-based and model-free variants of the elimination method are provided.

...read moreread less