Taming Non-stationary Bandits: A Bayesian Approach

Open AccessPosted Content

Taming Non-stationary Bandits: A Bayesian Approach

- 31 Jul 2017 -

TLDR

This work proposes a variant of Thompson Sampling which can be used in both rested and restless bandit scenarios and derives the exact expression for the probability of picking sub-optimal arms from the parameters of prior distribution.

Abstract:

We consider the multi armed bandit problem in non-stationary environments. Based on the Bayesian method, we propose a variant of Thompson Sampling which can be used in both rested and restless bandit scenarios. Applying discounting to the parameters of prior distribution, we describe a way to systematically reduce the effect of past observations. Further, we derive the exact expression for the probability of picking sub-optimal arms. By increasing the exploitative value of Bayes' samples, we also provide an optimistic version of the algorithm. Extensive empirical analysis is conducted under various scenarios to validate the utility of proposed algorithms. A comparison study with various state-of-the-arm algorithms is also included.

Citations

PDF

Open Access

More filters

Posted Content

Weighted Linear Bandits for Non-Stationary Environments

Yoan Russac, +2 more

- 19 Sep 2019 -

arXiv: Learning

TL;DR: In this paper, the authors consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model and propose an optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past.

...read moreread less

Journal ArticleDOI

Adaptive Targeted Infectious Disease Testing

Maximilian Kasy, +1 more

- 28 Sep 2020 -

Oxford Review of Economic Policy

TL;DR: In this paper, the authors show how to use costly testing resources in an epidemic, when testing outcomes can be used to make quarantine decisions, and describe a simple policy that is nearly optimal from a dynamic perspective, including imperfect testing technology, appropriate choice of prior, and nonstationarity of the prevalence rate.

...read moreread less

Posted Content

Hedging the Drift: Learning to Optimize under Non-Stationarity

Wang Chi Cheung, +2 more

- 04 Mar 2019 -

arXiv: Learning

TL;DR: This work introduces data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of non-stationary stochastic bandit settings and leverages the power of the "forgetting principle" in the learning processes, which is vital in changing environments.

...read moreread less

Proceedings ArticleDOI

AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning.

Han Guo, +2 more

TL;DR: The authors proposed AutoSeM, a two-stage multi-task learning pipeline, where the first stage automatically selects the most useful auxiliary tasks via a Beta-Bernoulli multi-armed bandit with Thompson sampling, and the second stage learns the training mixing ratio of these selected auxiliary task via a Gaussian Process based Bayesian optimization framework.

...read moreread less

Posted Content

The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm for Piece-Wise Non-Stationary Bandits.

Lilian Besson, +1 more

- 01 Feb 2019 -

arXiv: Machine Learning

TL;DR: The proposed GLR-klUCB combines an efficient bandit algorithm, klUCB, with an efficient, parameter-free, change-point detector, the Bernoulli Generalized Likelihood Ratio Test, for which it provides new theoretical guarantees of independent interest.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Finite-time Analysis of the Multiarmed Bandit Problem

Peter Auer, +2 more

- 01 May 2002 -

Machine Learning

TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

...read moreread less

Journal ArticleDOI

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples

William R. Thompson

- 01 Dec 1933 -

Biometrika

Book

Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems

Sébastien Bubeck, +1 more

TL;DR: In this article, the authors focus on regret analysis in the context of multi-armed bandit problems, where regret is defined as the balance between staying with the option that gave highest payoff in the past and exploring new options that might give higher payoffs in the future.

...read moreread less

Journal ArticleDOI

Some aspects of the sequential design of experiments

Herbert Robbins

- 01 Sep 1952 -

Bulletin of the American Mathematical So...

TL;DR: The authors proposed a theory of sequential design of experiments, in which the size and composition of the samples are not fixed in advance but are functions of the observations themselves, which is a major advance.

...read moreread less

Proceedings Article

An Empirical Evaluation of Thompson Sampling

Olivier Chapelle, +1 more

TL;DR: Empirical results using Thompson sampling on simulated and real data are presented, and it is shown that it is highly competitive and should be part of the standard baselines to compare against.

...read moreread less

Collapse

Taming Non-stationary Bandits: A Bayesian Approach

Citations

Weighted Linear Bandits for Non-Stationary Environments

Adaptive Targeted Infectious Disease Testing

Hedging the Drift: Learning to Optimize under Non-Stationarity

AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning.

The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm for Piece-Wise Non-Stationary Bandits.

References

Finite-time Analysis of the Multiarmed Bandit Problem

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples

Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems

Some aspects of the sequential design of experiments

An Empirical Evaluation of Thompson Sampling

Related Papers (5)

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples

An Empirical Evaluation of Thompson Sampling

Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

On upper-confidence bound policies for switching bandit problems

Finite-time Analysis of the Multiarmed Bandit Problem