scispace - formally typeset
Open AccessPosted Content

Taming Non-stationary Bandits: A Bayesian Approach

TLDR
This work proposes a variant of Thompson Sampling which can be used in both rested and restless bandit scenarios and derives the exact expression for the probability of picking sub-optimal arms from the parameters of prior distribution.
Abstract
We consider the multi armed bandit problem in non-stationary environments. Based on the Bayesian method, we propose a variant of Thompson Sampling which can be used in both rested and restless bandit scenarios. Applying discounting to the parameters of prior distribution, we describe a way to systematically reduce the effect of past observations. Further, we derive the exact expression for the probability of picking sub-optimal arms. By increasing the exploitative value of Bayes' samples, we also provide an optimistic version of the algorithm. Extensive empirical analysis is conducted under various scenarios to validate the utility of proposed algorithms. A comparison study with various state-of-the-arm algorithms is also included.

read more

Citations
More filters
Posted Content

Weighted Linear Bandits for Non-Stationary Environments

TL;DR: In this paper, the authors consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model and propose an optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past.
Journal ArticleDOI

Adaptive Targeted Infectious Disease Testing

TL;DR: In this paper, the authors show how to use costly testing resources in an epidemic, when testing outcomes can be used to make quarantine decisions, and describe a simple policy that is nearly optimal from a dynamic perspective, including imperfect testing technology, appropriate choice of prior, and nonstationarity of the prevalence rate.
Posted Content

Hedging the Drift: Learning to Optimize under Non-Stationarity

TL;DR: This work introduces data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of non-stationary stochastic bandit settings and leverages the power of the "forgetting principle" in the learning processes, which is vital in changing environments.
Proceedings ArticleDOI

AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning.

TL;DR: The authors proposed AutoSeM, a two-stage multi-task learning pipeline, where the first stage automatically selects the most useful auxiliary tasks via a Beta-Bernoulli multi-armed bandit with Thompson sampling, and the second stage learns the training mixing ratio of these selected auxiliary task via a Gaussian Process based Bayesian optimization framework.
Posted Content

The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm for Piece-Wise Non-Stationary Bandits.

TL;DR: The proposed GLR-klUCB combines an efficient bandit algorithm, klUCB, with an efficient, parameter-free, change-point detector, the Bernoulli Generalized Likelihood Ratio Test, for which it provides new theoretical guarantees of independent interest.
References
More filters
Journal ArticleDOI

Finite-time Analysis of the Multiarmed Bandit Problem

TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Book

Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems

TL;DR: In this article, the authors focus on regret analysis in the context of multi-armed bandit problems, where regret is defined as the balance between staying with the option that gave highest payoff in the past and exploring new options that might give higher payoffs in the future.
Journal ArticleDOI

Some aspects of the sequential design of experiments

TL;DR: The authors proposed a theory of sequential design of experiments, in which the size and composition of the samples are not fixed in advance but are functions of the observations themselves, which is a major advance.
Proceedings Article

An Empirical Evaluation of Thompson Sampling

TL;DR: Empirical results using Thompson sampling on simulated and real data are presented, and it is shown that it is highly competitive and should be part of the standard baselines to compare against.
Related Papers (5)