Arm-Acquiring Bandits
Reads0
Chats0
TLDR
In this article, the problem of allocating effort between projects at different stages of development when new projects are also continually appearing is considered, and an expression for the expected reward yielded by the Gittins index policy is derived.Abstract:
We consider the problem of allocating effort between projects at different stages of development when new projects are also continually appearing. An expression (14) is derived for the expected reward yielded by the Gittins index policy. This is shown to satisfy the dynamic programming equation for the problem, so confirming optimality of the policy.read more
Citations
More filters
Journal ArticleDOI
Restless bandits: activity allocation in a changing world
TL;DR: In this article, the Lagrange multiplier associated with this constraint defines an index which reduces to the Gittins index when projects not being operated are static, and arguments are advanced to support the conjecture that, for m and n large in constant ratio, the policy of operating the m projects of largest current index is nearly optimal.
Journal ArticleDOI
A modern Bayesian look at the multi-armed bandit
TL;DR: A heuristic for managing multi-armed bandits called randomized probability matching is described, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal.
Journal ArticleDOI
Optimal control of two interacting service stations
TL;DR: In this paper, the optimal control of a Markov network with two service stations and linear cost is studied and optimal switching curves described by switching curves in the two-dimensional state space are shown to exist.
Journal ArticleDOI
Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.
TL;DR: In this paper, the authors proposed a bandit-based patient allocation rule that overcomes the issue of low power, thus removing a potential barrier for their use in practice, and evaluated their performance compared to other allocation rules, including fixed randomization.
Proceedings Article
Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards
TL;DR: This paper fully characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward "variation" and the minimal achievable regret, and by established a connection between the adversarial and the stochastic MAB frameworks.