scispace - formally typeset
Open AccessJournal ArticleDOI

Arm-Acquiring Bandits

Peter Whittle
- 01 Apr 1981 - 
- Vol. 9, Iss: 2, pp 284-292
Reads0
Chats0
TLDR
In this article, the problem of allocating effort between projects at different stages of development when new projects are also continually appearing is considered, and an expression for the expected reward yielded by the Gittins index policy is derived.
Abstract
We consider the problem of allocating effort between projects at different stages of development when new projects are also continually appearing. An expression (14) is derived for the expected reward yielded by the Gittins index policy. This is shown to satisfy the dynamic programming equation for the problem, so confirming optimality of the policy.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Restless bandits: activity allocation in a changing world

TL;DR: In this article, the Lagrange multiplier associated with this constraint defines an index which reduces to the Gittins index when projects not being operated are static, and arguments are advanced to support the conjecture that, for m and n large in constant ratio, the policy of operating the m projects of largest current index is nearly optimal.
Journal ArticleDOI

A modern Bayesian look at the multi-armed bandit

TL;DR: A heuristic for managing multi-armed bandits called randomized probability matching is described, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal.
Journal ArticleDOI

Optimal control of two interacting service stations

TL;DR: In this paper, the optimal control of a Markov network with two service stations and linear cost is studied and optimal switching curves described by switching curves in the two-dimensional state space are shown to exist.
Journal ArticleDOI

Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.

TL;DR: In this paper, the authors proposed a bandit-based patient allocation rule that overcomes the issue of low power, thus removing a potential barrier for their use in practice, and evaluated their performance compared to other allocation rules, including fixed randomization.
Proceedings Article

Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

TL;DR: This paper fully characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward "variation" and the minimal achievable regret, and by established a connection between the adversarial and the stochastic MAB frameworks.