Arm-Acquiring Bandits

doi:10.1214/AOP/1176994469

Open AccessJournal ArticleDOI

Arm-Acquiring Bandits

Peter Whittle

- 01 Apr 1981 -

Annals of Probability

- Vol. 9, Iss: 2, pp 284-292

Chats0

TLDR

In this article, the problem of allocating effort between projects at different stages of development when new projects are also continually appearing is considered, and an expression for the expected reward yielded by the Gittins index policy is derived.

Abstract:

We consider the problem of allocating effort between projects at different stages of development when new projects are also continually appearing. An expression (14) is derived for the expected reward yielded by the Gittins index policy. This is shown to satisfy the dynamic programming equation for the problem, so confirming optimality of the policy.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Restless bandits: activity allocation in a changing world

Peter Whittle

- 01 Jan 1988 -

Journal of Applied Probability

TL;DR: In this article, the Lagrange multiplier associated with this constraint defines an index which reduces to the Gittins index when projects not being operated are static, and arguments are advanced to support the conjecture that, for m and n large in constant ratio, the policy of operating the m projects of largest current index is nearly optimal.

...read moreread less

Journal ArticleDOI

A modern Bayesian look at the multi-armed bandit

Steven L. Scott

- 01 Nov 2010 -

Applied Stochastic Models in Business an...

TL;DR: A heuristic for managing multi-armed bandits called randomized probability matching is described, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal.

...read moreread less

Journal ArticleDOI

Optimal control of two interacting service stations

Bruce Hajek

- 01 Jun 1984 -

IEEE Transactions on Automatic Control

TL;DR: In this paper, the optimal control of a Markov network with two service stations and linear cost is studied and optimal switching curves described by switching curves in the two-dimensional state space are shown to exist.

...read moreread less

Journal ArticleDOI

Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.

Sofia S. Villar, +2 more

- 29 Jul 2015 -

Statistical Science

TL;DR: In this paper, the authors proposed a bandit-based patient allocation rule that overcomes the issue of low power, thus removing a potential barrier for their use in practice, and evaluated their performance compared to other allocation rules, including fixed randomization.

...read moreread less

Proceedings Article

Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

Omar Besbes, +2 more

TL;DR: This paper fully characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward "variation" and the minimal achievable regret, and by established a connection between the adversarial and the stochastic MAB frameworks.

...read moreread less

Collapse

Arm-Acquiring Bandits

Citations

Restless bandits: activity allocation in a changing world

A modern Bayesian look at the multi-armed bandit

Optimal control of two interacting service stations

Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.

Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

Related Papers (5)

Restless bandits: activity allocation in a changing world

Bandit Processes and Dynamic Allocation Indices

Multi-Armed Bandit Allocation Indices.

Multi-Armed Bandits and the Gittins Index

Some aspects of the sequential design of experiments