A Bernoulli Two-armed Bandit

doi:10.1214/AOMS/1177692553

Open AccessJournal ArticleDOI

A Bernoulli Two-armed Bandit

Donald A. Berry

- 01 Jun 1972 -

Annals of Mathematical Statistics

- Vol. 43, Iss: 3, pp 871-897

TLDR

In this article, a Bernoulli process with unknown expectations is selected and observed at each of n$ stages, and the objective is to maximize the expected number of successes from the n$ selections.

Abstract:

One of two independent Bernoulli processes (arms) with unknown expectations $\rho$ and $\lambda$ is selected and observed at each of $n$ stages. The selection problem is sequential in that the process which is selected at a particular stage is a function of the results of previous selections as well as of prior information about $\rho$ and $\lambda$. The variables $\rho$ and $\lambda$ are assumed to be independent under the (prior) probability distribution. The objective is to maximize the expected number of successes from the $n$ selections. Sufficient conditions for the optimality of selecting one or the other of the arms are given and illustrated for example distributions. The stay-on-a-winner rule is proved.

Citations

PDF

Open Access

More filters

Book ChapterDOI

Information bounds, certainty equivalence and learning in asymptotically efficient adaptive control of time-invariant stochastic systems

Tze Leung Lai

- 01 Jan 1991 -

Lecture Notes in Control and Information...

Proceedings ArticleDOI

Optimal index rules for single resource allocation to stochastic dynamic competitors

Peter Jacko

TL;DR: This paper shows that the generic Markov decision process model of optimal single resource allocation to a collection of stochastic dynamic competitors is equivalent to solving a time sequence of its Lagrangian relaxations, and gives insights on sufficient conditions for optimality of index rules in restless problems.

...read moreread less

Journal ArticleDOI

Small-sample performance of Bernoulli two-armed bandit Bayesian strategies

Josep Ginebra, +1 more

- 01 Jun 1999 -

Journal of Statistical Planning and Infe...

TL;DR: In this paper, the authors examine the small-sample performance of a number of strategies for Bernoulli two-armed bandit problems with independent arms, and show that the myopic strategy and the strategy based on the one-armed threshold value dominate the Bayesian optimal strategy over a region in the parameter space that can have large probability under the assumed prior.

...read moreread less

Journal Article

The Finite-Horizon Two-Armed Bandit Problem with Binary Responses:A Multidisciplinary Survey of the History, State of the Art, and Myths

Peter Jacko

- 01 Jun 2019 -

arXiv: Optimization and Control

TL;DR: This paper considers the two-armed bandit problem, and presents a unified model cast in the Markov decision process framework, with subject responses modelled using the Bernoulli distribution, and the corresponding Beta distribution for Bayesian updating.

...read moreread less

Posted Content

Structural Properties of Bayesian Bandits with Exponential Family Distributions

Yaming Yu

- 16 Mar 2011 -

arXiv: Statistics Theory

TL;DR: Two structural results hold in broad generality: for a fixed prior weight, an arm becomes more desirable as its prior mean increases; and the less one knows about an arm, the more desirable it becomes because there remains more information to be gained when selecting that arm.

...read moreread less

Collapse

A Bernoulli Two-armed Bandit

Citations

Information bounds, certainty equivalence and learning in asymptotically efficient adaptive control of time-invariant stochastic systems

Optimal index rules for single resource allocation to stochastic dynamic competitors

Small-sample performance of Bernoulli two-armed bandit Bayesian strategies

The Finite-Horizon Two-Armed Bandit Problem with Binary Responses:A Multidisciplinary Survey of the History, State of the Art, and Myths

Structural Properties of Bayesian Bandits with Exponential Family Distributions

Related Papers (5)

Some aspects of the sequential design of experiments

Bandit Processes and Dynamic Allocation Indices

Bandit problems: Sequential Allocation of Experiments

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples

Multi-Armed Bandits and the Gittins Index