scispace - formally typeset
Open AccessJournal ArticleDOI

A Bernoulli Two-armed Bandit

Donald A. Berry
- 01 Jun 1972 - 
- Vol. 43, Iss: 3, pp 871-897
TLDR
In this article, a Bernoulli process with unknown expectations is selected and observed at each of n$ stages, and the objective is to maximize the expected number of successes from the n$ selections.
Abstract
One of two independent Bernoulli processes (arms) with unknown expectations $\rho$ and $\lambda$ is selected and observed at each of $n$ stages. The selection problem is sequential in that the process which is selected at a particular stage is a function of the results of previous selections as well as of prior information about $\rho$ and $\lambda$. The variables $\rho$ and $\lambda$ are assumed to be independent under the (prior) probability distribution. The objective is to maximize the expected number of successes from the $n$ selections. Sufficient conditions for the optimality of selecting one or the other of the arms are given and illustrated for example distributions. The stay-on-a-winner rule is proved.

read more

Citations
More filters
Proceedings ArticleDOI

Optimal index rules for single resource allocation to stochastic dynamic competitors

TL;DR: This paper shows that the generic Markov decision process model of optimal single resource allocation to a collection of stochastic dynamic competitors is equivalent to solving a time sequence of its Lagrangian relaxations, and gives insights on sufficient conditions for optimality of index rules in restless problems.
Journal ArticleDOI

Small-sample performance of Bernoulli two-armed bandit Bayesian strategies

TL;DR: In this paper, the authors examine the small-sample performance of a number of strategies for Bernoulli two-armed bandit problems with independent arms, and show that the myopic strategy and the strategy based on the one-armed threshold value dominate the Bayesian optimal strategy over a region in the parameter space that can have large probability under the assumed prior.
Journal Article

The Finite-Horizon Two-Armed Bandit Problem with Binary Responses:A Multidisciplinary Survey of the History, State of the Art, and Myths

TL;DR: This paper considers the two-armed bandit problem, and presents a unified model cast in the Markov decision process framework, with subject responses modelled using the Bernoulli distribution, and the corresponding Beta distribution for Bayesian updating.
Posted Content

Structural Properties of Bayesian Bandits with Exponential Family Distributions

Yaming Yu
TL;DR: Two structural results hold in broad generality: for a fixed prior weight, an arm becomes more desirable as its prior mean increases; and the less one knows about an arm, the more desirable it becomes because there remains more information to be gained when selecting that arm.