scispace - formally typeset
Open AccessJournal ArticleDOI

A Bernoulli Two-armed Bandit

Donald A. Berry
- 01 Jun 1972 - 
- Vol. 43, Iss: 3, pp 871-897
TLDR
In this article, a Bernoulli process with unknown expectations is selected and observed at each of n$ stages, and the objective is to maximize the expected number of successes from the n$ selections.
Abstract
One of two independent Bernoulli processes (arms) with unknown expectations $\rho$ and $\lambda$ is selected and observed at each of $n$ stages. The selection problem is sequential in that the process which is selected at a particular stage is a function of the results of previous selections as well as of prior information about $\rho$ and $\lambda$. The variables $\rho$ and $\lambda$ are assumed to be independent under the (prior) probability distribution. The objective is to maximize the expected number of successes from the $n$ selections. Sufficient conditions for the optimality of selecting one or the other of the arms are given and illustrated for example distributions. The stay-on-a-winner rule is proved.

read more

Citations
More filters
Journal ArticleDOI

A note on structural properties of the Bernoulli two-armed bandit problem

TL;DR: In this article, a certain monotonicity property is proved for the optimal expected cumulative discounted reward associated with a dynamic programming model with finite horizon, describing the Bernoulli: two-armed bandit problem.
Journal ArticleDOI

A Uniform Two-armed Bandit Problem--The Parameter of one Distribution is Known

TL;DR: The problem is asked to find the selection procedure which maximizes the expected value of the sum of the n observations, that is, which experiment should the authors perform sequentially at each stage !
Journal ArticleDOI

A Note on Discounted Future Two-Armed Bandits

TL;DR: In this article, the problem of finding Bayes sequential designs for successively choosing between two given Bernoulli variables so as to maximize the total discounted expected sum is addressed, assuming simple hypotheses concerning the success probabilities and dynamic programming methods are used to characterize optimal designs.
Journal ArticleDOI

Bernoulli Two-Armed Bandits with Geometric Termination

TL;DR: In this paper, the standard Bernoulli two-armed bandit model is modified by terminating the choice problem after the first unsuccessful trial, and both terminal reward situations and instances in which payoffs accrue with each success are considered.