A Decision Theoretic Approach to A/B Testing

Open AccessPosted Content

A Decision Theoretic Approach to A/B Testing

- 10 Oct 2017 -

TLDR

The results suggest that the 0.05 p-value threshold may be too conservative in some settings, but that its widespread use may reflect an ad-hoc means of controlling multiplicity in the common case of repeatedly testing variants of an experiment when the threshold is not reached.

Abstract:

A/B testing is ubiquitous within the machine learning and data science operations of internet companies. Generically, the idea is to perform a statistical test of the hypothesis that a new feature is better than the existing platform---for example, it results in higher revenue. If the p value for the test is below some pre-defined threshold---often, 0.05---the new feature is implemented. The difficulty of choosing an appropriate threshold has been noted before, particularly because dependent tests are often done sequentially, leading some to propose control of the false discovery rate (FDR) rather than use of a single, universal threshold. However, it is still necessary to make an arbitrary choice of the level at which to control FDR. Here we suggest a decision-theoretic approach to determining whether to adopt a new feature, which enables automated selection of an appropriate threshold. Our method has the basic ingredients of any decision-theory problem: a loss function, action space, and a notion of optimality, for which we choose Bayes risk. However, the loss function and the action space differ from the typical choices made in the literature, which has focused on the theory of point estimation. We give some basic results for Bayes-optimal thresholding rules for the feature adoption decision, and give some examples using eBay data. The results suggest that the 0.05 p-value threshold may be too conservative in some settings, but that its widespread use may reflect an ad-hoc means of controlling multiplicity in the common case of repeatedly testing variants of an experiment when the threshold is not reached.

A Decision Theoretic Approach to A/B Testing

Citations

A/B Testing with Fat Tails

Empirical Bayes Estimation of Treatment Effects with Many A/B Tests: An Overview

Empirical Bayes for Large-scale Randomized Experiments: a Spectral Approach

On Post-Selection Inference in A/B Tests

Optimal Testing in the Experiment-rich Regime

References

Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)

Controlled experiments on the web: survey and practical guide

A modern Bayesian look at the multi-armed bandit

Online controlled experiments at large scale

Practical guide to controlled experiments on the web: listen to your customers not to the hippo

Related Papers (5)

Algorithms and applications for the same-decision probability

Fuzzy specification in software engineering

Second-order quantile methods for experts and combinatorial games

A Theorem of the Alternative for Personalized Federated Learning.

Early classification of time series as a non myopic sequential decision making problem