Counterfactual reasoning and learning systems: the example of computational advertising

Open AccessJournal Article

Counterfactual reasoning and learning systems: the example of computational advertising

Léon Bottou, +8 more

- 01 Jan 2013 -

Journal of Machine Learning Research

- Vol. 14, Iss: 1, pp 3207-3260

Chats0

TLDR

This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system and allow both humans and algorithms to select the changes that would have improved the system performance.

Abstract:

This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select the changes that would have improved the system performance. This work is illustrated by experiments on the ad placement system associated with the Bing search engine.

Citations

PDF

Open Access

More filters

Proceedings Article

Doubly robust off-policy value evaluation for reinforcement learning

Nan Jiang, +1 more

TL;DR: This work extends the doubly robust estimator for bandits to sequential decision-making problems, which gets the best of both worlds: it is guaranteed to be unbiased and can have a much lower variance than the popular importance sampling estimators.

...read moreread less

Journal ArticleDOI

Doubly robust policy evaluation and optimization

Miroslav Dudík, +3 more

- 01 Nov 2014 -

Statistical Science

TL;DR: It is proved that the doubly robust estimation method uniformly improves over existing techniques, achieving both lower variance in value estimation and better policies, and is expected to become common practice in policy evaluation and optimization.

...read moreread less

Posted Content

An Optimistic Perspective on Offline Reinforcement Learning

Rishabh Agarwal, +2 more

- 10 Jul 2019 -

arXiv: Learning

TL;DR: It is demonstrated that recent off-policy deep RL algorithms, even when trained solely on this replay dataset, outperform the fully trained DQN agent and Random Ensemble Mixture (REM), a robust Q-learning algorithm that enforces optimal Bellman consistency on random convex combinations of multiple Q-value estimates is presented.

...read moreread less

Posted Content

Troubling Trends in Machine Learning Scholarship

Zachary C. Lipton, +1 more

- 09 Jul 2018 -

arXiv: Machine Learning

TL;DR: The authors focus on the following four patterns that appear to be trending in ML scholarship: failure to distinguish between explanation and speculation; failure to identify the sources of empirical gains; and misuse of language, e.g., by choosing terms of art with colloquial connotations or by overloading established technical terms.

...read moreread less

Posted Content

A Tour of Reinforcement Learning: The View from Continuous Control

Benjamin Recht

- 25 Jun 2018 -

arXiv: Optimization and Control

TL;DR: This article surveys reinforcement learning from the perspective of optimization and control, with a focus on continuous control applications.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

MonographDOI

Causality: models, reasoning, and inference

Judea Pearl

- 14 Sep 2009 -

Tijdschrift Voor Filosofie

TL;DR: The art and science of cause and effect have been studied in the social sciences for a long time as mentioned in this paper, see, e.g., the theory of inferred causation, causal diagrams and the identification of causal effects.

...read moreread less

Journal ArticleDOI

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Ronald J. Williams

- 01 May 1992 -

Machine Learning

TL;DR: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.

...read moreread less

Book

Course of Theoretical Physics

Lev Davidovich Landau, +1 more

Book

Introduction to Reinforcement Learning

Richard S. Sutton, +1 more

TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.

...read moreread less

Collapse

Counterfactual reasoning and learning systems: the example of computational advertising

Citations

Doubly robust off-policy value evaluation for reinforcement learning

Doubly robust policy evaluation and optimization

An Optimistic Perspective on Offline Reinforcement Learning

Troubling Trends in Machine Learning Scholarship

A Tour of Reinforcement Learning: The View from Continuous Control

References

Reinforcement Learning: An Introduction

Causality: models, reasoning, and inference

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Course of Theoretical Physics

Introduction to Reinforcement Learning

Related Papers (5)

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

The self-normalized estimator for counterfactual learning

The central role of the propensity score in observational studies for causal effects

A contextual-bandit approach to personalized news article recommendation

Eligibility Traces for Off-Policy Policy Evaluation