Counterfactual reasoning and learning systems: the example of computational advertising

Home
/
Papers
/
Counterfactual reasoning and learning systems: the example of computational advertising

Journal Article•

Counterfactual reasoning and learning systems: the example of computational advertising

Léon Bottou¹, Jonas Peters², Joaquin Quiñonero-Candela¹, Denis X. Charles¹, D. Max Chickering¹, Elon Portugaly¹, Dipankar Ray¹, Patrice Y. Simard¹, Ed Snelson¹ - Show less +5 more•Institutions (2)

Microsoft¹, Max Planck Society²

01 Jan 2013-Journal of Machine Learning Research (JMLR.org)-Vol. 14, Iss: 1, pp 3207-3260

TL;DR: This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system and allow both humans and algorithms to select the changes that would have improved the system performance.

read less

Abstract: This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select the changes that would have improved the system performance. This work is illustrated by experiments on the ad placement system associated with the Bing search engine.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Concrete Problems in AI Safety

[...]

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul F. Christiano, John Schulman, Dan Mané - Show less +2 more

21 Jun 2016-arXiv: Artificial Intelligence

TL;DR: A list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function, an objective function that is too expensive to evaluate frequently, or undesirable behavior during the learning process, are presented.

...read moreread less

Abstract: Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function ("avoiding side effects" and "avoiding reward hacking"), an objective function that is too expensive to evaluate frequently ("scalable supervision"), or undesirable behavior during the learning process ("safe exploration" and "distributional shift"). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.

...read moreread less

1,569 citations

Posted Content•

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

[...]

Sergey Levine, Aviral Kumar, George Tucker, Justin Fu

04 May 2020-arXiv: Learning

TL;DR: This tutorial article aims to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcementlearning algorithms that utilize previously collected data, without additional online data collection.

...read moreread less

Abstract: In this tutorial article, we aim to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcement learning algorithms that utilize previously collected data, without additional online data collection. Offline reinforcement learning algorithms hold tremendous promise for making it possible to turn large datasets into powerful decision making engines. Effective offline reinforcement learning methods would be able to extract policies with the maximum possible utility out of the available data, thereby allowing automation of a wide range of decision-making domains, from healthcare and education to robotics. However, the limitations of current algorithms make this difficult. We will aim to provide the reader with an understanding of these challenges, particularly in the context of modern deep reinforcement learning methods, and describe some potential solutions that have been explored in recent work to mitigate these challenges, along with recent applications, and a discussion of perspectives on open problems in the field.

...read moreread less

950 citations

Cites background from "Counterfactual reasoning and learni..."

...…offline data has included studies on optimizing newspaper article click-through-rates (Strehl et al., 2010; Garcin et al., 2014), advertisement ranking on search pages (Bottou et al., 2013), and personalized ad recommendation for digital marketing (Theocharous et al., 2015; Thomas et al., 2017)....
[...]
..., 2014), advertisement ranking on search pages (Bottou et al., 2013), and personalized ad recommendation for digital marketing (Theocharous et al....
[...]

Proceedings Article•

Hidden technical debt in Machine learning systems

[...]

D. Sculley¹, Gary Holt¹, Daniel Golovin¹, Eugene Davydov¹, Todd Phillips¹, Dietmar Ebner¹, Vinay Chaudhary¹, Michael Young¹, Jean-Francois Crespo¹, Dan Dennison¹ - Show less +6 more•Institutions (1)

Google¹

07 Dec 2015

TL;DR: It is found it is common to incur massive ongoing maintenance costs in real-world ML systems, and several ML-specific risk factors to account for in system design are explored.

...read moreread less

Abstract: Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems. We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.

...read moreread less

740 citations

Journal Article•DOI•

Toward Causal Representation Learning

[...]

Bernhard Schölkopf¹, Francesco Locatello¹, Stefan Bauer¹, Nan Rosemary Ke, Nal Kalchbrenner², Anirudh Goyal, Yoshua Bengio - Show less +3 more•Institutions (2)

Max Planck Society¹, Google²

26 Feb 2021

TL;DR: The authors reviewed fundamental concepts of causal inference and related them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research.

...read moreread less

Abstract: The two fields of machine learning and graphical causality arose and are developed separately. However, there is, now, cross-pollination and increasing interest in both fields to benefit from the advances of the other. In this article, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research. This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, that is, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities.

...read moreread less

601 citations

Posted Content•

WILDS: A Benchmark of in-the-Wild Distribution Shifts

[...]

Pang Wei Koh¹, Shiori Sagawa¹, Henrik Marklund¹, Sang Michael Xie², Marvin Zhang¹, Akshay Balsubramani¹, Weihua Hu¹, Michihiro Yasunaga³, Richard Lanas Phillips¹, Irena Gao¹, Tony Lee¹, Etienne David⁴, Ian Stavness⁵, Wei Guo⁵, Berton A. Earnshaw, Imran S. Haque⁶, Sara Beery¹, Jure Leskovec¹, Anshul Kundaje⁷, Emma Pierson², Sergey Levine¹, Chelsea Finn¹, Percy Liang¹ - Show less +19 more•Institutions (7)

Stanford University¹, University of California, Berkeley², Cornell University³, University of Saskatchewan⁴, University of Tokyo⁵, California Institute of Technology⁶, Microsoft⁷

14 Dec 2020-arXiv: Learning

TL;DR: WILDS is presented, a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, and is hoped to encourage the development of general-purpose methods that are anchored to real-world distribution shifts and that work well across different applications and problem settings.

...read moreread less

Abstract: Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity, these real-world distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts which naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. On each dataset, we show that standard training results in substantially lower out-of-distribution than in-distribution performance, and that this gap remains even with models trained by existing methods for handling distribution shifts. This underscores the need for new training methods that produce models which are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open-source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. Code and leaderboards are available at this https URL.

...read moreread less

579 citations

Cites background from "Counterfactual reasoning and learni..."

...Examples include recommendation systems and other consumer products (Bottou et al., 2013; Hashimoto et al., 2018); dialogue agents (Li et al., 2017b); molecular compound optimization (Cuccarese et al., 2020; Reker, 2020); decision systems (Liu et al., 2018; D’Amour et al., 2020b); and adversarial…...
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103

Collapse

References

PDF

Open Access

More filters

Book•

Reinforcement Learning: An Introduction

[...]

Richard S. Sutton¹, Andrew G. Barto•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

...read moreread less

37,989 citations

"Counterfactual reasoning and learni..." refers background or methods in this paper

...Modern reinforcement learning algorithms (see Sutton and Barto, 1998) leverage the assumption that the policy function, the reward function, the transition function, and the distributions of the corresponding noise variables, are independent from time....
[...]
...• Both multi-armed bandit and contextual bandit are special case of reinforcement learning (Sutton and Barto, 1998)....
[...]
...In particular, the work presented in this section is closely related to the Monte-Carlo approach of reinforcement learning (Sutton and Barto, 1998, Chapter 5) and to the offline evaluation of contextual bandit policies (Li et al., 2010, 2011)....
[...]
...Keywords: causation, counterfactual reasoning, computational advertising...
[...]
...Under simplified assumptions, multiarmed bandits theory (Robbins, 1952; Auer et al., 2002; Langford and Zhang, 2008) and reinforcement learning (Sutton and Barto, 1998) describe the exploration/exploitation dilemma associated with the training feedback loop....
[...]

Monograph•DOI•

Causality: models, reasoning, and inference

[...]

Judea Pearl¹•Institutions (1)

University of California, Los Angeles¹

14 Sep 2009-Tijdschrift Voor Filosofie

TL;DR: The art and science of cause and effect have been studied in the social sciences for a long time as mentioned in this paper, see, e.g., the theory of inferred causation, causal diagrams and the identification of causal effects.

...read moreread less

Abstract: 1. Introduction to probabilities, graphs, and causal models 2. A theory of inferred causation 3. Causal diagrams and the identification of causal effects 4. Actions, plans, and direct effects 5. Causality and structural models in the social sciences 6. Simpson's paradox, confounding, and collapsibility 7. Structural and counterfactual models 8. Imperfect experiments: bounds and counterfactuals 9. Probability of causation: interpretation and identification Epilogue: the art and science of cause and effect.

...read moreread less

12,606 citations

"Counterfactual reasoning and learni..." refers background in this paper

...The distribution readily factorizes as the product of the joint probability of the named exogenous variables, and, for each equation in the structural equation model, the conditional probability of the effect given its direct causes (Spirtes et al., 1993; Pearl, 2000)....
[...]

Journal Article•DOI•

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

[...]

Ronald J. Williams¹•Institutions (1)

Northeastern University¹

01 May 1992-Machine Learning

TL;DR: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.

...read moreread less

Abstract: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.

...read moreread less

7,930 citations

Book•

Course of Theoretical Physics

[...]

Lev Davidovich Landau, E.M. Lifshitz

24 Dec 2013

7,127 citations

Book•

Introduction to Reinforcement Learning

[...]

Richard S. Sutton, Andrew G. Barto

01 Mar 1998

TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.

...read moreread less

Abstract: From the Publisher: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability.

...read moreread less

7,016 citations

"Counterfactual reasoning and learni..." refers background or methods in this paper

...Under simplified assumptions, multiarmed bandits theory (Robbins, 1952; Auer et al., 2002; Langford and Zhang, 2008) and reinforcement learning (Sutton and Barto, 1998) describe the exploration/exploitation dilemma associated with the training feedback loop....
[...]
...• Both multi-armed bandit and contextual bandit are special case of reinforcement learning (Sutton and Barto, 1998)....
[...]
...In particular, the work presented in this section is closely related to the Monte-Carlo approach of reinforcement learning (Sutton and Barto, 1998, Chapter 5) and to the offline evaluation of contextual bandit policies (Li et al., 2010, 2011)....
[...]
...Modern reinforcement learning algorithms (see Sutton and Barto, 1998) leverage the assumption that the policy function, the reward function, the transition function, and the distributions of the corresponding noise variables, are independent from time....
[...]