Counterfactual reasoning and learning systems: the example of computational advertising
Citations
1,569 citations
950 citations
Cites background from "Counterfactual reasoning and learni..."
...…offline data has included studies on optimizing newspaper article click-through-rates (Strehl et al., 2010; Garcin et al., 2014), advertisement ranking on search pages (Bottou et al., 2013), and personalized ad recommendation for digital marketing (Theocharous et al., 2015; Thomas et al., 2017)....
[...]
..., 2014), advertisement ranking on search pages (Bottou et al., 2013), and personalized ad recommendation for digital marketing (Theocharous et al....
[...]
740 citations
601 citations
579 citations
Cites background from "Counterfactual reasoning and learni..."
...Examples include recommendation systems and other consumer products (Bottou et al., 2013; Hashimoto et al., 2018); dialogue agents (Li et al., 2017b); molecular compound optimization (Cuccarese et al., 2020; Reker, 2020); decision systems (Liu et al., 2018; D’Amour et al., 2020b); and adversarial…...
[...]
References
37,989 citations
"Counterfactual reasoning and learni..." refers background or methods in this paper
...Modern reinforcement learning algorithms (see Sutton and Barto, 1998) leverage the assumption that the policy function, the reward function, the transition function, and the distributions of the corresponding noise variables, are independent from time....
[...]
...• Both multi-armed bandit and contextual bandit are special case of reinforcement learning (Sutton and Barto, 1998)....
[...]
...In particular, the work presented in this section is closely related to the Monte-Carlo approach of reinforcement learning (Sutton and Barto, 1998, Chapter 5) and to the offline evaluation of contextual bandit policies (Li et al., 2010, 2011)....
[...]
...Keywords: causation, counterfactual reasoning, computational advertising...
[...]
...Under simplified assumptions, multiarmed bandits theory (Robbins, 1952; Auer et al., 2002; Langford and Zhang, 2008) and reinforcement learning (Sutton and Barto, 1998) describe the exploration/exploitation dilemma associated with the training feedback loop....
[...]
12,606 citations
"Counterfactual reasoning and learni..." refers background in this paper
...The distribution readily factorizes as the product of the joint probability of the named exogenous variables, and, for each equation in the structural equation model, the conditional probability of the effect given its direct causes (Spirtes et al., 1993; Pearl, 2000)....
[...]
7,930 citations
7,016 citations
"Counterfactual reasoning and learni..." refers background or methods in this paper
...Under simplified assumptions, multiarmed bandits theory (Robbins, 1952; Auer et al., 2002; Langford and Zhang, 2008) and reinforcement learning (Sutton and Barto, 1998) describe the exploration/exploitation dilemma associated with the training feedback loop....
[...]
...• Both multi-armed bandit and contextual bandit are special case of reinforcement learning (Sutton and Barto, 1998)....
[...]
...In particular, the work presented in this section is closely related to the Monte-Carlo approach of reinforcement learning (Sutton and Barto, 1998, Chapter 5) and to the offline evaluation of contextual bandit policies (Li et al., 2010, 2011)....
[...]
...Modern reinforcement learning algorithms (see Sutton and Barto, 1998) leverage the assumption that the policy function, the reward function, the transition function, and the distributions of the corresponding noise variables, are independent from time....
[...]