scispace - formally typeset
Search or ask a question

Showing papers by "Susan Athey published in 2019"


Journal ArticleDOI
TL;DR: A flexible, computationally efficient algorithm for growing generalized random forests, an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest, and an estimator for their asymptotic variance that enables valid confidence intervals are proposed.
Abstract: We propose generalized random forests, a method for nonparametric statistical estimation based on random forests (Breiman [Mach. Learn. 45 (2001) 5–32]) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: nonparametric quantile regression, conditional average partial effect estimation and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.

840 citations


Journal ArticleDOI
TL;DR: The authors discuss the relevance of the recent machine learning literature for economics and econometrics, and discuss the differences in goals, methods, and settings between the ML literature and economics.
Abstract: We discuss the relevance of the recent machine learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods, and settings between the ML literature an...

273 citations


Posted Content
TL;DR: The authors apply causal forests to a dataset derived from the National Study of Learning Mindsets, and consider resulting practical and conceptual challenges, and discuss how causal forests use estimated propensity scores to be more robust to confounding and how they handle data with clustered errors.
Abstract: We apply causal forests to a dataset derived from the National Study of Learning Mindsets, and consider resulting practical and conceptual challenges. In particular, we discuss how causal forests use estimated propensity scores to be more robust to confounding, and how they handle data with clustered errors.

165 citations


Posted Content
TL;DR: Newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, are highlighted.
Abstract: We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.

164 citations


Journal ArticleDOI
01 Jan 2019
TL;DR: The authors apply causal forests to a dataset derived from the National Study of Learning Mindsets, and consider resulting practical and conceptual challenges, and discuss how causal forests use estimated propensity scores to be more robust to confounding and how they handle data with clustered errors.
Abstract: We apply causal forests to a dataset derived from the National Study of Learning Mindsets, and consider resulting practical and conceptual challenges. In particular, we discuss how causal forests use estimated propensity scores to be more robust to confounding, and how they handle data with clustered errors.

88 citations


ReportDOI
TL;DR: In this article, the long-term impacts of programs on labor market outcomes can be predicted accurately by combining their short-term treatment effects into a surrogate index, which is the predicted value of the longterm outcome given the shortterm outcomes.
Abstract: A common challenge in estimating the long-term impacts of treatments (e.g., job training programs) is that the outcomes of interest (e.g., lifetime earnings) are observed with a long delay. We address this problem by combining several short-term outcomes (e.g., short-run earnings) into a “surrogate index,” the predicted value of the long-term outcome given the short-term outcomes. We show that the average treatment effect on the surrogate index equals the treatment effect on the long-term outcome under the assumption that the long-term outcome is independent of the treatment conditional on the surrogate index. We then characterize the bias that arises from violations of this assumption, deriving feasible bounds on the degree of bias and providing simple methods to validate the key assumption using additional outcomes. Finally, we develop efficient estimators for the surrogate index and show that even in settings where the long-term outcome is observed, using a surrogate index can increase precision. We apply our method to analyze the long-term impacts of a multi-site job training experiment in California. Using short-term employment rates as surrogates, one could have estimated the program's impacts on mean employment rates over a 9 year horizon within 1.5 years, with a 35% reduction in standard errors. Our empirical results suggest that the long-term impacts of programs on labor market outcomes can be predicted accurately by combining their short-term treatment effects into a surrogate index.

75 citations


Posted Content
Vitor Hadad1, David A. Hirshberg1, Ruohan Zhan1, Stefan Wager1, Susan Athey1 
TL;DR: The approach is to adaptively reweight the terms of an augmented inverse propensity-weighting estimator to control the contribution of each term to the estimator’s variance, which reduces overall variance and yields an asymptotically normal test statistic.
Abstract: Adaptive experiment designs can dramatically improve statistical efficiency in randomized trials, but they also complicate statistical inference. For example, it is now well known that the sample mean is biased in adaptive trials. Inferential challenges are exacerbated when our parameter of interest differs from the parameter the trial was designed to target, such as when we are interested in estimating the value of a sub-optimal treatment after running a trial to determine the optimal treatment using a stochastic bandit design. In this context, typical estimators that use inverse propensity weighting to eliminate sampling bias can be problematic: their distributions become skewed and heavy-tailed as the propensity scores decay to zero. In this paper, we present a class of estimators that overcome these issues. Our approach is to adaptively reweight the terms of an augmented inverse propensity weighting estimator to control the contribution of each term to the estimator's variance. This adaptive weighting scheme prevents estimates from becoming heavy-tailed, ensuring asymptotically correct coverage. It also reduces variance, allowing us to test hypotheses with greater power -- especially hypotheses that were not targeted by the experimental design. We validate the accuracy of the resulting estimates and their confidence intervals in numerical experiments, and show our methods compare favorably to existing alternatives in terms of RMSE and coverage.

66 citations


Journal ArticleDOI
TL;DR: In this paper, the authors explore the skills that PhD economists apply in tech companies, the companies that hire them, the types of problems that economists are currently working on, and the areas of academic research that have emerged in relation to these problems.
Abstract: As technology platforms have created new markets and new ways of acquiring information, economists have come to play an increasingly central role in tech companies-tackling problems such as platform design, strategy, pricing, and policy. Over the past five years, hundreds of PhD economists have accepted positions in the technology sector. In this paper, we explore the skills that PhD economists apply in tech companies, the companies that hire them, the types of problems that economists are currently working on, and the areas of academic research that have emerged in relation to these problems.

57 citations


Posted Content
TL;DR: In this paper, the authors apply Wasserstein GANs to compare a number of different estimators for average treatment effects under unconfoundedness in three distinct settings (corresponding to three real data sets) and present a methodology for assessing the robustness of the results.
Abstract: When researchers develop new econometric methods it is common practice to compare the performance of the new methods to those of existing methods in Monte Carlo studies. The credibility of such Monte Carlo studies is often limited because of the freedom the researcher has in choosing the design. In recent years a new class of generative models emerged in the machine learning literature, termed Generative Adversarial Networks (GANs) that can be used to systematically generate artificial data that closely mimics real economic datasets, while limiting the degrees of freedom for the researcher and optionally satisfying privacy guarantees with respect to their training data. In addition if an applied researcher is concerned with the performance of a particular statistical method on a specific data set (beyond its theoretical properties in large samples), she may wish to assess the performance, e.g., the coverage rate of confidence intervals or the bias of the estimator, using simulated data which resembles her setting. Tol illustrate these methods we apply Wasserstein GANs (WGANs) to compare a number of different estimators for average treatment effects under unconfoundedness in three distinct settings (corresponding to three real data sets) and present a methodology for assessing the robustness of the results. In this example, we find that (i) there is not one estimator that outperforms the others in all three settings, so researchers should tailor their analytic approach to a given setting, and (ii) systematic simulation studies can be helpful for selecting among competing methods in this situation.

33 citations


Journal ArticleDOI
TL;DR: In this article, the authors apply these ideas to synthetic control type problems in panel data and compare the predictive accuracy of three methods with an ensemble method and find that the latter dominates.
Abstract: In many prediction problems researchers have found that combinations of prediction methods ("ensembles") perform better than individual methods. In this paper we apply these ideas to synthetic control type problems in panel data. Here a number of conceptually quite different methods have been developed. We compare the predictive accuracy of three methods with an ensemble method and find that the latter dominates. These results show that ensemble methods are a practical and effective method for the type of data configurations typically encountered in empirical work in economics, and that these methods deserve more attention.

29 citations


Posted Content
TL;DR: In this article, the Synthetic Difference In Differences (SDID) estimator is proposed to estimate the difference in difference between two fixed effects, i.e., time fixed effects and unit weights.
Abstract: We present a new perspective on the Synthetic Control (SC) method as a weighted least squares regression estimator with time fixed effects and unit weights. This perspective suggests a generalization with two way (both unit and time) fixed effects, and both unit and time weights, which can be interpreted as a unit and time weighted version of the standard Difference In Differences (DID) estimator. We find that this new Synthetic Difference In Differences (SDID) estimator has attractive properties compared to the SC and DID estimators. Formally we show that our approach has double robustness properties: the SDID estimator is consistent under a wide variety of weighting schemes given a well-specified fixed effects model, and SDID is consistent with appropriately penalized SC weights when the basic fixed effects model is misspecified and instead the true data generating process involves a more general low-rank structure (e.g., a latent factor model). We also present results that justify standard inference based on weighted DID regression. Further generalizations include unit and time weighted factor models.

Journal ArticleDOI
17 Jul 2019
TL;DR: The authors propose to integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias and provide the first regret bound analyses for linear contextual bandits with balancing.
Abstract: Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for linear contextual bandits with balancing and show that our algorithms match the state of the art theoretical guarantees. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model misspecification and prejudice in the initial training data.

Posted Content
TL;DR: There is not one estimator that outperforms the others in all three settings, so researchers should tailor their analytic approach to a given setting, and systematic simulation studies can be helpful for selecting among competing methods in this situation.
Abstract: When researchers develop new econometric methods it is common practice to compare the performance of the new methods to those of existing methods in Monte Carlo studies. The credibility of such Monte Carlo studies is often limited because of the freedom the researcher has in choosing the design. In recent years a new class of generative models emerged in the machine learning literature, termed Generative Adversarial Networks (GANs) that can be used to systematically generate artificial data that closely mimics real economic datasets, while limiting the degrees of freedom for the researcher and optionally satisfying privacy guarantees with respect to their training data. In addition if an applied researcher is concerned with the performance of a particular statistical method on a specific data set (beyond its theoretical properties in large samples), she may wish to assess the performance, e.g., the coverage rate of confidence intervals or the bias of the estimator, using simulated data which resembles her setting. Tol illustrate these methods we apply Wasserstein GANs (WGANs) to compare a number of different estimators for average treatment effects under unconfoundedness in three distinct settings (corresponding to three real data sets) and present a methodology for assessing the robustness of the results. In this example, we find that (i) there is not one estimator that outperforms the others in all three settings, so researchers should tailor their analytic approach to a given setting, and (ii) systematic simulation studies can be helpful for selecting among competing methods in this situation.

Posted Content
TL;DR: A method for estimating consumer preferences among discrete choices, where the consumer chooses at most one product in a category, but selects from multiple categories in parallel, which improves over traditional modeling approaches that consider each category in isolation.
Abstract: This paper proposes a method for estimating consumer preferences among discrete choices, where the consumer chooses at most one product in a category, but selects from multiple categories in parallel. The consumer's utility is additive in the different categories. Her preferences about product attributes as well as her price sensitivity vary across products and are in general correlated across products. We build on techniques from the machine learning literature on probabilistic models of matrix factorization, extending the methods to account for time-varying product attributes and products going out of stock. We evaluate the performance of the model using held-out data from weeks with price changes or out of stock products. We show that our model improves over traditional modeling approaches that consider each category in isolation. One source of the improvement is the ability of the model to accurately estimate heterogeneity in preferences (by pooling information across categories); another source of improvement is its ability to estimate the preferences of consumers who have rarely or never made a purchase in a given category in the training data. Using held-out data, we show that our model can accurately distinguish which consumers are most price sensitive to a given product. We consider counterfactuals such as personally targeted price discounts, showing that using a richer model such as the one we propose substantially increases the benefits of personalization in discounts.

Posted Content
TL;DR: In this article, the authors proposed an analytically feasible solution to the optimal treatment design problem where the variance of the treatment effect estimator is at most 1+O(1/N^2) times the variance using the treatment design, where N is the number of units.
Abstract: Experimentation has become an increasingly prevalent tool for guiding decision-making and policy choices. A common hurdle in designing experiments is the lack of statistical power. In this paper, we study the optimal multi-period experimental design under the constraint that the treatment cannot be easily removed once implemented; for example, a government might implement a public health intervention in different geographies at different times, where the treatment cannot be easily removed due to practical constraints. The treatment design problem is to select which geographies (referred by units) to treat at which time, intending to test hypotheses about the effect of the treatment. When the potential outcome is a linear function of unit and time effects, and discrete observed/latent covariates, we provide an analytically feasible solution to the optimal treatment design problem where the variance of the treatment effect estimator is at most 1+O(1/N^2) times the variance using the optimal treatment design, where N is the number of units. This solution assigns units in a staggered treatment adoption pattern - if the treatment only affects one period, the optimal fraction of treated units in each period increases linearly in time; if the treatment affects multiple periods, the optimal fraction increases non-linearly in time, smaller at the beginning and larger at the end. In the general setting where outcomes depend on latent covariates, we show that historical data can be utilized in designing experiments. We propose a data-driven local search algorithm to assign units to treatment times. We demonstrate that our approach improves upon benchmark experimental designs via synthetic interventions on the influenza occurrence rate and synthetic experiments on interventions for in-home medical services and grocery expenditure.

Posted Content
TL;DR: This paper focuses on combining predictions based on each of the separate models using ensemble methods, and focuses on a weighted average of the three individual methods, with non-negative weights determined through out-of-sample cross-validation.
Abstract: This paper studies a panel data setting where the goal is to estimate causal effects of an intervention by predicting the counterfactual values of outcomes for treated units, had they not received the treatment. Several approaches have been proposed for this problem, including regression methods, synthetic control methods and matrix completion methods. This paper considers an ensemble approach, and shows that it performs better than any of the individual methods in several economic datasets. Matrix completion methods are often given the most weight by the ensemble, but this clearly depends on the setting. We argue that ensemble methods present a fruitful direction for further research in the causal panel data setting.

Journal ArticleDOI
TL;DR: This paper proposes a data-driven local search algorithm to assign units to treatment times and demonstrates that this approach improves upon benchmark experimental designs via synthetic interventions on the influenza occurrence rate and synthetic experiments on interventions for in-home medical services and grocery expenditure.
Abstract: Experimentation has become an increasingly prevalent tool for guiding decision-making and policy choices. A common hurdle in designing experiments is the lack of statistical power. In this paper, we study the optimal multi-period experimental design under the constraint that the treatment cannot be easily removed once implemented; for example, a government might implement a public health intervention in different geographies at different times, where the treatment cannot be easily removed due to practical constraints. The treatment design problem is to select which geographies (referred by units) to treat at which time, intending to test hypotheses about the effect of the treatment. When the potential outcome is a linear function of unit and time effects, and discrete observed/latent covariates, we provide an analytically feasible solution to the optimal treatment design problem where the variance of the treatment effect estimator is at most 1+O(1/N^2) times the variance using the optimal treatment design, where N is the number of units. This solution assigns units in a staggered treatment adoption pattern -- if the treatment only affects one period, the optimal fraction of treated units in each period increases linearly in time; if the treatment affects multiple periods, the optimal fraction increases non-linearly in time, smaller at the beginning and larger at the end. In the general setting where outcomes depend on latent covariates, we show that historical data can be utilized in designing experiments. We propose a data-driven local search algorithm to assign units to treatment times. We demonstrate that our approach improves upon benchmark experimental designs via synthetic interventions on the influenza occurrence rate and synthetic experiments on interventions for in-home medical services and grocery expenditure.

Posted Content
TL;DR: This paper investigates simple alternative solutions for universally consistent estimators that rely on lower-dimensional real-valued representations of categorical variables that are "sufficient" in the sense that no predictive information is lost.
Abstract: Many learning algorithms require categorical data to be transformed into real vectors before it can be used as input. Often, categorical variables are encoded as one-hot (or dummy) vectors. However, this mode of representation can be wasteful since it adds many low-signal regressors, especially when the number of unique categories is large. In this paper, we investigate simple alternative solutions for universally consistent estimators that rely on lower-dimensional real-valued representations of categorical variables that are "sufficient" in the sense that no predictive information is lost. We then compare preexisting and proposed methods on simulated and observational datasets.

Posted Content
TL;DR: In this paper, the relevance of the recent Machine Learning (ML) literature for economics and econometrics is discussed, and some specific methods from the machine learning literature that are important for empirical researchers in economics are discussed.
Abstract: We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.

Journal ArticleDOI
TL;DR: In this paper, the authors of Wang and Blei (2018) present a thought-provoking article on causal inference in settings with unobserved confounders, and they expect that their ideas will lead to further devel...
Abstract: We congratulate the authors of Wang and Blei (2018) on a thought-provoking article on causal inference in settings with unobserved confounders. We expect that their ideas will lead to further devel...

Posted Content
TL;DR: In this article, the authors proposed an analytically feasible solution to the optimal treatment design problem where the variance of the treatment effect estimator is at most 1+O(1/N^2) times the variance using the treatment design, where N is the number of units.
Abstract: Experimentation has become an increasingly prevalent tool for guiding decision-making and policy choices. A common hurdle in designing experiments is the lack of statistical power. In this paper, we study the optimal multi-period experimental design under the constraint that the treatment cannot be easily removed once implemented; for example, a government might implement a public health intervention in different geographies at different times, where the treatment cannot be easily removed due to practical constraints. The treatment design problem is to select which geographies (referred by units) to treat at which time, intending to test hypotheses about the effect of the treatment. When the potential outcome is a linear function of unit and time effects, and discrete observed/latent covariates, we provide an analytically feasible solution to the optimal treatment design problem where the variance of the treatment effect estimator is at most 1+O(1/N^2) times the variance using the optimal treatment design, where N is the number of units. This solution assigns units in a staggered treatment adoption pattern - if the treatment only affects one period, the optimal fraction of treated units in each period increases linearly in time; if the treatment affects multiple periods, the optimal fraction increases non-linearly in time, smaller at the beginning and larger at the end. In the general setting where outcomes depend on latent covariates, we show that historical data can be utilized in designing experiments. We propose a data-driven local search algorithm to assign units to treatment times. We demonstrate that our approach improves upon benchmark experimental designs via synthetic interventions on the influenza occurrence rate and synthetic experiments on interventions for in-home medical services and grocery expenditure.