scispace - formally typeset
Search or ask a question

Showing papers by "Victor Chernozhukov published in 2016"


ReportDOI
TL;DR: In this article, a general construction of locally robust/orthogonal moment functions for GMM, where moment conditions have zero derivative with respect to first steps, is given and debiased machine learning estimators of functionals of high dimensional conditional quantiles and of dynamic discrete choice parameters with high dimensional state variables.
Abstract: Many economic and causal parameters depend on nonparametric or high dimensional first steps. We give a general construction of locally robust/orthogonal moment functions for GMM, where moment conditions have zero derivative with respect to first steps. We show that orthogonal moment functions can be constructed by adding to identifying moments the nonparametric influence function for the effect of the first step on identifying moments. Orthogonal moments reduce model selection and regularization bias, as is very important in many applications, especially for machine learning first steps. We give debiased machine learning estimators of functionals of high dimensional conditional quantiles and of dynamic discrete choice parameters with high dimensional state variables. We show that adding to identifying moments the nonparametric influence function provides a general construction of orthogonal moments, including regularity conditions, and show that the nonparametric influence function is robust to additional unknown functions on which it depends. We give a general approach to estimating the unknown functions in the nonparametric influence function and use it to automatically debias estimators of functionals of high dimensional conditional location learners. We give a variety of new doubly robust moment equations and characterize double robustness. We give general and simple regularity conditions and apply these for asymptotic inference on functionals of high dimensional regression quantiles and dynamic discrete choice parameters with high dimensional state variables.

201 citations


ReportDOI
TL;DR: The resulting method could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models and achieves the fastest rates of convergence and exhibit robust good behavior with respect to a broader class of probability distributions than naive "single" ML estimators.
Abstract: Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal parameters. Examples of such parameters include individual regression coefficients, average treatment effects, average lifts, and demand or supply elasticities. In fact, estimates of such causal parameters obtained via naively plugging ML estimators into estimating equations for such parameters can behave very poorly due to the regularization bias. Fortunately, this regularization bias can be removed by solving auxiliary prediction problems via ML tools. Specifically, we can form an orthogonal score for the target low-dimensional parameter by combining auxiliary and main ML predictions. The score is then used to build a de-biased estimator of the target parameter which typically will converge at the fastest possible 1/root(n) rate and be approximately unbiased and normal, and from which valid confidence intervals for these parameters of interest may be constructed. The resulting method thus could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models. In order to avoid overfitting, our construction also makes use of the K-fold sample splitting, which we call cross-fitting. This allows us to use a very broad set of ML predictive methods in solving the auxiliary and main prediction problems, such as random forest, lasso, ridge, deep neural nets, boosted trees, as well as various hybrids and aggregators of these methods.

157 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed new methods for estimating and constructing confidence regions for a regression parameter of primary interest, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable.
Abstract: This article considers generalized linear models in the presence of many controls. We lay out a general methodology to estimate an effect of interest based on the construction of an instrument that immunizes against model selection mistakes and apply it to the case of logistic binary choice model. More specifically we propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest α0, a parameter in front of the regressor of interest, such as the treatment variable or a policy variable. These methods allow to estimate α0 at the root-n rate when the total number p of other regressors, called controls, potentially exceeds the sample size n using sparsity assumptions. The sparsity assumption means that there is a subset of s < n controls, which suffices to accurately approximate the nuisance part of the regression function. Importantly, the estimators and these resulting confidence regions are valid uniformly over s-sparse models satisfying s2log 2p = o(n...

114 citations


Posted Content
TL;DR: This work can form an orthogonal score for the target low-dimensional parameter by combining auxiliary and main ML predictions, and build a de-biased estimator of the target parameter which typically will converge at the fastest possible 1/root(n) rate and be approximately unbiased and normal, and from which valid confidence intervals for these parameters of interest may be constructed.
Abstract: Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal parameters. Examples of such parameters include individual regression coefficients, average treatment effects, average lifts, and demand or supply elasticities. In fact, estimates of such causal parameters obtained via naively plugging ML estimators into estimating equations for such parameters can behave very poorly due to the regularization bias. Fortunately, this regularization bias can be removed by solving auxiliary prediction problems via ML tools. Specifically, we can form an orthogonal score for the target low-dimensional parameter by combining auxiliary and main ML predictions. The score is then used to build a de-biased estimator of the target parameter which typically will converge at the fastest possible 1/root(n) rate and be approximately unbiased and normal, and from which valid confidence intervals for these parameters of interest may be constructed. The resulting method thus could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models. In order to avoid overfitting, our construction also makes use of the K-fold sample splitting, which we call cross-fitting. This allows us to use a very broad set of ML predictive methods in solving the auxiliary and main prediction problems, such as random forest, lasso, ridge, deep neural nets, boosted trees, as well as various hybrids and aggregators of these methods.

104 citations


Journal ArticleDOI
TL;DR: In particular, this article showed that vanishingly small individual uncertainty about the signal distributions can lead to substantial (nonvanishing) differences in asymptotic beliefs, and characterized the conditions under which a small amount of uncertainty leads to significant divergence.
Abstract: Under the assumption that individuals know the conditional distributions of signals given the payoff-relevant parameters, existing results conclude that as individuals observe infinitely many signals, their beliefs about the parameters will eventually merge. We first show that these results are fragile when individuals are uncertain about the signal distributions: given any such model, vanishingly small individual uncertainty about the signal distributions can lead to substantial (nonvanishing) differences in asymptotic beliefs. Under a uniform convergence assumption, we then characterize the conditions under which a small amount of uncertainty leads to significant asymptotic disagreement.

71 citations


ReportDOI
TL;DR: In this article, the authors derived strong approximations to the supremum of the non-centered empirical process indexed by a possibly unbounded VC-type class of functions by the suprema of the Gaussian and bootstrap processes.

56 citations


Posted Content
TL;DR: This paper derives non-asymptotic error bounds for the Lasso estimator when the penalty parameter for the estimator is chosen using $K$-fold cross-validation and serves as a justification for the widely spread practice of using cross- validation as a method to choose the Penalty parameter.
Abstract: In this paper, we derive non-asymptotic error bounds for the Lasso estimator when the penalty parameter for the estimator is chosen using $K$-fold cross-validation. Our bounds imply that the cross-validated Lasso estimator has nearly optimal rates of convergence in the prediction, $L^2$, and $L^1$ norms. For example, we show that in the model with the Gaussian noise and under fairly general assumptions on the candidate set of values of the penalty parameter, the estimation error of the cross-validated Lasso estimator converges to zero in the prediction norm with the $\sqrt{s\log p / n}\times \sqrt{\log(p n)}$ rate, where $n$ is the sample size of available data, $p$ is the number of covariates, and $s$ is the number of non-zero coefficients in the model. Thus, the cross-validated Lasso estimator achieves the fastest possible rate of convergence in the prediction norm up to a small logarithmic factor $\sqrt{\log(p n)}$, and similar conclusions apply for the convergence rate both in $L^2$ and in $L^1$ norms. Importantly, our results cover the case when $p$ is (potentially much) larger than $n$ and also allow for the case of non-Gaussian noise. Our paper therefore serves as a justification for the widely spread practice of using cross-validation as a method to choose the penalty parameter for the Lasso estimator.

46 citations


ReportDOI
TL;DR: The vector quantile regression (VQR) as discussed by the authors is a linear model for CVQF of a random vector $Y$ given covariates $Z=z, which is a strong representation for some version of $U$ almost surely.
Abstract: We propose a notion of conditional vector quantile function and a vector quantile regression. A conditional vector quantile function (CVQF) of a random vector $Y$, taking values in $\mathbb{R}^{d}$ given covariates $Z=z$, taking values in $\mathbb{R}^{k}$, is a map $u\longmapsto Q_{Y|Z}(u,z)$, which is monotone, in the sense of being a gradient of a convex function, and such that given that vector $U$ follows a reference non-atomic distribution $F_{U}$, for instance uniform distribution on a unit cube in $\mathbb{R}^{d}$, the random vector $Q_{Y|Z}(U,z)$ has the distribution of $Y$ conditional on $Z=z$. Moreover, we have a strong representation, $Y=Q_{Y|Z}(U,Z)$ almost surely, for some version of $U$. The vector quantile regression (VQR) is a linear model for CVQF of $Y$ given $Z$. Under correct specification, the notion produces strong representation, $Y=\beta (U)^{\top}f(Z)$, for $f(Z)$ denoting a known set of transformations of $Z$, where $u\longmapsto\beta(u)^{\top}f(Z)$ is a monotone map, the gradient of a convex function and the quantile regression coefficients $u\longmapsto\beta(u)$ have the interpretations analogous to that of the standard scalar quantile regression. As $f(Z)$ becomes a richer class of transformations of $Z$, the model becomes nonparametric, as in series modelling. A key property of VQR is the embedding of the classical Monge–Kantorovich’s optimal transportation problem at its core as a special case. In the classical case, where $Y$ is scalar, VQR reduces to a version of the classical QR, and CVQF reduces to the scalar conditional quantile function. An application to multiple Engel curve estimation is considered.

39 citations


Journal ArticleDOI
TL;DR: In this paper, double-lasso regression is used to identify which covariates have sufficient empirical support for inclusion in analyses of correlations, moderation, mediation and experimental interventions, as well as to test for the effectiveness of randomization.
Abstract: The decision of whether to control for covariates, and how to select which covariates to include, is ubiquitous in psychological research. Failing to control for valid covariates can yield biased parameter estimates in correlational analyses or in imperfectly randomized experiments and contributes to underpowered analyses even in effectively randomized experiments. We introduce double-lasso regression as a principle method for variable selection. The double lasso method is calibrated to not over-select potentially spurious covariates, and simulations demonstrate that using this method reduces error and increases statistical power. This method can be used to identify which covariates have sufficient empirical support for inclusion in analyses of correlations, moderation, mediation and experimental interventions, as well as to test for the effectiveness of randomization. We illustrate both the method’s usefulness and how to implement it in practice by applying it to four analyses from the prior literature, using both correlational and experimental data.

36 citations


Posted Content
TL;DR: The High-dimensional Metrics (hdm) as discussed by the authors is a collection of statistical methods for estimation and quantification of uncertainty in high-dimensional approximately sparse models focusing on providing confidence intervals and significance testing for (possibly many) low-dimensional subcomponents of the highdimensional parameter vector.
Abstract: In this article the package High-dimensional Metrics (\texttt{hdm}) is introduced. It is a collection of statistical methods for estimation and quantification of uncertainty in high-dimensional approximately sparse models. It focuses on providing confidence intervals and significance testing for (possibly many) low-dimensional subcomponents of the high-dimensional parameter vector. Efficient estimators and uniformly valid confidence intervals for regression coefficients on target variables (e.g., treatment or policy variable) in a high-dimensional approximately sparse regression model, for average treatment effect (ATE) and average treatment effect for the treated (ATET), as well for extensions of these parameters to the endogenous setting are provided. Theory grounded, data-driven methods for selecting the penalization parameter in Lasso regressions under heteroscedastic and non-Gaussian errors are implemented. Moreover, joint/ simultaneous confidence intervals for regression coefficients of a high-dimensional sparse regression are implemented. Data sets which have been used in the literature and might be useful for classroom demonstration and for testing new estimators are included.

33 citations


ReportDOI
TL;DR: The High-dimensional Metrics (hdm) as discussed by the authors is a collection of statistical methods for estimation and quantification of uncertainty in high-dimensional approximately sparse models focusing on providing confidence intervals and significance testing for (possibly many) low-dimensional subcomponents of the highdimensional parameter vector.
Abstract: In this article the package High-dimensional Metrics (\texttt{hdm}) is introduced. It is a collection of statistical methods for estimation and quantification of uncertainty in high-dimensional approximately sparse models. It focuses on providing confidence intervals and significance testing for (possibly many) low-dimensional subcomponents of the high-dimensional parameter vector. Efficient estimators and uniformly valid confidence intervals for regression coefficients on target variables (e.g., treatment or policy variable) in a high-dimensional approximately sparse regression model, for average treatment effect (ATE) and average treatment effect for the treated (ATET), as well for extensions of these parameters to the endogenous setting are provided. Theory grounded, data-driven methods for selecting the penalization parameter in Lasso regressions under heteroscedastic and non-Gaussian errors are implemented. Moreover, joint/ simultaneous confidence intervals for regression coefficients of a high-dimensional sparse regression are implemented. Data sets which have been used in the literature and might be useful for classroom demonstration and for testing new estimators are included.

ReportDOI
TL;DR: In this article, the authors propose an inference method for quantile and quantile effect (QE) functions for descriptive and causal analysis due to their natural and intuitive interpretation, and show that these functions are important tools for causal analysis.
Abstract: Quantile and quantile effect (QE) functions are important tools for descriptive and causal analysis due to their natural and intuitive interpretation. Existing inference methods for these functions...

Posted Content
TL;DR: In this article, a command called cqiv conducts censored quantile instrumental variable (CQIV) estimation, which can implement both censored and uncensored quantile IV estimation either under exogeneity or endogeneity.
Abstract: cqiv conducts censored quantile instrumental variable (CQIV) estimation. This command can implement both censored and uncensored quantile IV estimation either under exogeneity or endogeneity. The estimator proposed by Chernozhukov, Fernandez-Val and Kowalski (2010) is used if CQIV estimation is implemented. A parametric version of the estimator proposed by Lee (2007) is used if quantile IV estimation without censoring is implemented. The estimator proposed by Chernozhukov and Hong (2002) is used if censored quantile regression (CQR) is estimated without endogeneity. Note that all the variables in the parentheses of the syntax are those involved in the first stage estimation of CQIV and QIV.

Posted ContentDOI
06 Jun 2016
TL;DR: This framework is intended to quantify the dependence in non-Gaussian settings which are ubiquitous in many econometric applications and applies the framework to study financial contagion and the impact of downside movement in the market on the dependence structure of assets’ return.
Abstract: We propose Quantile Graphical Models (QGMs) to characterize predictive and conditional independence relationships within a set of random variables of interest. This framework is intended to quantify the dependence in non-Gaussian settings which are ubiquitous in many econometric applications. We consider two distinct QGMs. First, Condition Independence QGMs characterize conditional independence at each quantile index revealing the distributional dependence structure. Second, Predictive QGMs characterize the best linear predictor under asymmetric loss functions. Under Gaussianity these notions essentially coincide but non-Gaussian settings lead us to different models as prediction and conditional independence are fundamentally different properties. Combined the models complement the methods based on normal and nonparanormal distributions that study mean predictability and use covariance and precision matrices for conditional independence. We also propose estimators for each QGMs. The estimators are based on high-dimension techniques including (a continuum of) l1-penalized quantile regressions and low biased equations, which allows us to handle the potentially large number of variables. We build upon recent results to obtain valid choice of the penalty parameters and rates of convergence. These results are derived without any assumptions on the separation from zero and are uniformly valid across a wide-range of models. With the additional assumptions that the coefficients are well-separated from zero, we can consistently estimate the graph associated with the dependence structure by hard thresholding the proposed estimators. Further we show how QGM can be used to represent the tail interdependence of the variables which plays an important role in application concern with extreme events in opposition to average behavior. We show that the associated tail risk network can be used for measuring systemic risk contributions. We also apply the framework to study financial contagion and the impact of downside movement in the market on the dependence structure of assets’ return. Finally, we illustrate the properties of the proposed framework through simulated examples.

Posted Content
TL;DR: T Theory grounded, data-driven methods for selecting the penalization parameter in Lasso regressions under heteroscedastic and non-Gaussian errors are implemented and joint/ simultaneous confidence intervals for regression coefficients of a high-dimensional sparse regression are implemented.
Abstract: The package High-dimensional Metrics (\Rpackage{hdm}) is an evolving collection of statistical methods for estimation and quantification of uncertainty in high-dimensional approximately sparse models. It focuses on providing confidence intervals and significance testing for (possibly many) low-dimensional subcomponents of the high-dimensional parameter vector. Efficient estimators and uniformly valid confidence intervals for regression coefficients on target variables (e.g., treatment or policy variable) in a high-dimensional approximately sparse regression model, for average treatment effect (ATE) and average treatment effect for the treated (ATET), as well for extensions of these parameters to the endogenous setting are provided. Theory grounded, data-driven methods for selecting the penalization parameter in Lasso regressions under heteroscedastic and non-Gaussian errors are implemented. Moreover, joint/ simultaneous confidence intervals for regression coefficients of a high-dimensional sparse regression are implemented, including a joint significance test for Lasso regression. Data sets which have been used in the literature and might be useful for classroom demonstration and for testing new estimators are included. \R and the package \Rpackage{hdm} are open-source software projects and can be freely downloaded from CRAN: \texttt{this http URL}.

Posted Content
TL;DR: Quantile and quantile effect functions are important tools for descriptive and causal analysis due to their natural and intuitive interpretation.
Abstract: Quantile and quantile effect functions are important tools for descriptive and causal analyses due to their natural and intuitive interpretation. Existing inference methods for these functions do not apply to discrete random variables. This paper offers a simple, practical construction of simultaneous confidence bands for quantile and quantile effect functions of possibly discrete random variables. It is based on a natural transformation of simultaneous confidence bands for distribution functions, which are readily available for many problems. The construction is generic and does not depend on the nature of the underlying problem. It works in conjunction with parametric, semiparametric, and nonparametric modeling methods for observed and counterfactual distributions, and does not depend on the sampling scheme. We apply our method to characterize the distributional impact of insurance coverage on health care utilization and obtain the distributional decomposition of the racial test score gap. We find that universal insurance coverage increases the number of doctor visits across the entire distribution, and that the racial test score gap is small at early ages but grows with age due to socio economic factors affecting child development especially at the top of the distribution. These are new, interesting empirical findings that complement previous analyses that focused on mean effects only. In both applications, the outcomes of interest are discrete rendering existing inference methods invalid for obtaining uniform confidence bands for observed and counterfactual quantile functions and for their difference -- the quantile effects functions.

Posted Content
TL;DR: In this article, the problem of vector quantile regression (VQR) is formulated as an optimal transport problem subject to an additional mean-independence condition, where the dependence of a random vector of interest with respect to a vector of explanatory variables is modeled as the conditional mean distribution.
Abstract: This paper studies vector quantile regression (VQR), which is a way to model the dependence of a random vector of interest with respect to a vector of explanatory variables so to capture the whole conditional distribution, and not only the conditional mean The problem of vector quantile regression is formulated as an optimal transport problem subject to an additional mean-independence condition This paper provides a new set of results on VQR beyond the case with correct specification which had been the focus of previous work First, we show that even under misspecification, the VQR problem still has a solution which provides a general representation of the conditional dependence between random vectors Second, we provide a detailed comparison with the classical approach of Koenker and Bassett in the case when the dependent variable is univariate and we show that in that case, VQR is equivalent to classical quantile regression with an additional monotonicity constraint

ReportDOI
TL;DR: The R package quantregnonpar as mentioned in this paper implements nonparametric quantile regression methods to estimate and make inference on partially linear quantile models, and provides pointwise and uniform confidence intervals over a region of covariate values and/or quantile indices for the same functions using analytical and resampling methods.
Abstract: The R package quantregnonpar implements nonparametric quantile regression methods to estimate and make inference on partially linear quantile models quantregnonpar obtains point estimates of the conditional quantile function and its derivatives based on series approximations to the nonparametric part of the model It also provides pointwise and uniform confidence intervals over a region of covariate values and/or quantile indices for the same functions using analytical and resampling methods This paper serves as an introduction to the package and displays basic functionality of the functions contained within


Posted Content
TL;DR: The vector quantile regression (VQR) as discussed by the authors is a linear model for CVQF of a random vector given a reference nonatomic distribution FU, for instance uniform distribution on a unit cube in Rd, where the random vector QY jZ(U; z) has the distribution of Y conditional on Z = z.
Abstract: We propose a notion of conditional vector quantile function and a vector quantile regression. A conditional vector quantile function (CVQF) of a random vector Y , taking values in Rd given covariates Z = z, taking values in Rk, is a map u --> QY jZ(u; z), which is monotone, in the sense of being a gradient of a convex function, and such that given that vector U follows a reference nonatomic distribution FU, for instance uniform distribution on a unit cube in Rd, the random vector QY jZ(U; z) has the distribution of Y conditional on Z = z. Moreover, we have a strong representation, Y = QY jZ(U;Z) almost surely, for some version of U. The vector quantile regression (VQR) is a linear model for CVQF of Y given Z. Under correct specification, the notion produces strong representation, Y = (U)> f(Z), for f(Z) denoting a known set of transformations of Z, where u --> (u)>f(Z) is a monotone map, the gradient of a convex function, and the quantile regression coefficients u --> (u) have the interpretations analogous to that of the standard scalar quantile regression. As f(Z) becomes a richer class of transformations of Z, the model becomes nonparametric, as in series modelling. A key property of VQR is the embedding of the classical Monge-Kantorovich's optimal transportation problem at its core as a special case. In the classical case, where Y is scalar, VQR reduces to a version of the classical QR, and CVQF reduces to the scalar conditional quantile function. An application to multiple Engel curve estimation is considered.

Posted Content
TL;DR: The R package quantreg.nonpar as mentioned in this paper implements nonparametric quantile regression methods to estimate and make inference on partially linear quantile models and provides pointwise and uniform confidence intervals over a region of covariate values and/or quantile indices.
Abstract: The R package quantreg.nonpar implements nonparametric quantile regression methods to estimate and make inference on partially linear quantile models. quantreg.nonpar obtains point estimates of the conditional quantile function and its derivatives based on series approximations to the nonparametric part of the model. It also provides pointwise and uniform confidence intervals over a region of covariate values and/or quantile indices for the same functions using analytical and resampling methods. This paper serves as an introduction to the package and displays basic functionality of the functions contained within.

Posted Content
TL;DR: The Counterfactual package as discussed by the authors implements the estimation and inference methods of Chernozhukov, Fernandez-Val and Melly (2013) for counterfactual analysis for quantile treatment effects and wage decompositions.
Abstract: The Counterfactual package implements the estimation and inference methods of Chernozhukov, Fernandez-Val and Melly (2013) for counterfactual analysis The counterfactual distributions considered are the result of changing either the marginal distribution of covariates related to the outcome variable of interest, or the conditional distribution of the outcome given the covariates They can be applied to estimate quantile treatment effects and wage decompositions This paper serves as an introduction to the package and displays basic functionality of the commands contained within

Posted Content
TL;DR: In this paper, the authors provide a method to construct simultaneous confidence bands for quantile and quantile eff ect functions for possibly discrete or mixed discrete-continuous random variables and apply their method to analyze the distributional impact of insurance coverage on health care utilization.
Abstract: This paper provides a method to construct simultaneous con fidence bands for quantile and quantile eff ect functions for possibly discrete or mixed discrete-continuous random variables. The construction is generic and does not depend on the nature of the underlying problem. It works in conjunction with parametric, semiparametric, and nonparametric modeling strategies and does not depend on the sampling schemes. It is based upon projection of simultaneous con dfidence bands for distribution functions. We apply our method to analyze the distributional impact of insurance coverage on health care utilization and to provide a distributional decomposition of the racial test score gap. Our analysis generates new interesting fi ndings, and complements previous analyses that focused on mean e ffects only. In both applications, the outcomes of interest are discrete rendering standard inference methods invalid for obtaining uniform con fidence bands for quantile and quantile e ffects functions.

Posted Content
TL;DR: In this paper, the authors proposed a double ML estimator, which combines auxiliary and main ML predictions to achieve the fastest rates of convergence and exhibit robust good behavior with respect to a broader class of probability distributions than naive "single" ML estimators.
Abstract: Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal parameters. Examples of such parameters include individual regression coffiecients, average treatment e ffects, average lifts, and demand or supply elasticities. In fact, estimators of such causal parameters obtained via naively plugging ML estimators into estimating equations for such parameters can behave very poorly. For example, the resulting estimators may formally have inferior rates of convergence with respect to the sample size n caused by regularization bias. Fortunately, this regularization bias can be removed by solving auxiliary prediction problems via ML tools. Speci ficially, we can form an efficient score for the target low-dimensional parameter by combining auxiliary and main ML predictions. The efficient score may then be used to build an efficient estimator of the target parameter which typically will converge at the fastest possible 1/v n rate and be approximately unbiased and normal, allowing simple construction of valid con fidence intervals for parameters of interest. The resulting method thus could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models. Such double ML estimators achieve the fastest rates of convergence and exhibit robust good behavior with respect to a broader class of probability distributions than naive "single" ML estimators. In order to avoid overfi tting, following [3], our construction also makes use of the K-fold sample splitting, which we call cross- fitting. The use of sample splitting allows us to use a very broad set of ML predictive methods in solving the auxiliary and main prediction problems, such as random forests, lasso, ridge, deep neural nets, boosted trees, as well as various hybrids and aggregates of these methods (e.g. a hybrid of a random forest and lasso). We illustrate the application of the general theory through application to the leading cases of estimation and inference on the main parameter in a partially linear regression model and estimation and inference on average treatment eff ects and average treatment e ffects on the treated under conditional random assignment of the treatment. These applications cover randomized control trials as a special case. We then use the methods in an empirical application which estimates the e ffect of 401(k) eligibility on accumulated financial assets.