scispace - formally typeset
Search or ask a question

Showing papers by "Victor Chernozhukov published in 2014"


ReportDOI
TL;DR: An abstract approximation theorem that is applicable to a wide variety of problems, primarily in statistics, is proved and the bound in the main approximation theorem is non-asymptotic and the theorem does not require uniform boundedness of the class of functions.
Abstract: This paper develops a new direct approach to approximating suprema of general empirical processes by a sequence of suprema of Gaussian processes, without taking the route of approximating whole empirical processes in the sup-norm. We prove an abstract approximation theorem applicable to a wide variety of statistical problems, such as construction of uniform confidence bands for functions. Notably, the bound in the main approximation theorem is nonasymptotic and the theorem allows for functions that index the empirical process to be unbounded and have entropy divergent with the sample size. The proof of the approximation theorem builds on a new coupling inequality for maxima of sums of random vectors, the proof of which depends on an effective use of Stein’s method for normal approximation, and some new empirical process techniques. We study applications of this approximation theorem to local and series empirical processes arising in nonparametric estimation via kernel and series methods, where the classes of functions change with the sample size and are non-Donsker. Importantly, our new technique is able to prove the Gaussian approximation for the supremum type statistics under weak regularity conditions, especially concerning the bandwidth and the number of series functions, in those examples.

257 citations


Posted Content
TL;DR: In this paper, the central limit and bootstrap theorems for probabilities that sums of centered high-dimensional random vectors hit hyperrectangles and sparsely convex sets are derived.
Abstract: This paper derives central limit and bootstrap theorems for probabilities that sums of centered high-dimensional random vectors hit hyperrectangles and sparsely convex sets. Specifically, we derive Gaussian and bootstrap approximations for probabilities $\Pr(n^{-1/2}\sum_{i=1}^n X_i\in A)$ where $X_1,\dots,X_n$ are independent random vectors in $\mathbb{R}^p$ and $A$ is a hyperrectangle, or, more generally, a sparsely convex set, and show that the approximation error converges to zero even if $p=p_n\to \infty$ as $n \to \infty$ and $p \gg n$; in particular, $p$ can be as large as $O(e^{Cn^c})$ for some constants $c,C>0$. The result holds uniformly over all hyperrectangles, or more generally, sparsely convex sets, and does not require any restriction on the correlation structure among coordinates of $X_i$. Sparsely convex sets are sets that can be represented as intersections of many convex sets whose indicator functions depend only on a small subset of their arguments, with hyperrectangles being a special case.

254 citations


ReportDOI
TL;DR: A self-tuning Lasso method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise, and generates sharp bounds even in extreme cases, such as the infinite variance case and the noiseless case.
Abstract: We propose a self-tuning $\sqrt{\mathrm {Lasso}} $ method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even in extreme cases, such as the infinite variance case and the noiseless case, in contrast to Lasso. We establish various nonasymptotic bounds for $\sqrt{\mathrm {Lasso}} $ including prediction norm rate and sparsity. Our analysis is based on new impact factors that are tailored for bounding prediction norm. In order to cover heteroscedastic non-Gaussian noise, we rely on moderate deviation theory for self-normalized sums to achieve Gaussian-like results under weak conditions. Moreover, we derive bounds on the performance of ordinary least square (ols) applied to the model selected by $\sqrt{\mathrm {Lasso}} $ accounting for possible misspecification of the selected model. Under mild conditions, the rate of convergence of ols post $\sqrt{\mathrm {Lasso}} $ is as good as $\sqrt{\mathrm {Lasso}} $’s rate. As an application, we consider the use of $\sqrt{\mathrm {Lasso}} $ and ols post $\sqrt{\mathrm {Lasso}} $ as estimators of nuisance parameters in a generic semiparametric problem (nonlinear moment condition or $Z$-problem), resulting in a construction of $\sqrt{n}$-consistent and asymptotically normal estimators of the main parameters.

111 citations


01 May 2014
TL;DR: In this paper, the authors give explicit comparisons of expectations of smooth functions and distribution functions of maxima of Gaussian random vectors without any restriction on the covariance matrices, and derive a useful upper bound on the Levy concentration function for the maximum of (not necessarily independent) Gaussian variables.
Abstract: Slepian and Sudakov-Fernique type inequalities, which compare expectations of maxima of Gaussian random vectors under certain restrictions on the covariance matrices, play an important role in the probability theory, especially in empirical process and extreme value theories. Here we give explicit comparisons of expectations of smooth functions and distribution functions of maxima of Gaussian random vectors without any restriction on the covariance matrices. We also establish an anti-concentration inequality for maxima of Gaussian random vectors, which derives a useful upper bound on the Levy concentration function for the maximum of (not necessarily independent) Gaussian random variables. The bound is universal and applies to vectors with arbitrary covariance matrices. This anti-concentration inequality plays a crucial role in establishing bounds on the Kolmogorov distance between maxima of Gaussian random vectors. These results have immediate applications in mathematical statistics. As an example of application, we establish a conditional multiplier central limit theorem for maxima of sums of independent random vectors where the dimension of the vectors is possibly much larger than the sample size.

110 citations


ReportDOI
TL;DR: In this paper, an anti-concentration property of the supremum of a Gaussian process is derived from an inequality leading to a generalized SBR condition for separable Gaussian processes.
Abstract: Modern construction of uniform condence e and Nickl (2010). This condition requires the existence of a limit distribution of an extreme value type for the supremum of a studentized empirical process (equivalently, for the supremum of a Gaussian process with the same covariance function as that of the studentized empirical process). The principal contribution of this paper is to remove the need for this classical condition. We show that a considerably weaker sucient condi- tion is derived from an anti-concentration property of the supremum of the approximating Gaussian process, and we derive an inequality lead- ing to such a property for separable Gaussian processes. We refer to the new condition as a generalized SBR condition. Our new result shows that the supremum does not concentrate too fast around any value. We then apply this result to derive a Gaussian multiplier bootstrap procedure for constructing honest condence bands for nonparametric density estimators (this result can be applied in other nonparametric problems as well). An essential advantage of our approach is that it ap- plies generically even in those cases where the limit distribution of the supremum of the studentized empirical process does not exist (or is un- known). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fash- ion, which is needed for adaptive constructions of the condence bands. Furthermore, our approach is asymptotically honest at a polynomial rate { namely, the error in coverage level converges to zero at a fast, polynomial speed (with respect to the sample size). In sharp contrast, the approach based on extreme value theory is asymptotically honest only at a logarithmic rate { the error converges to zero at a slow, loga- rithmic speed. Finally, of independent interest is our introduction of a new, practical version of Lepski's method, which computes the optimal, non-conservative resolution levels via a Gaussian multiplier bootstrap method.

107 citations


Journal ArticleDOI
TL;DR: Within this framework, procedures that give uniformly valid inference over a fixed subset of parameters in the canonical linear fixed effects model and over coefficients on a fixed vector of endogenous variables in panel data instrumental variable models with fixed effects and many instruments are provided.
Abstract: We consider estimation and inference in panel data models with additive unobserved individual specific heterogeneity in a high-dimensional setting. The setting allows the number of time-varying regressors to be larger than the sample size. To make informative estimation and inference feasible, we require that the overall contribution of the time-varying variables after eliminating the individual specific heterogeneity can be captured by a relatively small number of the available variables whose identities are unknown. This restriction allows the problem of estimation to proceed as a variable selection problem. Importantly, we treat the individual specific heterogeneity as fixed effects which allows this heterogeneity to be related to the observed time-varying variables in an unspecified way and allows that this heterogeneity may differ for all individuals. Within this framework, we provide procedures that give uniformly valid inference over a fixed subset of parameters in the canonical linear fixed effect...

94 citations


Posted Content
TL;DR: In this article, the authors consider the problem of estimating the effect of individual specific heterogeneity on the overall contribution of the time varying variables after eliminating the individual-specific heterogeneity can be captured by a relatively small number of available variables whose identities are unknown.
Abstract: We consider estimation and inference in panel data models with additive unobserved individual specific heterogeneity in a high dimensional setting. The setting allows the number of time varying regressors to be larger than the sample size. To make informative estimation and inference feasible, we require that the overall contribution of the time varying variables after eliminating the individual specific heterogeneity can be captured by a relatively small number of the available variables whose identities are unknown. This restriction allows the problem of estimation to proceed as a variable selection problem. Importantly, we treat the individual specific heterogeneity as fixed effects which allows this heterogeneity to be related to the observed time varying variables in an unspecified way and allows that this heterogeneity may be non-zero for all individuals. Within this framework, we provide procedures that give uniformly valid inference over a fixed subset of parameters in the canonical linear fixed effects model and over coefficients on a fixed vector of endogenous variables in panel data instrumental variables models with fixed effects and many instruments. An input to developing the properties of our proposed procedures is the use of a variant of the Lasso estimator that allows for a grouped data structure where data across groups are independent and dependence within groups is unrestricted. We provide formal conditions within this structure under which the proposed Lasso variant selects a sparse model with good approximation properties. We present simulation results in support of the theoretical developments and illustrate the use of the methods in an application aimed at estimating the effect of gun prevalence on crime rates.

73 citations


Posted Content
TL;DR: In this paper, the authors propose new concepts of statistical depth, multivariate quantiles, ranks and signs, based on canonical transportation maps between a distribution of interest on $R^d$ and a reference distribution on the $d$-dimensional unit ball.
Abstract: We propose new concepts of statistical depth, multivariate quantiles, ranks and signs, based on canonical transportation maps between a distribution of interest on $R^d$ and a reference distribution on the $d$-dimensional unit ball. The new depth concept, called Monge-Kantorovich depth, specializes to halfspace depth in the case of spherical distributions, but, for more general distributions, differs from the latter in the ability for its contours to account for non convex features of the distribution of interest. We propose empirical counterparts to the population versions of those Monge-Kantorovich depth contours, quantiles, ranks and signs, and show their consistency by establishing a uniform convergence property for empirical transport maps, which is of independent interest.

55 citations


Posted Content
TL;DR: In this paper, Conditional vector quantile functions (CVQF) and vector quantiles regressions (VQR) are introduced, where the quantile regression coefficients have interpretations analogous to that of the standard scalar quantile regressions.
Abstract: We propose a notion of conditional vector quantile function and a vector quantile regression. A \emph{conditional vector quantile function} (CVQF) of a random vector $Y$, taking values in $\mathbb{R}^d$ given covariates $Z=z$, taking values in $\mathbb{R}% ^k$, is a map $u \longmapsto Q_{Y\mid Z}(u,z)$, which is monotone, in the sense of being a gradient of a convex function, and such that given that vector $U$ follows a reference non-atomic distribution $F_U$, for instance uniform distribution on a unit cube in $\mathbb{R}^d$, the random vector $Q_{Y\mid Z}(U,z)$ has the distribution of $Y$ conditional on $Z=z$. Moreover, we have a strong representation, $Y = Q_{Y\mid Z}(U,Z)$ almost surely, for some version of $U$. The \emph{vector quantile regression} (VQR) is a linear model for CVQF of $Y$ given $Z$. Under correct specification, the notion produces strong representation, $Y=\beta \left(U\right) ^\top f(Z)$, for $f(Z)$ denoting a known set of transformations of $Z$, where $u \longmapsto \beta(u)^\top f(Z)$ is a monotone map, the gradient of a convex function, and the quantile regression coefficients $u \longmapsto \beta(u)$ have the interpretations analogous to that of the standard scalar quantile regression. As $f(Z)$ becomes a richer class of transformations of $Z$, the model becomes nonparametric, as in series modelling. A key property of VQR is the embedding of the classical Monge-Kantorovich's optimal transportation problem at its core as a special case. In the classical case, where $Y$ is scalar, VQR reduces to a version of the classical QR, and CVQF reduces to the scalar conditional quantile function. An application to multiple Engel curve estimation is considered.

38 citations


ReportDOI
TL;DR: New inference methods for the estimation of a regression coefficient of interest in quantile regression models where the number of regressors potentially exceeds the sample size but a subset of them suffice to construct a reasonable approximation of the unknown quantiles regression function in the model are proposed.
Abstract: This work proposes new inference methods for a regression coefficient of interest in a (heterogenous) quantile regression model. We consider a high-dimensional model where the number of regressors ...

31 citations


ReportDOI
TL;DR: The vector quantile regression (VQR) as discussed by the authors is a linear model for CVQF of a random vector Y given covariates Z = Z, which is a strong representation for some version of U. Under correct specification, the notion produces strong representation, Y=β(U)⊤f(Z), for f(Z) denoting a known set of transformations of Z.
Abstract: We propose a notion of conditional vector quantile function and a vector quantile regression. A conditional vector quantile function (CVQF) of a random vector Y, taking values in ℝd given covariates Z=z, taking values in ℝk, is a map u↦QY∣Z(u,z), which is monotone, in the sense of being a gradient of a convex function, and such that given that vector U follows a reference non-atomic distribution FU, for instance uniform distribution on a unit cube in ℝd, the random vector QY∣Z(U,z) has the distribution of Y conditional on Z=z. Moreover, we have a strong representation, Y=QY∣Z(U,Z) almost surely, for some version of U. The vector quantile regression (VQR) is a linear model for CVQF of Y given Z. Under correct specification, the notion produces strong representation, Y=β(U)⊤f(Z), for f(Z) denoting a known set of transformations of Z, where u↦β(u)⊤f(Z) is a monotone map, the gradient of a convex function, and the quantile regression coefficients u↦β(u) have the interpretations analogous to that of the standard scalar quantile regression. As f(Z) becomes a richer class of transformations of Z, the model becomes nonparametric, as in series modelling. A key property of VQR is the embedding of the classical Monge-Kantorovich's optimal transportation problem at its core as a special case. In the classical case, where Y is scalar, VQR reduces to a version of the classical QR, and CVQF reduces to the scalar conditional quantile function. Several applications to diverse problems such as multiple Engel curve estimation, and measurement of financial risk, are considered.

Posted Content
18 Jun 2014
TL;DR: In this paper, Conditional vector quantile functions (CVQF) and vector quantiles regressions (VQR) are introduced, where the quantile regression coefficients have interpretations analogous to that of the standard scalar quantile regressions.
Abstract: We propose a notion of conditional vector quantile function and a vector quantile regression. A \emph{conditional vector quantile function} (CVQF) of a random vector $Y$, taking values in $\mathbb{R}^d$ given covariates $Z=z$, taking values in $\mathbb{R}% ^k$, is a map $u \longmapsto Q_{Y\mid Z}(u,z)$, which is monotone, in the sense of being a gradient of a convex function, and such that given that vector $U$ follows a reference non-atomic distribution $F_U$, for instance uniform distribution on a unit cube in $\mathbb{R}^d$, the random vector $Q_{Y\mid Z}(U,z)$ has the distribution of $Y$ conditional on $Z=z$. Moreover, we have a strong representation, $Y = Q_{Y\mid Z}(U,Z)$ almost surely, for some version of $U$. The \emph{vector quantile regression} (VQR) is a linear model for CVQF of $Y$ given $Z$. Under correct specification, the notion produces strong representation, $Y=\beta \left(U\right) ^\top f(Z)$, for $f(Z)$ denoting a known set of transformations of $Z$, where $u \longmapsto \beta(u)^\top f(Z)$ is a monotone map, the gradient of a convex function, and the quantile regression coefficients $u \longmapsto \beta(u)$ have the interpretations analogous to that of the standard scalar quantile regression. As $f(Z)$ becomes a richer class of transformations of $Z$, the model becomes nonparametric, as in series modelling. A key property of VQR is the embedding of the classical Monge-Kantorovich's optimal transportation problem at its core as a special case. In the classical case, where $Y$ is scalar, VQR reduces to a version of the classical QR, and CVQF reduces to the scalar conditional quantile function. An application to multiple Engel curve estimation is considered.

Posted Content
TL;DR: This paper derives Gaussian and bootstrap approximations for the probabilities that a root-n rescaled sample average of Xi is in A, and shows that the approximation error converges to zero even if p=pn-> infinity and p>>n; in particular, p can be as large as O(e^(Cn^c) for some constants c,C>0.
Abstract: In this paper, we derive central limit and bootstrap theorems for probabilities that centered high-dimensional vector sums hit rectangles and sparsely convex sets. Specifically, we derive Gaussian and bootstrap approximations for the probabilities that a root-n rescaled sample average of Xi is in A, where X1,..., Xn are independent random vectors in Rp and A is a rectangle, or, more generally, a sparsely convex set, and show that the approximation error converges to zero even if p=pn-> infinity and p>>n; in particular, p can be as large as O(e^(Cn^c)) for some constants c,C>0. The result holds uniformly over all rectangles, or more generally, sparsely convex sets, and does not require any restrictions on the correlation among components of Xi. Sparsely convex sets are sets that can be represented as intersections of many convex sets whose indicator functions depend nontrivially only on a small subset of their arguments, with rectangles being a special case.

Posted Content
TL;DR: In this article, the authors proposed new inference methods for the estimation of a regression coefficient of interest in quantile regression models, where the number of regressors potentially exceeds the sample size but a subset of them suffice to construct a reasonable approximation of the unknown quantile regress function in the model.
Abstract: This work proposes new inference methods for the estimation of a regression coefficient of interest in quantile regression models. We consider high-dimensional models where the number of regressors potentially exceeds the sample size but a subset of them suffice to construct a reasonable approximation of the unknown quantile regression function in the model. The proposed methods are protected against moderate model selection mistakes, which are often inevitable in the approximately sparse model considered here. The methods construct (implicitly or explicitly) an optimal instrument as a residual from a density-weighted projection of the regressor of interest on other regressors. Under regularity conditions, the proposed estimators of the quantile regression coefficient are asymptotically root-n normal, with variance equal to the semi-parametric efficiency bound of the partially linear quantile regression model. In addition, the performance of the technique is illustrated through Monte-carlo experiments and an empirical example, dealing with risk factors in childhood malnutrition. The numerical results confirm the theoretical findings that the proposed methods should outperform the naive post-model selection methods in non-parametric settings. Moreover, the empirical results demonstrate soundness of the proposed methods.

Posted Content
TL;DR: In this article, the identi?cation and estimation of ceteris paribus effects of continuous regressors in nonseparable panel models with time homogeneity are considered, where the effects of interest are derivatives of the average and quantile structural functions of the model.
Abstract: This paper considers identi?cation and estimation of ceteris paribus effects of continuous regressors in nonseparable panel models with time homogeneity. The effects of interest are derivatives of the average and quantile structural functions of the model. We ?nd that these derivatives are identi?ed with two time periods for “stayers”, i.e. for individuals with the same regressor values in two time periods. We show that the identi?cation results carry over to models that allow location and scale time e?ects. We propose nonparametric series methods and a weighted bootstrap scheme to estimate and make inference on the identi?ed e?ects. The bootstrap proposed allows inference for function-valued parameters such as quantile e?ects uniformly over a region of quantile indices and/or regressor values. An empirical application to Engel curve estimation with panel data illustrates the results.


Posted Content
TL;DR: In this paper, the authors derived conditions under which preferences and technology are nonparametrically identified in hedonic equilibrium models, where products are differentiated along more than one dimension and agents are characterized by several dimensions of unobserved heterogeneity.
Abstract: This paper derives conditions under which preferences and technology are nonparametrically identified in hedonic equilibrium models, where products are differentiated along more than one dimension and agents are characterized by several dimensions of unobserved heterogeneity. With products differentiated along a quality index and agents characterized by scalar unobserved heterogeneity, single crossing conditions on preferences and technology provide identifying restrictions. We develop similar shape restrictions in the multi-attribute case and we provide identification results from the observation of a single market. We thereby extend identification results in Matzkin (2003) and Heckman, Matzkin, and Nesheim (2010) to accommodate multiple dimensions of unobserved heterogeneity.

Journal ArticleDOI
TL;DR: In this paper, a general analysis of valid post-selection or post-regularization inference about a low-dimensional target parameter, α, in the presence of a very high-dimensional nuisance parameter, η, which is estimated using modern selection or regularization methods is provided.
Abstract: Here we present an expository, general analysis of valid post-selection or post-regularization inference about a low-dimensional target parameter, α, in the presence of a very high-dimensional nuisance parameter, η, which is estimated using modern selection or regularization methods. Our analysis relies on high-level, easy-to-interpret conditions that allow one to clearly see the structures needed for achieving valid post-regularization inference. Simple, readily verifiable sufficient conditions are provided for a class of affine-quadratic models. We rely on asymptotic statements which dramatically simplifies theoretical statements and helps highlight the structure of the problem. We focus our discussion on estimation and inference procedures based on using the empirical analog of theoretical equations M(α, η) = 0 which identify α. Within this structure, we show that setting up such equations in a manner such that the orthogonality/immunization condition ∂ηM (α, η) = 0 at the true parameter values is satisfied, coupled with plausible conditions on the smoothness of M and the quality of the estimator ηˆ, guarantees that inference for the main parameter α based on testing or point estimation methods discussed below will be regular despite selection or regularization biases occurring in estimation of η. In particular, the estimator of α will often be uniformly consistent at the root-n rate and uniformly asymptotically normal even though estimators ηˆ will generally not be asymptotically linear and regular. The uniformity holds over large classes of models that do not impose highly implausible “beta-min” conditions. We also show that inference can be carried out by inverting tests formed from Neyman’s C(α) (orthogonal score) statistics. As an application and an illustration of these ideas, we provide an analysis of post-selection inference in the linear models with many regressors and many instruments. We conclude with a review of other developments in post-selection inference and argue that many of the developments can be viewed as special cases of the general framework of orthogonalized estimating equations.

Posted Content
TL;DR: In this article, the authors consider the problem of estimating the effect of individual specific heterogeneity on the overall contribution of the time varying variables after eliminating the individual-specific heterogeneity can be captured by a relatively small number of available variables whose identities are unknown.
Abstract: We consider estimation and inference in panel data models with additive unobserved individual specific heterogeneity in a high dimensional setting. The setting allows the number of time varying regressors to be larger than the sample size. To make informative estimation and inference feasible, we require that the overall contribution of the time varying variables after eliminating the individual specific heterogeneity can be captured by a relatively small number of the available variables whose identities are unknown. This restriction allows the problem of estimation to proceed as a variable selection problem. Importantly, we treat the individual specific heterogeneity as fixed effects which allows this heterogeneity to be related to the observed time varying variables in an unspecified way and allows that this heterogeneity may be non-zero for all individuals. Within this framework, we provide procedures that give uniformly valid inference over a fixed subset of parameters in the canonical linear fixed effects model and over coefficients on a fixed vector of endogenous variables in panel data instrumental variables models with fixed effects and many instruments. An input to developing the properties of our proposed procedures is the use of a variant of the Lasso estimator that allows for a grouped data structure where data across groups are independent and dependence within groups is unrestricted. We provide formal conditions within this structure under which the proposed Lasso variant selects a sparse model with good approximation properties. We present simulation results in support of the theoretical developments and illustrate the use of the methods in an application aimed at estimating the effect of gun prevalence on crime rates.

Posted Content
TL;DR: The clr2bound package as discussed by the authors provides two-sided bound estimates using Bonferroni's inequality and clrtest performs an intersection bound test of the hypothesis that a collection of lower intersection bounds is no greater than zero.
Abstract: This package includes various commands. clr2bound provides two-sided bound estimates using Bonferroni's inequality. clrbound provides a one-sided bound estimate. clrtest performs an intersection bound test of the hypothesis that a collection of lower intersection bounds is no greater than zero. clr3bound provides two-sided bound estimates by inverting clrtest.