Oracle Inequalities and Optimal Inference under Group Sparsity
Reads0
Chats0
TLDR
In this article, the authors consider the problem of estimating a sparse linear regression vector β* under a Gaussian noise model, for the purpose of both prediction and model selection, and establish oracle inequalities for the prediction and l2 estimation errors of this estimator.Abstract:
We consider the problem of estimating a sparse linear regression vector β* under a Gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate β*. We establish oracle inequalities for the prediction and l2 estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger condition, we derive bounds for the estimation error for mixed (2, p)-norms with 1 ≤ p ≤ ∞. When p=∞, this result implies that a thresholded version of the Group Lasso estimator selects the sparsity pattern of β* with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and l2 estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation errors as compared to the Lasso. An important application of our results is provided by the problem of estimating multiple regression equations simultaneously or multi-task learning. In this case, we obtain refinements of the results in [In Proc. of the 22nd Annual Conference on Learning Theory (COLT) (2009)], which allow us to establish a quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest.read more
Citations
More filters
Book
High-Dimensional Statistics: A Non-Asymptotic Viewpoint
TL;DR: This book provides a self-contained introduction to the area of high-dimensional statistics, aimed at the first-year graduate level, and includes chapters that are focused on core methodology and theory - including tail bounds, concentration inequalities, uniform laws and empirical process, and random matrices.
Journal ArticleDOI
Least squares after model selection in high-dimensional sparse models
TL;DR: In this paper, the post-l1-penalized estimators in high-dimensional linear regression models are used to estimate the probability of a linear regression model to be true.
Journal ArticleDOI
Structured sparsity through convex optimization
TL;DR: In this article, the authors consider situations where they are not only interested in sparsity, but where some structural prior knowledge is available as well, and show that the $\ell_1$-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables.
Journal ArticleDOI
Sparse PCA: Optimal rates and adaptive estimation
TL;DR: In this paper, the authors considered both minimax and adaptive estimation of the principal subspace in the high dimensional setting and established the optimal rates of convergence for estimating the subspace which are sharp with respect to all the parameters, thus providing a complete characterization of the difficulty of the estimation problem in terms of the convergence rate.
Journal ArticleDOI
Robust inference on average treatment effects with possibly more covariates than observations
TL;DR: In this article, robust inference on average treatment effects following model selection is studied. Butler et al. construct confidence intervals using a doubly-robust estimator that are robust to model selection errors and prove their uniform validity over a large class of models that allows for multivalued treatments with heterogeneous effects and selection amongst (possibly) more covariates than observations.
References
More filters
Book
Econometric Analysis of Cross Section and Panel Data
TL;DR: This is the essential companion to Jeffrey Wooldridge's widely-used graduate text Econometric Analysis of Cross Section and Panel Data (MIT Press, 2001).
Journal ArticleDOI
An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias
TL;DR: In this paper, a method of estimating the parameters of a set of regression equations is reported which involves application of Aitken's generalized least-squares to the whole system of equations.
Journal ArticleDOI
Model selection and estimation in regression with grouped variables
Ming Yuan,Yi Lin +1 more
TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.
Book
Analysis of Panel Data
TL;DR: In this paper, the authors propose a homogeneity test for linear regression models (analysis of covariance) and show that linear regression with variable intercepts is more consistent than simple regression with simple intercepts.
Book
Weak Convergence and Empirical Processes: With Applications to Statistics
TL;DR: In this article, the authors define the Ball Sigma-Field and Measurability of Suprema and show that it is possible to achieve convergence almost surely and in probability.