Oracle Inequalities and Optimal Inference under Group Sparsity

doi:10.1214/11-AOS896

Open AccessJournal ArticleDOI

Oracle Inequalities and Optimal Inference under Group Sparsity

Karim Lounici, +3 more

- 01 Aug 2011 -

Annals of Statistics

- Vol. 39, Iss: 4, pp 2164-2204

Chats0

TLDR

In this article, the authors consider the problem of estimating a sparse linear regression vector β* under a Gaussian noise model, for the purpose of both prediction and model selection, and establish oracle inequalities for the prediction and l2 estimation errors of this estimator.

Abstract:

We consider the problem of estimating a sparse linear regression vector β* under a Gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate β*. We establish oracle inequalities for the prediction and l2 estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger condition, we derive bounds for the estimation error for mixed (2, p)-norms with 1 ≤ p ≤ ∞. When p=∞, this result implies that a thresholded version of the Group Lasso estimator selects the sparsity pattern of β* with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and l2 estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation errors as compared to the Lasso. An important application of our results is provided by the problem of estimating multiple regression equations simultaneously or multi-task learning. In this case, we obtain refinements of the results in [In Proc. of the 22nd Annual Conference on Learning Theory (COLT) (2009)], which allow us to establish a quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest.

Oracle Inequalities and Optimal Inference under Group Sparsity

Citations

High-Dimensional Statistics: A Non-Asymptotic Viewpoint

Least squares after model selection in high-dimensional sparse models

Structured sparsity through convex optimization

Sparse PCA: Optimal rates and adaptive estimation

Robust inference on average treatment effects with possibly more covariates than observations

References

Econometric Analysis of Cross Section and Panel Data

An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias

Model selection and estimation in regression with grouped variables

Analysis of Panel Data

Weak Convergence and Empirical Processes: With Applications to Statistics

Related Papers (5)

Model selection and estimation in regression with grouped variables

Simultaneous analysis of lasso and dantzig selector

Regression Shrinkage and Selection via the Lasso

The adaptive lasso and its oracle properties

On Model Selection Consistency of Lasso