scispace - formally typeset
Open AccessJournal ArticleDOI

Oracle Inequalities and Optimal Inference under Group Sparsity

Karim Lounici, +3 more
- 01 Aug 2011 - 
- Vol. 39, Iss: 4, pp 2164-2204
Reads0
Chats0
TLDR
In this article, the authors consider the problem of estimating a sparse linear regression vector β* under a Gaussian noise model, for the purpose of both prediction and model selection, and establish oracle inequalities for the prediction and l2 estimation errors of this estimator.
Abstract
We consider the problem of estimating a sparse linear regression vector β* under a Gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate β*. We establish oracle inequalities for the prediction and l2 estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger condition, we derive bounds for the estimation error for mixed (2, p)-norms with 1 ≤ p ≤ ∞. When p=∞, this result implies that a thresholded version of the Group Lasso estimator selects the sparsity pattern of β* with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and l2 estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation errors as compared to the Lasso. An important application of our results is provided by the problem of estimating multiple regression equations simultaneously or multi-task learning. In this case, we obtain refinements of the results in [In Proc. of the 22nd Annual Conference on Learning Theory (COLT) (2009)], which allow us to establish a quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

High-Dimensional Statistics: A Non-Asymptotic Viewpoint

TL;DR: This book provides a self-contained introduction to the area of high-dimensional statistics, aimed at the first-year graduate level, and includes chapters that are focused on core methodology and theory - including tail bounds, concentration inequalities, uniform laws and empirical process, and random matrices.
Journal ArticleDOI

Least squares after model selection in high-dimensional sparse models

TL;DR: In this paper, the post-l1-penalized estimators in high-dimensional linear regression models are used to estimate the probability of a linear regression model to be true.
Journal ArticleDOI

Structured sparsity through convex optimization

TL;DR: In this article, the authors consider situations where they are not only interested in sparsity, but where some structural prior knowledge is available as well, and show that the $\ell_1$-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables.
Journal ArticleDOI

Sparse PCA: Optimal rates and adaptive estimation

TL;DR: In this paper, the authors considered both minimax and adaptive estimation of the principal subspace in the high dimensional setting and established the optimal rates of convergence for estimating the subspace which are sharp with respect to all the parameters, thus providing a complete characterization of the difficulty of the estimation problem in terms of the convergence rate.
Journal ArticleDOI

Robust inference on average treatment effects with possibly more covariates than observations

TL;DR: In this article, robust inference on average treatment effects following model selection is studied. Butler et al. construct confidence intervals using a doubly-robust estimator that are robust to model selection errors and prove their uniform validity over a large class of models that allows for multivalued treatments with heterogeneous effects and selection amongst (possibly) more covariates than observations.
References
More filters
Book

Econometric Analysis of Cross Section and Panel Data

TL;DR: This is the essential companion to Jeffrey Wooldridge's widely-used graduate text Econometric Analysis of Cross Section and Panel Data (MIT Press, 2001).
Journal ArticleDOI

An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias

TL;DR: In this paper, a method of estimating the parameters of a set of regression equations is reported which involves application of Aitken's generalized least-squares to the whole system of equations.
Journal ArticleDOI

Model selection and estimation in regression with grouped variables

TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.
Book

Analysis of Panel Data

TL;DR: In this paper, the authors propose a homogeneity test for linear regression models (analysis of covariance) and show that linear regression with variable intercepts is more consistent than simple regression with simple intercepts.
Book

Weak Convergence and Empirical Processes: With Applications to Statistics

TL;DR: In this article, the authors define the Ball Sigma-Field and Measurability of Suprema and show that it is possible to achieve convergence almost surely and in probability.
Related Papers (5)