scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection

TL;DR: A data‐driven weighted linear combination of convex loss functions, together with weighted L1‐penalty is proposed and established a strong oracle property of the method proposed that has both the model selection consistency and estimation efficiency for the true non‐zero coefficients.
Abstract: In high-dimensional model selection problems, penalized least-square approaches have been extensively used. This paper addresses the question of both robustness and efficiency of penalized model selection methods, and proposes a data-driven weighted linear combination of convex loss functions, together with weighted L1-penalty. It is completely data-adaptive and does not require prior knowledge of the error distribution. The weighted L1-penalty is used both to ensure the convexity of the penalty term and to ameliorate the bias caused by the L1-penalty. In the setting with dimensionality much larger than the sample size, we establish a strong oracle property of the proposed method that possesses both the model selection consistency and estimation efficiency for the true non-zero coefficients. As specific examples, we introduce a robust method of composite L1-L2, and optimal composite quantile method and evaluate their performance in both simulated and real data examples.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The single-index models (SIMs) provide an efficient way of coping with high-dimensional nonparametric estimation problems and avoid the “curse of dimensionality.”
Abstract: The single-index models (SIMs) provide an efficient way of coping with high-dimensional nonparametric estimation problems and avoid the “curse of dimensionality.” Many existing estimation procedure...

3 citations

Journal ArticleDOI
TL;DR: A new regression method is proposed, called composite kernel quantile regression (CKQR), which uses the sum of multiple check functions as a loss in reproducing kernel Hilbert spaces for the robust estimation of a nonlinear regression function.
Abstract: The composite quantile regression (CQR) has been developed for the robust and efficient estimation of regression coefficients in a liner regression model. By employing the idea of the CQR, we propose a new regression method, called composite kernel quantile regression (CKQR), which uses the sum of multiple check functions as a loss in reproducing kernel Hilbert spaces for the robust estimation of a nonlinear regression function. The numerical results demonstrate the usefulness of the proposed CKQR in estimating both conditional nonlinear mean and quantile functions.

3 citations


Cites methods from "Penalized Composite Quasi-Likelihoo..."

  • ...Bradic et al. (2011) suggested a data-driven method for estimating the optimal weights, and showed that the efficiency of the weighted linear CQR can be significantly improved by employing proper weights. Further, Sun et al. (2013) proposed the weighted local linear CQR for general conditions of the random error....

    [...]

  • ...Bradic et al. (2011) suggested a data-driven method for estimating the optimal weights, and showed that the efficiency of the weighted linear CQR can be significantly improved by employing proper weights....

    [...]

Journal ArticleDOI
TL;DR: In this article, a robust post-selection inference method based on the Huber loss for the regression coefficients, when the error distribution is heavy-tailed and asymmetric in a high-dimensional linear model with an intercept term, was proposed.

3 citations

01 Jan 2013
TL;DR: The methods proposed in this thesis collectively attempt to solve many of the issues arising in high dimensional statistics, from screening to variable selection, and propose a new algorithm that recovers the solution paths for a continuum of regularization parameter values.
Abstract: The aim of this thesis is to develop methods for variable selection and statistical prediction for high dimensional statistical problems. Along with proposing new and innovative procedures, this thesis also focuses on the theoretical properties of the proposed methods and establishes bounds on the statistical error of resulting estimators. The main body of the thesis is divided into three parts. In Chapter 1, a variable screening method for generalized linear models is discussed. The emphasis of the chapter is to provide a procedure to reduce the number of variables in a reliable and fast manner. Then, Chapter 2 considers the linear regression problem in high dimensions when the noise has heavy tails. To perform robust variable selection, a new method, called adaptive robust Lasso, is introduced. Finally, in Chapter 3, the subject is high dimensional classification problems. In this chapter, a robust approach for this problem is proposed and theoretical properties for this approach are established. Overall, the methods proposed in this thesis collectively attempt to solve many of the issues arising in high dimensional statistics, from screening to variable selection. In Chapter 1, we study the variable screening problem for generalized linear models. In many applications, researchers often have some prior knowledge that a certain set of variables is related to the response. In such a situation, a natural assessment on the relative importance of the other predictors is the conditional contributions of the individual predictors in presence of the known set of variables. This results in conditional sure independence screening (CSIS). We propose and study CSIS in the context of generalized linear models. For ultrahigh-dimensional statistical problems, we give conditions under which sure screening is possible and derive an upper bound on the number of selected variables. We also spell out the situation under which CSIS yields model selection consistency. In Chapter 2, we consider the heavy-tailed high dimensional linear regression problem. In the ultra-high dimensional setting, where the dimensionality can grow iii exponentially with the sample size, we investigate the model selection oracle property and establish the asymptotic normality of a quantile regression based method called WR-Lasso. We show that only mild conditions on the model error distribution are needed. Our theoretical results also reveal that adaptive choice of the weight vector is essential for the WR-Lasso to enjoy these nice asymptotic properties. To make the WR-Lasso practically feasible, we propose a two-step procedure, called adaptive robust Lasso (AR-Lasso), in which the weight vector in the second step is constructed based on the L1-penalized quantile regression estimate from the first step. In Chapter 3, we provide an analysis about the issue of measurement errors in high dimensional linear classification problems. For such settings, we propose a new estimator called the robust sparse linear discriminant, that recovers the sparsity signal and adapts to the unknown noise level simultaneously. In contrast to the existing methods, we show that this new method has low risk properties even in the case of measurement errors. Moreover, we propose a new algorithm that recovers the solution paths for a continuum of regularization parameter values.

3 citations


Cites methods from "Penalized Composite Quasi-Likelihoo..."

  • ...In this paper, we introduce the penalized quantile regression with the weighted L1-penalty (WR-Lasso) for robust regularization, as in Bradic et al. (2011). The weights are introduced to reduce the bias problem induced by the L1-penalty. The flexibility of the choice of the weights provides flexibility in shrinkage estimation of the regression coefficient. WR-Lasso shares a similar spirit to the folded-concave penalized quantile-regression (Zou and Li, 2008; Wang et al., 2012), but avoids the nonconvex optimization problem. We establish conditions on the error distribution in order for the WR-Lasso to successfully recover the true underlying sparse model with asymptotic probability one. It turns out that the required condition is much weaker than the sub-Gaussian assumption in Bradic et al. (2011). The only conditions we impose is that the density function of error has Lipschitz property in a neighborhood around 0....

    [...]

  • ...In this paper, we introduce the penalized quantile regression with the weighted L1-penalty (WR-Lasso) for robust regularization, as in Bradic et al. (2011). The weights are introduced to reduce the bias problem induced by the L1-penalty....

    [...]

Journal ArticleDOI
TL;DR: In this article, a varying coefficient quantile regression model is proposed to model the joint distribution of functional variables over their domains and across clinical covariates, and an estimation procedure based on the alternating direction method of multipliers and propagation separation algorithms is proposed.
Abstract: Despite interest in the joint modeling of multiple functional responses such as diffusion properties in neuroimaging, robust statistical methods appropriate for this task are lacking. To address this need, we propose a varying coefficient quantile regression model able to handle bivariate functional responses. Our work supports innovative insights into biomedical data by modeling the joint distribution of functional variables over their domains and across clinical covariates. We propose an estimation procedure based on the alternating direction method of multipliers and propagation separation algorithms to estimate varying coefficients using a B-spline basis and an $L_2$ smoothness penalty that encourages interpretability. A simulation study and an application to a real-world neurodevelopmental data set demonstrates the performance of our model and the insights provided by modeling functional fractional anisotropy and mean diffusivity jointly and their association with gestational age and sex.

3 citations

References
More filters
Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations

Journal ArticleDOI
TL;DR: In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.
Abstract: Variable selection is fundamental to high-dimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of ...

8,314 citations

Journal ArticleDOI
TL;DR: A publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates is described.
Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

7,828 citations


"Penalized Composite Quasi-Likelihoo..." refers background or methods in this paper

  • ...…(16) can be recast as a penalized weighted least square regression argmin β n∑ i=1 w1∣∣∣Yi −XTi β̂ (0) ∣∣∣ + w2 ( Yi −XTi β )2 + n p∑ j=1 γλ(|β(0)j |)|βj | which can be efficiently solved by pathwise coordinate optimization (Friedman et al., 2008) or least angle regression (Efron et al., 2004)....

    [...]

  • ...) are all nonnegative. This class of problems can be solved with fast and efficient computational algorithms such as pathwise coordinate optimization (Friedman et al., 2008) and least angle regression (Efron et al., 2004). One particular example is the combination of L 1 and L 2 regressions, in which K= 2, ρ 1(t) = |t−b 0|andρ 2(t) = t2. Here b 0 denotes themedian of error distributionε. Iftheerror distribution is sym...

    [...]

  • ...i=1 w 1 Yi −XT i βˆ (0) +w 2 Yi −XT i β 2 +n Xp j=1 γλ(|β (0) j |)|βj| which can be efficiently solved by pathwise coordinate optimization (Friedman et al., 2008) or least angle regression (Efron et al., 2004). If b 0 6= 0, the penalized least-squares problem ( 16) is somewhat different from (5) since we have an additional parameter b 0. Using the same arguments, and treating b 0 as an additional parameter ...

    [...]

  • ...This class of problems can be solved with fast and efficient computational algorithms such as pathwise coordinate optimization (Friedman et al., 2008) and least angle regression (Efron et al., 2004)....

    [...]

Journal ArticleDOI
Hui Zou1
TL;DR: A new version of the lasso is proposed, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the ℓ1 penalty, and the nonnegative garotte is shown to be consistent for variable selection.
Abstract: The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a necessary condition for the lasso variable selection to be consistent. Consequently, there exist certain scenarios where the lasso is inconsistent for variable selection. We then propose a new version of the lasso, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the l1 penalty. We show that the adaptive lasso enjoys the oracle properties; namely, it performs as well as if the true underlying model were given in advance. Similar to the lasso, the adaptive lasso is shown to be near-minimax optimal. Furthermore, the adaptive lasso can be solved by the same efficient algorithm for solving the lasso. We also discuss the extension of the adaptive lasso in generalized linear models and show that the oracle properties still hold under mild regularity conditions. As a bypro...

6,765 citations

Journal ArticleDOI
TL;DR: In this article, a new approach toward a theory of robust estimation is presented, which treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits estimators that are asyptotically most robust (in a sense to be specified) among all translation invariant estimators.
Abstract: This paper contains a new approach toward a theory of robust estimation; it treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits estimators—intermediaries between sample mean and sample median—that are asymptotically most robust (in a sense to be specified) among all translation invariant estimators. For the general background, see Tukey (1960) (p. 448 ff.)

5,628 citations