scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection

TL;DR: A data‐driven weighted linear combination of convex loss functions, together with weighted L1‐penalty is proposed and established a strong oracle property of the method proposed that has both the model selection consistency and estimation efficiency for the true non‐zero coefficients.
Abstract: In high-dimensional model selection problems, penalized least-square approaches have been extensively used. This paper addresses the question of both robustness and efficiency of penalized model selection methods, and proposes a data-driven weighted linear combination of convex loss functions, together with weighted L1-penalty. It is completely data-adaptive and does not require prior knowledge of the error distribution. The weighted L1-penalty is used both to ensure the convexity of the penalty term and to ameliorate the bias caused by the L1-penalty. In the setting with dimensionality much larger than the sample size, we establish a strong oracle property of the proposed method that possesses both the model selection consistency and estimation efficiency for the true non-zero coefficients. As specific examples, we introduce a robust method of composite L1-L2, and optimal composite quantile method and evaluate their performance in both simulated and real data examples.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper , a regularized estimator under the Lq-loss combined with a weighted penalty function is proposed and its properties, such as the model-selection oracle property and asymptotic normality, are studied.

1 citations

Posted Content
TL;DR: A robust variable selection procedure using a divergence based M-estimator combined with a penalty function that produces robust estimates of the regression parameters and simultaneously selects the important explanatory variables is proposed.
Abstract: We propose a robust variable selection procedure using a divergence based M-estimator combined with a penalty function. It produces robust estimates of the regression parameters and simultaneously selects the important explanatory variables. An efficient algorithm based on the quadratic approximation of the estimating equation is constructed. The asymptotic distribution and the influence function of the regression coefficients are derived. The widely used model selection procedures based on the Mallows's $C_p$ statistic and Akaike information criterion (AIC) often show very poor performance in the presence of heavy-tailed error or outliers. For this purpose, we introduce robust versions of these information criteria based on our proposed method. The simulation studies show that the robust variable selection technique outperforms the classical likelihood-based techniques in the presence of outliers. The performance of the proposed method is also explored through the real data analysis.

1 citations


Cites methods from "Penalized Composite Quasi-Likelihoo..."

  • ...Fan et al. (2014), Bradic et al. (2011) introduced the penalized quantile regression with the weighted L1-penalty for robust regularization....

    [...]

DissertationDOI
01 Jan 2016
TL;DR: Gillen et al. as discussed by the authors proposed a data-driven approach to the problem of variable selection in econometric models of discrete choice estimated using aggregate data, and applied penalized estimation algorithms imported from the machine learning literature along with confidence intervals that are robust to variable selection.
Abstract: This dissertation comprises three essays in Econometrics and Political Economy offering both methodological and substantive contributions to the study of electoral coalitions (Chapter 2), the effectiveness of campaign expenditures (Chapter 3), and the general practice of experimentation (Chapter 4). Chapter 2 presents an empirical investigation of coalition formation in elections. Despite its prevalence in most democracies, there is little evidence documenting the impact of electoral coalition formation on election outcomes. To address this imbalance, I develop and estimate a structural model of electoral competition that enables me to conduct counterfactual analyses of election outcomes under alternative coalitional scenarios. The results uncover substantial equilibrium savings in campaign expenditures from coalition formation, as well as significant electoral gains benefitting electorally weaker partners. Chapter 3, co-authored with Benjamin J. Gillen, Hyungsik Roger Moon, and Matthew Shum, proposes a novel data-driven approach to the problem of variable selection in econometric models of discrete choice estimated using aggregate data. Our approach applies penalized estimation algorithms imported from the machine learning literature along with confidence intervals that are robust to variable selection. We illustrate our approach with an application that explores the effect of campaign expenditures on candidate vote shares in data from Mexican elections. Chapter 4, co-authored with Abhijit Banerjee, Sylvain Chassang, and Erik Snowberg, provides a decision-theoretic framework in which to study the question of optimal experiment design. We model experimenters as ambiguity-averse decision makers who trade off their own subjective expected payoff against that of an adversarial audience. We establish that ambiguity aversion is required for randomized controlled trials to be optimal. We also use this framework to shed light on the important practical questions of rerandomization and resampling.

1 citations


Additional excerpts

  • ...Fan and Li (2001), Zou and Li (2008), Bradic et al. (2011), and Fan and Lv (2011) propose methods for analyzing models defined by quasi-likelihood....

    [...]

Posted Content
TL;DR: In this article, the authors define mean absolute correlation (MAC) to measure the overall dependence level and investigate a family of estimators for their performances in the full range of MAC.
Abstract: Estimating the proportion of signals hidden in a large amount of noise variables is of interest in many scientific inquires. In this paper, we consider realistic but theoretically challenging settings with arbitrary covariance dependence between variables. We define mean absolute correlation (MAC) to measure the overall dependence level and investigate a family of estimators for their performances in the full range of MAC. We explicit the joint effect of MAC dependence and signal sparsity on the performances of the family of estimators and discover that no single estimator in the family is most powerful under different MAC dependence levels. Informed by the theoretical insight, we propose a new estimator to better adapt to arbitrary covariance dependence. The proposed method compares favorably to several existing methods in extensive finite-sample settings with strong to weak covariance dependence and real dependence structures from genetic association studies.

1 citations

Posted Content
TL;DR: A new variable selection procedure is developed to control over-selection of the noise variables ranking after the last relevant variable, and, at the same time, retain a high proportion of relevant variables ranking before the first noise variable.
Abstract: Among the most popular variable selection procedures in high-dimensional regression, Lasso provides a solution path to rank the variables and determines a cut-off position on the path to select variables and estimate coefficients. In this paper, we consider variable selection from a new perspective motivated by the frequently occurred phenomenon that relevant variables are not completely distinguishable from noise variables on the solution path. We propose to characterize the positions of the first noise variable and the last relevant variable on the path. We then develop a new variable selection procedure to control over-selection of the noise variables ranking after the last relevant variable, and, at the same time, retain a high proportion of relevant variables ranking before the first noise variable. Our procedure utilizes the recently developed covariance test statistic and Q statistic in post-selection inference. In numerical examples, our method compares favorably with other existing methods in selection accuracy and the ability to interpret its results.

1 citations


Cites background or methods from "Penalized Composite Quasi-Likelihoo..."

  • ...Knots are denoted by λ1 ≥ λ2 ≥ · · · ≥ λm ≥ 0, where m = min(n − 1, p) is the length of the solution path (Efron et al., 2004). Recent developments in high-dimensional regression focus on hypothesis testing for variable selection. Impressive progress has been made in Zhang & Zhang (2014), Van De Geer et al....

    [...]

  • ...Bradic et al. (2011) studied the same samples for eQTL mapping but only focused on cis-eQTLs....

    [...]

  • ...Table 4 reports the locations on the solution path of the variables identified in Bradic et al. (2011)....

    [...]

  • ...Bradic et al. (2011) studied the same samples for eQTL mapping but only focused on cis-eQTLs. Therefore, the numbers of SNPs included in their analysis are much smaller with p = 1955, 1978, 2146 for the three populations, respectively. More SNP variables are identified in Bradic et al. (2011) for each population due to larger ratio of sample size to dimension....

    [...]

  • ...Knots are denoted by λ1 ≥ λ2 ≥ · · · ≥ λm ≥ 0, where m = min(n − 1, p) is the length of the solution path (Efron et al., 2004). Recent developments in high-dimensional regression focus on hypothesis testing for variable selection. Impressive progress has been made in Zhang & Zhang (2014), Van De Geer et al. (2014), Lockhart et al. (2014), Barber & Candès (2015), Bogdan et al....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations

Journal ArticleDOI
TL;DR: In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.
Abstract: Variable selection is fundamental to high-dimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of ...

8,314 citations

Journal ArticleDOI
TL;DR: A publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates is described.
Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

7,828 citations


"Penalized Composite Quasi-Likelihoo..." refers background or methods in this paper

  • ...…(16) can be recast as a penalized weighted least square regression argmin β n∑ i=1 w1∣∣∣Yi −XTi β̂ (0) ∣∣∣ + w2 ( Yi −XTi β )2 + n p∑ j=1 γλ(|β(0)j |)|βj | which can be efficiently solved by pathwise coordinate optimization (Friedman et al., 2008) or least angle regression (Efron et al., 2004)....

    [...]

  • ...) are all nonnegative. This class of problems can be solved with fast and efficient computational algorithms such as pathwise coordinate optimization (Friedman et al., 2008) and least angle regression (Efron et al., 2004). One particular example is the combination of L 1 and L 2 regressions, in which K= 2, ρ 1(t) = |t−b 0|andρ 2(t) = t2. Here b 0 denotes themedian of error distributionε. Iftheerror distribution is sym...

    [...]

  • ...i=1 w 1 Yi −XT i βˆ (0) +w 2 Yi −XT i β 2 +n Xp j=1 γλ(|β (0) j |)|βj| which can be efficiently solved by pathwise coordinate optimization (Friedman et al., 2008) or least angle regression (Efron et al., 2004). If b 0 6= 0, the penalized least-squares problem ( 16) is somewhat different from (5) since we have an additional parameter b 0. Using the same arguments, and treating b 0 as an additional parameter ...

    [...]

  • ...This class of problems can be solved with fast and efficient computational algorithms such as pathwise coordinate optimization (Friedman et al., 2008) and least angle regression (Efron et al., 2004)....

    [...]

Journal ArticleDOI
Hui Zou1
TL;DR: A new version of the lasso is proposed, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the ℓ1 penalty, and the nonnegative garotte is shown to be consistent for variable selection.
Abstract: The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a necessary condition for the lasso variable selection to be consistent. Consequently, there exist certain scenarios where the lasso is inconsistent for variable selection. We then propose a new version of the lasso, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the l1 penalty. We show that the adaptive lasso enjoys the oracle properties; namely, it performs as well as if the true underlying model were given in advance. Similar to the lasso, the adaptive lasso is shown to be near-minimax optimal. Furthermore, the adaptive lasso can be solved by the same efficient algorithm for solving the lasso. We also discuss the extension of the adaptive lasso in generalized linear models and show that the oracle properties still hold under mild regularity conditions. As a bypro...

6,765 citations

Journal ArticleDOI
TL;DR: In this article, a new approach toward a theory of robust estimation is presented, which treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits estimators that are asyptotically most robust (in a sense to be specified) among all translation invariant estimators.
Abstract: This paper contains a new approach toward a theory of robust estimation; it treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits estimators—intermediaries between sample mean and sample median—that are asymptotically most robust (in a sense to be specified) among all translation invariant estimators. For the general background, see Tukey (1960) (p. 448 ff.)

5,628 citations