scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection

TL;DR: A data‐driven weighted linear combination of convex loss functions, together with weighted L1‐penalty is proposed and established a strong oracle property of the method proposed that has both the model selection consistency and estimation efficiency for the true non‐zero coefficients.
Abstract: In high-dimensional model selection problems, penalized least-square approaches have been extensively used. This paper addresses the question of both robustness and efficiency of penalized model selection methods, and proposes a data-driven weighted linear combination of convex loss functions, together with weighted L1-penalty. It is completely data-adaptive and does not require prior knowledge of the error distribution. The weighted L1-penalty is used both to ensure the convexity of the penalty term and to ameliorate the bias caused by the L1-penalty. In the setting with dimensionality much larger than the sample size, we establish a strong oracle property of the proposed method that possesses both the model selection consistency and estimation efficiency for the true non-zero coefficients. As specific examples, we introduce a robust method of composite L1-L2, and optimal composite quantile method and evaluate their performance in both simulated and real data examples.

Content maybe subject to copyright    Report

Citations
More filters
Journal Article
TL;DR: In this paper, a brief account of the recent developments of theory, methods, and implementations for high-dimensional variable selection is presented, with emphasis on independence screening and two-scale methods.
Abstract: High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods.

892 citations

Journal ArticleDOI
TL;DR: This work proposes adaptive penalization methods for variable selection in the semiparametric varying-coefficient partially linear model and proves that the methods possess the oracle property.
Abstract: The complexity of semiparametric models poses new challenges to statistical inference and model selection that frequently arise from real applications In this work, we propose new estimation and variable selection procedures for the semiparametric varying-coefficient partially linear model We first study quantile regression estimates for the nonparametric varying-coefficient functions and the parametric regression coefficients To achieve nice efficiency properties, we further develop a semiparametric composite quantile regression procedure We establish the asymptotic normality of proposed estimators for both the parametric and nonparametric parts and show that the estimators achieve the best convergence rate Moreover, we show that the proposed method is much more efficient than the least-squares-based method for many non-normal errors and that it only loses a small amount of efficiency for normal errors In addition, it is shown that the loss in efficiency is at most 111% for estimating varying coefficient functions and is no greater than 136% for estimating parametric components To achieve sparsity with high-dimensional covariates, we propose adaptive penalization methods for variable selection in the semiparametric varying-coefficient partially linear model and prove that the methods possess the oracle property Extensive Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed procedures Finally, we apply the new methods to analyze the plasma beta-carotene level data

265 citations


Cites methods from "Penalized Composite Quasi-Likelihoo..."

  • ...Moreover, we show that the proposed method is much more efficient than the least-squares-based method for many non-normal errors and that it only loses a small amount of efficiency for normal errors....

    [...]

Journal ArticleDOI
TL;DR: This paper reviews the literature on sparse high dimensional models and discusses some applications in economics and finance, including variable selection methods that are proved to be effective in high dimensional sparse modeling.
Abstract: This article reviews the literature on sparse high-dimensional models and discusses some applications in economics and finance. Recent developments in theory, methods, and implementations in penalized least-squares and penalized likelihood methods are highlighted. These variable selection methods are effective in sparse high-dimensional modeling. The limits of dimensionality that regularization methods can handle, the role of penalty functions, and their statistical properties are detailed. Some recent advances in sparse ultra-high-dimensional modeling are also briefly discussed.

228 citations

Journal ArticleDOI
TL;DR: In this article, a principal factor approximation (PFA) based method was proposed to solve the problem of false discovery control in large-scale multiple hypothesis testing, where a common threshold is used and a consistent estimate of realized FDP is provided.
Abstract: Multiple hypothesis testing is a fundamental problem in high-dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any single-nucleotide polymorphisms (SNPs) are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In this article, we propose a novel method—based on principal factor approximation—that successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive an approximate expression for false discovery proportion (FDP) in large-scale multiple testing when a common threshold is used and provide a consistent estimate of realized FDP. This result has important applications in controlling false discovery rate and FDP. Our estimate of realized FDP compares favorably with Efr...

199 citations

Posted Content
TL;DR: An approximate expression for false discovery proportion (FDP) in large-scale multiple testing when a common threshold is used and a consistent estimate of realized FDP is provided, which has important applications in controlling false discovery rate and FDP.
Abstract: Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any SNPs are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In the current paper, we propose a novel method based on principal factor approximation, which successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive an approximate expression for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provide a consistent estimate of realized FDP. This result has important applications in controlling FDR and FDP. Our estimate of realized FDP compares favorably with Efron (2007)'s approach, as demonstrated in the simulated examples. Our approach is further illustrated by some real data applications. We also propose a dependence-adjusted procedure, which is more powerful than the fixed threshold procedure.

152 citations

References
More filters
Book
14 Mar 1996
TL;DR: In this article, the authors define the Ball Sigma-Field and Measurability of Suprema and show that it is possible to achieve convergence almost surely and in probability.
Abstract: 1.1. Introduction.- 1.2. Outer Integrals and Measurable Majorants.- 1.3. Weak Convergence.- 1.4. Product Spaces.- 1.5. Spaces of Bounded Functions.- 1.6. Spaces of Locally Bounded Functions.- 1.7. The Ball Sigma-Field and Measurability of Suprema.- 1.8. Hilbert Spaces.- 1.9. Convergence: Almost Surely and in Probability.- 1.10. Convergence: Weak, Almost Uniform, and in Probability.- 1.11. Refinements.- 1.12. Uniformity and Metrization.- 2.1. Introduction.- 2.2. Maximal Inequalities and Covering Numbers.- 2.3. Symmetrization and Measurability.- 2.4. Glivenko-Cantelli Theorems.- 2.5. Donsker Theorems.- 2.6. Uniform Entropy Numbers.- 2.7. Bracketing Numbers.- 2.8. Uniformity in the Underlying Distribution.- 2.9. Multiplier Central Limit Theorems.- 2.10. Permanence of the Donsker Property.- 2.11. The Central Limit Theorem for Processes.- 2.12. Partial-Sum Processes.- 2.13. Other Donsker Classes.- 2.14. Tail Bounds.- 3.1. Introduction.- 3.2. M-Estimators.- 3.3. Z-Estimators.- 3.4. Rates of Convergence.- 3.5. Random Sample Size, Poissonization and Kac Processes.- 3.6. The Bootstrap.- 3.7. The Two-Sample Problem.- 3.8. Independence Empirical Processes.- 3.9. The Delta-Method.- 3.10. Contiguity.- 3.11. Convolution and Minimax Theorems.- A. Appendix.- A.1. Inequalities.- A.2. Gaussian Processes.- A.2.1. Inequalities and Gaussian Comparison.- A.2.2. Exponential Bounds.- A.2.3. Majorizing Measures.- A.2.4. Further Results.- A.3. Rademacher Processes.- A.4. Isoperimetric Inequalities for Product Measures.- A.5. Some Limit Theorems.- A.6. More Inequalities.- A.6.1. Binomial Random Variables.- A.6.2. Multinomial Random Vectors.- A.6.3. Rademacher Sums.- Notes.- References.- Author Index.- List of Symbols.

5,231 citations

BookDOI
TL;DR: This chapter discusses Convergence: Weak, Almost Uniform, and in Probability, which focuses on the part of Convergence of the Donsker Property which is concerned with Uniformity and Metrization.
Abstract: 1.1. Introduction.- 1.2. Outer Integrals and Measurable Majorants.- 1.3. Weak Convergence.- 1.4. Product Spaces.- 1.5. Spaces of Bounded Functions.- 1.6. Spaces of Locally Bounded Functions.- 1.7. The Ball Sigma-Field and Measurability of Suprema.- 1.8. Hilbert Spaces.- 1.9. Convergence: Almost Surely and in Probability.- 1.10. Convergence: Weak, Almost Uniform, and in Probability.- 1.11. Refinements.- 1.12. Uniformity and Metrization.- 2.1. Introduction.- 2.2. Maximal Inequalities and Covering Numbers.- 2.3. Symmetrization and Measurability.- 2.4. Glivenko-Cantelli Theorems.- 2.5. Donsker Theorems.- 2.6. Uniform Entropy Numbers.- 2.7. Bracketing Numbers.- 2.8. Uniformity in the Underlying Distribution.- 2.9. Multiplier Central Limit Theorems.- 2.10. Permanence of the Donsker Property.- 2.11. The Central Limit Theorem for Processes.- 2.12. Partial-Sum Processes.- 2.13. Other Donsker Classes.- 2.14. Tail Bounds.- 3.1. Introduction.- 3.2. M-Estimators.- 3.3. Z-Estimators.- 3.4. Rates of Convergence.- 3.5. Random Sample Size, Poissonization and Kac Processes.- 3.6. The Bootstrap.- 3.7. The Two-Sample Problem.- 3.8. Independence Empirical Processes.- 3.9. The Delta-Method.- 3.10. Contiguity.- 3.11. Convolution and Minimax Theorems.- A. Appendix.- A.1. Inequalities.- A.2. Gaussian Processes.- A.2.1. Inequalities and Gaussian Comparison.- A.2.2. Exponential Bounds.- A.2.3. Majorizing Measures.- A.2.4. Further Results.- A.3. Rademacher Processes.- A.4. Isoperimetric Inequalities for Product Measures.- A.5. Some Limit Theorems.- A.6. More Inequalities.- A.6.1. Binomial Random Variables.- A.6.2. Multinomial Random Vectors.- A.6.3. Rademacher Sums.- Notes.- References.- Author Index.- List of Symbols.

4,600 citations

Book
01 Jan 1950
TL;DR: In this paper, the authors present an approach for estimating the average risk of a risk-optimal risk maximization algorithm for a set of risk-maximization objectives, including maximalaxity and admissibility.
Abstract: Preface to the Second Edition.- Preface to the First Edition.- List of Tables.- List of Figures.- List of Examples.- Table of Notation.- Preparations.- Unbiasedness.- Equivariance.- Average Risk Optimality.- Minimaxity and Admissibility.- Asymptotic Optimality.- References.- Author Index.- Subject Index.

4,382 citations

01 Jan 1980

3,652 citations


"Penalized Composite Quasi-Likelihoo..." refers background or methods or result in this paper

  • ...2 Penalized composite quantile regression Weighted composite quantile regression (WCQR) was first studied by Koenker (1984) in a classical statistical inference setting. Zou and Yuan (2008) used equally weighted composite quantile regression (EWCQR) for penalized model selection with p large but fixed....

    [...]

  • ...2 Penalized composite quantile regression Weighted composite quantile regression (WCQR) was first studied by Koenker (1984) in a classical statistical inference setting....

    [...]

  • ...This term is identical to the situation that was dealt with by Portnoy (1985). Using his result, the second conclusion of theorem 2 follows....

    [...]

  • ...Particular focus will be given to the oracle property of Fan and Li (2001), but we shall strengthen it and prove that estimator (5) is an oracle estimator with overwhelming probability. Fan and Lv (2010) were among the first to discuss the oracle properties with nonpolynomial dimensionality by using the full likelihood function in generalized linear models with a class of folded concave penalties....

    [...]

  • ...This method is particularly computationally efficient in ultrahigh dimensional problems and here we retained the top 100 SNPs with the largest F -statistics. In the second step, we applied to the screened data the penalized L2- and L1-regression, L1–L + 2 , L1–L2, EWCQR, WCQR + and WCQR with local linear approximation of the SCAD penalty. All the four composite quantile regressions used quantiles at .10%, . . . , 90%/. The lasso was used as the initial estimator and the tuning parameter in both the lasso and the SCAD penalty was chosen by fivefold cross-validation. In all three populations, the L1–L2- and L1–L + 2 -regressions reduced to L2regression. This is not unexpected owing to the gene expression normalization procedure. In addition, WCQR reduced to WCQR+. The selected SNPs, their coefficients and distances from the TSS are summarized in Tables 5–7. In the Asian population (Table 5), the five methods are reasonably consistent in not only variables selection but also coefficient estimation (in terms of signs and order of magnitude). WCQR uses the weights .0:19, 0:11, 0:02, 0, 0:12, 0:09, 0:18, 0:19, 0:10/. There are four SNPs which were chosen by all five methods. Two of them, rs2832159 and rs2245431, up-regulate gene expression, whereas rs9981984 and rs16981663 down-regulate gene expression. EWCQR selects the largest set of SNPs, whereas L1-regression selects the smallest set. In the CEPH population (Table 6), all five methods consistently selected the same seven SNPs with only EWCQR choosing two additional SNPs. WCQR uses the weight .0:19, 0:21, 0, 0:04, 0:03, 0:07, 0:1, 0:21, 0:15/. The coefficient estimations were also highly consistent. Deutsch et al. (2005) performed a similar cis-EQTL mapping for gene CCT8 using the same CEPH data as here....

    [...]

Journal ArticleDOI
TL;DR: In many important statistical applications, the number of variables or parameters p is much larger than the total number of observations n as discussed by the authors, and it is possible to estimate β reliably based on the noisy data y.
Abstract: In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y=Xβ+z, where β∈Rp is a parameter vector of interest, X is a data matrix with possibly far fewer rows than columns, n≪p, and the zi’s are i.i.d. N(0, σ^2). Is it possible to estimate β reliably based on the noisy data y?

3,539 citations