scispace - formally typeset
Search or ask a question

Showing papers by "Runze Li published in 2019"


Journal ArticleDOI
TL;DR: This paper proposes constrained partial regularization method and introduces an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints, and shows that the limiting null distributions of these three test statistics are χ2 distribution with the same degrees of freedom, and under local alternatives, they asymptotically follow non-central χ1 distribution.
Abstract: This paper is concerned with testing linear hypotheses in high dimensional generalized linear models. To deal with linear hypotheses, we first propose the constrained partial regularization method and study its statistical properties. We further introduce an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints. To test linear hypotheses, we propose a partial penalized likelihood ratio test, a partial penalized score test and a partial penalized Wald test. We show that the limiting null distributions of these three test statistics are $\chi^{2}$ distribution with the same degrees of freedom, and under local alternatives, they asymptotically follow noncentral $\chi^{2}$ distributions with the same degrees of freedom and noncentral parameter, provided the number of parameters involved in the test hypothesis grows to $\infty$ at a certain rate. Simulation studies are conducted to examine the finite sample performance of the proposed tests. Empirical analysis of a real data example is used to illustrate the proposed testing procedures.

26 citations


Journal ArticleDOI
TL;DR: This paper deals with the detection and identification of changepoints among covariances of high-dimensional longitudinal data, where the number of features is greater than both the sample size and the numberof repeated measurements.
Abstract: This paper deals with the detection and identification of changepoints among covariances of high-dimensional longitudinal data, where the number of features is greater than both the sample size and the number of repeated measurements. The proposed methods are applicable under general temporal-spatial dependence. A new test statistic is introduced for changepoint detection, and its asymptotic distribution is established. If a changepoint is detected, an estimate of the location is provided. The rate of convergence of the estimator is shown to depend on the data dimension, sample size, and signal-to-noise ratio. Binary segmentation is used to estimate the locations of possibly multiple changepoints, and the corresponding estimator is shown to be consistent under mild conditions. Simulation studies provide the empirical size and power of the proposed test and the accuracy of the changepoint estimator. An application to a time-course microarray dataset identifies gene sets with significant gene interaction changes over time.

21 citations


Journal ArticleDOI
TL;DR: The numerical comparison implies that the proposed tests outperform existing ones in terms of controlling Type I error rate and power, and the simulation indicates that the test based on quadratic loss seems to have better power than the testbased on entropy loss.
Abstract: This paper is concerned with test of significance on high dimensional covariance structures, and aims to develop a unified framework for testing commonly-used linear covariance structures. We first construct a consistent estimator for parameters involved in the linear covariance structure, and then develop two tests for the linear covariance structures based on entropy loss and quadratic loss used for covariance matrix estimation. To study the asymptotic properties of the proposed tests, we study related high dimensional random matrix theory, and establish several highly useful asymptotic results. With the aid of these asymptotic results, we derive the limiting distributions of these two tests under the null and alternative hypotheses. We further show that the quadratic loss based test is asymptotically unbiased. We conduct Monte Carlo simulation study to examine the finite sample performance of the two tests. Our simulation results show that the limiting null distributions approximate their null distributions quite well, and the corresponding asymptotic critical values keep Type I error rate very well. Our numerical comparison implies that the proposed tests outperform existing ones in terms of controlling Type I error rate and power. Our simulation indicates that the test based on quadratic loss seems to have better power than the test based on entropy loss.

20 citations


Journal ArticleDOI
TL;DR: This paper is interested in identifying the influential users in a social network under the framework of NAM, and considers the autoregression model that allows to have a heterogenous and sparse network effect coefficients.

19 citations


Journal ArticleDOI
TL;DR: A model-free and data-adaptive feature screening method for ultrahigh-dimensional data based on the projection correlation which measures the dependence between two random vectors that enjoys both sure screening and rank consistency properties under weak assumptions.
Abstract: This paper proposes a model-free and data-adaptive feature screening method for ultra-high dimensional datasets. The proposed method is based on the projection correlation which measures the dependence between two random vectors. This projection correlation based method does not require specifying a regression model and applies to the data in the presence of heavy-tailed errors and multivariate response. It enjoys both sure screening and rank consistency properties under weak assumptions. Further, a two-step approach is proposed to control the false discovery rate (FDR) in feature screening with the help of knockoff features. It can be shown that the proposed two-step approach enjoys both sure screening and FDR control if the pre-specified FDR level $\alpha$ is greater or equal to $1/s$, where $s$ is the number of active features. The superior empirical performance of the proposed methods is justified by various numerical experiments and real data applications.

19 citations


Journal ArticleDOI
TL;DR: It is shown that, if an FCP-regularized SAA formulation is solved locally, then the required number of samples can be significantly reduced in approximating the global solution of a convex SP: the sample size is only required to be poly-logarithmic in the number of dimensions.
Abstract: The theory on the traditional sample average approximation (SAA) scheme for stochastic programming (SP) dictates that the number of samples should be polynomial in the number of problem dimensions in order to ensure proper optimization accuracy. In this paper, we study a modification to the SAA in the scenario where the global minimizer is either sparse or can be approximated by a sparse solution. By making use of a regularization penalty referred to as the folded concave penalty (FCP), we show that, if an FCP-regularized SAA formulation is solved locally, then the required number of samples can be significantly reduced in approximating the global solution of a convex SP: the sample size is only required to be poly-logarithmic in the number of dimensions. The efficacy of the FCP regularizer for nonconvex SPs is also discussed. As an immediate implication of our result, a flexible class of folded concave penalized sparse M-estimators in high-dimensional statistical learning may yield a sound performance even when the problem dimension cannot be upper-bounded by any polynomial function of the sample size.

15 citations


Posted Content
TL;DR: In this paper, the central limit theorem was established for the linear spectral statistics of the Kendall's rank correlation matrices under the Marchenko-Pastur asymptotic regime, in which the dimension diverges to infinity proportionally with the sample size.
Abstract: This paper is concerned with the limiting spectral behaviors of large dimensional Kendall's rank correlation matrices generated by samples with independent and continuous components. We do not require the components to be identically distributed, and do not need any moment conditions, which is very different from the assumptions imposed in the literature of random matrix theory. The statistical setting in this paper covers a wide range of highly skewed and heavy-tailed distributions. We establish the central limit theorem (CLT) for the linear spectral statistics of the Kendall's rank correlation matrices under the Marchenko-Pastur asymptotic regime, in which the dimension diverges to infinity proportionally with the sample size. We further propose three nonparametric procedures for high dimensional independent test and their limiting null distributions are derived by implementing this CLT. Our numerical comparisons demonstrate the robustness and superiority of our proposed test statistics under various mixed and heavy-tailed cases.

11 citations


Journal ArticleDOI
TL;DR: This paper treats the variance function using B-spline approximation and proposes a semiparametric estimator based on efficient score functions to deal with the heteroscedasticity of the measurement error, which is consistent and enjoys good inference properties.

8 citations


Journal ArticleDOI
TL;DR: Practical guidelines are provided, based on experience with scientific applications, for helping practitioners interpret their results and illustrate these ideas using data from a smoking cessation study.
Abstract: Researchers are sometimes interested in predicting a distal or external outcome (such as smoking cessation at follow-up) from the trajectory of an intensively recorded longitudinal variable (such as urge to smoke). This can be done in a semiparametric way via scalar-on-function regression. However, the resulting fitted coefficient regression function requires special care for correct interpretation, as it represents the joint relationship of time points to the outcome, rather than a marginal or cross-sectional relationship. We provide practical guidelines, based on experience with scientific applications, for helping practitioners interpret their results and illustrate these ideas using data from a smoking cessation study.

8 citations


Journal ArticleDOI
TL;DR: The findings of the real data example imply that daily assessments are particularly beneficial for characterizing more variable behaviors like alcohol use, whereas weekly assessments may be sufficient for low variation events such as marijuana use.

6 citations


Posted Content
TL;DR: In this article, the authors propose a distributed screening framework for big data setup, which expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments.
Abstract: Feature screening is a powerful tool in the analysis of high dimensional data. When the sample size $N$ and the number of features $p$ are both large, the implementation of classic screening methods can be numerically challenging. In this paper, we propose a distributed screening framework for big data setup. In the spirit of "divide-and-conquer", the proposed framework expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments. With the component estimates aggregated, we obtain a final correlation estimate that can be readily used for screening features. This framework enables distributed storage and parallel computing and thus is computationally attractive. Due to the unbiased distributive estimation of the component parameters, the final aggregated estimate achieves a high accuracy that is insensitive to the number of data segments $m$ specified by the problem itself or to be chosen by users. Under mild conditions, we show that the aggregated correlation estimator is as efficient as the classic centralized estimator in terms of the probability convergence bound; the corresponding screening procedure enjoys sure screening property for a wide range of correlation measures. The promising performances of the new method are supported by extensive numerical examples.

Posted Content
TL;DR: In this article, a constrained partial regularization method was proposed to test linear hypotheses in high dimensional generalized linear models, and an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints was introduced.
Abstract: This paper is concerned with testing linear hypotheses in high dimensional generalized linear models. To deal with linear hypotheses, we first propose the constrained partial regularization method and study its statistical properties. We further introduce an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints. To test linear hypotheses, we propose a partial penalized likelihood ratio test, a partial penalized score test and a partial penalized Wald test. We show that the limiting null distributions of these three test statistics are χ2 distribution with the same degrees of freedom, and under local alternatives, they asymptotically follow noncentral χ2 distributions with the same degrees of freedom and noncentral parameter, provided the number of parameters involved in the test hypothesis grows to ∞ at a certain rate. Simulation studies are conducted to examine the finite sample performance of the proposed tests. Empirical analysis of a real data example is used to illustrate the proposed testing procedures.

Journal ArticleDOI
TL;DR: This paper proposes an effective algorithm and establishes its ascent property, and proves that the proposed procedure possesses the sure screening property, which is, with probability tending to 1, the selected variable set includes the actual active predictors.

Journal ArticleDOI
TL;DR: Healthcare expenditures increased significantly following the PA autism mandate, with nonexempt, large employer groups had the largest increase in spending.