scispace - formally typeset
Search or ask a question

Showing papers by "Runze Li published in 2021"


Journal ArticleDOI
TL;DR: A new estimation and valid inference method for single or low-dimensional regression coefficients in high-dimensional generalized linear models and it is proved the proposed CI is asymptotically narrower than the CIs constructed based on the desparsified Lasso estimator and the decorrelated score statistic.
Abstract: In this article, we develop a new estimation and valid inference method for single or low-dimensional regression coefficients in high-dimensional generalized linear models. The number of the predic...

23 citations



Journal ArticleDOI
TL;DR: For a high-dimensional linear model with a finite number of covariates measured with error, statistical inference on the parameters associated with the error-prone covariates is studied, and a new corrected decorrelated score test and the corresponding one-step estimator are proposed.

10 citations


Journal ArticleDOI
TL;DR: In this paper, the authors give rise to ultra-high-dimensional supervised problems with sparse signals; that is, a limited number of observations (n), each of which can be used to solve the problem.
Abstract: Contemporary high-throughput experimental and surveying techniques give rise to ultrahigh-dimensional supervised problems with sparse signals; that is, a limited number of observations (n), each wi...

10 citations


Journal ArticleDOI
TL;DR: The results illustrate the importance of examining not only negative affect but also positive affect in order to fully understand the association between emotion dynamics and cigarette dependence and call for future research to compare cigarettes and e-cigarettes in terms of their effects on emotion regulation.

9 citations


Journal ArticleDOI
TL;DR: In this paper, the Smoothly Clipped Absolute Deviation penalty (SCAD) was adopted to select important indices for dynamic patterns of consumption or craving and estimate their associations with e-cigarette dependence scales.
Abstract: Introduction Existing e-cigarette dependence scales are mainly validated based on retrospective overall consumption or perception. Further, given that the majority of adult e-cigarette users also use combustible cigarettes, it is important to determine whether e-cigarette dependence scales capture the product-specific dependence. This study fills in the current knowledge gaps by validating e-cigarette dependence scales using novel indices of dynamic patterns of e-cigarette use behaviors and examining the association between dynamic patterns of smoking and e-cigarette dependence among dual users. Methods Secondary analysis was conducted on the 2-week ecological momentary assessment data from 116 dual users. The Smoothly Clipped Absolute Deviation penalty (SCAD) was adopted to select important indices for dynamic patterns of consumption or craving and estimate their associations with e-cigarette dependence scales. Results The fitted linear regression models support the hypothesis that higher e-cigarette dependence is associated with higher levels of e-cigarette consumption and craving as well as lower instability of e-cigarette consumption. Controlling for dynamic patterns of vaping, dual users with lower e-cigarette dependence tend to report higher day-to-day dramatic changes in combustible cigarette consumption but not higher average levels of smoking. Conclusions We found that more stable use patterns are related to higher levels of dependence, which has been demonstrated in combustible cigarettes and we have now illustrated in e-cigarettes. Furthermore, the e-cigarette dependence scales may capture the product-specific average consumption but not product-specific instability of consumption. Implications This study provides empirical support for three e-cigarette dependence measures: PS-ECDI, e-FTCD, and e-WISDM, based on dynamic patterns of e-cigarette consumption and craving revealed by EMA data that have great ecological validity. This is the first study that introduces novel indices of dynamic patterns and demonstrates their potential applications in vaping research.

6 citations


Journal ArticleDOI
TL;DR: In this article, the central limit theorem for the linear spectral statistics of the Kendall rank correlation matrices under the Marchenko-Pastur asymptotic regime was established, in which the dimension diverges to infinity proportionally with the sample size.
Abstract: This paper is concerned with the limiting spectral behaviors of large dimensional Kendall’s rank correlation matrices generated by samples with independent and continuous components. The statistical setting in this paper covers a wide range of highly skewed and heavy-tailed distributions since we do not require the components to be identically distributed, and do not need any moment conditions. We establish the central limit theorem (CLT) for the linear spectral statistics (LSS) of the Kendall’s rank correlation matrices under the Marchenko–Pastur asymptotic regime, in which the dimension diverges to infinity proportionally with the sample size. We further propose three nonparametric procedures for high dimensional independent test and their limiting null distributions are derived by implementing this CLT. Our numerical comparisons demonstrate the robustness and superiority of our proposed test statistics under various mixed and heavy-tailed cases.

5 citations


Journal ArticleDOI
TL;DR: This article proposes an adjustment and develops the Z -estimation and unconstrained/constrained ordinary least squares estimation methods and demonstrates that the resulting estimators are consistent and asymptotically normal.

4 citations


Journal ArticleDOI
TL;DR: The Hotelling T 2 test statistic for the two-sample mean problem is no longer well defined due to singularity of the sample covariance matrix when the sample size is less than the dimension of data as discussed by the authors.

3 citations


Posted Content
TL;DR: In this article, the authors proposed new statistical inference procedures for high dimensional mediation models, in which both the outcome model and the mediator model are linear with high-dimensional mediators.
Abstract: Mediation analysis draws increasing attention in many scientific areas such as genomics, epidemiology and finance. In this paper, we propose new statistical inference procedures for high dimensional mediation models, in which both the outcome model and the mediator model are linear with high dimensional mediators. Traditional procedures for mediation analysis cannot be used to make statistical inference for high dimensional linear mediation models due to high-dimensionality of the mediators. We propose an estimation procedure for the indirect effects of the models via a partial penalized least squares method, and further establish its theoretical properties. We further develop a partial penalized Wald test on the indirect effects, and prove that the proposed test has a $\chi^2$ limiting null distribution. We also propose an $F$-type test for direct effects and show that the proposed test asymptotically follows a $\chi^2$-distribution under null hypothesis and a noncentral $\chi^2$-distribution under local alternatives. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed tests and compare their performance with existing ones. We further apply the newly proposed statistical inference procedures to study stock reaction to COVID-19 pandemic via an empirical analysis of studying the mediation effects of financial metrics that bridge company's sector and stock return.

3 citations


Journal ArticleDOI
TL;DR: In this article, a stable correlation is proposed to measure the dependence between two random vectors, which is well defined without the moment condition and is zero if and only if the two vectors are independent.
Abstract: In this paper, we propose a new correlation, called stable correlation, to measure the dependence between two random vectors The new correlation is well defined without the moment condition and is zero if and only if the two random vectors are independent We also study its other theoretical properties Based on the new correlation, we further propose a robust model-free feature screening procedure for ultrahigh dimensional data and establish its sure screening property and rank consistency property without imposing the subexponential or sub-Gaussian tail condition, which is commonly required in the literature of feature screening We also examine the finite sample performance of the proposed robust feature screening procedure via Monte Carlo simulation studies and illustrate the proposed procedure by a real data example

Journal ArticleDOI
TL;DR: In this article, a varying coefficient mediation model was proposed for causal mediation analysis with varying indirect and direct effects, which can also be viewed as an extension of the modera... approach.
Abstract: This paper is concerned with causal mediation analysis with varying indirect and direct effects. We propose a varying coefficient mediation model, which can also be viewed as an extension of modera...

Posted ContentDOI
TL;DR: The proposed model and the statistical inference procedure are applied to real-world data collected from a smoking cessation study and show that the proposed procedures perform well when comparing the confidence band and the true underlying model.
Abstract: Traditional mediation analysis typically examines the relations among an intervention, a time-invariant mediator, and a time-invariant outcome variable. Although there may be a direct effect of the intervention on the outcome, there is a need to understand the process by which the intervention affects the outcome (i.e. the indirect effect through the mediator). This indirect effect is frequently assumed to be time-invariant. With improvements in data collection technology, it is possible to obtain repeated assessments over time resulting in intensive longitudinal data. This calls for an extension of traditional mediation analysis to incorporate time-varying variables as well as time-varying effects. In this paper, we focus on estimation and inference for the time-varying mediation model, which allows mediation effects to vary as a function of time. We propose a two-step approach to estimate the time-varying mediation effect. Moreover, we use a simulation based approach to derive the corresponding point-wise confidence band for the time-varying mediation effect. Simulation studies show that the proposed procedures perform well when comparing the confidence band and the true underlying model. We further apply the proposed model and the statistical inference procedure to real-world data collected from a smoking cessation study.

Posted Content
TL;DR: In this paper, a multiple-splitting projection test (MPT) was proposed for one-sample mean vectors in high-dimensional settings, which is based on regularized quadratic optimization.
Abstract: We propose a multiple-splitting projection test (MPT) for one-sample mean vectors in high-dimensional settings. The idea of projection test is to project high-dimensional samples to a 1-dimensional space using an optimal projection direction such that traditional tests can be carried out with projected samples. However, estimation of the optimal projection direction has not been systematically studied in literature. In this work, we bridge the gap by proposing a consistent estimation via regularized quadratic optimization. To retain type I error rate, we adopt a data-splitting strategy when constructing test statistics. To mitigate the power loss due to data-splitting, we further propose a test via multiple splits to enhance the testing power. We show that the $p$-values resulted from multiple splits are exchangeable. Unlike existing methods which tend to conservatively combine dependent $p$-values, we develop an exact level $\alpha$ test that explicitly utilizes the exchangeability structure to achieve better power. Numerical studies show that the proposed test well retains the type I error rate and is more powerful than state-of-the-art tests.

Posted Content
TL;DR: In this paper, the authors proposed the propensity score regression (PSR) to estimate the treatment effects of seasonal influenza vaccination and having paid sick leave across different age groups in a wide context.
Abstract: Understanding how treatment effects vary on individual characteristics is critical in the contexts of personalized medicine, personalized advertising and policy design. When the characteristics are of practical interest are only a subset of full covariate, non-parametric estimation is often desirable; but few methods are available due to the computational difficult. Existing non-parametric methods such as the inverse probability weighting methods have limitations that hinder their use in many practical settings where the values of propensity scores are close to 0 or 1. We propose the propensity score regression (PSR) that allows the non-parametric estimation of the heterogeneous treatment effects in a wide context. PSR includes two non-parametric regressions in turn, where it first regresses on the propensity scores together with the characteristics of interest, to obtain an intermediate estimate; and then, regress the intermediate estimates on the characteristics of interest only. By including propensity scores as regressors in the non-parametric manner, PSR is capable of substantially easing the computational difficulty while remain (locally) insensitive to any value of propensity scores. We present several appealing properties of PSR, including the consistency and asymptotical normality, and in particular the existence of an explicit variance estimator, from which the analytical behaviour of PSR and its precision can be assessed. Simulation studies indicate that PSR outperform existing methods in varying settings with extreme values of propensity scores. We apply our method to the national 2009 flu survey (NHFS) data to investigate the effects of seasonal influenza vaccination and having paid sick leave across different age groups.

Journal ArticleDOI
TL;DR: The strong screening consistency property of the NW-SIS is rigorously established and the finite sample performance of the proposed method is assessed by simulation study and illustrated by an empirical analysis of a dataset from Chinese stock market.
Abstract: Network analysis has drawn great attention in recent years It is applied to a wide range disciplines These include but are not limited to social science, finance and genetics It is typical that one collects abundant covariates along the response variable in practice Since the network structure makes the responses at different nodes no longer independent, existing screening methods may not perform well for network data We propose a network-based sure independence screening (NW-SIS) method This approach explicitly takes the network structure into consideration The strong screening consistency property of the NW-SIS is rigorously established We further investigated the estimation of the network effect and establish the n -consistency of the estimator The finite sample performance of the proposed method is assessed by simulation study and illustrated by an empirical analysis of a dataset from Chinese stock market

Posted Content
TL;DR: In this paper, an equivalence between the conditional independence and the mutual independence is established and an index is proposed to measure the conditional dependence by quantifying the mutual dependence among the transformed variables.
Abstract: This paper is concerned with test of the conditional independence. We first establish an equivalence between the conditional independence and the mutual independence. Based on the equivalence, we propose an index to measure the conditional dependence by quantifying the mutual dependence among the transformed variables. The proposed index has several appealing properties. (a) It is distribution free since the limiting null distribution of the proposed index does not depend on the population distributions of the data. Hence the critical values can be tabulated by simulations. (b) The proposed index ranges from zero to one, and equals zero if and only if the conditional independence holds. Thus, it has nontrivial power under the alternative hypothesis. (c) It is robust to outliers and heavy-tailed data since it is invariant to conditional strictly monotone transformations. (d) It has low computational cost since it incorporates a simple closed-form expression and can be implemented in quadratic time. (e) It is insensitive to tuning parameters involved in the calculation of the proposed index. (f) The new index is applicable for multivariate random vectors as well as for discrete data. All these properties enable us to use the new index as statistical inference tools for various data. The effectiveness of the method is illustrated through extensive simulations and a real application on causal discovery.

Posted Content
TL;DR: In this article, the authors proposed a general approach to handle data contaminations that might disrupt the performance of feature selection and estimation procedures for high-dimensional linear models, which can be modeled as additional fixed and random components, respectively, and evaluated independently.
Abstract: We propose a general approach to handle data contaminations that might disrupt the performance of feature selection and estimation procedures for high-dimensional linear models. Specifically, we consider the co-occurrence of mean-shift and variance-inflation outliers, which can be modeled as additional fixed and random components, respectively, and evaluated independently. Our proposal performs feature selection while detecting and down-weighting variance-inflation outliers, detecting and excluding mean-shift outliers, and retaining non-outlying cases with full weights. Feature selection and mean-shift outlier detection are performed through a robust class of nonconcave penalization methods. Variance-inflation outlier detection is based on the penalization of the restricted posterior mode. The resulting approach satisfies a robust oracle property for feature selection in the presence of data contamination -- which allows the number of features to exponentially increase with the sample size -- and detects truly outlying cases of each type with asymptotic probability one. This provides an optimal trade-off between a high breakdown point and efficiency. Computationally efficient heuristic procedures are also presented. We illustrate the finite-sample performance of our proposal through an extensive simulation study and a real-world application.

Journal ArticleDOI
TL;DR: In this article, a new Bayesian variable selection approach for partially linear models (PLM) with ultra-high dimensional covariates is proposed, which employs the difference-based method to reduce the impact from the estimation of the nonparametric component, and incorporates Bayesian subset modeling with diffusing prior to shrink the corresponding estimator in the linear component.


Journal ArticleDOI
TL;DR: In this article, a folded concave penalized machine learning scheme with spatial coupling fused penalty (fused FCP) was proposed to build biomarkers for Parkinson's disease directly from whole-brain voxel-wise MRI data.