Showing papers by "Runze Li published in 2020"

PDF

Open Access

Journal Article•DOI•

Sensitivity and specificity of information criteria.

[...]

John J. Dziak¹, Donna L. Coffman², Stephanie T. Lanza, Runze Li¹, Lars S. Jermiin³ - Show less +1 more•Institutions (3)

Pennsylvania State University¹, Temple University², Australian National University³

23 Mar 2020-Briefings in Bioinformatics

TL;DR: In some cases the comparison of two models using ICs can be viewed as equivalent to a likelihood ratio test, with the different criteria representing different alpha levels and BIC being a more conservative test than AIC.

...read moreread less

Abstract: Choosing a model with too few parameters can involve making unrealistically simple assumptions and lead to high bias, poor prediction, and missed opportunities for insight. Such models are not flexible enough to describe the sample or the population well. A model with too many parameters can fit the observed data very well, but be too closely tailored to it. Such models may generalize poorly. Penalizedlikelihood information criteria, such as Akaike’s Information Criterion (AIC), the Bayesian Information Criterion (BIC), the Consistent AIC, and the Adjusted BIC, are widely used for model selection. However, different criteria sometimes support different models, leading to uncertainty about which criterion is the most trustworthy. In some simple cases the comparison of two models using information criteria can be viewed as equivalent to a likelihood ratio test, with the different models representing different alpha levels (i.e., different emphases on sensitivity or specificity; Lin & Dayton 1997). This perspective may lead to insights about how to interpret the criteria in less simple situations. For example, AIC or BIC could be preferable, depending on sample size and on the relative importance one assigns to sensitivity versus specificity. Understanding the differences among the criteria may make it easier to compare their results and to use them to make informed decisions.

...read moreread less

444 citations

Book•DOI•

Statistical Foundations of Data Science

[...]

Jianqing Fan, Runze Li, Cun-Hui Zhang, Hui Zou¹•Institutions (1)

Pennsylvania State University¹

20 Sep 2020

TL;DR: Statistical Foundations of Data Science as discussed by the authors provides a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories.

...read moreread less

Abstract: Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications. The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.

...read moreread less

89 citations

Journal Article•

Online sufficient dimension reduction through sliced inverse regression

[...]

Zhanrui Cai¹, Runze Li, Liping Zhu²•Institutions (2)

Pennsylvania State University¹, Renmin University of China²

01 Jan 2020-Journal of Machine Learning Research

TL;DR: This paper adapts the stationary sliced inverse regression to cope with the rapidly changing environments and proposes two online algorithms, one is motivated by the perturbation method and the other is originated from the gradient descent optimization, to perform online singular value decomposition.

...read moreread less

Abstract: Sliced inverse regression is an effective paradigm that achieves the goal of dimension reduction through replacing high dimensional covariates with a small number of linear combinations. It does not impose parametric assumptions on the dependence structure. More importantly, such a reduction of dimension is sufficient in that it does not cause loss of information. In this paper, we adapt the stationary sliced inverse regression to cope with the rapidly changing environments. We propose to implement sliced inverse regression in an online fashion. This online learner consists of two steps. In the first step we construct an online estimate for the kernel matrix; in the second step we propose two online algorithms, one is motivated by the perturbation method and the other is originated from the gradient descent optimization, to perform online singular value decomposition. The theoretical properties of this online learner are established. We demonstrate the numerical performance of this online learner through simulations and real world applications. All numerical studies confirm that this online learner performs as well as the batch learner.

...read moreread less

35 citations

Journal Article•DOI•

A Tuning-free Robust and Efficient Approach to High-dimensional Regression

[...]

Lan Wang¹, Bo Peng¹, Jelena Bradic², Runze Li³, Yunan Wu¹ - Show less +1 more•Institutions (3)

University of Minnesota¹, University of California, San Diego², Pennsylvania State University³

18 Dec 2020-Journal of the American Statistical Association

TL;DR: A novel approach for high-dimensional regression with theoretical guarantees that overcomes the challenge of tuning parameter selection of Lasso and possesses several appealing properties, and is robust with substantial efficiency gain for heavy-tailed random errors while maintaining high efficiency for normal random errors.

...read moreread less

Abstract: We introduce a novel approach for high-dimensional regression with theoretical guarantees. The new procedure overcomes the challenge of tuning parameter selection of Lasso and possesses several app...

...read moreread less

31 citations

Journal Article•DOI•

Model-free forward screening via cumulative divergence.

[...]

Tingyou Zhou¹, Liping Zhu², Chen Xu³, Runze Li⁴•Institutions (4)

Zhejiang University of Finance and Economics¹, Renmin University of China², University of Ottawa³, Pennsylvania State University⁴

02 Jul 2020-Journal of the American Statistical Association

TL;DR: A new metric named cumulative divergence is introduced, and a CD-based forward screening procedure is developed, which is model-free and resistant to the presence of outliers in the response.

...read moreread less

Abstract: Feature screening plays an important role in the analysis of ultrahigh dimensional data. Due to complicated model structure and high noise level, existing screening methods often suffer from model ...

...read moreread less

27 citations

Journal Article•DOI•

Consistent selection of the number of change-points via sample-splitting.

[...]

Changliang Zou¹, Guanghui Wang¹, Runze Li²•Institutions (2)

Nankai University¹, Pennsylvania State University²

17 Feb 2020-Annals of Statistics

TL;DR: This work proposes a data-driven selection criterion that is applicable to most kinds of popular change-point detection methods, including binary segmentation and optimal partitioning algorithms, and develops a cross-validation estimation scheme based on an order-preserved sample-splitting strategy.

...read moreread less

Abstract: In multiple change-point analysis, one of the major challenges is to estimate the number of change-points. Most existing approaches attempt to minimize a Schwarz information criterion which balances a term quantifying model fit with a penalization term accounting for model complexity that increases with the number of change-points and limits overfitting. However, different penalization terms are required to adapt to different contexts of multiple change-point problems and the optimal penalization magnitude usually varies from the model and error distribution. We propose a data-driven selection criterion that is applicable to most kinds of popular change-point detection methods, including binary segmentation and optimal partitioning algorithms. The key idea is to select the number of change-points that minimizes the squared prediction error, which measures the fit of a specified model for a new sample. We develop a cross-validation estimation scheme based on an order-preserved sample-splitting strategy, and establish its asymptotic selection consistency under some mild conditions. Effectiveness of the proposed selection criterion is demonstrated on a variety of numerical experiments and real-data examples.

...read moreread less

26 citations

Journal Article•DOI•

Model-Free Feature Screening and FDR Control With Knockoff Features

[...]

Wanjun Liu¹, Yuan Ke², Jingyuan Liu³, Runze Li¹•Institutions (3)

Pennsylvania State University¹, University of Georgia², Xiamen University³

20 Jul 2020-Journal of the American Statistical Association

TL;DR: In this article, a model-free and data-adaptive feature screening method for ultra-high-dimensional data is proposed based on the projection correlation which measures the dependences of dependences.

...read moreread less

Abstract: This article proposes a model-free and data-adaptive feature screening method for ultrahigh-dimensional data The proposed method is based on the projection correlation which measures the dependenc

...read moreread less

18 citations

Journal Article•DOI•

Prioritizing genetic variants in GWAS with lasso using permutation-assisted tuning.

[...]

Songshan Yang¹, Jiawei Wen¹, Scott Eckert¹, Yaqun Wang², Dajiang J. Liu¹, Rongling Wu¹, Runze Li¹, Xiang Zhan¹ - Show less +4 more•Institutions (2)

Pennsylvania State University¹, Rutgers University²

01 Jun 2020-Bioinformatics

TL;DR: A new permutation-assisted tuning procedure in lasso (plasso) is proposed to identify phenotype-associated SNPs in a joint multiple-SNP regression model in GWAS and gains new additional insights into the genetic control of complex traits.

...read moreread less

Abstract: Motivation Large scale genome-wide association studies (GWAS) have resulted in the identification of a wide range of genetic variants related to a host of complex traits and disorders. Despite their success, the individual single-nucleotide polymorphism (SNP) analysis approach adopted in most current GWAS can be limited in that it is usually biologically simple to elucidate a comprehensive genetic architecture of phenotypes and statistically underpowered due to heavy multiple-testing correction burden. On the other hand, multiple-SNP analyses (e.g. gene-based or region-based SNP-set analysis) are usually more powerful to examine the joint effects of a set of SNPs on the phenotype of interest. However, current multiple-SNP approaches can only draw an overall conclusion at the SNP-set level and does not directly inform which SNPs in the SNP-set are driving the overall genotype-phenotype association. Results In this article, we propose a new permutation-assisted tuning procedure in lasso (plasso) to identify phenotype-associated SNPs in a joint multiple-SNP regression model in GWAS. The tuning parameter of lasso determines the amount of shrinkage and is essential to the performance of variable selection. In the proposed plasso procedure, we first generate permutations as pseudo-SNPs that are not associated with the phenotype. Then, the lasso tuning parameter is delicately chosen to separate true signal SNPs and non-informative pseudo-SNPs. We illustrate plasso using simulations to demonstrate its superior performance over existing methods, and application of plasso to a real GWAS dataset gains new additional insights into the genetic control of complex traits. Availability and implementation R codes to implement the proposed methodology is available at https://github.com/xyz5074/plasso. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

17 citations

Journal Article•DOI•

Examining measurement reactivity in daily diary data on substance use: Results from a randomized experiment.

[...]

Anne Buu¹, Songshan Yang², Runze Li², Marc A. Zimmerman³, Rebecca M. Cunningham³, Maureen A. Walton³ - Show less +2 more•Institutions (3)

University of Texas Health Science Center at Houston¹, Pennsylvania State University², University of Michigan³

01 Mar 2020-Addictive Behaviors

TL;DR: The findings imply that although daily monitoring of drinking may motivate people to reduce the quantity consumed once they start to drink, it may also arouse their desire to start drinking, as both effects tend to last only one week, as participants accommodate to the monitoring by the second week.

...read moreread less

16 citations

Journal Article•

Distributed feature screening via componentwise debiasing

[...]

Xingxiang Li¹, Xingxiang Li², Runze Li, Zhiming Xia³, Chen Xu - Show less +1 more•Institutions (3)

Xi'an Jiaotong University¹, University of Ottawa², Northwest University (China)³

01 Feb 2020-Journal of Machine Learning Research

TL;DR: A distributed screening framework for big data setup that expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments, and shows that the aggregated correlation estimator is as efficient as the classic centralized estimator in terms of the probability convergence bound.

...read moreread less

Abstract: Feature screening is a powerful tool in the analysis of high dimensional data. When the sample size $N$ and the number of features $p$ are both large, the implementation of classic screening methods can be numerically challenging. In this paper, we propose a distributed screening framework for big data setup. In the spirit of "divide-and-conquer", the proposed framework expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments. With the component estimates aggregated, we obtain a final correlation estimate that can be readily used for screening features. This framework enables distributed storage and parallel computing and thus is computationally attractive. Due to the unbiased distributive estimation of the component parameters, the final aggregated estimate achieves a high accuracy that is insensitive to the number of data segments $m$ specified by the problem itself or to be chosen by users. Under mild conditions, we show that the aggregated correlation estimator is as efficient as the classic centralized estimator in terms of the probability convergence bound; the corresponding screening procedure enjoys sure screening property for a wide range of correlation measures. The promising performances of the new method are supported by extensive numerical examples.

...read moreread less

14 citations

Journal Article•DOI•

Test of significance for high-dimensional longitudinal data.

[...]

Ethan X. Fang¹, Yang Ning², Runze Li•Institutions (2)

Pennsylvania State University¹, Cornell University²

19 Sep 2020-Annals of Statistics

TL;DR: A new quadratic decorrelated inference function approach is proposed, which simultaneously removes the impact of nuisance parameters and incorporates the correlation to enhance the efficiency of the estimation procedure.

...read moreread less

Abstract: This paper concerns statistical inference for longitudinal data with ultrahigh dimensional covariates. We first study the problem of constructing confidence intervals and hypothesis tests for a low-dimensional parameter of interest. The major challenge is how to construct a powerful test statistic in the presence of high-dimensional nuisance parameters and sophisticated within-subject correlation of longitudinal data. To deal with the challenge, we propose a new quadratic decorrelated inference function approach which simultaneously removes the impact of nuisance parameters and incorporates the correlation to enhance the efficiency of the estimation procedure. When the parameter of interest is of fixed dimension, we prove that the proposed estimator is asymptotically normal and attains the semiparametric information bound, based on which we can construct an optimal Wald test statistic. We further extend this result and establish the limiting distribution of the estimator under the setting with the dimension of the parameter of interest growing with the sample size at a polynomial rate. Finally, we study how to control the false discovery rate (FDR) when a vector of high-dimensional regression parameters is of interest. We prove that applying the Storey (J. R. Stat. Soc. Ser. B. Stat. Methodol. 64 (2002) 479–498) procedure to the proposed test statistics for each regression parameter controls FDR asymptotically in longitudinal data. We conduct simulation studies to assess the finite sample performance of the proposed procedures. Our simulation results imply that the newly proposed procedure can control both Type I error for testing a low dimensional parameter of interest and the FDR in the multiple testing problem. We also apply the proposed procedure to a real data example.

...read moreread less

Posted Content•

Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection.

[...]

Yining Wang¹, Yi Chen², Ethan X. Fang³, Zhaoran Wang², Runze Li⁴ - Show less +1 more•Institutions (4)

University of Florida¹, Northwestern University², Johns Hopkins University³, Pennsylvania State University⁴

04 Sep 2020-arXiv: Machine Learning

TL;DR: This work considers the stochastic contextual bandit problem under the high dimensional linear model and proposes doubly growing epochs and estimating the parameter using the best subset selection method, which is easy to implement in practice and achieves high probability regret with high probability.

...read moreread less

Abstract: We consider the stochastic contextual bandit problem under the high dimensional linear model. We focus on the case where the action space is finite and random, with each action associated with a randomly generated contextual covariate. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. However, it is very challenging as we need to balance exploration and exploitation. We propose doubly growing epochs and estimating the parameter using the best subset selection method, which is easy to implement in practice. This approach achieves $ \tilde{\mathcal{O}}(s\sqrt{T})$ regret with high probability, which is nearly independent in the ``ambient'' regression model dimension $d$. We further attain a sharper $\tilde{\mathcal{O}}(\sqrt{sT})$ regret by using the \textsc{SupLinUCB} framework and match the minimax lower bound of low-dimensional linear stochastic bandit problems. Finally, we conduct extensive numerical experiments to demonstrate the applicability and robustness of our algorithms empirically.

...read moreread less

Journal Article•DOI•

Ultrahigh Dimensional Precision Matrix Estimation via Refitted Cross Validation.

[...]

Luheng Wang¹, Zhao Chen², Christina Dan Wang³, Runze Li⁴•Institutions (4)

Beijing Normal University¹, Fudan University², New York University Shanghai³, Pennsylvania State University⁴

01 Mar 2020-Journal of Econometrics

TL;DR: A refitted cross validation (RCV) method for sparse precision matrix estimation based on its Cholesky decomposition, which does not require the Gaussian assumption, can be easily implemented with existing software for ultrahigh dimensional linear regression.

...read moreread less

Journal Article•DOI•

Feature selection for generalized varying coefficient mixed-effect models with application to obesity gwas

[...]

Wanghuan Chu¹, Runze Li², Jingyuan Liu³, Matthew Reimherr²•Institutions (3)

Google¹, Pennsylvania State University², Xiamen University³

01 Mar 2020-The Annals of Applied Statistics

TL;DR: A two-step gene-detection procedure for generalized varying coefficient mixed-effects models with ultrahigh dimensional covariates finds significant single nucleotide polymorphisms impacting the mean BMI trend, some of which have already been biologically proven to be "fat genes."

...read moreread less

Abstract: Motivated by an empirical analysis of data from a genome-wide association study on obesity, measured by the body mass index (BMI), we propose a two-step gene-detection procedure for generalized varying coefficient mixed-effects models with ultrahigh dimensional covariates. The proposed procedure selects significant single nucleotide polymorphisms (SNPs) impacting the mean BMI trend, some of which have already been biologically proven to be “fat genes.” The method also discovers SNPs that significantly influence the age-dependent variability of BMI. The proposed procedure takes into account individual variations of genetic effects and can also be directly applied to longitudinal data with continuous, binary or count responses. We employ Monte Carlo simulation studies to assess the performance of the proposed method and further carry out causal inference for the selected SNPs.

...read moreread less

Journal Article•DOI•

Large-Scale Datastreams Surveillance via Pattern-Oriented-Sampling

[...]

Haojie Ren¹, Changliang Zou², Nan Chen³, Runze Li•Institutions (3)

Pennsylvania State University¹, Nankai University², National University of Singapore³

23 Oct 2020-Journal of the American Statistical Association

TL;DR: Monitoring large-scale datastreams with limited resources has become increasingly important for real-time detection of abnormal activities in many applications and the availability of large datasets is increasingly limited.

...read moreread less

Abstract: Monitoring large-scale datastreams with limited resources has become increasingly important for real-time detection of abnormal activities in many applications. Despite the availability of large da...

...read moreread less

Book Chapter•DOI•

Variable Selection and Feature Screening

[...]

Wanjun Liu¹, Runze Li¹•Institutions (1)

Pennsylvania State University¹

01 Jan 2020

TL;DR: This chapter provides a selective review on feature screening methods for ultra-high dimensional data by reducing the ultra- high dimensionality of the feature space to a moderate size in a fast and efficient way and meanwhile retaining all the important features in the reduced feature space.

...read moreread less

Abstract: This chapter provides a selective review on feature screening methods for ultra-high dimensional data. The main idea of feature screening is reducing the ultra-high dimensionality of the feature space to a moderate size in a fast and efficient way and meanwhile retaining all the important features in the reduced feature space. This is referred to as the sure screening property. After feature screening, more sophisticated methods can be applied to reduced feature space for further analysis such as parameter estimation and statistical inference. This chapter only focuses on the feature screening stage. From the perspective of different types of data, we review feature screening methods for independent and identically distributed data, longitudinal data, and survival data. From the perspective of modeling, we review various models including linear model, generalized linear model, additive model, varying-coefficient model, Cox model, etc. We also cover some model-free feature screening procedures.

...read moreread less

Posted Content•

A New Procedure for Controlling False Discovery Rate in Large-Scale t-tests

[...]

Changliang Zou, Haojie Ren, Xu Guo, Runze Li

28 Feb 2020-arXiv: Statistics Theory

TL;DR: In this paper, the authors proposed a data-driven testing procedure for controlling the false discovery rate (FDR) in large-scale multiple testing problems, which achieves exact FDR control in finite sample settings when the populations are symmetric.

...read moreread less

Abstract: This paper is concerned with false discovery rate (FDR) control in large-scale multiple testing problems. We first propose a new data-driven testing procedure for controlling the FDR in large-scale t-tests for one-sample mean problem. The proposed procedure achieves exact FDR control in finite sample settings when the populations are symmetric no matter the number of tests or sample sizes. Comparing with the existing bootstrap method for FDR control, the proposed procedure is computationally efficient. We show that the proposed method can control the FDR asymptotically for asymmetric populations even when the test statistics are not independent. We further show that the proposed procedure with a simple correction is as accurate as the bootstrap method to the second-order degree, and could be much more effective than the existing normal calibration. We extend the proposed procedure to two-sample mean problem. Empirical results show that the proposed procedures have better FDR control than existing ones when the proportion of true alternative hypotheses is not too low, while maintaining reasonably good detection ability.

...read moreread less

Journal Article•DOI•

Comment: Feature Screening and Variable Selection via Iterative Ridge Regression.

[...]

Jianqing Fan, Runze Li¹•Institutions (1)

Pennsylvania State University¹

24 Aug 2020-Technometrics

TL;DR: Ridge regression was originally introduced by Hoerl and Kennard (1970) to deal with collinearity issue in linear regression in the presence of highly correlated covariates and solves l2 penalized l...

...read moreread less

Abstract: Ridge regression was originally introduced by Hoerl and Kennard (1970) to deal with collinearity issue in linear regression in the presence of highly correlated covariates. It solves l2 penalized l...

...read moreread less

Journal Article•DOI•

Empirical likelihood test for a large-dimensional mean vector

[...]

Xia Cui¹, Runze Li², Guangren Yang³, Wang Zhou⁴•Institutions (4)

Guangzhou University¹, Pennsylvania State University², Jinan University³, National University of Singapore⁴

01 Sep 2020-Biometrika

TL;DR: In this paper, the authors proposed a new strategy of adding two artificial data points to the observed data to deal with these two challenges, and established the asymptotic normality of the proposed empirical likelihood ratio test.

...read moreread less

Abstract: SummaryThis paper is concerned with empirical likelihood inference on the population mean when the dimension $p$ and the sample size $n$ satisfy $p/n\rightarrow c\in [1,\infty)$. As shown in Tsao (2004), the empirical likelihood method fails with high probability when $p/n>1/2$ because the convex hull of the $n$ observations in $\mathbb{R}^p$ becomes too small to cover the true mean value. Moreover, when $p> n$, the sample covariance matrix becomes singular, and this results in the breakdown of the first sandwich approximation for the log empirical likelihood ratio. To deal with these two challenges, we propose a new strategy of adding two artificial data points to the observed data. We establish the asymptotic normality of the proposed empirical likelihood ratio test. The proposed test statistic does not involve the inverse of the sample covariance matrix. Furthermore, its form is explicit, so the test can easily be carried out with low computational cost. Our numerical comparison shows that the proposed test outperforms some existing tests for high-dimensional mean vectors in terms of power. We also illustrate the proposed procedure with an empirical analysis of stock data.

...read moreread less

Journal Article•DOI•

Feature screening in ultrahigh-dimensional generalized varying-coefficient models

[...]

Guangren Yang¹, Songshan Yang², Runze Li²•Institutions (2)

Jinan University¹, Pennsylvania State University²

01 Jan 2020-Statistica Sinica

TL;DR: The proposed screening procedure is based on joint quasi-likelihood of all predictors, and therefore is distinguished from marginal screening procedures proposed in the literature, and can effectively identify active predictors that are jointly dependent but marginally independent of the response.

...read moreread less

Abstract: Generalized varying coefficient models are particularly useful for examining dynamic effects of covariates on a continuous, binary or count response. This paper is concerned with feature screening for generalized varying coefficient models with ultrahigh dimensional covariates. The proposed screening procedure is based on joint quasi-likelihood of all predictors, and therefore is distinguished from marginal screening procedures proposed in the literature. In particular, the proposed procedure can effectively identify active predictors that are jointly dependent but marginally independent of the response. In order to carry out the proposed procedure, we propose an effective algorithm and establish the ascent property of the proposed algorithm. We further prove that the proposed procedure possesses the sure screening property. That is, with probability tending to one, the selected variable set includes the actual active predictors. We examine the finite sample performance of the proposed procedure and compare it with existing ones via Monte Carlo simulations, and illustrate the proposed procedure by a real data example.

...read moreread less

Book Chapter•DOI•

An Introduction to Deep Learning

[...]

Jianqing Fan, Runze Li, Cun-Hui Zhang, Hui Zou

20 Sep 2020

Journal Article•DOI•

Time-varying Effects of GABRG1 and Maladaptive Peer Behavior on Externalizing Behavior from Childhood to Adulthood: Testing Gene × Environment × Development Effects.

[...]

Elisa M. Trucco¹, Songshan Yang², James J. Yang³, Robert A. Zucker⁴, Runze Li², Anne Buu⁵ - Show less +2 more•Institutions (5)

Florida International University¹, Pennsylvania State University², University of Texas at Austin³, University of Michigan⁴, University of Texas Health Science Center at Houston⁵

01 Jul 2020-Journal of Youth and Adolescence

TL;DR: The findings indicate a sequential risk gradient in the influence of maladaptive peer behavior on externalizing behavior depending on the number of G alleles during childhood through adulthood.

...read moreread less

Abstract: Engagement in externalizing behavior is problematic. Deviant peer affiliation increases risk for externalizing behavior. Yet, peer effects vary across individuals and may differ across genes. This study determines gene × environment × development interactions as they apply to externalizing behavior from childhood to adulthood. A sample (n = 687; 68% male, 90% White) of youth from the Michigan Longitudinal Study was assessed from ages 10 to 25. Interactions between γ-amino butyric acid type A receptor γ1 subunit (GABRG1; rs7683876, rs13120165) and maladaptive peer behavior on externalizing behavior were examined using time-varying effect modeling. The findings indicate a sequential risk gradient in the influence of maladaptive peer behavior on externalizing behavior depending on the number of G alleles during childhood through adulthood. Individuals with the GG genotype are most vulnerable to maladaptive peer influences, which results in greater externalizing behavior during late childhood through early adulthood.

...read moreread less

Posted Content•

A Robust Consistent Information Criterion for Model Selection based on Empirical Likelihood

[...]

Chixiang Chen, Ming Wang, Rongling Wu, Runze Li

23 Jun 2020-arXiv: Methodology

TL;DR: A robust and consistent model selection criterion based upon the empirical likelihood function which is data-driven is proposed, which avoids potential computational convergence issues and allows versatile applications, such as generalized linear models, generalized estimating equations, penalized regressions and so on.

...read moreread less

Abstract: Conventional likelihood-based information criteria for model selection rely on the distribution assumption of data However, for complex data that are increasingly available in many scientific fields, the specification of their underlying distribution turns out to be challenging, and the existing criteria may be limited and are not general enough to handle a variety of model selection problems Here, we propose a robust and consistent model selection criterion based upon the empirical likelihood function which is data-driven In particular, this framework adopts plug-in estimators that can be achieved by solving external estimating equations, not limited to the empirical likelihood, which avoids potential computational convergence issues and allows versatile applications, such as generalized linear models, generalized estimating equations, penalized regressions and so on The formulation of our proposed criterion is initially derived from the asymptotic expansion of the marginal likelihood under variable selection framework, but more importantly, the consistent model selection property is established under a general context Extensive simulation studies confirm the out-performance of the proposal compared to traditional model selection criteria Finally, an application to the Atherosclerosis Risk in Communities Study illustrates the practical value of this proposed framework

...read moreread less

Posted Content•

Inference in High-Dimensional Linear Measurement Error Models

[...]

Mengyan Li¹, Runze Li, Yanyuan Ma²•Institutions (2)

Bentley University¹, Pennsylvania State University²

28 Jan 2020-arXiv: Methodology

TL;DR: In this paper, a new corrected decorrelated score test and a corresponding one-step estimator were proposed for a high-dimensional linear model with a finite number of covariates measured with error.

...read moreread less

Abstract: For a high-dimensional linear model with a finite number of covariates measured with error, we study statistical inference on the parameters associated with the error-prone covariates, and propose a new corrected decorrelated score test and the corresponding one-step estimator. We further establish asymptotic properties of the newly proposed test statistic and the one-step estimator. Under local alternatives, we show that the limiting distribution of our corrected decorrelated score test statistic is non-central normal. The finite-sample performance of the proposed inference procedure is examined through simulation studies. We further illustrate the proposed procedure via an empirical analysis of a real data example.

...read moreread less

Book Chapter•DOI•

Projection Test with Sparse Optimal Direction for High-Dimensional One Sample Mean Problem

[...]

Wanjun Liu¹, Runze Li¹•Institutions (1)

Pennsylvania State University¹

01 Jul 2020

TL;DR: A projection test based on a new estimation of the optimal projection direction of thevarSigma ^{-1}\mu is proposed, which uses a regularized quadratic programming with nonconvex penalty and linear constraint to estimate it.

...read moreread less

Abstract: Testing whether the mean vector from some population is zero or not is a fundamental problem in statistics. In the high-dimensional regime, where the dimension of data p is greater than the sample size n, traditional methods such as Hotelling’s $T^2$ test cannot be directly applied. One can project the high-dimensional vector onto a space of low dimension and then traditional methods can be applied. In this paper, we propose a projection test based on a new estimation of the optimal projection direction $\varSigma ^{-1}\mu $. Under the assumption that the optimal projection $\varSigma ^{-1}\mu $ is sparse, we use a regularized quadratic programming with nonconvex penalty and linear constraint to estimate it. Simulation studies and real data analysis are conducted to examine the finite sample performance of different tests in terms of type I error and power.

...read moreread less

Journal Article•DOI•

Rejoinder to “A Tuning-Free Robust and Efficient Approach to High-Dimensional Regression”

[...]

Lan Wang¹, Bo Peng², Jelena Bradic³, Runze Li⁴, Yunan Wu⁵ - Show less +1 more•Institutions (5)

University of Miami¹, Adobe Systems², University of California, San Diego³, Pennsylvania State University⁴, University of Minnesota⁵

18 Dec 2020-Journal of the American Statistical Association

TL;DR: In this paper, the editors, Professors Regina Liu and Hongyu Zhao, thank the editors for featuring this article and organizing stimulating discussions and are grateful for the feedback on their work from the three reviewers.

...read moreread less

Abstract: We heartily thank the editors, Professors Regina Liu and Hongyu Zhao, for featuring this article and organizing stimulating discussions. We are grateful for the feedback on our work from the three ...

...read moreread less