scispace - formally typeset
Search or ask a question

Showing papers by "Runze Li published in 2020"


Journal ArticleDOI
TL;DR: In some cases the comparison of two models using ICs can be viewed as equivalent to a likelihood ratio test, with the different criteria representing different alpha levels and BIC being a more conservative test than AIC.
Abstract: Choosing a model with too few parameters can involve making unrealistically simple assumptions and lead to high bias, poor prediction, and missed opportunities for insight. Such models are not flexible enough to describe the sample or the population well. A model with too many parameters can fit the observed data very well, but be too closely tailored to it. Such models may generalize poorly. Penalizedlikelihood information criteria, such as Akaike’s Information Criterion (AIC), the Bayesian Information Criterion (BIC), the Consistent AIC, and the Adjusted BIC, are widely used for model selection. However, different criteria sometimes support different models, leading to uncertainty about which criterion is the most trustworthy. In some simple cases the comparison of two models using information criteria can be viewed as equivalent to a likelihood ratio test, with the different models representing different alpha levels (i.e., different emphases on sensitivity or specificity; Lin & Dayton 1997). This perspective may lead to insights about how to interpret the criteria in less simple situations. For example, AIC or BIC could be preferable, depending on sample size and on the relative importance one assigns to sensitivity versus specificity. Understanding the differences among the criteria may make it easier to compare their results and to use them to make informed decisions.

444 citations


BookDOI
20 Sep 2020
TL;DR: Statistical Foundations of Data Science as discussed by the authors provides a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories.
Abstract: Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications. The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.

89 citations


Journal Article
TL;DR: This paper adapts the stationary sliced inverse regression to cope with the rapidly changing environments and proposes two online algorithms, one is motivated by the perturbation method and the other is originated from the gradient descent optimization, to perform online singular value decomposition.
Abstract: Sliced inverse regression is an effective paradigm that achieves the goal of dimension reduction through replacing high dimensional covariates with a small number of linear combinations. It does not impose parametric assumptions on the dependence structure. More importantly, such a reduction of dimension is sufficient in that it does not cause loss of information. In this paper, we adapt the stationary sliced inverse regression to cope with the rapidly changing environments. We propose to implement sliced inverse regression in an online fashion. This online learner consists of two steps. In the first step we construct an online estimate for the kernel matrix; in the second step we propose two online algorithms, one is motivated by the perturbation method and the other is originated from the gradient descent optimization, to perform online singular value decomposition. The theoretical properties of this online learner are established. We demonstrate the numerical performance of this online learner through simulations and real world applications. All numerical studies confirm that this online learner performs as well as the batch learner.

35 citations


Journal ArticleDOI
TL;DR: A novel approach for high-dimensional regression with theoretical guarantees that overcomes the challenge of tuning parameter selection of Lasso and possesses several appealing properties, and is robust with substantial efficiency gain for heavy-tailed random errors while maintaining high efficiency for normal random errors.
Abstract: We introduce a novel approach for high-dimensional regression with theoretical guarantees. The new procedure overcomes the challenge of tuning parameter selection of Lasso and possesses several app...

31 citations


Journal ArticleDOI
TL;DR: A new metric named cumulative divergence is introduced, and a CD-based forward screening procedure is developed, which is model-free and resistant to the presence of outliers in the response.
Abstract: Feature screening plays an important role in the analysis of ultrahigh dimensional data. Due to complicated model structure and high noise level, existing screening methods often suffer from model ...

27 citations


Journal ArticleDOI
TL;DR: This work proposes a data-driven selection criterion that is applicable to most kinds of popular change-point detection methods, including binary segmentation and optimal partitioning algorithms, and develops a cross-validation estimation scheme based on an order-preserved sample-splitting strategy.
Abstract: In multiple change-point analysis, one of the major challenges is to estimate the number of change-points. Most existing approaches attempt to minimize a Schwarz information criterion which balances a term quantifying model fit with a penalization term accounting for model complexity that increases with the number of change-points and limits overfitting. However, different penalization terms are required to adapt to different contexts of multiple change-point problems and the optimal penalization magnitude usually varies from the model and error distribution. We propose a data-driven selection criterion that is applicable to most kinds of popular change-point detection methods, including binary segmentation and optimal partitioning algorithms. The key idea is to select the number of change-points that minimizes the squared prediction error, which measures the fit of a specified model for a new sample. We develop a cross-validation estimation scheme based on an order-preserved sample-splitting strategy, and establish its asymptotic selection consistency under some mild conditions. Effectiveness of the proposed selection criterion is demonstrated on a variety of numerical experiments and real-data examples.

26 citations


Journal ArticleDOI
TL;DR: In this article, a model-free and data-adaptive feature screening method for ultra-high-dimensional data is proposed based on the projection correlation which measures the dependences of dependences.
Abstract: This article proposes a model-free and data-adaptive feature screening method for ultrahigh-dimensional data The proposed method is based on the projection correlation which measures the dependenc

18 citations


Journal ArticleDOI
TL;DR: A new permutation-assisted tuning procedure in lasso (plasso) is proposed to identify phenotype-associated SNPs in a joint multiple-SNP regression model in GWAS and gains new additional insights into the genetic control of complex traits.
Abstract: Motivation Large scale genome-wide association studies (GWAS) have resulted in the identification of a wide range of genetic variants related to a host of complex traits and disorders. Despite their success, the individual single-nucleotide polymorphism (SNP) analysis approach adopted in most current GWAS can be limited in that it is usually biologically simple to elucidate a comprehensive genetic architecture of phenotypes and statistically underpowered due to heavy multiple-testing correction burden. On the other hand, multiple-SNP analyses (e.g. gene-based or region-based SNP-set analysis) are usually more powerful to examine the joint effects of a set of SNPs on the phenotype of interest. However, current multiple-SNP approaches can only draw an overall conclusion at the SNP-set level and does not directly inform which SNPs in the SNP-set are driving the overall genotype-phenotype association. Results In this article, we propose a new permutation-assisted tuning procedure in lasso (plasso) to identify phenotype-associated SNPs in a joint multiple-SNP regression model in GWAS. The tuning parameter of lasso determines the amount of shrinkage and is essential to the performance of variable selection. In the proposed plasso procedure, we first generate permutations as pseudo-SNPs that are not associated with the phenotype. Then, the lasso tuning parameter is delicately chosen to separate true signal SNPs and non-informative pseudo-SNPs. We illustrate plasso using simulations to demonstrate its superior performance over existing methods, and application of plasso to a real GWAS dataset gains new additional insights into the genetic control of complex traits. Availability and implementation R codes to implement the proposed methodology is available at https://github.com/xyz5074/plasso. Supplementary information Supplementary data are available at Bioinformatics online.

17 citations


Journal ArticleDOI
TL;DR: The findings imply that although daily monitoring of drinking may motivate people to reduce the quantity consumed once they start to drink, it may also arouse their desire to start drinking, as both effects tend to last only one week, as participants accommodate to the monitoring by the second week.

16 citations


Journal Article
TL;DR: A distributed screening framework for big data setup that expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments, and shows that the aggregated correlation estimator is as efficient as the classic centralized estimator in terms of the probability convergence bound.
Abstract: Feature screening is a powerful tool in the analysis of high dimensional data. When the sample size $N$ and the number of features $p$ are both large, the implementation of classic screening methods can be numerically challenging. In this paper, we propose a distributed screening framework for big data setup. In the spirit of "divide-and-conquer", the proposed framework expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments. With the component estimates aggregated, we obtain a final correlation estimate that can be readily used for screening features. This framework enables distributed storage and parallel computing and thus is computationally attractive. Due to the unbiased distributive estimation of the component parameters, the final aggregated estimate achieves a high accuracy that is insensitive to the number of data segments $m$ specified by the problem itself or to be chosen by users. Under mild conditions, we show that the aggregated correlation estimator is as efficient as the classic centralized estimator in terms of the probability convergence bound; the corresponding screening procedure enjoys sure screening property for a wide range of correlation measures. The promising performances of the new method are supported by extensive numerical examples.

14 citations


Journal ArticleDOI
TL;DR: A new quadratic decorrelated inference function approach is proposed, which simultaneously removes the impact of nuisance parameters and incorporates the correlation to enhance the efficiency of the estimation procedure.
Abstract: This paper concerns statistical inference for longitudinal data with ultrahigh dimensional covariates. We first study the problem of constructing confidence intervals and hypothesis tests for a low-dimensional parameter of interest. The major challenge is how to construct a powerful test statistic in the presence of high-dimensional nuisance parameters and sophisticated within-subject correlation of longitudinal data. To deal with the challenge, we propose a new quadratic decorrelated inference function approach which simultaneously removes the impact of nuisance parameters and incorporates the correlation to enhance the efficiency of the estimation procedure. When the parameter of interest is of fixed dimension, we prove that the proposed estimator is asymptotically normal and attains the semiparametric information bound, based on which we can construct an optimal Wald test statistic. We further extend this result and establish the limiting distribution of the estimator under the setting with the dimension of the parameter of interest growing with the sample size at a polynomial rate. Finally, we study how to control the false discovery rate (FDR) when a vector of high-dimensional regression parameters is of interest. We prove that applying the Storey (J. R. Stat. Soc. Ser. B. Stat. Methodol. 64 (2002) 479–498) procedure to the proposed test statistics for each regression parameter controls FDR asymptotically in longitudinal data. We conduct simulation studies to assess the finite sample performance of the proposed procedures. Our simulation results imply that the newly proposed procedure can control both Type I error for testing a low dimensional parameter of interest and the FDR in the multiple testing problem. We also apply the proposed procedure to a real data example.

Posted Content
TL;DR: This work considers the stochastic contextual bandit problem under the high dimensional linear model and proposes doubly growing epochs and estimating the parameter using the best subset selection method, which is easy to implement in practice and achieves high probability regret with high probability.
Abstract: We consider the stochastic contextual bandit problem under the high dimensional linear model. We focus on the case where the action space is finite and random, with each action associated with a randomly generated contextual covariate. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. However, it is very challenging as we need to balance exploration and exploitation. We propose doubly growing epochs and estimating the parameter using the best subset selection method, which is easy to implement in practice. This approach achieves $ \tilde{\mathcal{O}}(s\sqrt{T})$ regret with high probability, which is nearly independent in the ``ambient'' regression model dimension $d$. We further attain a sharper $\tilde{\mathcal{O}}(\sqrt{sT})$ regret by using the \textsc{SupLinUCB} framework and match the minimax lower bound of low-dimensional linear stochastic bandit problems. Finally, we conduct extensive numerical experiments to demonstrate the applicability and robustness of our algorithms empirically.

Journal ArticleDOI
TL;DR: A refitted cross validation (RCV) method for sparse precision matrix estimation based on its Cholesky decomposition, which does not require the Gaussian assumption, can be easily implemented with existing software for ultrahigh dimensional linear regression.

Journal ArticleDOI
TL;DR: A two-step gene-detection procedure for generalized varying coefficient mixed-effects models with ultrahigh dimensional covariates finds significant single nucleotide polymorphisms impacting the mean BMI trend, some of which have already been biologically proven to be "fat genes."
Abstract: Motivated by an empirical analysis of data from a genome-wide association study on obesity, measured by the body mass index (BMI), we propose a two-step gene-detection procedure for generalized varying coefficient mixed-effects models with ultrahigh dimensional covariates. The proposed procedure selects significant single nucleotide polymorphisms (SNPs) impacting the mean BMI trend, some of which have already been biologically proven to be “fat genes.” The method also discovers SNPs that significantly influence the age-dependent variability of BMI. The proposed procedure takes into account individual variations of genetic effects and can also be directly applied to longitudinal data with continuous, binary or count responses. We employ Monte Carlo simulation studies to assess the performance of the proposed method and further carry out causal inference for the selected SNPs.

Journal ArticleDOI
TL;DR: Monitoring large-scale datastreams with limited resources has become increasingly important for real-time detection of abnormal activities in many applications and the availability of large datasets is increasingly limited.
Abstract: Monitoring large-scale datastreams with limited resources has become increasingly important for real-time detection of abnormal activities in many applications. Despite the availability of large da...

Book ChapterDOI
01 Jan 2020
TL;DR: This chapter provides a selective review on feature screening methods for ultra-high dimensional data by reducing the ultra- high dimensionality of the feature space to a moderate size in a fast and efficient way and meanwhile retaining all the important features in the reduced feature space.
Abstract: This chapter provides a selective review on feature screening methods for ultra-high dimensional data. The main idea of feature screening is reducing the ultra-high dimensionality of the feature space to a moderate size in a fast and efficient way and meanwhile retaining all the important features in the reduced feature space. This is referred to as the sure screening property. After feature screening, more sophisticated methods can be applied to reduced feature space for further analysis such as parameter estimation and statistical inference. This chapter only focuses on the feature screening stage. From the perspective of different types of data, we review feature screening methods for independent and identically distributed data, longitudinal data, and survival data. From the perspective of modeling, we review various models including linear model, generalized linear model, additive model, varying-coefficient model, Cox model, etc. We also cover some model-free feature screening procedures.

Posted Content
TL;DR: In this paper, the authors proposed a data-driven testing procedure for controlling the false discovery rate (FDR) in large-scale multiple testing problems, which achieves exact FDR control in finite sample settings when the populations are symmetric.
Abstract: This paper is concerned with false discovery rate (FDR) control in large-scale multiple testing problems. We first propose a new data-driven testing procedure for controlling the FDR in large-scale t-tests for one-sample mean problem. The proposed procedure achieves exact FDR control in finite sample settings when the populations are symmetric no matter the number of tests or sample sizes. Comparing with the existing bootstrap method for FDR control, the proposed procedure is computationally efficient. We show that the proposed method can control the FDR asymptotically for asymmetric populations even when the test statistics are not independent. We further show that the proposed procedure with a simple correction is as accurate as the bootstrap method to the second-order degree, and could be much more effective than the existing normal calibration. We extend the proposed procedure to two-sample mean problem. Empirical results show that the proposed procedures have better FDR control than existing ones when the proportion of true alternative hypotheses is not too low, while maintaining reasonably good detection ability.

Journal ArticleDOI
TL;DR: Ridge regression was originally introduced by Hoerl and Kennard (1970) to deal with collinearity issue in linear regression in the presence of highly correlated covariates and solves l2 penalized l...
Abstract: Ridge regression was originally introduced by Hoerl and Kennard (1970) to deal with collinearity issue in linear regression in the presence of highly correlated covariates. It solves l2 penalized l...

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a new strategy of adding two artificial data points to the observed data to deal with these two challenges, and established the asymptotic normality of the proposed empirical likelihood ratio test.
Abstract: SummaryThis paper is concerned with empirical likelihood inference on the population mean when the dimension $p$ and the sample size $n$ satisfy $p/n\rightarrow c\in [1,\infty)$. As shown in Tsao (2004), the empirical likelihood method fails with high probability when $p/n>1/2$ because the convex hull of the $n$ observations in $\mathbb{R}^p$ becomes too small to cover the true mean value. Moreover, when $p> n$, the sample covariance matrix becomes singular, and this results in the breakdown of the first sandwich approximation for the log empirical likelihood ratio. To deal with these two challenges, we propose a new strategy of adding two artificial data points to the observed data. We establish the asymptotic normality of the proposed empirical likelihood ratio test. The proposed test statistic does not involve the inverse of the sample covariance matrix. Furthermore, its form is explicit, so the test can easily be carried out with low computational cost. Our numerical comparison shows that the proposed test outperforms some existing tests for high-dimensional mean vectors in terms of power. We also illustrate the proposed procedure with an empirical analysis of stock data.

Journal ArticleDOI
TL;DR: The proposed screening procedure is based on joint quasi-likelihood of all predictors, and therefore is distinguished from marginal screening procedures proposed in the literature, and can effectively identify active predictors that are jointly dependent but marginally independent of the response.
Abstract: Generalized varying coefficient models are particularly useful for examining dynamic effects of covariates on a continuous, binary or count response. This paper is concerned with feature screening for generalized varying coefficient models with ultrahigh dimensional covariates. The proposed screening procedure is based on joint quasi-likelihood of all predictors, and therefore is distinguished from marginal screening procedures proposed in the literature. In particular, the proposed procedure can effectively identify active predictors that are jointly dependent but marginally independent of the response. In order to carry out the proposed procedure, we propose an effective algorithm and establish the ascent property of the proposed algorithm. We further prove that the proposed procedure possesses the sure screening property. That is, with probability tending to one, the selected variable set includes the actual active predictors. We examine the finite sample performance of the proposed procedure and compare it with existing ones via Monte Carlo simulations, and illustrate the proposed procedure by a real data example.


Journal ArticleDOI
TL;DR: The findings indicate a sequential risk gradient in the influence of maladaptive peer behavior on externalizing behavior depending on the number of G alleles during childhood through adulthood.
Abstract: Engagement in externalizing behavior is problematic. Deviant peer affiliation increases risk for externalizing behavior. Yet, peer effects vary across individuals and may differ across genes. This study determines gene × environment × development interactions as they apply to externalizing behavior from childhood to adulthood. A sample (n = 687; 68% male, 90% White) of youth from the Michigan Longitudinal Study was assessed from ages 10 to 25. Interactions between γ-amino butyric acid type A receptor γ1 subunit (GABRG1; rs7683876, rs13120165) and maladaptive peer behavior on externalizing behavior were examined using time-varying effect modeling. The findings indicate a sequential risk gradient in the influence of maladaptive peer behavior on externalizing behavior depending on the number of G alleles during childhood through adulthood. Individuals with the GG genotype are most vulnerable to maladaptive peer influences, which results in greater externalizing behavior during late childhood through early adulthood.

Posted Content
TL;DR: A robust and consistent model selection criterion based upon the empirical likelihood function which is data-driven is proposed, which avoids potential computational convergence issues and allows versatile applications, such as generalized linear models, generalized estimating equations, penalized regressions and so on.
Abstract: Conventional likelihood-based information criteria for model selection rely on the distribution assumption of data However, for complex data that are increasingly available in many scientific fields, the specification of their underlying distribution turns out to be challenging, and the existing criteria may be limited and are not general enough to handle a variety of model selection problems Here, we propose a robust and consistent model selection criterion based upon the empirical likelihood function which is data-driven In particular, this framework adopts plug-in estimators that can be achieved by solving external estimating equations, not limited to the empirical likelihood, which avoids potential computational convergence issues and allows versatile applications, such as generalized linear models, generalized estimating equations, penalized regressions and so on The formulation of our proposed criterion is initially derived from the asymptotic expansion of the marginal likelihood under variable selection framework, but more importantly, the consistent model selection property is established under a general context Extensive simulation studies confirm the out-performance of the proposal compared to traditional model selection criteria Finally, an application to the Atherosclerosis Risk in Communities Study illustrates the practical value of this proposed framework

Posted Content
TL;DR: In this paper, a new corrected decorrelated score test and a corresponding one-step estimator were proposed for a high-dimensional linear model with a finite number of covariates measured with error.
Abstract: For a high-dimensional linear model with a finite number of covariates measured with error, we study statistical inference on the parameters associated with the error-prone covariates, and propose a new corrected decorrelated score test and the corresponding one-step estimator. We further establish asymptotic properties of the newly proposed test statistic and the one-step estimator. Under local alternatives, we show that the limiting distribution of our corrected decorrelated score test statistic is non-central normal. The finite-sample performance of the proposed inference procedure is examined through simulation studies. We further illustrate the proposed procedure via an empirical analysis of a real data example.

Book ChapterDOI
01 Jul 2020
TL;DR: A projection test based on a new estimation of the optimal projection direction of thevarSigma ^{-1}\mu is proposed, which uses a regularized quadratic programming with nonconvex penalty and linear constraint to estimate it.
Abstract: Testing whether the mean vector from some population is zero or not is a fundamental problem in statistics. In the high-dimensional regime, where the dimension of data p is greater than the sample size n, traditional methods such as Hotelling’s \(T^2\) test cannot be directly applied. One can project the high-dimensional vector onto a space of low dimension and then traditional methods can be applied. In this paper, we propose a projection test based on a new estimation of the optimal projection direction \(\varSigma ^{-1}\mu \). Under the assumption that the optimal projection \(\varSigma ^{-1}\mu \) is sparse, we use a regularized quadratic programming with nonconvex penalty and linear constraint to estimate it. Simulation studies and real data analysis are conducted to examine the finite sample performance of different tests in terms of type I error and power.

Journal ArticleDOI
TL;DR: In this paper, the editors, Professors Regina Liu and Hongyu Zhao, thank the editors for featuring this article and organizing stimulating discussions and are grateful for the feedback on their work from the three reviewers.
Abstract: We heartily thank the editors, Professors Regina Liu and Hongyu Zhao, for featuring this article and organizing stimulating discussions. We are grateful for the feedback on our work from the three ...