scispace - formally typeset
Search or ask a question

Showing papers by "Fang Han published in 2018"


Journal ArticleDOI
TL;DR: In this paper, a robust alternative to principal component analysis (PCA), called elliptical component analysis, is proposed for analyzing high-dimensional, elliptically distributed data, where a multivariate rank statistic is exploited to cope with heavy-tailed elliptical distributions.
Abstract: We present a robust alternative to principal component analysis (PCA)—called elliptical component analysis (ECA)—for analyzing high-dimensional, elliptically distributed data. ECA estimates the eigenspace of the covariance matrix of the elliptical data. To cope with heavy-tailed elliptical distributions, a multivariate rank statistic is exploited. At the model-level, we consider two settings: either that the leading eigenvectors of the covariance matrix are nonsparse or that they are sparse. Methodologically, we propose ECA procedures for both nonsparse and sparse settings. Theoretically, we provide both nonasymptotic and asymptotic analyses quantifying the theoretical performances of ECA. In the nonsparse setting, we show that ECA’s performance is highly related to the effective rank of the covariance matrix. In the sparse setting, the results are twofold: (i) we show that the sparse ECA estimator based on a combinatoric program attains the optimal rate of convergence; (ii) based on some recent d...

44 citations


Journal ArticleDOI
TL;DR: In this article, a novel exponential inequality for U-statistics under the time series setting is proved, and explicit mixing conditions are given for guaranteeing fast convergence, the bound proves to be analogous to the one under independence, and extension to non-stationary time series is straightforward.
Abstract: The family of U-statistics plays a fundamental role in statistics. This paper proves a novel exponential inequality for U-statistics under the time series setting. Explicit mixing conditions are given for guaranteeing fast convergence, the bound proves to be analogous to the one under independence, and extension to non-stationary time series is straightforward. The proof relies on a novel decomposition of U-statistics via exploiting the temporal correlatedness structure. Such results are of interest in many fields where high-dimensional time series data are present. In particular, applications to high-dimensional time series inference are discussed.

18 citations


Posted Content
14 Dec 2018
TL;DR: The proposed tests are distribution-free, implementable without the need for permutation, and are shown to be rate-optimal against sparse alternatives under the Gaussian copula model and reveal an identity between the aforementioned three rank correlation statistics.
Abstract: Testing mutual independence for high dimensional observations is a fundamental statistical challenge. Popular tests based on linear and simple rank correlations are known to be incapable of detecting non-linear, non-monotone relationships, calling for methods that can account for such dependences. To address this challenge, we propose a family of tests that are constructed using maxima of pairwise rank correlations that permit consistent assessment of pairwise independence. Built upon a newly developed Cram\'{e}r-type moderate deviation theorem for degenerate U-statistics, our results cover a variety of rank correlations including Hoeffding's $D$, Blum-Kiefer-Rosenblatt's $R$, and Bergsma-Dassios-Yanagimoto's $\tau^*$. The proposed tests are distribution-free, implementable without the need for permutation, and are shown to be rate-optimal against sparse alternatives under the Gaussian copula model. As a by-product of the study, we reveal an identity between the aforementioned three rank correlation statistics, and hence make a step towards proving a conjecture of Bergsma and Dassios.

10 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show that two long-standing problems in random matrix theory can be solved: (i) simple bootstrap inference on sample eigen values when true eigenvalues are tied; and (ii) conducting two-sample Roy's covariance test in high dimensions.
Abstract: Recently, Chernozhukov, Chetverikov, and Kato (Ann. Statist. 42 (2014) 1564–1597) developed a new Gaussian comparison inequality for approximating the suprema of empirical processes. This paper exploits this technique to devise sharp inference on spectra of large random matrices. In particular, we show that two long-standing problems in random matrix theory can be solved: (i) simple bootstrap inference on sample eigenvalues when true eigenvalues are tied; (ii) conducting two-sample Roy’s covariance test in high dimensions. To establish the asymptotic results, a generalized $\varepsilon$-net argument regarding the matrix rescaled spectral norm and several new empirical process bounds are developed and of independent interest.

9 citations


Posted Content
TL;DR: The goal of this paper is to obtain expectation bounds for the deviation of large sample autocovariance matrices from their means under weak data dependence and establish deviation bounds that depend only on the parameters controlling the "intrinsic dimension" of the data up to some logarithmic terms.
Abstract: The goal of this paper is to obtain expectation bounds for the deviation of large sample autocovariance matrices from their means under weak data dependence. While the accuracy of covariance matrix estimation corresponding to independent data has been well understood, much less is known in the case of dependent data. We make a step towards filling this gap, and establish deviation bounds that depend only on the parameters controlling the "intrinsic dimension" of the data up to some logarithmic terms. Our results have immediate impacts on high dimensional time series analysis, and we apply them to high dimensional linear VAR($d$) model, vector-valued ARCH model, and a model used in Banna et al. (2016).

5 citations


Posted Content
TL;DR: In this article, the authors proposed a family of tests that are constructed using maxima of pairwise rank correlations that permit consistent assessment of mutual independence for high-dimensional observations, based upon a newly developed Cramer-type moderate deviation theorem for degenerate U-statistics.
Abstract: Testing mutual independence for high-dimensional observations is a fundamental statistical challenge. Popular tests based on linear and simple rank correlations are known to be incapable of detecting non-linear, non-monotone relationships, calling for methods that can account for such dependences. To address this challenge, we propose a family of tests that are constructed using maxima of pairwise rank correlations that permit consistent assessment of pairwise independence. Built upon a newly developed Cramer-type moderate deviation theorem for degenerate U-statistics, our results cover a variety of rank correlations including Hoeffding's $D$, Blum-Kiefer-Rosenblatt's $R$, and Bergsma-Dassios-Yanagimoto's $\tau^*$. The proposed tests are distribution-free in the class of multivariate distributions with continuous margins, implementable without the need for permutation, and are shown to be rate-optimal against sparse alternatives under the Gaussian copula model. As a by-product of the study, we reveal an identity between the aforementioned three rank correlation statistics, and hence make a step towards proving a conjecture of Bergsma and Dassios.

4 citations


Posted Content
TL;DR: In this article, the validity of uncertainty assessment for weighted U-statistics has been investigated in the context of online ranking algorithms, where both kernels and weights are asymmetric and data points not identically distributed.
Abstract: Motivated by challenges on studying a new correlation measurement being popularized in evaluating online ranking algorithms' performance, this manuscript explores the validity of uncertainty assessment for weighted U-statistics. Without any commonly adopted assumption, we verify Efron's bootstrap and a new resampling procedure's inference validity. Specifically, in its full generality, our theory allows both kernels and weights asymmetric and data points not identically distributed, which are all new issues that historically have not been addressed. For achieving strict generalization, for example, we have to carefully control the order of the "degenerate" term in U-statistics which are no longer degenerate under the empirical measure for non-i.i.d. data. Our result applies to the motivating task, giving the region at which solid statistical inference can be made.

3 citations