A consistent multivariate test of association based on ranks of distances

doi:10.1093/BIOMET/ASS070

Home
/
Papers
/
A consistent multivariate test of association based on ranks of distances

Journal Article•DOI•

A consistent multivariate test of association based on ranks of distances

Ruth Heller¹, Yair Heller, Malka Gorfine²•Institutions (2)

Tel Aviv University¹, Technion – Israel Institute of Technology²

01 Jun 2013-Biometrika (Oxford University Press)-Vol. 100, Iss: 2, pp 503-510

TL;DR: In this paper, the problem of detecting associations between random vectors of any dimension is considered and a powerful test that is applicable in all dimensions and consistent against all alternatives is proposed. But the test has a simple form, is easy to implement, and has good power.

read less

Abstract: SUMMARY We consider the problem of detecting associations between random vectors of any dimension. Few tests of independence exist that are consistent against all dependent alternatives. We propose a powerful test that is applicable in all dimensions and consistent against all alternatives. The test has a simple form, is easy to implement, and has good power.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

The Analysis of Variance.

[...]

Thomas E. Kurtz, H. Scheffe

01 Nov 1960-American Mathematical Monthly

1,275 citations

Journal Article•DOI•

Equitability, mutual information, and the maximal information coefficient

[...]

Justin B. Kinney¹, Gurinder S. Atwal¹•Institutions (1)

Cold Spring Harbor Laboratory¹

04 Mar 2014-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is argued that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality, and shown that estimating mutual information provides a natural and practical method for equitably quantifying associations in large datasets.

...read moreread less

Abstract: How should one quantify the strength of association between two random variables without bias for relationships of a specific form? Despite its conceptual simplicity, this notion of statistical "equitability" has yet to receive a definitive mathematical formalization. Here we argue that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality. Mutual information, a fundamental quantity in information theory, is shown to satisfy this equitability criterion. These findings are at odds with the recent work of Reshef et al. [Reshef DN, et al. (2011) Science 334(6062):1518-1524], which proposed an alternative definition of equitability and introduced a new statistic, the "maximal information coefficient" (MIC), said to satisfy equitability in contradistinction to mutual information. These conclusions, however, were supported only with limited simulation evidence, not with mathematical arguments. Upon revisiting these claims, we prove that the mathematical definition of equitability proposed by Reshef et al. cannot be satisfied by any (nontrivial) dependence measure. We also identify artifacts in the reported simulation evidence. When these artifacts are removed, estimates of mutual information are found to be more equitable than estimates of MIC. Mutual information is also observed to have consistently higher statistical power than MIC. We conclude that estimating mutual information provides a natural (and often practical) way to equitably quantify statistical associations in large datasets.

...read moreread less

524 citations

Proceedings Article•

Efficient Estimation of Mutual Information for Strongly Dependent Variables

[...]

Shuyang Gao¹, Greg Ver Steeg¹, Aram Galstyan¹•Institutions (1)

Information Sciences Institute¹

11 May 2015

TL;DR: This work introduces a new estimator that is robust to local non-uniformity, works well with limited data, and is able to capture relationship strengths over many orders of magnitude.

...read moreread less

Abstract: We demonstrate that a popular class of nonparametric mutual information (MI) estimators based on k-nearest-neighbor graphs requires number of samples that scales exponentially with the true MI. Consequently, accurate estimation of MI between two strongly dependent variables is possible only for prohibitively large sample size. This important yet overlooked shortcoming of the existing estimators is due to their implicit reliance on local uniformity of the underlying joint distribution. We introduce a new estimator that is robust to local non-uniformity, works well with limited data, and is able to capture relationship strengths over many orders of magnitude. We demonstrate the superior performance of the proposed estimator on both synthetic and real-world data.

...read moreread less

124 citations

Cites background from "A consistent multivariate test of a..."

...While several problems (Simon and Tibshirani, 2014; Gorfine et al.) and alternatives (Heller et al., 2013; Székely et al., 2009) were pointed out, Kinney and Atwal (KA) were the first to point out that MIC’s apparent superiority to MI was actually due to flaws in estimation (Kinney and Atwal,…...
[...]
...) and alternatives (Heller et al., 2013; Székely et al., 2009) were pointed out, Kinney and Atwal (KA) showed that MIC’s apparent superiority to MI was actually due to flaws in estimation (Kinney and Atwal, 2014)....
[...]

Journal Article•DOI•

A comparative study of statistical methods used to identify dependencies between gene expression signals

[...]

Suzana de Siqueira Santos, Daniel Y. Takahashi, Asuka Nakata, André Fujita

01 Nov 2014-Briefings in Bioinformatics

TL;DR: This work seeks to summarize the main methods used to identify dependency between random variables, especially gene expression data, and also to evaluate the strengths and limitations of each method.

...read moreread less

Abstract: One major task in molecular biology is to understand the dependency among genes to model gene regulatory networks. Pearson's correlation is the most common method used to measure dependence between gene expression signals, but it works well only when data are linearly associated. For other types of association, such as non-linear or non-functional relationships, methods based on the concepts of rank correlation and information theory-based measures are more adequate than the Pearson's correlation, but are less used in applications, most probably because of a lack of clear guidelines for their use. This work seeks to summarize the main methods (Pearson's, Spearman's and Kendall's correlations; distance correlation; Hoeffding's D: measure; Heller-Heller-Gorfine measure; mutual information and maximal information coefficient) used to identify dependency between random variables, especially gene expression data, and also to evaluate the strengths and limitations of each method. Systematic Monte Carlo simulation analyses ranging from sample size, local dependence and linear/non-linear and also non-functional relationships are shown. Moreover, comparisons in actual gene expression data are carried out. Finally, we provide a suggestive list of methods that can be used for each type of data set.

...read moreread less

112 citations

Cites background or methods from "A consistent multivariate test of a..."

...Heller, Heller and Gorfine [11] propose a test of independence based on the distances among values of X and Y, i.e. dðxi, xjÞ and dðyi, yjÞ for i, j 2 f1, . . . ng, respectively....
[...]
...To estimate the P-value under H0, a permutation test [9,11] can be used to test if dCor 1⁄4 0 (which occurs if and only if dCov 1⁄4 0)....
[...]
...Heller, Heller and Gorfine measure Heller, Heller and Gorfine [11] propose a test of independence based on the distances among values of X and Y, i....
[...]
...measure [11], mutual information (MI) [12] and...
[...]
...Methods that are applicable in multivariate scenarios are distance correlation and HHG [9,11]....
[...]

Posted Content•

Efficient Estimation of Mutual Information for Strongly Dependent Variables

[...]

Shuyang Gao¹, Greg Ver Steeg¹, Aram Galstyan¹•Institutions (1)

Information Sciences Institute¹

07 Nov 2014-arXiv: Information Theory

TL;DR: In this article, a nonparametric mutual information (MI) estimator based on k-nearest-neighbor graphs is proposed, which is robust to local non-uniformity and works well with limited data.

...read moreread less

107 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

Collapse

References

PDF

Open Access

More filters

Journal Article•

R: A language and environment for statistical computing.

[...]

R Core Team

01 Jan 2014-MSOR connections

TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.

...read moreread less

Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

...read moreread less

272,030 citations

Journal Article•DOI•

The Analysis of Variance

[...]

Henry Scheffé¹•Institutions (1)

University of California, Berkeley¹

01 Jun 1960-Soil Science

TL;DR: In this paper, the basic theory of analysis of variance by considering several different mathematical models is examined, including fixed-effects models with independent observations of equal variance and other models with different observations of variance.

...read moreread less

Abstract: Originally published in 1959, this classic volume has had a major impact on generations of statisticians. Newly issued in the Wiley Classics Series, the book examines the basic theory of analysis of variance by considering several different mathematical models. Part I looks at the theory of fixed-effects models with independent observations of equal variance, while Part II begins to explore the analysis of variance in the case of other models.

...read moreread less

5,728 citations

Journal Article•DOI•

Measuring and testing dependence by correlation of distances

[...]

Gábor J. Székely, Maria L. Rizzo, Nail K. Bakirov

01 Dec 2007-Annals of Statistics

TL;DR: Distance correlation is a new measure of dependence between random vectors that is based on certain Euclidean distances between sample elements rather than sample moments, yet has a compact representation analogous to the classical covariance and correlation.

...read moreread less

Abstract: Distance correlation is a new measure of dependence between random vectors. Distance covariance and distance correlation are analogous to product-moment covariance and correlation, but unlike the classical definition of correlation, distance correlation is zero only if the random vectors are independent. The empirical distance dependence measures are based on certain Euclidean distances between sample elements rather than sample moments, yet have a compact representation analogous to the classical covariance and correlation. Asymptotic properties and applications in testing independence are discussed. Implementation of the test and Monte Carlo results are also presented.

...read moreread less

2,042 citations

"A consistent multivariate test of a..." refers background or methods or result in this paper

...Moreover, our aim is to investigate the performanceof our test for nonmonotone relationships, and these classical tests, or related tests for higher dimensions found in Taskinen et al. (2005), are ineffective for testingnon-monotone types of dependence (Szekely et al., 2007)....
[...]
...In the following two examples from Szekely et al. (2007), none of the likelihood ratio type of tests considered performed well....
[...]
...Szekely et al. (2007) considered multivariate examples andcompared them to like- lihood ratio type of tests....
[...]
...We revisit some of the examples of Szekely et al. (2007), and add new examples....
[...]
...A very elegant test with a simple formula is provided in Szekely et al. (2007), and has been further investigated in Szekely and Rizzo (2009) and in the discussions that followed it....
[...]

Journal Article•DOI•

Applied smoothing techniques for data analysis : the kernel approach with S-plus illustrations

[...]

Adrian Bowman, Adelchi Azzalini

01 Sep 1999-Journal of the American Statistical Association

TL;DR: 1. Density estimation for exploring data 2. D density estimation for inference 3. Nonparametric regression for explore data 4. Inference with nonparametric regressors 5. Checking parametric regression models 6. Comparing regression curves and surfaces

...read moreread less

Abstract: 1. Density estimation for exploring data 2. Density estimation for inference 3. Nonparametric regression for exploring data 4. Inference with nonparametric regression 5. Checking parametric regression models 6. Comparing regression curves and surfaces 7. Time series data 8. An introduction to semiparametric and additive models References

...read moreread less

1,424 citations

"A consistent multivariate test of a..." refers background in this paper

...ft designs during the twentieth century. They consider two variables, wing span (m) and speed (km/h) for the 230 designs of the third (of three) periods. This example and the data (aircraft) are from Bowman and Azzalini (1997). They showed that the dCov test of independence of log(Speed) and log(Span) in period 3 is signiﬁcant (p-value ≤ 0.00001), while the Pearson correlation test is not signiﬁcant (p-value = 0.8001). Our...
[...]

Journal Article•DOI•

The Analysis of Variance.

[...]

Thomas E. Kurtz, H. Scheffe

01 Nov 1960-American Mathematical Monthly

1,275 citations