Measuring statistical dependence with hilbert-schmidt norms
read more
Citations
A kernel two-sample test
Improved techniques for training GANs
Domain Adaptation via Transfer Component Analysis
A Comprehensive Survey on Transfer Learning
Kernel methods in machine learning
References
An information-maximization approach to blind separation and blind deconvolution
Probability Inequalities for sums of Bounded Random Variables
Independent Component Analysis
A Probabilistic Theory of Pattern Recognition
A New Learning Algorithm for Blind Signal Separation
Related Papers (5)
Frequently Asked Questions (16)
Q2. What is the principle underlying these algorithms?
The principle underlying these algorithms is that the authors may define covariance and cross-covariance operators in RKHSs, and derive statistics from these operators suited to measuring the dependence between functions in these spaces.
Q3. What is the way to test the independence of the ICA algorithm?
most specialised ICA algorithms exploit the linear mixing structure of the problem to avoid having to conduct a general test of independence, which makes the task of recovering A easier.
Q4. What is the covariance of the linear algebraic case?
In the linear algebraic case, the covariance is Cxx := Ex[xx >] − Ex[x]Ex[x >], while thecross-covariance is Cxy := Ex,y[xy >] − Ex[x]
Q5. What is the definition of a reproducing kernel Hilbert space?
Then F is a reproducing kernel Hilbert space if for each x ∈ X , the Dirac evaluation operator δx : F → , which maps f ∈ F to f(x) ∈ , is a bounded linear functional.
Q6. What is the definition of a one-sample U-statistic?
A one-sample U-statistic is defined as the random variableu := 1 (m)r∑imrg(xi1, . . . , xir),where g is called the kernel of the U-statistic.
Q7. What are the parameters used for the KCC and KGV?
In the case of the KCC and KGV, the authors use the parameters recommended by Bach and Jordan [2002]: namely, κ = 2 × 10−2 and σ = 1 for m ≤ 1000, κ = 2× 10−3 and σ = 0.5 for m > 1000 (σ being the kernel size, and κ the coefficient used to scale the regularising terms).
Q8. Why does the Laplace kernel improve on the Gaussian kernel?
This is because the slow decay of the eigenspectrum of the Laplace kernel improves the detection of dependence encoded at higher frequencies in the probability density function, which need not be related to the kurtosis — see [Gretton et al., 2005, Section 4.2].
Q9. What is the largest singular value of the spectral norm?
Proof According to Gretton et al. [2005], the largest singular value (i.e., the spectral norm) ‖Cxy‖S is zero if and only if x and y are independent, under the conditions specified in the theorem.
Q10. What is the definition of the cross-covariance operator?
Cross-Covariance Following Baker [1973], Fukumizu et al. [2004],5 the cross-covariance operator associated with the joint measure px,y on (X × Y, Γ × Λ) is a linear operator Cxy : G → F defined asCxy := Ex,y [(φ(x) − µx) ⊗ (ψ(y) − µy)] = Ex,y [φ(x) ⊗ ψ(y)] ︸ ︷︷ ︸:=C̃xy−µx ⊗ µy ︸ ︷︷ ︸:=Mxy. (6)Here (6) follows from the linearity of the expectation.
Q11. What is the way to test the dependence measures of a linear ICA?
That said, ICA is in general a good benchmark for dependence measures, in that it applies to a problem with a known “ground truth”, and tests that the dependence measures approach zero gracefully as dependent random variables are made to approach independence (through optimisation of the unmixing matrix).
Q12. What is the simplest way to define the kernels of the U-statistics?
y ′)] − 1(m)4∑im4Ki1i2Li3i4 ≥ 1 − α− βt Using the shorthand z := (x, y) the authors define the kernels of the U-statistics in the three expressions above as g(zi, zj) = KijLij, g(zi, zj, zr) = KijLjr and g(zi, zj, zq, zr) = KijLqr.
Q13. What is the HS norm of f g?
Then the tensor product operator f ⊗ g : G → F is defined as (f ⊗ g)h := f〈g, h〉G for all h ∈ G. (2) Moreover, by the definition of the HS norm, the authors can compute the HS norm of f ⊗ g via‖f ⊗ g‖2HS = 〈f ⊗ g, f ⊗ g〉HS = 〈f, (f ⊗ g)g〉F = 〈f, f〉F 〈g, g〉 G = ‖f‖2F‖g‖2G (3)Mean
Q14. What are the two important criterion for detecting dependence?
More importantly, however, the authors believe their proof assures that HSIC is indeed a dependence criterion under all circumstances (i.e., HSIC is zero if and only if the random variables are independent), which is not necessarily1 Respectively the Kernel Generalised Variance (KGV) and the Kernel Mutual Information (KMI) 2
Q15. What is the first experiment to use?
Their first experiment consisted in de-mixing data drawn independently from several distributions chosen at random with replacement from Table 1, and mixed with a random9
Q16. What is the advantage of HSIC, COCO, and the KMI?
A major advantage of HSIC, COCO, and the KMI is that these do not require any additional tuning beyond the selection of a kernel.