Nonlinear component analysis as a kernel eigenvalue problem
Summary (2 min read)
Introduction
- The authors describe a new method for performing a nonlinear form of Principal Component Anal ysis.
- In this paper the authors give some examples of non linear methods constructed by this approach.
- To gether these two sections form the basis for Sec which presents the proposed kernel based algo rithm for nonlinear PCA Following that Sec will discuss some di erences between kernel based PCA and other generalizations of PCA.
- To this end they substitute a priori chosen kernel functions for all occurances of dot products.
- In ex periments on classi cation based on the extracted principal components the authors found that in the non linear case it was su cient to use a linear Support Vector machine to construct the decision bound ary Linear Support Vector machines however are much faster in classi cation speed than non linear ones.
B Kernels Corresponding to Dot Products in Another Space
- In practise the authors are free to try to use also sym metric kernels of inde nite operators.
- In that case the matrix K can still be diagonalized and the authors can extract nonlinear feature values with the one modi cation that they need to modify their normal ization condition in order to deal with possi ble negative Eigenvalues K then induces a map ping to a Riemannian space with inde nite metric.
- In fact many symmetric forms may induce spaces with inde nite signature.
- In the following sections the authors shall give some ex amples of kernels that can be used for kernel PCA.
B Kernels Chosen A Priori
- The fact that the authors can use inde nite operators dis tinguishes this approach from the usage of kernels in the Support Vector machine in the latter the de niteness is necessary for the optimization procedure.
- The choice of c should depend on the range of the input variables and Neural Network type kernels k x y tanh x y b Interestingly these di erent types of kernels al low the construction of Polynomial Classi ers Ra dial Basis Function Classi ers and Neural Net works with the Support Vector algorithm which exhibit very similar accuracy.
B Local Kernels
- Locality in their context means that the principal component extraction should take into account only neighbourhoods.
- Depending on whether the authors consider neighbourhoods in input space or in an other space say the image space where the input vectors correspond to d functions the functions in locality can assume di erent meanings.
- This additional degree of freedom can greatly improve statistical estimates which are computed from a limited amount of data Bottou Vapnik.
B Constructing Kernels from other Kernels
- In other words the admissible kernels form a cone in the space of all integral operators Clearly k k corresponds to mapping into the direct sum of the respective spaces into which k and k map.
- Of course the authors could also explicitly do the principal component extraction twice for both kernels and decide ourselves on the respec tive numbers of components to extract.
Did you find this useful? Give us your feedback
Citations
[...]
38,208 citations
15,696 citations
Cites background from "Nonlinear component analysis as a k..."
...Recent work has generalized the basic ideas (Smola, Schölkopf and Müller, 1998a; Smola and Schölkopf, 1998), shown connections to regularization theory (Smola, Schölkopf and Müller, 1998b; Girosi, 1998; Wahba, 1998), and shown how SVM ideas can be incorporated in a wide range of other algorithms (Schölkopf, Smola and Müller, 1998b; Schölkopf et al, 1998c)....
[...]
...This fact has been used to derive a nonlinear version of principal component analysis by (Schölkopf, Smola and Müller, 1998b); it seems likely that this trick will continue to find uses elsewhere....
[...]
...…Müller, 1998a; Smola and Schölkopf, 1998), shown connections to regularization theory (Smola, Schölkopf and Müller, 1998b; Girosi, 1998; Wahba, 1998), and shown how SVM ideas can be incorporated in a wide range of other algorithms (Schölkopf, Smola and Müller, 1998b; Schölkopf et al, 1998c)....
[...]
...…(with each image suffering the same permutation), an act of vandalism that would leave the best performing neural networks severely handicapped) and much work has been done on incorporating prior knowledge into SVMs (Schölkopf, Burges andVapnik, 1996; Schölkopf et al., 1998a; Burges, 1998)....
[...]
...Keywords: support vector machines, statistical learning theory, VC dimension, pattern recognition...
[...]
11,201 citations
Cites methods from "Nonlinear component analysis as a k..."
...this geometric perspective adopt a non-parametric approach, based on a training set nearest neighbor graph (Schölkopf et al., 1998; Roweis and Saul, 2000; Tenenbaum et al., 2000; Brand, 2003; Belkin and Niyogi, 2003; Donoho and Grimes, 2003; Weinberger and Saul, 2004; Hinton and Roweis, 2003; van der Maaten and Hinton, 2008)....
[...]
...The large majority of algorithms built on this geometric perspective adopt a non-parametric approach, based on a training set nearest neighbor graph (Schölkopf et al., 1998; Roweis and Saul, 2000; Tenenbaum et al., 2000; Brand, 2003; Belkin and Niyogi, 2003; Donoho and Grimes, 2003; Weinberger and…...
[...]
10,696 citations
10,141 citations
References
40,147 citations
"Nonlinear component analysis as a k..." refers background or methods in this paper
...Clearly, the last point has yetto be evaluated in practise, however, for the Sup-port Vector machine, the utility of di erent kernelshas already been established (Sch olkopf, Burges,& Vapnik, 1995)....
[...]
...The general question which function k corre-sponds to a dot product in some space F hasbeen discussed by Boser, Guyon, & Vapnik (1992)and Vapnik (1995): Mercer's theorem of functionalanalysis states that if k is a continuous kernel ofa positive integral operator, we can construct amapping into a…...
[...]
...In addition, theyall construct their decision functions from an al-most identical subset of a small number of trainingpatterns, the Support Vectors (Sch olkopf, Burges,& Vapnik, 1995)....
[...]
...The number of components extracted thendetermines the size of of the rst hidden layer.Combining (24) with the Support Vector decisionfunction (Vapnik, 1995), we thus get machines ofthe typef(x) = sgn X̀i=1 iK2(~g(xi); ~g(x)) + b!...
[...]
...…convolutional 5{layerneural networks (5.0% were reported by LeCunet al., 1989) and nonlinear Support Vector classi- ers (4.0%, Sch olkopf, Burges, & Vapnik, 1995);it is far superior to linear classi ers operating di-rectly on the image data (a linear Support Vec-tor machine achieves 8.9%; Sch…...
[...]
37,861 citations
17,446 citations
14,562 citations
"Nonlinear component analysis as a k..." refers background or methods in this paper
...PCA has been successfully used for face recogni-tion (Turk & Pentland, 1991) and face representa-tion (Vetter & Poggio, 1995)....
[...]
...This is due to the fact that for k(x;y) = (x y), the Support Vector decision function (Boser, Guyon, & Vapnik, 1992) f(x) = sgn(X̀i=1 ik(x;xi) + b) (28) can be expressed with a single weight vector w = Pì=1 ixi asf(x) = sgn((x w) + b): (29) Thus the nal stage of classi cation can be done extremely fast; the speed of the principal component extraction phase, on the other hand, and thus the accuracy{speed tradeo of the whole classi er, can be controlled by the number of components which we extract, or by the above reduced set parameter m....
[...]
11,211 citations
Related Papers (5)
Frequently Asked Questions (3)
Q2. What is the meaning of k k?
In other words the admissible kernels form a cone in the space of all integral operators Clearly k k corresponds to mapping into the direct sum of the respective spaces into which k and k map
Q3. What is the definition of input space locality?
In input space locality consists of basing their com ponent extraction for a point x on other points in an appropriately chosen neighbourhood of x