# A sequential approach for multi-class discriminant analysis with kernels

## Summary (2 min read)

### 1. INTRODUCTION

- Fisher linear discriminant analysis (LDA) is a classical multivariate technique both for dimension reduction and classification.
- This strategy allows low dimensional representations by using the first variates corresponding to the largest eigenvalues indicating that the major part of the information in the data is conserved.
- Kernel-based methods are categorized into nonlinear transformation techniques for representation and for classification.
- Recently, a powerful method for obtaining a nonlinear extension of the LDA method has been proposed and referred to as the Generalized Discriminant Analysis (GDA) [1].
- Experiments showing the validity of their approach are proposed in Section 5.

### 2. SCATTER MATRICES FOR SEPARABILITY CRITERIA

- For convenience and intelligibility, the authors show hereafter all scatter matrices in the feature space F .
- This reflects the notion that performing a nonlinear data transformation into some specific high dimensional feature spaces increases the probability of having linearly separable classes within the transformed space.
- The mixture scatter matrix in F is the covariance matrix of all samples regardless of their class assignments.

### 3. GDA METHOD IN FEATURE SPACE

- The GDA method consists in finding the transformation matrix W that in some sense maximizes the ratio of the between-class scatter to the within-class scatter.
- The columns of an optimal W are the generalized eigenvectors that correspond to the largest eigenvalues in1 Bwi = λiSwi. (7) By observing (7), the authors can conclude that deriving the GDA solutions may be a computationally intractable problem since they have to work in F which may be a very high, or even infinitely, dimensional space.
- Note that the largest eigenvalue of (7) leads to the maximum quotient of the inertia [1] λ = wtBw wtSw .
- Here, κ can be any kernel that satisfies the Mercer condition.
- Memory and complexity problems can arise for the GDA method when the authors deal with large number of patterns since they have to perform an eigenvectors decomposition of K .

### 4. SEQUENTIAL GDA METHOD

- Thus it may arises in the eigenvector decomposition the same problems of storage and complexity calculation as with the standard GDA.
- Note that calculating (19) may be a computationally intractable problem.
- (b) Gives the projection of the whole examples on the first two axes using the sequential approach.
- Then, the same algorithm described at the beginning of this section can be applied with Knew.

### 5. EXPERIMENTS

- The Iris data consist of 150 4-dimension examples of three classes [1] (each class consists on 50 examples).
- One class is linearly separable from two other non-linearly separable classes.
- Figure 1 (a) shows the projection of the three classes on the first axe, that was obtained with the sequential GDA.
- The first axe seems to be sufficient to separate the data.

### 6. CONCLUSION

- The authors have presented a sequential approach to calculate nonlinear features based on the GDA method proposed by [1].
- The importance in the proposed algorithm for sequential GDA is that it does not need the inversion or even the storage of the Gram matrix of size (n, n).
- The weakness of their approach is that the complexity increases with the number of axes to be found.

Did you find this useful? Give us your feedback

##### Citations

1,189 citations

1 citations

### Cites background from "A sequential approach for multi-cla..."

...Recently the LDA and NLDA are being revisited in the solution of difficult pattern recognition problems like Human face recognition [6]-[16]....

[...]

##### References

^{1}

40,147 citations

### "A sequential approach for multi-cla..." refers methods in this paper

...However, by using the theory of reproducing kernels [4, 6, 7, 8, 9], such a problem can be solved without explicitly (1)Bwi = ρiV wi and Bwi = λiSwi are equivalent eigenvalue equations with identical solutions wi....

[...]

...Fortunately, the LDA algorithm can be reformulated into dot product form in F and there is a highly effective trick to compute scalar products in some feature spaces using kernel functions which satisfy the Mercer condition [4]....

[...]

...Support Vector Machines (SVMs) were introduced and first applied as alternatives to multilayer neural networks [4]....

[...]

20,541 citations

### "A sequential approach for multi-cla..." refers background in this paper

...A judicious criterion function is the ratio [1, 3, 5] J(W ) = |W BW | |W V W | (6) where |X | indicates the determinant of a matrix X ....

[...]

5,506 citations

### "A sequential approach for multi-cla..." refers methods in this paper

...However, by using the theory of reproducing kernels [4, 6, 7, 8, 9], such a problem can be solved without explicitly (1)Bwi = ρiV wi and Bwi = λiSwi are equivalent eigenvalue equations with identical solutions wi....

[...]

3,566 citations

### "A sequential approach for multi-cla..." refers methods in this paper

...However, by using the theory of reproducing kernels [4, 6, 7, 8, 9], such a problem can be solved without explicitly (1)Bwi = ρiV wi and Bwi = λiSwi are equivalent eigenvalue equations with identical solutions wi....

[...]

2,896 citations