Candid covariance-free incremental principal component analysis
Summary (2 min read)
1 INTRODUCTION
- A class of image analysis techniques called appearance-based approach has now become very popular.
- Further, when the dimension of the image is high, both the computation and storage complexity grow dramatically.
- Several IPCA techniques have been proposed to compute principal components without the covariance matrix [9], [10], [11].
- An amnesic average technique is also used to dynamically determine the retaining rate of the old and new data, instead of a fixed learning rate.
2.1 The First Eigenvector
- Suppose that sample vectors are acquired sequentially, uð1Þ; uð2Þ; . . . , possibly infinite.
- Now, the question is how to estimate xðiÞ in (2).
- The authors are with the Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824.
- Procedure (6) is at the mercy of the magnitude of observation uðnÞ, where the first term has a unit norm, but the second can take any magnitude.
- In (4), the statistical efficiency is realized by keeping the scale of the estimate at the same order of the new observations (the first and second terms properly weighted on the right side of (4) to get sample mean), which allows full use of every observation in terms of statistical efficiency.
2.2 Intuitive Explanation
- An intuitive explanation of procedure (4) is as follows: Consider a set of two-dimensional data with a Gaussian probability distribution function (for any other physically arising distribution, the authors can consider its first two orders of statistics since PCA does so).
- Noticing uT ðnÞ v1ðnÿ 1Þjjv1ðnÿ 1Þjj is a scalar, the authors know 1 n uðnÞuT ðnÞ v1ðnÿ 1Þjjv1ðnÿ 1Þjj is essentially a scaled vector of uðnÞ.
- For the points uu in the upper half plane, the pure force will pull v1ðnÿ 1Þ toward the direction of v1 since there are more data points to the right side of v1ðnÿ 1Þ than those to the left side.
- V1ðnÿ 1Þwill not stop moving until it is aligned with v1 when the pulling forces from both sides are balanced.
- In other words, v1ðnÞ in (4) will converge to the first eigenvector.
2.3 Higher-Order Eigenvectors
- Procedure (4) only estimates the first dominant eigenvector.
- To compute the second order eigenvector, the authors first subtract from the data its projection on the estimated first order eigenvector v1ðnÞ, as shown in (9), u2ðnÞ ¼ u1ðnÞ ÿ uT1 ðnÞ v1ðnÞ jjv1ðnÞjj v1ðnÞ jjv1ðnÞjj ; ð9Þ where u1ðnÞ ¼ uðnÞ.
- The orthogonality is always enforced when the convergence is reached, although not exactly so at early stages.
- In either case, the statistical efficiency was not considered.
- One may notice that the expensive steps in both SGA and CCIPCA are the dot products in the high-dimensional data space.
2.4 Equal Eigenvalues
- Let us consider the case where there are equal eigenvalues.
- Therefore, the estimate of eigenvectors ei, where i < l, will not be affected anyway.
- Since their eigen- values are equal, the shape of the distribution in Fig. 1 is a hyper- sphere within the subspace.
- Thus, the estimates of the multiple eigenvectors will converge to any set of the orthogonal basis of that subspace.
- Where it converges to depends mainly on the early samples because of the averaging effect in (2), where the contribution of new data gets infinitely small when n increases without a bound.
2.5 Algorithm Summary
- Combining the mechanisms discussed above, the authors have the candid covariance-free IPCA algorithm as follows: Procedure 1. (b) Otherwise, viðnÞ ¼ nÿ1ÿln viðnÿ.
- 1Þ þ 1þln uiðnÞuTi ðnÞ viðnÿ1Þ jjviðnÿ1Þjj ; (10) uiþ1ðnÞ ¼ uiðnÞ ÿ uTi ðnÞ viðnÞ jjviðnÞjj viðnÞ jjviðnÞjj : (11) A mathematical proof of the convergence of CCIPCA can be founded in [12].
3 EMPIRICAL RESULTS ON CONVERGENCE
- The authors performed experiments to study the statistical efficiency of the new algorithm as well as the existing IPCA algorithms, especially for high-dimensional data such as images.
- In contrast, the proposed CCIPCA converges fast.
- Shown in Fig. 7 are the first 10 eigenfaces estimated by batch PCA and CCIPCA (with the amnesic parameter l ¼ 2) after one epoch and 20 epochs, respectively.
- For the general readership, an experiment was done on a lower dimension data set.
- The authors extracted 10 x 10 pixel subimages around the right eye area in each image of the FERET data set, estimated their sample covariance matrix , and used MATLAB to generate 1,000 samples with the Gaussian distribution Nð0; Þ in the 100-dimensional space.
4 CONCLUSIONS AND DISCUSSIONS
- This short paper concentrates on a challenging issue of computing dominating eigenvectors and eigenvalues from an incrementally arriving high-dimensional data stream without computing the corresponding covariance matrix and without knowing data in advance.
- An amnesic average technique is implemented to further improve the convergence rate.
- The importance of the result presented here is potentially beyond the apparent technical scope interesting to the computer vision community.
- As discussed in [7], what a human brain does is not just computing—processing data—but, more importantly and more fundamentally, developing the computing engine itself, from real-world, online sensory data streams.
- The link between incremental PCA and the developmental mechanisms of their brain is probably more intimate than one can fully appreciate now.the authors.
Did you find this useful? Give us your feedback
Citations
7,563 citations
1,622 citations
664 citations
616 citations
Cites background from "Candid covariance-free incremental ..."
...For Fig 8, there are convergence guarantees on learning eigendecompositions by SGD (discussed in citation Weng et al (2003) and Supplement, Section "Learning eigenvectors by stochastic gradient descent")....
[...]
448 citations
References
19,881 citations
Additional excerpts
...(b) Otherwise, viðnÞ ¼ nÿ1ÿln viðnÿ 1Þ þ 1þln uiðnÞuTi ðnÞ viðnÿ1Þ jjviðnÿ1Þjj ; (10) uiþ1ðnÞ ¼ uiðnÞ ÿ uTi ðnÞ viðnÞ jjviðnÞjj viðnÞ jjviðnÞjj : (11) A mathematical proof of the convergence of CCIPCA can be founded in [12]....
[...]
14,562 citations
"Candid covariance-free incremental ..." refers methods in this paper
...The new method is motivated by the concept of statistical efficiency (the estimate has the smallest variance given the observed data)....
[...]
7,563 citations
7,518 citations
"Candid covariance-free incremental ..." refers methods in this paper
...Index Terms—Principal component analysis, incremental principal component analysis, stochastic gradient ascent (SGA), generalized hebbian algorithm (GHA), orthogonal complement. æ...
[...]
Related Papers (5)
Frequently Asked Questions (6)
Q2. What is the common approach to PCA?
A well-known computational approach to PCA involves solving an eigensystem problem, i.e., computing the eigenvectors and eigenvalues of the sample covariance matrix, using a numerical method such as the power method and the QR method [6].
Q3. What is the way to compute the eigenvector?
Start with a set of orthonormalized vectors, update them using the suggested iteration step, and recover the orthogonality using GSO.
Q4. Why is the coefficient n 1=n in (4) important?
This is true because all the “observations,” i.e., the last term in (4) and (6), contribute to the estimate in (4) with the same weight for statistical efficiency, but they contribute unequally in (6) due to normalization of vðnÿ 1Þ in the first term and, thus, damage the efficiency.
Q5. What is the way to compute a eigenvector?
Kreyszig proposed an algorithm which finds the first eigenvectorusing a method equivalent to SGA and subtracts the firstcomponent from the samples before computing the next compo-nent [17].
Q6. What is the learning rate of a correction term?
Simply speaking, the learning rate should be appropriate so that the second term (the correction term) on the right side of (6) is comparable to the first term, neither too large nor too small.