scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1989"


BookDOI
01 Jan 1989
TL;DR: In this paper, the concept of principal components is introduced and a number of techniques related to principal component analysis are presented, such as using principal components to select a subset of variables for regression analysis, detecting outliers, and detecting influential observations.
Abstract: Introduction Basic Concepts of Principal Components Geometrical Properties of Principal Components Decomposition Properties of Principal Components Principal Components of Patterned Correlation Matrices Rotation of Principal Components Using Principal Components to Select a Subset of Variables Principal Components Versus Factor Analysis Uses of Principal Components in Regression Analysis Using Principal Components to Detect Outlying and Influential Observations Use of Principal Components in Cluster Analysis Use of Principal Components Analysis in Conjunction with Other Multivariate Analysis Procedures Other Techniques Related to Principal Components Summary and Conclusions

2,622 citations


Book
01 Sep 1989
TL;DR: This latent variable path modeling with partial least squares%0D will actually offer you the smart idea to be successful and will add even more expertise to life and also work much better.
Abstract: Partial Least Squares (PLS) is an estimation method and an algorithm for latent variable path (LVP) models. PLS is a component technique and estimates the latent variables as weighted aggregates. The implications of this choice are considered and compared to covariance structure techniques like LISREL, COSAN and EQS. The properties of special cases of PLS (regression, factor scores, structural equations, principal components, canonical correlation, hierarchical components, correspondence analysis, three-mode path and component analysis) are examined step by step and contribute to the understanding of the general PLS technique. The proof of the convergence of the PLS algorithm is extended beyond two-block models. Some 10 computer programs and 100 applications of PLS are referenced. The book gives the statistical underpinning for the computer programs PLS 1.8, which is in use in some 100 university computer centers, and for PLS/PC. It is intended to be the background reference for the users of PLS 1.8, not as textbook or program manual.

1,695 citations


Journal ArticleDOI
TL;DR: The main result is a complete description of the landscape attached to E in terms of principal component analysis, showing that E has a unique minimum corresponding to the projection onto the subspace generated by the first principal vectors of a covariance matrix associated with the training patterns.

1,456 citations


Journal ArticleDOI
TL;DR: A single neuron with Hebbian-type learning for the connection weights, and with nonlinear internal feedback, has been shown to extract the statistical principal components of its stationary input pattern sequence, which yields a multi-dimensional, principal component subspace.
Abstract: A single neuron with Hebbian-type learning for the connection weights, and with nonlinear internal feedback, has been shown to extract the statistical principal components of its stationary input pattern sequence. A generalization of this model to a layer of neuron units is given, called the Subspace Network, which yields a multi-dimensional, principal component subspace. This can be used as an associative memory for the input vectors or as a module in nonsupervised learning of data clusters in the input space. It is also able to realize a powerful pattern classifier based on projections on class subspaces. Some classification results for natural textures are given.

858 citations



Proceedings ArticleDOI
01 Jan 1989
TL;DR: A network of highly interconnected linear neuron-like processing units and a simple, local, unsupervised rule for the modification of connection strengths between these units are proposed, making the implementation of the network easier, faster, and biologically more plausible than rules depending on error propagation.
Abstract: A network of highly interconnected linear neuron-like processing units and a simple, local, unsupervised rule for the modification of connection strengths between these units are proposed. After training the network on a high (m) dimensional distribution of input vectors, the lower (n) dimensional output will be a projection into the subspace of the n largest principal components (the subspace spanned by the n eigenvectors of the largest eigenvalues of the input covariance matrix) and maximize the mutual information between the input and the output in the same way as principal component analysis does. The purely local nature of the synaptic modification rule (simple Hebbian and anti-Hebbian) makes the implementation of the network easier, faster, and biologically more plausible than rules depending on error propagation. >

282 citations


Journal ArticleDOI
TL;DR: A quantitative method for measuring the information capacity of an animal's ‘signature system’, i.e. the set of cues by which individuals are identified, is developed and may prove valuable for comparative analyses where evolutionary hypotheses predict one species to have a better developed signature system than another.

259 citations


Journal ArticleDOI
01 Dec 1989-EPL
TL;DR: A two-layered network of linear neurons that organizes itself in response to a set of presented patterns and proposes a local anti-Hebbian rule for lateral, hierarchically organized weights within the output layer.
Abstract: We present a two-layered network of linear neurons that organizes itself in response to a set of presented patterns. After completion of the learning process, the net transforms the complete information contained in a pattern into mutually independent features. The synaptic weights between layers obey a Hebbian learning rule. We propose a local anti-Hebbian rule for lateral, hierarchically organized weights within the output layer. For a proper choice of the learning parameters, the rule forces the activities of the output units to becomes uncorrelated and the lateral weights to vanish. The weights between the two layers converge to the eigenvectors of the covariance matrix of input patterns, i.e. the network performs a principal-component analysis of the input information.

250 citations


Journal ArticleDOI
TL;DR: A regression equation is presented for predicting parallel analysis values used to decide the number of principal components to retain and is appropriate for predicting criterion mean eigenvalues and was derived from random data sets containing between 5 and 50 variables and between 50 and 500 subjects.
Abstract: Monte Carlo research increasingly seems to favor the use of parallel analysis as a method for determining the "correct" number of factors in factor analysis or components in principal components analysis. We present a regression equation for predicting parallel analysis values used to decide the number of principal components to retain. This equation is appropriate for predicting criterion mean eigenvalues and was derived from random data sets containing between 5 and 50 variables and between 50 and 500 subjects. This relatively simple equation is more accurate for predicting mean eigenvalues from random data matrices with unities in the diagonals than a previously published equation. Moreover, given that the parallel analysis decision rule may be too dependent on chance, our equation is also used to predict the 95th percentile point in distributions of eigenvalues generated from random data matrices. Multiple correlations for all analyses were at least .95. Regression weights for predicting the first 33 mean and 95th percentile eigenvalues are given in easy-to-use tables.

234 citations


Book
07 Dec 1989
TL;DR: Statistics in medicine the design of medical investigations - clinical trials and other methods measurement in medicine statistical inference regression analysis repeated measures analysis of variance and the analysis of covariance crossover designs.
Abstract: Statistics in medicine the design of medical investigations - clinical trials and other methods measurement in medicine statistical inference regression analysis repeated measures analysis of variance and the analysis of covariance crossover designs the analysis of survival data multivariate data and principal components analysis statistical methods for classification - cluster analysis and assignment techniques time series analysis the analysis of observational techniques. Appendix: computers and statistics.

188 citations


Journal ArticleDOI
TL;DR: The method aspires to make maximal use of the possibilities of standard image analysis equipment and to combine the results obtained with more traditional image analysis techniques.

Journal ArticleDOI
TL;DR: In this paper, a multivariate procedure for spatial grouping of sampling sites is described. But the method is not suitable for soil survey data from two small areas in Britain and from a transect and the results of the latter are compared with those of strict segmentation.
Abstract: Earth scientists and land managers often wish to group sampling sites that are both similar with respect to their properties and near to one another on the ground. This paper outlines the geostatistical rationale for such spatial grouping and describes a multivariate procedure to implement it. Sample variograms are calculated from the original data or their leading principal components and then the parameters of the underlying functions are estimated. A dissimilarity matrix is computed for all sampling sites, preferably using Gower's general similarity coefficient. Dissimilarities are then modified using the variogram to incorporate the form and extent of spatial variation. A nonhierarchical classification of sampling sites is performed on the leading latent vectors of the modified dissimilarity matrix by dynamic clustering to an optimum. The technique is illustrated with results of its application to soil survey data from two small areas in Britain and from a transect. In the case of the latter results of spatially weighted classifications are compared with those of strict segmentation. An appendix lists a Genstat program for a spatially constrained classification using a spherical variogram as an example.

Journal ArticleDOI
TL;DR: In this paper, the mathematics behind principal component analysis and partial least squares regression is presented in detail, starting from the appropriate extrema conditions, and the meaning of the resultant vectors and many of their mathematical interrelationships are also presented.

Book
01 Jun 1989
TL;DR: A variety of applications of singular value decomposition in identification and signal processing and a novel method for reducing the computational load of SVD-based high discrimination algorithms.
Abstract: Parts: I. Tutorials. 1. Singular value decomposition: an introduction (P. Dewilde, E.F. Deprettere). 2. A variety of applications of singular value decomposition in identification and signal processing (J. Vandewalle, B. De Moor). 3. Eigen and singular value decomposition techniques for the solution of harmonic retrieval problems (M. Bouvet, H. Clergeot). 4. Advances in principal component signal processing (R.J. Vaccaro et al.). II: Model Reduction and Identification. 5. An overview of Hankel norm model reduction (A.C.M. Ran). 6. Identification of linear state space models with singular value decomposition using canonical correlation concepts (B. De Moor et al.). 7. Detection of multiple sinusoids in white noise: a signal enhancement approach (J.A. Cadzow et al.). III: Total Least Squares and GSVD. 8. The total least squares technique: computation, properties and applications (S. van Huffel, J. Vandewalle). 9. Oriented energy and oriented signal-to-signal ratio concepts in the analysis of vector sequences and time series (B. De Moor et al.). 10. ESPRIT - Estimation of signal parameters via rotational invariance techniques (R. Roy, T. Kailath). IV: Real-Time, Adaptive and Acceleration Algorithms. 11. On-line algorithm for signal separation based on SVD (D. Callaerts et al.). 12. A family of rank-one subspace updating methods (R.D. DeGroat, R.A. Roberts). 13. An array processing technique using the first principal component (P. Comon). 14. A novel method for reducing the computational load of SVD-based high discrimination algorithms (J.L. Mather). 15. Singular value decomposition of Frobenius Matrices for approximate and multi-objective signal processing tasks (E.A. Trachtenberg). V: Algorithms and Architectures. 16. On block Kogbetliantz methods for computation of the SVD (K.V. Fernando, S.J. Hammarling). 17. Reducing the number of sweeps in Hestenes' Method (P.C. Hansen). 18. Computational arrays for cyclic-by-rows Jacobi-algorithms (L. Thiele). 19. The symmetric tridiagonal eigenproblem on a custom linear array and hypercubes (E. de Doncker et al.). 20. Computing the singular value decomposition on the connection machine (L.M. Ewerbring, F.T. Luk). 21. Singular value decomposition on warp (M. Annaratone). 22. Execution of linear algebra operations on the SPRINT (A.J. De Groot et al.). VI. Resolution Limits, Enhancements and Questions. 23. An SVD analysis of resolution limits for harmonic retrieval problems (J.R. Casar, G. Cybenko). 24. A new application of SVD to harmonic retrieval (S. Mayrargue, J.P. Jouveau). 25. Retrieval of significant parameters from magnetic resonance signals via singular value decomposition (R. de Beer et al.).

Journal ArticleDOI
TL;DR: In this article, an explicit expression for the Cramer-Rao bounds and a calculation of the statistical perturbation of the covariance matrix due to additive noise are presented, applied to a statistical efficiency analysis of the main frequency estimation methods based on eigenvalue decomposition.
Abstract: An explicit expression for the Cramer-Rao bounds (CRBs) and a calculation of the statistical perturbation of the covariance matrix due to additive noise are presented. The results are applied to a statistical efficiency analysis of the main frequency estimation methods based on eigenvalue decomposition. For the covariance matrix, in order to characterize the perturbation of the signal subspace, only the component of the perturbation of the eigenvectors orthogonal to the subspace is considered. This gives a simpler and more significant form of the error covariance. The treatment includes the cases of forward-backward and moving averages. The CRB and estimation variances are calculated in the presence of additive random noise, but for a given set of amplitudes characterized by their sample covariance matrix. This approach is more realistic for the evaluation of efficiency in the small-sample case. >

Journal ArticleDOI
TL;DR: In this paper, a stochastic differential equation approach to principal component analysis is proposed, and the equations governing the spectrum of the square B T B of a n × p matrix of independent Brownian motions are given.

Journal ArticleDOI
TL;DR: In this paper, a set of a hundred aromatic substituents were multivariately characterized by nine descriptor variables taken from the literature using principal components analysis (PCA) techniques.
Abstract: A set of a hundred aromatic substituents were multivariately characterized by nine descriptor variables taken from the literature. From the 9*100 data set were calculated four principal properties for the aromatic substituents as the four first dimensions in a principal components analysis, PCA. The first three principal properties were used to develop a strategy for selecting substituents from eight subgroups according to a factorial design.

Journal ArticleDOI
TL;DR: This paper proposes a new algorithm to obtain an eigenvalue decomposition for the sample covariance matrix of a multivariate dataset, referred to as ROPRC, which is based on the rotation technique employed by Ammann and Van Ness (1988a,b) to obtain a robust solution to an errors-in-variables problem.
Abstract: This paper proposes a new algorithm to obtain an eigenvalue decomposition for the sample covariance matrix of a multivariate dataset. The algorithm is based on the rotation technique employed by Ammann and Van Ness (1988a,b) to obtain a robust solution to an errors-in-variables problem. When this rotation technique is combined with an iterative reweighting of the data, a robust eigenvalue decomposition is obtained. This robust eigenvalue decomposition has important applications to principal component analysis. Monte Carlo simulations are performed to compare ordinary principal component analysis using the standard eigenvalue decomposition with this algorithm, referred to as ROPRC. It is seen that ROPRC is reasonably efficient compared to an eigenvalue decomposition when Gaussian data is available, and that ROPRC is much better than the eigenvalue decomposition if outliers are present or if the data has a heavy-tailed distribution. The algorithm returns useful numerical diagnostic information in the form o...


Journal ArticleDOI
TL;DR: Methods of data analysis of multichannel recordings where components of evoked brain activity are identified quantitatively are illustrated, and the results of spatial PCA relate to experimental conditions in a meaningful way.
Abstract: Electroencephalographic data recorded for topographical analysis constitute multidimensional observations, and the present paper illustrates methods of data analysis of multichannel recordings where components of evoked brain activity are identified quantitatively. The computation of potential field strength (Global Field Power, GFP) is used for component latency determination. Multivariate statistical methods like Principal Component Analysis (PCA) may be applied to the topographical distribution of potential values. The analysis of statistically defined components of visually elicited brain activity is illustrated with data sets stemming from different experiments. With spatial PCA the dimensionality of multichannel data is reduced to only three components that account for more than 90% of the variance. The results of spatial PCA relate to experimental conditions in a meaningful way, and this method may also be used for time segmentation of topographic potential maps series.

Journal ArticleDOI
TL;DR: In this paper, a generalization of principal components analysis for the simultaneous analysis of a number of variables observed in several populations or on several occasions is presented, which is suitable for small and large data sets, respectively.
Abstract: Millsap and Meredith (1988) have developed a generalization of principal components analysis for the simultaneous analysis of a number of variables observed in several populations or on several occasions. The algorithm they provide has some disadvantages. The present paper offers two alternating least squares algorithms for their method, suitable for small and large data sets, respectively. Lower and upper bounds are given for the loss function to be minimized in the Millsap and Meredith method. These can serve to indicate whether or not a global optimum for the simultaneous components analysis problem has been attained.

Journal ArticleDOI
TL;DR: Two Macintosh programs written for multivariate data analysis and multivariateData graphical display are presented and GraphMu is designed for drawing collections of elementary graphics thus allowing comparisons between variables, individuals, and principal axes planes of multivariate methods.
Abstract: Two Macintosh programs written for multivariate data analysis and multivariate data graphical display are presented. MacMul includes principal component analysis (PCA), correspondence analysis (CA) and multiple correspondence analysis (MCA), with a complete, original and unified set of numerical aids to interpretation. GraphMu is designed for drawing collections of elementary graphics (curves, maps, graphical models) thus allowing comparisons between variables, individuals, and principal axes planes of multivariate methods. Both programs are self-documented applications and make full use of the user-oriented graphical interface of the Macintosh to simplify the process of analysing data sets. An example is described to show the results obtained on a small ecological data set.


Journal ArticleDOI
TL;DR: In this article, the authors evaluated Cabernet Sauvignon wines from four regions and Chardonnay wines from three vintages by descriptive analysis using principal component analysis (PCA) and by canonical variate analysis (CVA-Wine).
Abstract: Cabernet Sauvignon wines from four regions and Chardonnay wines from three vintages were evaluated by descriptive analysis. The sensory ratings were evaluated by principal component analysis (PCA) and by canonical variate analysis (CVA) using wines (CVA-Wine) and using regions or vintages (CVA-Group) as classification variables. PCA and CVA-Wine analyses provides similar results for both data sets. Whereas, the CVA-Group analyses demonstrated significant differences among regions or vintages, the variable configuration differed from the other two methods, reflecting the differences among groups rather than among the wines overall. To understand the structure of the data, CVA-Wine and PCA tests were superior to CVA-Group.

Journal ArticleDOI
TL;DR: Software for the analysis of multivariate chemical data by principal components and partial least squares methods is included on disk and contains options for the graphical display of scores and loadings for interpretation of the results of analyses.

Journal ArticleDOI
TL;DR: Classification of animals according to the principle of the relationship of the principal components (motivations) in determination of their behavior can be carried out on the basis of factor analysis.
Abstract: 1. The main portion (60%) of the variance of the behavior of Wistar rats in the open field test can be due to the effect of only three principal factors (components). 2. An analysis of the behavioral structure of these components showed that they can be designated as “exploration” (first component), “fear” (second), and “shifted activity” (third). 3. The overall motor activity in the open field test has a two-factor basis, and an investigation of the temporal dynamics of the given index is required for its factorial separation. 4. Classification of animals according to the principle of the relationship of the principal components (motivations) in determination of their behavior can be carried out on the basis of factor analysis.

Proceedings ArticleDOI
01 Jan 1989
TL;DR: It is shown to be possible to show that the learning trajectory will converge to the global minimum of the landscape under certain conditions of the starting weights and learning rate of the descent procedure.
Abstract: The behavior of a linear computing unit is analyzed during learning by gradient descent of a cost function equal to the sum of a variance maximization and a weight normalization term. The landscape of this cost function is shown to be composed of one local maximum, a set of saddle points, and one global minimum aligned with the principal components of the input patterns. It is possible to describe the cost landscape in terms of the hyperspheres, hypercrests, and hypervalleys associated with each of these principal components. Using this description, it is possible to show that the learning trajectory will converge to the global minimum of the landscape under certain conditions of the starting weights and learning rate of the descent procedure. Furthermore, it is possible to provide a precise description of the learning trajectory in this cost landscape. Extensions and implications of the algorithm are discussed by using networks of such cells. >

Journal ArticleDOI
TL;DR: Application de la methode d'analyse des composantes principales a la modelisation et a l'optimisation de reponses multivariables this paper was applied in this paper.
Abstract: Application de la methode d'analyse des composantes principales a la modelisation et a l'optimisation de reponses multivariables

Journal ArticleDOI
TL;DR: Parameter values for 59 common substituents and 74 descriptors used in QSAR studies were compiled and linear regression confirmed that lipophilicity can be factorized into two terms, one related to molecular bulk and the other to polarity.
Abstract: Parameter values for 59 common substituents and 74 descriptors used in QSAR studies were compiled. This data matrix was analysed by a variety of multivariate techniques. Linear regression confirmed that lipophilicity can be factorized into two terms, one related to molecular bulk and the other to polarity. Principal component analysis (PCA) of parameters revealed 5 significant principal components and a grouping of lipophilic, steric and electronic parameters. The different loadings of parameters with 5 PCA were also explored. The classification of substituents by cluster analysis (CA) proved rather disappointing. In contrast, the SIMCA method classified substituents of increasing bulk into 5 groups of increasing polarity.

Journal ArticleDOI
TL;DR: Grahn and Szeverenyi as mentioned in this paper used principal components as a novel way to condense information in magnetic resonance images, which is capable of extracting the most significant information from stacks of congruent images and of condensing them into a few principal component images.