scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1994"


Journal ArticleDOI
TL;DR: An efficient algorithm is proposed, which allows the computation of the ICA of a data matrix within a polynomial time and may actually be seen as an extension of the principal component analysis (PCA).

8,522 citations


Journal ArticleDOI
TL;DR: In this paper, a new variant of Factor Analysis (PMF) is described, where the problem is solved in the weighted least squares sense: G and F are determined so that the Frobenius norm of E divided (element-by-element) by σ is minimized.
Abstract: A new variant ‘PMF’ of factor analysis is described. It is assumed that X is a matrix of observed data and σ is the known matrix of standard deviations of elements of X. Both X and σ are of dimensions n × m. The method solves the bilinear matrix problem X = GF + E where G is the unknown left hand factor matrix (scores) of dimensions n × p, F is the unknown right hand factor matrix (loadings) of dimensions p × m, and E is the matrix of residuals. The problem is solved in the weighted least squares sense: G and F are determined so that the Frobenius norm of E divided (element-by-element) by σ is minimized. Furthermore, the solution is constrained so that all the elements of G and F are required to be non-negative. It is shown that the solutions by PMF are usually different from any solutions produced by the customary factor analysis (FA, i.e. principal component analysis (PCA) followed by rotations). Usually PMF produces a better fit to the data than FA. Also, the result of PF is guaranteed to be non-negative, while the result of FA often cannot be rotated so that all negative entries would be eliminated. Different possible application areas of the new method are briefly discussed. In environmental data, the error estimates of data can be widely varying and non-negativity is often an essential feature of the underlying models. Thus it is concluded that PMF is better suited than FA or PCA in many environmental applications. Examples of successful applications of PMF are shown in companion papers.

4,797 citations


Journal ArticleDOI
TL;DR: The approach is contrasted with other approaches which use theoretical or knowledge-based models, and its potential is illustrated using a detailed simulation study of a semibatch reactor for the production of styrene-butadiene latex.
Abstract: Multivariate statistical procedures for monitoring the progress of batch processes are developed. The only information needed to exploit the procedures is a historical database of past successful batches. Multiway principal component analysis is used to extract the information in the multivariate trajectory data by projecting them onto low-dimensional spaces defined by the latent variables or principal components. This leads to simple monitoring charts, consistent with the philosophy of statistical process control, which are capable of tracking the progress of new batch runs and detecting the occurrence of observable upsets. The approach is contrasted with other approaches which use theoretical or knowledge-based models, and its potential is illustrated using a detailed simulation study of a semibatch reactor for the production of styrene-butadiene latex.

1,435 citations


Journal ArticleDOI
TL;DR: An unconventional procedure (fuzzy coding) to structure biological and environmental information, which uses positive scores to describe the affinity of a species for different modalities (i.e. categories) of a given variable is presented.
Abstract: SUMMARY 1We present an unconventional procedure (fuzzy coding) to structure biological and environmental information, which uses positive scores to describe the affinity of a species for different modalities (i.e. categories) of a given variable. Fuzzy coding is essential for the synthesis of long-term ecological data because it enables analysis of diverse kinds of biological information derived from a variety of sources (e.g. samples, literature). 2A fuzzy coded table can be processed by correspondence analysis. An example using aquatic beetles illustrates the properties of such a fuzzy correspondence analysis. Fuzzy coded tables were used in all articles of this issue to examine relationships between spatial-temporal habitat variability and species traits, which were obtained from a long-term study of the Upper Rhone River, France. 3Fuzzy correspondence analysis can be programmed with the equations given in this paper or can be performed using ADE (Environmental Data Analysis) software that has been adapted to analyse such long-term ecological data. On Macintosh AppleTM computers, ADE performs simple linear ordination, more recently developed methods (e.g. principal component analysis with respect to instrumental variables, canonical correspondence analysis, co-inertia analysis, local and spatial analyses), and provides a graphical display of results of these and other types of analysis (e.g. biplot, mapping, modelling curves). 4ADE consists of a program library that exploits the potential of the HyperCardTM interface. ADE in an open system, which offers the user a variety of facilities to create a specific sequence of programs. The mathematical background of ADE is supported by the algebraic model known as ‘duality diagram’.

784 citations


Journal ArticleDOI
TL;DR: Co-inertia analysis as mentioned in this paper is an extension of the analysis of cross tables previously attempted by others, which is particularly suitable for the simultaneous detection of faunistic and environmental features in studies of ecosystem structure.
Abstract: SUMMARY 1Methods used for the study of species–environment relationships can be grouped into: (i) simple indirect and direct gradient analysis and multivariate direct gradient analysis (e.g. canonical correspondence analysis), all of which search for non-symmetric patterns between environmental data sets and species data sets; and (ii) analysis of juxtaposed tables, canonical correlation analysis, and intertable ordination, which examine species–environment relationships by considering each data set equally. Different analytical techniques are appropriate for fulfilling different objectives. 2We propose a method, co-inertia analysis, that can synthesize various approaches encountered in the ecological literature. Co-inertia analysis is based on the mathematically coherent Euclidean model and can be universally reproduced (i.e. independently of software) because of its numerical stability. The method performs simultaneous analysis of two tables. The optimizing criterion in co-inertia analysis is that the resulting sample scores (environmental scores and faunistic scores) are the most covariant. Such analysis is particularly suitable for the simultaneous detection of faunistic and environmental features in studies of ecosystem structure. 3The method was demonstrated using faunistic and environmental data from Friday (Freshwater Biology 18, 87-104, 1987). In this example, non-symmetric analyses is inappropriate because of the large number of variables (species and environmental variables) compared with the small number of samples. 4Co-inertia analysis is an extension of the analysis of cross tables previously attempted by others. It serves as a general method to relate any kinds of data set, using any kinds of standard analysis (e.g. principal components analysis, correspondence analysis, multiple correspondence analysis) or between-class and within-class analyses.

759 citations


Journal ArticleDOI
TL;DR: The method of Parallel Factor Analysis, which simultaneously fits multiple two-way arrays or ‘slices’ of a three-way array in terms of a common set of factors with differing relative weights in each ‘slice’ is reviewed.

502 citations


Book
07 Jun 1994
TL;DR: The Ordinary Principal Component Model Factor Analysis Factor Analysis of Correlated Observations Ordinal and Nominal Random Data Other Models for Discrete Data Factor Analysis and Least Squares Regression Exercises References.
Abstract: Preliminaries Matrixes, Vector Spaces The Ordinary Principal Components Model Statistical Testing of the Ordinary Principal Components Model Extensions of the Ordinary Principal Components Model Factor Analysis Factor Analysis of Correlated Observations Ordinal and Nominal Random Data Other Models for Discrete Data Factor Analysis and Least Squares Regression Exercises References Index.

448 citations


Journal ArticleDOI
TL;DR: A class of nonlinear PCA (principal component analysis) type learning algorithms is derived by minimizing a general statistical signal representation error and several known algorithms emerge as special cases of these optimization approaches that provide useful information on the properties of the algorithms.

396 citations


Book
01 May 1994
TL;DR: This text explains to new readers the various methods of multivariate analysis used in archaeological practice, including: principal component analysis; correspondence analysis; cluster analysis; and discriminant analysis.
Abstract: This text explains to new readers the various methods of multivariate analysis used in archaeological practice. It focuses on the techniques available, including: principal component analysis; correspondence analysis; cluster analysis; and discriminant analysis. Critically reviewing their use in practice, the book describes other areas in which they could be usefully applied. Methods where software packages are available are emphasized.

365 citations


Journal ArticleDOI
TL;DR: Principal components analysis and projections to latent structures are generalized to dynamically updated models for modelling processes with memory and drift and predictive control schemes based on these models are discussed.

251 citations


Journal ArticleDOI
TL;DR: A neural network model (APEX) for multiple principal component extraction that is applicable to the constrained PCA problem where the signal variance is maximized under external orthogonality constraints and the exponential convergence of the network is formally proved.
Abstract: The authors describe a neural network model (APEX) for multiple principal component extraction. All the synaptic weights of the model are trained with the normalized Hebbian learning rule. The network structure features a hierarchical set of lateral connections among the output units which serve the purpose of weight orthogonalization. This structure also allows the size of the model to grow or shrink without need for retraining the old units. The exponential convergence of the network is formally proved while there is significant performance improvement over previous methods. By establishing an important connection with the recursive least squares algorithm they have been able to provide the optimal size for the learning step-size parameter which leads to a significant improvement in the convergence speed. This is in contrast with previous neural PCA models which lack such numerical advantages. The APEX algorithm is also parallelizable allowing the concurrent extraction of multiple principal components. Furthermore, APEX is shown to be applicable to the constrained PCA problem where the signal variance is maximized under external orthogonality constraints. They then study various principal component analysis (PCA) applications that might benefit from the adaptive solution offered by APEX. In particular they discuss applications in spectral estimation, signal detection and image compression and filtering, while other application domains are also briefly outlined. >

Journal ArticleDOI
TL;DR: There is an application section demonstrating several rules of interpretation of loading plots with examples taken from environmental chemistry, analysis of complex round robin tests and contamination analysis in tungsten wire production.

Journal ArticleDOI
TL;DR: The results suggest that the first principal component shows qualitatively different behavior from higher principal components and is associated with apparent barrier crossing events on an anharmonic conformational energy surface.
Abstract: A comparison is made between a 200-ps molecular dynamics simulation in vacuum and a normal mode analysis on the protein bovine pancreatic trypsin inhibitor (BPTI) in order to elucidate the dual aspects of harmonicity and anharmonicity in the dynamics of proteins. The molecular dynamics trajectory is analyzed using principal component analysis, an effective harmonic analysis suited for comparison with the results from the normal mode analysis. The results suggest that the first principal component shows qualitatively different behavior from higher principal components and is associated with apparent barrier crossing events on an anharmonic conformational energy surface. The higher principal components appear to have probability distributions that are well approximated by Gaussians, indicating harmonicity. Eliminating the contribution from the first principal component reveals a great deal of correspondence between the 2 methods. This correspondence, however, involves a factor of 2, as the variances of the distribution of the higher principal components are, on average, roughly twice those found from the normal mode analysis. A model is proposed to reconcile these results with those from previous analyses.

Journal ArticleDOI
TL;DR: In this paper, a nonlinear generalization of principal components analysis (PCA) is developed for curve and surface reconstruction and to data summarization, and a principal surface of the data is constructed adaptively, using some ideas from the MARS procedure of Friedman.
Abstract: We develop a nonlinear generalization of principal components analysis. A principal surface of the data is constructed adaptively, using some ideas from the MARS procedure of Friedman. We explore applications to curve and surface reconstruction and to data summarization.

Proceedings ArticleDOI
29 Jun 1994
TL;DR: The authors present an NLPCA method which integrates the principal curve algorithm and neural networks and the results show that the method is excellent for solving nonlinear principal component problems.
Abstract: Many applications of principal component analysis (PCA) can be found in the literature But principal component analysis is a linear method, and most engineering problems are nonlinear Sometimes using the linear PCA method in nonlinear problems can bring distorted and misleading results So there is a need for a nonlinear principal component analysis (NLPCA) method The principal curve algorithm was a breakthrough of solving the NLPCA problem, but the algorithm does not yield an NLPCA model which can be used for predictions In this paper the authors present an NLPCA method which integrates the principal curve algorithm and neural networks The results on both simulated and real problems show that the method is excellent for solving nonlinear principal component problems Potential applications of NLPCA are also discussed in this paper

Journal ArticleDOI
TL;DR: In this study a data base of heterogeneous organic compounds from the guinea pig maximization test has been subjected to multivariate QSAR analysis and the structural alerts may be better employed in an expert system, to identify potential hazard, where they will not suffer the limitations of a statistical model.
Abstract: There is a regulatory requirement for the potential of a new chemical to cause skin sensitization to be assessed. This requirement is presently fulfilled by the use of animal tests. In this study a data base of heterogeneous organic compounds from the guinea pig maximization test has been subjected to multivariate QSAR analysis. The compounds were described both by whole molecule parameters and structural features associated with likely sites of reactivity. Principal component analysis was applied to the data set and although it functions reasonably well to reduce the dimensionality of a large data matrix, it is only moderately useful as a predictive tool when descriptors were chosen rationally. Stepwise discriminant analysis produces a fourteen parameter model, of which twelve were structural features associated with reactivity. This however predicts only 82.6% of compounds correctly after cross validation. There is trend for the linear discriminant analysis model to predict compounds as non sensitizers, suggesting that the parameters incorporated were not wholly suitable for discriminating between the two classes. Another criticism of linear discriminant analysis is that it may be unable to cope with the likely embedded data structure. With this in mind, the structural alerts may be better employed in an expert system, to identify potential hazard, where they will not suffer the limitations of a statistical model.

Journal ArticleDOI
TL;DR: The effects of the alignment procedure on the PCA is demonstrated for a set of chromatographic profiles intended for peptide mapping and the problem has been analysed in terms of parameter variations for exponentially modified Gaussian peaks.

Proceedings Article
01 Jan 1994
TL;DR: An EM-based algorithm in which the M-step is computationally straightforward principal components analysis (PCA), and incorporating tangent-plane information about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance.
Abstract: We construct a mixture of locally linear generative models of a collection of pixel-based images of digits, and use them for recognition. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their log-likelihoods under each model. We use an EM-based algorithm in which the M-step is computationally straightforward principal components analysis (PCA). Incorporating tangent-plane information [12] about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance.

Journal ArticleDOI
TL;DR: This article uses data published recently in this journal, to show how PCA can assist in their evaluation, and attempts to bridge the gap between the theory and the applications of PCA.

Journal ArticleDOI
TL;DR: To record three-dimensional coordinates of the joints from normal human subjects during locomotion, a digital motion analysis system (ELITE) was used and it was possible to accurately resolve small distributed changes in gait patterns within subjects.
Abstract: To record three-dimensional coordinates of the joints from normal human subjects during locomotion, we used a digital motion analysis system (ELITE). Recordings were obtained under several different conditions, which included normal walking and stepping over obstacles. Principal component analysis was used to analyze coordinate data after conversion of the data to segmental angles. This technique gave a stable summary of the redundancy in gait kinematic data in the form of reduced variables (principal components). By modeling the shapes of the phase plots of reduced variables (distortion analysis) and using a limited number of model parameters, good resolution was obtained between subtly different conditions. Hence, it was possible to accurately resolve small distributed changes in gait patterns within subjects. These methods seem particularly suited to longitudinal studies in which relevant movement features are not known a priori. Assumptions and neurophysiological applications are discussed.

Journal ArticleDOI
TL;DR: In this paper, RQ-mode principal components analysis (PCA) is used for calculating variable and object loadings on the same axes, so that elements can be displayed along with data points on a single diagram.
Abstract: RQ-mode principal components analysis (PCA) is a means for calculating variable and object loadings on the same axes, so that elements can be displayed along with data points on a single diagram. The biplots resulting from RQ-mode PCA preserve both Euclidean relations among the objects and variance-covariance structure. When used with data on the chemical composition of archaeological pottery, such biplots facilitate recognizing compositional subgroups and determining the chemical basis of group separation. RQ-mode PCA is illustrated in this paper with neutron activation data on Mesoamerican Plumbate pottery.

Journal ArticleDOI
TL;DR: The proposed methodology can enhance clinically interesting information in a dynamic PET imaging sequence in the first few principal component images and thus should be able to aid in the identification of structures for further analysis.
Abstract: Multivariate image analysis can be used to analyse multivariate medical images The purpose could be to visualize or classify structures in the image One common multivariate image analysis technique which can be used for visualization purposes is principal component analysis (PCA) The present work concerns visualization of organs and structures with different kinetics in a dynamic sequence utilizing PCA When applying PCA on positron emission tomography (PET) images, the result is initially not satisfactory It is illustrated that one major explanation for the behaviour of PCA when applied to PET images is that it is a data-driven technique which cannot separate signals from high noise levels With a better understanding of the PCA, gained with a strategy of examining the image data set, the transformations, and the results using visualization tools, a surprisingly easily understood methodology can be derived The proposed methodology can enhance clinically interesting information in a dynamic PET imaging sequence in the first few principal component images and thus should be able to aid in the identification of structures for further analysis

Journal ArticleDOI
TL;DR: Cases with unresolved chromatographic peaks, where the diagnosis and subsequent resolution using common procedures from evolutionary factor analysis fails, are investigated and discussed in some detail and a new procedure is developed, called sequential rank analysis, to solve the problem of embedded peaks.

Journal ArticleDOI
TL;DR: The studies presented here support the idea that the information useful for solving seemingly complex tasks such as face categorization or identification can be described using simple linear models (linear autoassociator or principal component analysis) in conjunction with a pixel-based coding of the faces.
Abstract: Recent statistical/neural network models of face processing suggest that faces can be efficiently represented in terms of the eigendecomposition of a matrix storing pixel-based descriptions of a set of face images. The studies presented here support the idea that the information useful for solving seemingly complex tasks such as face categorization or identification can be described using simple linear models (linear autoassociator or principal component analysis) in conjunction with a pixel-based coding of the faces.

Journal ArticleDOI
TL;DR: The objectives of this study were to make a broader evaluation of PCA and multiple regression analysis (MRA) and to establish guidelines under which one approach is preferable to the other.

Journal ArticleDOI
TL;DR: The matrix for the noise-adjusted principal components (NAPC) transform is the solution of a generalized symmetric eigenvalue problem applied to remote sensing imagery, this entails the simultaneous diagonalization of data and noise covariance matrices.
Abstract: The matrix for the noise-adjusted principal components (NAPC) transform is the solution of a generalized symmetric eigenvalue problem. Applied to remote sensing imagery, this entails the simultaneous diagonalization of data and noise covariance matrices. One of the two PC transforms of the original NAPC transform is replaced by several short, fast procedures. The total operation count for the computation of the NAPC transform matrix is halved. >

Journal ArticleDOI
TL;DR: The asymptotic convergence of the network to the principal (normalized) singular vectors of the cross-correlation matrix of two stochastic signals is proved and simulation results suggest that the convergence is exponential.
Abstract: In this paper we provide theoretical foundations for a new neural model for singular value decomposition based on an extension of the Hebbian learning rule called the cross-coupled Hebbian rule. The model is extracting the SVD of the cross-correlation matrix of two stochastic signals and is an extension on previous work on neural-network-related principal component analysis (PCA). We prove the asymptotic convergence of the network to the principal (normalized) singular vectors of the cross-correlation and we provide simulation results which suggest that the convergence is exponential. The new model may have useful applications in the problems of filtering for signal processing and signal detection. >

Proceedings ArticleDOI
29 Jun 1994
TL;DR: In this paper, the authors examined and compared various stopping rules and cross-validation procedures for the selection of the number of principal components to retain for the development of models for the on-line disturbance detection and isolation in statistical process control.
Abstract: This paper examines and compares various stopping rules and cross-validation procedures for the selection of the number of principal components to retain for the development of models for the on-line disturbance detection and isolation in statistical process control The specific methods investigated are: the percent of variance, the scree test, parallel analysis, and the PRESS statistic Although comparisons of this type have been published previously for other types of problems, where the data used were static, it is necessary to determine what will work best for the problems of disturbance detection and isolation utilizing data that might also have dynamic information

Journal ArticleDOI
TL;DR: In this paper, the generalized rank annihilation method (GRAM) is compared and a slightly different eigenvalue problem is derived, which facilitates a comparison with other PCA-based methods for curve resolution and calibration.
Abstract: SUMMARY Rank annihilation factor analysis (RAFA) is a method for multicomponent calibration using two data matrices simultaneously, one for the unknown and one for the calibration sample. In its most general form, the generalized rank annihilation method (GRAM), an eigenvalue problem has to be solved. In this first paper different formulations of GRAM are compared and a slightly different eigenvalue problem will be derived. The eigenvectors of this specific eigenvalue problem constitute the transformation matrix that rotates the abstract factors from principal component analysis (PCA) into their physical counterparts. This reformulation of GRAM facilitates a comparison with other PCA-based methods for curve resolution and calibration. Furthermore, we will discuss two characteristics common to all formulations of GRAM, i.e. the distinct possibility of a complex and degenerate solution. It will be shown that a complex solution-contrary to degeneracy-should not arise for components present in both samples for model data.

Journal ArticleDOI
TL;DR: In this article, the potential of principal component regression (PCR) for mixture resolution by UV-visible spectrophotometry was assessed, and the number of significant principal components was determined on the basis of four different criteria.
Abstract: The potential of principal component regression (PCR) for mixture resolution by UV-visible spectrophotometry was assessed. For this purpose, a set of binary mixtures with Gaussian bands was simulated, and the influence of spectral overlap on the precision of quantification was studied. Likewise, the results obtained in the resolution of a mixture of components with extensively overlapped spectra were investigated in terms of spectral noise and the criterion used to select the optimal number of principal components. The model was validated by cross-validation, and the number of significant principal components was determined on the basis of four different criteria. Three types of noise were considered: intrinsic instrumental noise, which was modeled from experimental data provided by an HP 8452A diode array spectrophotometer; constant baseline shifts; and baseline drift. Introducing artificial baseline alterations in some samples of the calibration matrix was found to increase the reliability of the proposed method in routine analysis. The method was applied to the analysis of mixtures of Ti, Al, and Fe by resolving the spectra of their 8-hydroxyquinoline complexes previously extracted into chloroform.