scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1991"


Book
13 Mar 1991
TL;DR: In this paper, the authors present a directory of Symbols and Definitions for PCA, as well as some classic examples of PCA applications, such as: linear models, regression PCA of predictor variables, and analysis of variance PCA for Response Variables.
Abstract: Preface.Introduction.1. Getting Started.2. PCA with More Than Two Variables.3. Scaling of Data.4. Inferential Procedures.5. Putting It All Together-Hearing Loss I.6. Operations with Group Data.7. Vector Interpretation I : Simplifications and Inferential Techniques.8. Vector Interpretation II: Rotation.9. A Case History-Hearing Loss II.10. Singular Value Decomposition: Multidimensional Scaling I.11. Distance Models: Multidimensional Scaling II.12. Linear Models I : Regression PCA of Predictor Variables.13. Linear Models II: Analysis of Variance PCA of Response Variables.14. Other Applications of PCA.15. Flatland: Special Procedures for Two Dimensions.16. Odds and Ends.17. What is Factor Analysis Anyhow?18. Other Competitors.Conclusion.Appendix A. Matrix Properties.Appendix B. Matrix Algebra Associated with Principal Component Analysis.Appendix C. Computational Methods.Appendix D. A Directory of Symbols and Definitions for PCA.Appendix E. Some Classic Examples.Appendix F. Data Sets Used in This Book.Appendix G. Tables.Bibliography.Author Index.Subject Index.

3,534 citations


Journal ArticleDOI
TL;DR: The NLPCA method is demonstrated using time-dependent, simulated batch reaction data and shows that it successfully reduces dimensionality and produces a feature space map resembling the actual distribution of the underlying system parameters.
Abstract: Nonlinear principal component analysis is a novel technique for multivariate data analysis, similar to the well-known method of principal component analysis. NLPCA, like PCA, is used to identify and remove correlations among problem variables as an aid to dimensionality reduction, visualization, and exploratory data analysis. While PCA identifies only linear correlations between variables, NLPCA uncovers both linear and nonlinear correlations, without restriction on the character of the nonlinearities present in the data. NLPCA operates by training a feedforward neural network to perform the identity mapping, where the network inputs are reproduced at the output layer. The network contains an internal “bottleneck” layer (containing fewer nodes than input or output layers), which forces the network to develop a compact representation of the input data, and two additional hidden layers. The NLPCA method is demonstrated using time-dependent, simulated batch reaction data. Results show that NLPCA successfully reduces dimensionality and produces a feature space map resembling the actual distribution of the underlying system parameters.

2,643 citations


Journal ArticleDOI
TL;DR: In this paper, the theory of polynomial splines is applied to multivariate data analysis, where spline smoothing relies on a partition of a function space into two orthogonal subspaces, one containing the obvious or structural components of variation among a set of observed functions, and the other of which contains residual components.
Abstract: Multivariate data analysis permits the study of observations which are finite sets of numbers, but modern data collection situations can involve data, or the processes giving rise to them, which are functions. Functional data analysis involves infinite dimensional processes and/or data. The paper shows how the theory of L-splines can support generalizations of linear modelling and principal components analysis to samples drawn from random functions. Spline smoothing rests on a partition of a function space into two orthogonal subspaces, one of which contains the obvious or structural components of variation among a set of observed functions, and the other of which contains residual components. This partitioning is achieved through the use of a linear differential operator, and we show how the theory of polynomial splines can be applied more generally with an arbitrary operator and associated boundary constraints. These data analysis tools are illustrated by a study of variation in temperature-precipitation patterns among some Canadian weather-stations.

833 citations


Journal Article
TL;DR: A simple principal component color composite image can then be created in which anomalous concentrations of hydroxyl, hyroxyl plus iron-oxide, and ironoxide are displayed in red-green-blue (RGB) color space.
Abstract: Reducing the number of image bands input for principal component analysis (PCA) ensures that certain materials will not be mapped and increases the likelihood that others will be unequivocally mapped into only one of the principal component images. In arid terrain, PCA of four TM bands will avoid iron-oxide and thus more reliably detect hydroxyl-bearing minerals if only one input band is from the visible spectrum. PCA for iron-oxide mapping will avoid hydroxyls if only one of the SWIR bands is used. A simple principal component color composite image can then be created in which anomalous concentrations of hydroxyl, hydroxyl plus iron-oxide, and iron-oxide are displayed brightly in red-green-blue (RGB) color space. This composite allows qualitative inferences on alteration type and intensity to be made which can be widely applied.

359 citations



Journal ArticleDOI
TL;DR: Results show that the theoretically derived normalization of the data set substantially improves the classification of vapours and beverages and it is suggested that the separation may be improved further by employing other sensor types or processing techniques.
Abstract: Mathematical expressions describing the response of individual sensors and arrays of tin oxide gas sensors are derived from a barrier-limited electron mobility model. From these expressions, the fractional change in conductance is identified as the optimal response parameter with which to characterize sensor array performance instead of the more usual relative conductance. In an experimental study, twelve tin oxide gas sensors are exposed to five alcohols and six beverages, and the responses are studied using pattern-recognition methods. Results of regression and supervised learning analysis show a high degree of colinearity in the data with a subset of only five sensors needed for classification. Principal component analysis and clustering methods are applied to the response of the tin oxide sensors to all the vapours. The results show that the theoretically derived normalization of the data set substantially improves the classification of vapours and beverages. The individual alcohols are separated out into five distinct clusters, whereas the beverages cluster into only three distinct classes, namely, beers, lagers and spirits. It is suggested that the separation may be improved further by employing other sensor types or processing techniques.

317 citations



01 Jan 1991
TL;DR: The bootstrap procedure as mentioned in this paper provides a way to test the significance of the regression coefficients and the stability of the estimates in response functions generated by regression on principal components A subroutine RESBO, which calculates a bootstrapped response function, has been added to Fritts' program PRECON.
Abstract: The bootstrap procedure provides a way to test the significance of the regression coefficients and the stability of the estimates in response functions generated by regression on principal components A subroutine RESBO, which calculates a bootstrapped response function, has been added to Fritts' program PRECON The principle of the response function is described in Fritts (1976) and discussed in Hughes, et al (1982) To avoid problems with the great number of predictors and their inter correlation, Fritts et al (1971) introduced regression on principal components As with all regression methods, the main problems with this procedure are testing the significance of the coefficients and the stability of the estimates The response function obtained on a sample is considered satisfactory only if it explains the growth over independent years The most straightforward way to assess the stability is to divide climatic and tree -ring data into a dependent calibration set and an independent verification set (Fritts 1976) If the set of tree -ring indices estimated from the verification -set climate data using the regression coefficients that were derived from the calibration data set is close to the observed values, the response function is judged as reliable Gordon et al (1982) clearly set out the problem of verifying the predictive ability of a model calibrated on one data set when applied to another data set Because regression coefficients are validated only to the dependent data, they result in overconfidence in the predictive power of the model We can be convinced of that by simulating tree -ring indices by random numbers and by calculating response functions with real climatic data (Guiot 1981; Cropper 1985) These authors showed that simulated tree -ring series also can produce regression coefficients judged significant by standard Student's tests This result is due mainly to an inadequate number of degrees of freedom To test regression coefficients, Student's test involves n -k -1 degrees of freedom where n is the number of observations and k the number of regressors If k is set to the number of principal components actually introduced into the regression on the basis of their correlation with the predictand (stepwise regression), the significance of the coefficients is overestimated; therefore, the number k must be chosen by a priori considerations independent of the predictand A good practice is to select a relatively large number of principal components taking into account say 90 or 95% of the variance of the climatic data or using the PVP criterion of Guiot (1981, 1985) The number k is then the number of princi-

251 citations


Journal ArticleDOI
TL;DR: In this paper, a method for structural analysis of multivariate data is proposed that combines features of regression analysis and principal component analysis, which is based on the generalized singular value decomposition of a matrix with certain metric matrices.
Abstract: A method for structural analysis of multivariate data is proposed that combines features of regression analysis and principal component analysis. In this method, the original data are first decomposed into several components according to external information. The components are then subjected to principal component analysis to explore structures within the components. It is shown that this requires the generalized singular value decomposition of a matrix with certain metric matrices. The numerical method based on the QR decomposition is described, which simplifies the computation considerably. The proposed method includes a number of interesting special cases, whose relations to existing methods are discussed. Examples are given to demonstrate practical uses of the method.

188 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed a formula for coefficient alpha for a principal component, first derived in 1957 by Kaiser, and its use in the Kaiser-Guttman Rule for the number of components is discussed, both in theory and in practice.
Abstract: The formula for coefficient alpha for a principal component, first derived in 1957 by Kaiser, is developed. Its use in the Kaiser-Guttman Rule for the number of components is discussed, both in theory and in practice, with Hotelling's (1933) original correlation matrix.

152 citations


Journal ArticleDOI
TL;DR: A number of methods for the analysis of three-way data are described and shown to be variants of principal components analysis (PCA) of the two-way supermatrix in which each twoway slice is "strung out" into a column vector.
Abstract: A number of methods for the analysis of three-way data are described and shown to be variants of principal components analysis (PCA) of the two-way supermatrix in which each two-way slice is “strung out” into a column vector The methods are shown to form a hierarchy such that each method is a constrained variant of its predecessor A strategy is suggested to determine which of the methods yields the most useful description of a given three-way data set


Journal ArticleDOI
TL;DR: In this article, a simple structure rotation of a PCAMIX solution based on the rotation of component scores is proposed, which can be viewed as generalizations of simple structure methods for PCA.
Abstract: Several methods have been developed for the analysis of a mixture of qualitative and quantitative variables, and one, called PCAMIX, includes ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases. The present paper proposes several techniques for simple structure rotation of a PCAMIX solution based on the rotation of component scores and indicates how these can be viewed as generalizations of the simple structure methods for PCA. In addition, a recently developed technique for the analysis of mixtures of qualitative and quantitative variables, called INDOMIX, is shown to construct component scores (without rotational freedom) maximizing the quartimax criterion over all possible sets of component scores. A numerical example is used to illustrate the implication that when used for qualitative variables, INDOMIX provides axes that discriminate between the observation units better than do those generated from MCA.

Journal ArticleDOI
01 Feb 1991-Ecology
TL;DR: It is concluded that the eigenvalue tests can be used to detect patterns in PCA's of assemblage structure data, if the number of samples is at least three times thenumber of species and either a covariance or correlation matrix solution is used.
Abstract: We examined the ability of eigenvalue tests to distinguish field-collected from random, assemblage structure data sets. Eight published time series of species abun- dances were used in the analysis, including data sets for: fishes, birds, mammals, stream benthos, and crabs. To test the efficacy of eigenvalue tests, we constructed 1000 randomly generated data sets for each real data set, whose means and variances were identical to the means and variances of the original data matrices. The data sets were then subjected to a principal components analysis (PCA) and eigenvalue tests used to identify significant ei- genvalues for both correlation and covariance matrix solutions. We also examined the effects of: (1) number of species (= number of variables), (2) number of samples (= rep- lication), and (3) variance structure, on the performance of the test. Using PCA's based on the correlation matrix and with sample sizes typically encountered in the field, the eigenvalue tests generally performed at the .05 level when a = .01. Slightly poorer results were obtained with the covariance matrix. Increasing the number of samples to at least three times the number of species generally gave a level coverage for an a level test (i.e., a = .05, .01). Increasing variance in the data set only affected test outcomes at levels of replication less than twice the number of species. We conclude that the eigenvalue tests can be used to detect patterns in PCA's of assemblage structure data, if the number of samples is at least three times the number of species and either a covariance or correlation matrix solution is used. It is assumed that these patterns represent ecologically meaningful patterns of variation.

Book ChapterDOI
01 Jan 1991
TL;DR: A method for extracting a small number of parameters from the whole of an image, which can then be used for characterisation, recognition and reconstruction, and which is both theoretically more attractive, and more effective in practice.
Abstract: We describe a method based on Principal Component Analysis for extracting a small number of parameters from the whole of an image. These parameters can then be used for characterisation, recognition and reconstruction. The method itself is by no means new, and has a number of obvious flaws. In this paper we suggest improvements, based on purely theoretical considerations, in which the image is preprocessed using prior knowledge of the content. The subsequent Principal Component Analysis (PCA) is both theoretically more attractive, and more effective in practice. We present the work in the context of face recognition, but the method has much wider applicability.

Journal ArticleDOI
TL;DR: In this article, a singular value decomposition (SVD) of the triaxial data matrix produces an eigenanalysis of the covariance matrix and a rotation of the data onto the directions given by the eigen analysis, all in one step.
Abstract: Polarization analysis can be achieved efficiently by treating a time window of a single‐station triaxial recording as a matrix and doing a singular value decomposition (SVD) of this seismic data matrix. SVD of the triaxial data matrix produces an eigenanalysis of the data covariance (cross‐energy) matrix and a rotation of the data onto the directions given by the eigenanalysis (Karhunen‐Loeve transform), all in one step. SVD provides a complete principal components analysis of the data in the analysis time window. Selection of this time window is crucial to the success of the analysis and is governed by three considerations: the window should contain only one arrival; the window should be such that the signal‐to‐noise ratio is maximized; and the window should be long enough to be able to discriminate random noise from signal. The SVD analysis provides estimates of signal, signal polarization directions, and noise. An F‐test is proposed which gives the confidence level for the hypothesis of rectilinear pol...

Journal ArticleDOI
TL;DR: In this article, a form of empirical onthogonal function (principal component analysis) based upon the singular value decomposition is used to identify the minimum data set necessary to describe the general circulation and identify the dominant physical characteristics of the North Atlantic Ocean.

Journal ArticleDOI
TL;DR: In this article, principal component analysis has been used as a digital filter to improve the overall quality of gas chromatography/mass spectrometry (GC/MS) data sets.
Abstract: Principal component analysis has been evaluated as a digital filter to improve the overall quality of gas chromatography/mass spectrometry (GC/MS) data sets. Data are initially read into a matrix, scaled, and then processed by using the NI-PALS algorithm, which is used to separate signal from the matrix. By use of a six-component solvent mixture with samples of from 0.5 to 150 pg of each component, significant improvements in mass spectral quality and spectral matches were observed. Signal to noise was improved by a factor of from 2 to 100

Journal ArticleDOI
TL;DR: The Karhunen-Loève (K-L) transform was used to quantify the temporal distribution of spikes in the responses of lateral geniculate (LGN) neurons to compare the amount of stimulus-related information carried by LGN neurons when two codes were assumed: first, a univariate code based on response strength alone; and second, a multivariate temporal codebased on the coefficients of the first three principal components.
Abstract: 1. We used the Karhunen-Loeve (K-L) transform to quantify the temporal distribution of spikes in the responses of lateral geniculate (LGN) neurons. The basis functions of the K-L transform are a set of waveforms called principal components, which are extracted from the data set. The coefficients of the principal components are uncorrelated with each other and can be used to quantify individual responses. The shapes of each of the first three principal components were very similar across neurons. 2. The coefficient of the first principal component was highly correlated with the spike count, but the other coefficients were not. Thus the coefficient of the first principal component reflects the strength of the response, whereas the coefficients of the other principal components reflect aspects of the temporal distribution of spikes in the response that are uncorrelated with the strength of the response. Statistical analysis revealed that the coefficients of up to 10 principal components were driven by the stimuli. Therefore stimuli govern the temporal distribution as well as the number of spikes in the response. 3. Through the application of information theory, we were able to compare the amount of stimulus-related information carried by LGN neurons when two codes were assumed: first, a univariate code based on response strength alone; and second, a multivariate temporal code based on the coefficients of the first three principal components. We found that LGN neurons were able to transmit an average of 1.5 times as much information using the three-component temporal code as they could using the strength code. 4. The stimulus set we used allowed us to calculate the amount of information each neuron could transmit about stimulus luminance, pattern, and contrast. All neurons transmitted the greatest amount of information about stimulus luminance, but they also transmitted significant amounts of information about stimulus pattern. This pattern information was not a reflection of the luminance or contrast of the pixel centered on the receptive field. 5. In addition to measuring the average amount of information each neuron transmitted about all stimuli, we also measured the amount of information each neuron transmitted about the individual stimuli with both the univariate spike count code and the multivariate temporal code. We then compared the amount of information transmitted per stimulus with the magnitudes of the responses to the individual stimuli. We found that the magnitudes of both the univariate and the multivariate responses to individual stimuli were poorly correlated with the information transmitted about the individual stimuli.(ABSTRACT TRUNCATED AT 400 WORDS)

Journal ArticleDOI
TL;DR: The results support the conclusion that the TC model is a promising alternative to principal components for data reduction and analysis of MEP's.
Abstract: We describe a substantive application of the trilinear topographic components /parallel factors model (TC/PARAFAC, due to Mocks/Harshman) to the decomposition of multichannel evoked potentials (MEP's). We provide practical guidelines and procedures for applying PARAFAC methodology to MEP decomposition. Specifically, we apply techniques of data preprocessing, orthogonality constraints, and validation of solutions in a complete TC analysis, for the first time using actual MEP data. The TC model is shown to be superior to the traditional bilinear principal components model in terms of data reduction, confirming the advantage of the TC model's added assumptions. The model is then shown to provide a unique spatiotemporal decomposition that is reproducible in different subject groups. The components are shown to be consistent with spatial/temporal features evident in the data, except for an artificial component resulting from latency jitter. Subject scores on this component are shown to reflect peak latencies in the data, suggesting a new aspect to statistical analyses based on subject scores. In general, the results support the conclusion that the TC model is a promising alternative to principal components for data reduction and analysis of MEP's.

Journal ArticleDOI
TL;DR: An improved method of automatic image segmentation, the principal component transformation split-and-merge clustering (PCTSMC) algorithm, is presented and applied to cloud screening of both nighttime and daytime AVHRR data.

Journal ArticleDOI
TL;DR: In this paper, a fast and accurate method for determining the sucrose content of sugar cane juice has been developed by using principal component regression (PCR) for the development of a prediction equation of the sugar content by mid-infrared spectroscopy.
Abstract: A fast and accurate method for determining the sucrose content of sugar cane juice has been developed. The application of principal component regression (PCR) has been proposed for the development of a prediction equation of sucrose content by mid-infrared spectroscopy. An attenuated total reflectance (ATR) cell is used in place of the more familiar transmission cell. PCR involves two steps: (1) the creation of new synthetic variables by principal component analysis (PCA) of spectral data, and (2) multiple linear regression (MLR) with these new variables. Results obtained by this procedure have been compared with those obtained by the conventional application of polarization.

Book ChapterDOI
01 Jan 1991
TL;DR: It is shown that the PCI method requires much less data to produce a given, needed level of SNR with higher probability than the sample matrix inverse (SMI) method based on the inverse of the sample covariance matrix.
Abstract: We present a simpler and more general analysis of our previously proposed Principal Component Inverse (PCI) method of rapidly adaptive nulling of interference. We assume that the interference consists of a strong gaussian component with a rank deficient covariance matrix plus a weak component of white gaussian background noise. We derive an approximate Beta probability density function of the output signal-to-noise ratio (SNR) for the PCI method. Using our theoretically derived formulas, we show that the PCI method requires much less data to produce a given, needed level of SNR with high probability than the Sample Matrix Inverse (SMI) method based on the inverse of the sample covariance matrix. The approximations and the final probability density function are tested through computer simulation. They accurately explain the experimental results.

Journal ArticleDOI
TL;DR: In this article, the spectral and texture features were computed for individual trees selected from the video data within and around two of the test plots and the most significant features were selected to form a numerical maple decline index model, which was then applied to evaluate decline of trees in all test plots.

Journal ArticleDOI
TL;DR: Simulations suggest that a generalized version of Oja's PCA neuron can detect such a dth order principal component, and high-order correlation analysis may prove increasingly useful as data from large networks of cells engaged in information processing becomes available.

Journal ArticleDOI
TL;DR: Pattern recognition techniques based on multivariate analysis have been most useful in processing data from chromatography and spectrometry mainly due to the intrinsic multi dimensionality of flavor.
Abstract: Chemometrics is playing an increasingly important role in flavor research. Pattern recognition techniques based on multivariate analysis have been most useful in processing data from chromatography and spectrometry mainly due to the intrinsic multi dimensionality of flavor. Multiple regression analysis and its derivatives including partial least squares regression (PLS) have been frequently used for correlating instrumental data to sensory properties. Factor analysis and principal component analysis are widely used for searching latent factors and extracting information as unsupervised pattern recognition. Cluster analysis and discriminant analysis have been successful for classification of samples; however, modeling of samples using SIMCA and nonparametric classification such as KNN have also gained popularity for improving accuracy. Simplex optimization has been well established as a technique in chemometrics, however, it is relatively unknown in flavor research. Computer‐aided optimization has...

Journal ArticleDOI
TL;DR: In this paper, a decomposition of influence functions for the parameters in covariance structure analysis has been proposed, where the parameters are estimated by minimizing a discrepancy function between the assumed covariance matrix and the sample covariance matrices.
Abstract: Influence functions are derived for the parameters in covariance structure analysis, where the parameters are estimated by minimizing a discrepancy function between the assumed covariance matrix and the sample covariance matrix. The case of confirmatory factor analysis is studied precisely with a numerical example. Comparing with a general procedure called one-step estimation, the proposed procedure has two advantages:1) computing cost is cheaper, 2) the property that arbitrary influence can be decomposed into a fi-nite number of components discussed by Tanaka and Castano-Tostado(1990) can be used for efficient computing and the characterization of a covariance structure model from the sensitivity perspective. A numerical comparison is made among the confirmatory factor analysis and some procedures of ex-ploratory factor analysis by using the decomposition mentioned above.

Journal Article
29 Sep 1991-The Auk
TL;DR: Song playback initiates nestbuilding during clutch overlap in mockingbirds, Mimus polyglottos, and role of chatburst versus song in the defense of fall territory in mockingbird (Mimus polyGLottos) is studied.
Abstract: --. 1988. Variation in repertoire presentation in Northern Mockingbirds. Condor 90: 592-606. HEGNER, R. E., & J. C. WING•IELD. 1986. Gonadal development during autumn and winter in House Sparrows. Condor 88: 269-278. LOGAN, C.A. 1987. Fluctuations in fall and winter territory size in the Northern Mockingbird (Mimus polyglottos). J. Field Ornithol. 58: 297-305. ß P. D. Bvv•a•, & K. R. Fu•c. 1983. Role of chatburst versus song in the defense of fall territory in mockingbirds (Mimus polyglottos). J. Comp. Psychol. 97: 292-301. ß L. E. HYATt, & L. GREGORCY!C. 1990. Song playback initiates nestbuilding during clutch overlap in mockingbirds, Mimus polyglottos. Anim. Behav. 39: 943-953.

Journal ArticleDOI
TL;DR: Correspondence analysis is a technique which has been little used by sensory scientists for sensory evaluation data and has been argued by advocates that it is more correct to use correspondence analysis with sensory data due to its often categorical nature as discussed by the authors.

Journal ArticleDOI
TL;DR: In this paper, an alternative to multiple indicator kriging is proposed which approximates the full coindicator Kriging system by krigging the principal components of the original indicator variables.
Abstract: An alternative to multiple indicator kriging is proposed which approximates the full coindicator kriging system by kriging the principal components of the original indicator variables. This transformation is studied in detail for the biGaussian model. It is shown that the cross-correlations between principal components are either insignificant or exactly zero. This result allows derivation of the conditional cumulative density function (cdf) by kriging principal components and then applying a linear back transform. A performance comparison based on a real data set (Walker Lake) is presented which suggests that the proposed method achieves approximation of the conditional cdf equivalent to indicator cokriging but with substantially less variogram modeling effort and at smaller computational cost.