scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1993"



Journal ArticleDOI
TL;DR: It is demonstrated that the difference between common factor and principal component pattern loadings is inversely related to the number of indicators per factor, not to the total number of observed variables in the analysis, countering claims by both Snook and Gorsuch and Velicer and Jackson.
Abstract: The aim of the present article was to reconsider several conclusions by Velicer and Jackson (1990a) in their review of issues that arise when comparing common factor analysis and principal component analysis Specifically, the three conclusions by Velicer and Jackson that are considered in the present article are: (a) that common factor and principal component solutions are similar, (b) that differences between common factor and principal component solutions appear only when too many dimensions are extracted, and (c) that common factor and principal component parameters are equally generalizable In contrast, Snook and Gorsuch (1989) argued recently that principal component analysis and common factor analysis led to different, dissimilar estimates of pattern loadings, terming the principal component loadings biased and the common factor loadings unbiased In the present article, after replicating the Snook and Gorsuch results, an extension demonstrated that the difference between common factor and principal component pattern loadings is inversely related to the number of indicators per factor, not to the total number of observed variables in the analysis, countering claims by both Snook and Gorsuch and Velicer and Jackson Considering the more general case of oblique factors, one concomitant of overrepresentation of pattern loadings is an underrepresentation of intercorrelations among dimensions represented by principal component analysis, whereas comparable values obtained using factor analysis are accurate Differences in parameters deriving from principal component analysis and common factor analysis were explored in relation to several additional aspects of population data, such as variation in the level of communality of variables on a given factor and the moving of a variable from one battery of measures to another The results suggest that principal component analysis should not be used if a researcher wishes to obtain parameters reflecting latent constructs or factors

512 citations


Journal ArticleDOI
TL;DR: The PCA program performs many more functions especially in testing and graphics, than PCA programs in conventional statistical packages, and includes a theoretical description of principal components, the basic rules for their interpretation and also statistical testing.

481 citations


Journal ArticleDOI
TL;DR: It is shown that each mode of principal component analysis or ‘factor analysis’ is equivalent to solving a certain least squares problem where certain error estimators σ ij are assumed for the measured data matrix X ij and the best posssible scaling and a near-optimal scaling are introduced.

405 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a set of data on children's gait collected by the Motion Analysis Laboratory at Children's Hospital, San Diego, California; see Olshen et al. (1989) for full details.
Abstract: SUMMARY It is not immediately straightforward to extend canonical correlation analysis to the context of functional data analysis, where the data are themselves curves or functions. The obvious approach breaks down, and it is necessary to use a method involving smoothing in some way. Such a method is introduced and discussed with reference to a data set on human gait. The breakdown of the unsmoothed method is illustrated in a practical context and is demonstrated theoretically. A consistency theorem for the smoothed method is proved. In an increasing number of problems in a wide range of fields, the data observed are not the univariate or multivariate observations of classical statistics, but are func- tions observed continuously. Ramsay and Dalzell (1991) give the name functional data analysis to the analysis of data of this kind. In most cases, the observations will be functions of time, or a closely related variable, but there are clearly applications where the functions are surfaces observed over two- or three-dimensional space. The motivating example for the present paper is a set of data on children's gait collected by the Motion Analysis Laboratory at Children's Hospital, San Diego, California; see Olshen et al. (1989) for full details. For each of a number of children, several angles made by the child's joints (knee, hip, etc.) are observed during the child's gait cycle. One aim of the study is to gain understanding of the gait cycle of a 'normal' child to make comparisons with children suffering from walking diffi- culties. These data motivated Rice and Silverman (1991) to discuss the extension of principal component analysis to the functional setting, and to explain how smoothing can be incorporated into the analysis in a natural way. In a substantial complemen- tary paper, Ramsay and Dalzell (1991) discuss different approaches to regression and principal component analysis in functional data analysis, illustrating their work by a meteorological example. The insights of functional data analysis may be helpful in chemometrics; for instance a standard chemometric problem is the analysis of spectra observed in chromatography, and these are to all intents and purposes functional observations. Another obvious area of potential relevance is the analysis of growth curve data. The focus in the present paper will be on canonical correlation analysis (CCA). In the gait example, one might ask how variability in the knee angle cycle is related to

316 citations


Journal ArticleDOI
TL;DR: In this paper, principal components have been calculated using covariance and correlation matrices for tour data sets: monthly NOAA-NDVI maximum-value composites, NOAA-LAC data, Landsat-TM data, and SPOT multi-spectral data.
Abstract: In this study Principal Components have been calculated using covariance and correlation matrices for Tour data sets: Monthly NOAA-NDVI maximum-value composites, NOAA-LAC data, Landsat-TM data, and SPOT multi-spectral data. An analysis of the results shows consistent improvements in the signal to noise ratio (SNR) using the correlation matrix in comparison to the covariance matrix in the principal components analysis for all the data sets

141 citations


Journal ArticleDOI
TL;DR: Overall, CA proved most robust as it demonstrated high consistency irrespective of the data standardizations and the strong influence of data standardization on the other ordination methods emphasizes the importance of this frequently neglected stage of data analysis.
Abstract: Benthic invertebrate data from thirty-nine lakes in south-central Ontario were analyzed to determine the effect of choosing particular data standardizations, resemblance measures, and ordination methods on the resultant multivariate summaries. Logarithmic-transformed, 0–1 scaled, and ranked data were used as standardized variables with resemblance measures of Bray-Curtis, Euclidean distance, cosine distance, correlation, covariance and chi-squared distance. Combinations of these measures and standardizations were used in principal components analysis, principal coordinates analysis, non-metric multidimensional scaling, correspondence analysis, and detrended correspondence analysis. Correspondence analysis and principal components analysis using a correlation coefficient provided the most consistent results irrespective of the choice in data standardization. Other approaches using detrended correspondence analysis, principal components analysis, principal coordinates analysis, and non-metric multidimensional scaling provided less consistent results. These latter three methods produced similar results when the abundance data were replaced with ranks or standardized to a 0–1 range. The log-transformed data produced the least consistent results, whereas ranked data were most consistent. Resemblance measures such as the Bray-Curtis and correlation coefficient provided more consistent solutions than measures such as Euclidean distance or the covariance matrix when different data standardizations were used. The cosine distance based on standardized data provided results comparable to the CA and DCA solutions. Overall, CA proved most robust as it demonstrated high consistency irrespective of the data standardizations. The strong influence of data standardization on the other ordination methods emphasizes the importance of this frequently neglected stage of data analysis.

126 citations


Journal ArticleDOI
TL;DR: This article contains tables of 95th percentile eigenvalues from random data that can be used when the sample size is between 50 and 500 and when the number of variables is between 5 and 50.
Abstract: Selecting the "correct" number of components to retain in principal components analysis is crucial. Parallel analysis, which requires a comparison of eigenvalues from observed and random data, is a highly promising strategy for making this decision. This paper focuses on linear interpolation, which has been shown to be an accurate method of implementing parallel analysis. Specifically, this article contains tables of 95th percentile eigenvalues from random data that can be used when the sample size is between 50 and 500 and when the number of variables is between 5 and 50. An empirical example is provided illustrating linear interpolation, direct computation, and regression methods for obtaining 95th percentile eigenvalues from random data. The tables of eigenvalues given in this report will hopefully enable more researchers to use parallel analysis because interpolation is an accurate and simple method of obviating the Monte Carlo requirements of parallel analysis.

109 citations


Journal ArticleDOI
TL;DR: In this paper, a detailed analysis of the variability of Australian district rainfall on seasonal time-scales over the period 1950-1987 is described, where the major analysis tool is rotated principal component analysis (RPCA), used in both the S and T modes.
Abstract: A detailed analysis of the variability of Australian district rainfall on seasonal time-scales over the period 1950–1987 is described. This paper, Part I, describes the dominant spatial modes or patterns of variability. The major analysis tool is rotated principal component analysis (RPCA), used in both the S and T modes. Various criteria are examined to determine the number of components to rotate, including conducting trial rotations and comparison of the resulting patterns with the corresponding one-point correlation maps. The stability of the chosen solutions is examined by repeating the analysis on various subsets of the data. The S-mode analysis, which groups districts with similar temporal variation, provides a regionalization of the continent into eight coherent and approximately equally sized regions. The results of this analysis closely resemble those obtained from cluster analysis. The T-mode analysis clusters seasons with similar large-scale spatial variations (anomaly patterns). The similarity measure used in the T-mode analysis is the congruence coefficient, rather than the correlation or covariance. The patterns produced by this analysis consist of continental-scale anomalies, similar in some respects to the unrotated S-mode patterns, but more amenable to meteorological interpretation. In particular the first pattern, which accounts for approximately 25 per cent of the total variance consists of anomalies of the same sign over the entire continent centred on south-east Australia. The relationship between the two modes of representation is also explored. Regression equations are developed to express the spatially complex T-mode patterns in terms of the localized S-mode patterns, and alternatively, to partition the variance of each of the S-mode patterns between the T-mode components. In Part II the temporal variability and the relationship of these patterns to the Southern Oscillation and other large-scale circulation anomalies are examined.

102 citations


Proceedings Article
29 Nov 1993
TL;DR: The method, "Principal Components Pruning (PCP)", is based on principal component analysis of the node activations of successive layers of the network and requires no network retraining and does not involve calculating the full Hessian of the cost function.
Abstract: We present a new algorithm for eliminating excess parameters and improving network generalization after supervised training. The method, "Principal Components Pruning (PCP)", is based on principal component analysis of the node activations of successive layers of the network. It is simple, cheap to implement, and effective. It requires no network retraining, and does not involve calculating the full Hessian of the cost function. Only the weight and the node activity correlation matrices for each layer of nodes are required. We demonstrate the efficacy of the method on a regression problem using polynomial basis functions, and on an economic time series prediction problem using a two-layer, feedforward network.

88 citations



Journal ArticleDOI
TL;DR: In this paper, the spatial orthogonality of the principal components is investigated in three situations: the intrinsic correlation, two basic structures with independent nugget components, three basic structure with independent NUGget components and uncorrelated subsets of variables.
Abstract: Within the frame of the linear model of coregionalization, is paper sets up equations relating the variogram matrix of the principal components extracted from the variance-covariance matrix to the diagonal variogram matrices of the regionalized factors. The spatial orthogonality of the principal components is investigated in three situations: the intrinsic correlation, two basic structures with independent nugget components, three basic structures with independent nugget components and uncorrelated subsets of variables. Two examples point out that the correlation between the principal components may be nonnegligible at short distances, especially if the correlation structure changes according to the spatial scale considered. For one of the two case studies, an orthogonal varimax rotation of the first principal components is found to greatly reduce the spatial correlation between some of them.

Journal ArticleDOI
TL;DR: In this article, principal component analysis (PCA) is applied to filter adaptively the dominant modes of subannual (SA) variability of a 12-year long multivariate time series of Northern Hemisphere atmospheric angular momentum (AAM); AAM is computed in 23 latitude bands of equal area from operational analyses of the U.S National Meteorological Center.
Abstract: Principal component analysis (PCA) in the space and time domains is applied to filter adaptively the dominant modes of subannual (SA) variability of a 12-year long multivariate time series of Northern Hemisphere atmospheric angular momentum (AAM); AAM is computed in 23 latitude bands of equal area from operational analyses of the U.S. National Meteorological Center. PCA isolates the leading empirical orthogonal functions (EOFs) of spatial dependence, while multivariate singular spectrum analysis (M-SSA) yields filtered time series that capture the dominant low-frequency modes of SA variability. The time series prefiltered by M-SSA lend themselves to prediction by the maximum entropy method (MEM). Whole-field predictions are made by combining the forecasts so obtained with the leading spatial EOFs obtained by PCA. The combination of M-SSA and MEM has predictive ability up to about a month. These methods are essentially linear but data-adaptive. They seem to perform well for short, noisy, multivariate time series, to which purely nonlinear, deterministically based methods are difficult to apply.

Journal ArticleDOI
TL;DR: In this paper, a new method for texture segmentation based on the use of texture feature detectors derived from a decorrelation procedure of a modified version of a Pseudo-Wigner distribution was proposed.
Abstract: In this paper we propose a new method for texture segmentation based on the use of texture feature detectors derived from a decorrelation procedure of a modified version of a Pseudo-Wigner distribution (PWD). The decorrelation procedure is accomplished by a cascade recursive least squared (CRLS) principal component (PC) neural network. The goal is to obtain a more eAcient analysis of images by combining the advantages of using a high-resolution joint representation given by the PWD with an eAective adaptive principal component analysis (PCA) through the use of feedforward neural networks. ” 1999 Elsevier Science B.V. All rights reserved.

Journal ArticleDOI
TL;DR: In this article, a Principal Component Analysis (PCA) of intercorrelated influencing parameters (e.g., dry-bulb temperature, solar radiation and humidity) is used to predict electricity consumption in conjunction with a change-point model.
Abstract: A new method for predicting daily whole-building electricity usage in a commercial building has been developed. This method utilizes a Principal Component Analysis (PCA) of intercorrelated influencing parameters (e.g., dry-bulb temperature, solar radiation and humidity) to predict electricity consumption in conjunction with a change-point model. This paper describes the PCA procedure and presents the results of its application in conjunction with a change-point regression, to predict whole-building electricity consumption for a commercial grocery store. Comparison of the results with a traditional Multiple Linear Regression (MLR) analysis indicates that a change-point, Principal Component Analysis (CP/PCA) appears to produce a more reliable and physically plausible model than an MLR analysis and offers more insight into the environmental and operational driving forces that influence energy consumption in a commercial building. It is thought that the method will be useful for determining conservation retrofit savings from pre-retrofit and post-retrofit consumption data for commercial buildings. A companion paper presents the development of the four-parameter change-point model and a comparison to the Princeton Scorekeeping Method (PRISM).


Journal ArticleDOI
TL;DR: In this paper, a near infrared spectroscopy was used to discriminate between three sources of orange juice, and three pretreatments and five data transformations to improve discrimination were compared.
Abstract: Near infrared spectroscopy was used to discriminate between three sources of orange juice. Three pretreatments and five data transformations to improve discrimination were compared. Principal components analysis of 92 calibration samples was followed by canonical variates analysis using up to 25 principal components. Success rates were compared across pretreatments and transformations for the calibration and test data (50 samples). 100% prediction success was obtained with 25 principal components following no pretreatment and no transformation. Principal component loadings were interpretable.

Journal ArticleDOI
TL;DR: In this article, principal component analysis (PCA) in conjunction with iterative target transformation factor analysis (ITTFA) is able to provide an independent estimate of the number, position and shape of spectral components required to describe the Mo 3D envelope as monitored by XPS in reduced Mo/TiO 2 catalysts.
Abstract: This study shows that principal component analysis (PCA) in conjunction with iterative target transformation factor analysis (ITTFA) is able to provide an independent estimate of the number, position and, to a certain extent, the shape of spectral components required to describe the Mo 3d envelope as monitored by XPS in reduced Mo/TiO 2 catalysts. Three components were required to reproduce the original data. Abstract components from PCA were transformed through ITTFA into components which have spectroscopic meaning

Book
01 Jan 1993
TL;DR: In this article, the classical methods of multivariate analysis Principal Component Analysis Principal Components Analysis Optimal Quantification Indicator Matrices Properties and Risks of optimal quantification are discussed.
Abstract: General Concepts Classical Methods of Multivariate Analysis Principal Components Analysis Optimal Quantification Indicator Matrices Properties and Risks of Optimal Quantification Conclusions

Journal ArticleDOI
TL;DR: Using the principal components (PC) scheme along with the polarized images used in the present study led to substantial improvements in the classification rates when compared with previous studies.
Abstract: The development of a neural-network-based classifier for classifying three distinct scenes (urban, park, and water) from several polarized SAR images of the San Francisco Bay area is discussed. The principal components (PC) scheme or Karhunen-Loeve transform is used to extract the salient features of the input data, and to reduce the dimensionality of the feature space prior to the application to the neural networks. Using the PC scheme along with the polarized images used in the present study led to substantial improvements in the classification rates when compared with previous studies. When a combined polarization architecture was used, the classification rate for water, urban, and park areas improved to 100%, 98.7%, and 96.1%, respectively. >

Journal ArticleDOI
TL;DR: In this paper, a method for the characterization of a seismically active zone from a distribution of hypocenters is presented, based on principal components analysis, a powerful multivariant statistical technique that is used to find the rupture local ellipsoid (RLE).
Abstract: A method for the characterization of a seismically active zone from a distribution of hypocenters is presented. This method is based on principal components analysis, a powerful multivariant statistical technique that is used to find the rupture local ellipsoid (RLE). The ellipsoid is a planar structure with which two variations of the method are developed: the spatial principal components analysis and the spatial-temporal principal components analysis; using these methods, it is possible to find the dominating tendencies in the fracturing of the seismically active volume as well as the temporal evolution of the process. The methodology developed has been applied to a series of earthquakes that occurred near Antequera, Spain, with the result that the main fracture series trends N70°–80°E. Moreover, the temporal evolution of the system from the most relevant RLE has been characterized.

Journal ArticleDOI
Yu-Long Xie1, Ji-Hong Wang1, Yi-Zeng Liang1, Lixian Sun1, Xin-Hua Song1, Ru-Qin Yu1 
TL;DR: In this paper, projection pursuit (PP) is used to carry out principal component analysis with a criterion which is more robust than the variance, and generalized simulated annealing (GSA) is introduced as an optimization procedure in the process of PP calculation to guarantee the global optimum.
Abstract: Principal component analysis (PCA) is a widely used technique in chemometrics. The classical PCA method is, unfortunately, non-robust, since the variance is adopted as the objective function. In this paper, projection pursuit (PP) is used to carry out PCA with a criterion which is more robust than the variance. In addition, the generalized simulated annealing (GSA) algorithm is introduced as an optimization procedure in the process of PP calculation to guarantee the global optimum. The results for simulated data sets show that PCA via PP is resistant to the deviation of the error distribution from the normal one. The method is especially recommended for use in cases with possible outlier(s) existing in the data.


Journal ArticleDOI
TL;DR: In this paper, the authors have introduced randomized variables into five data sets and found that a good result must recognize these randomized variables as noise and place them near the centroid of the principal components scattergram of variable loadings.
Abstract: There has been debate about whether standard principal components analysis is appropriate for the multivariate analysis of compositional data (eg oxide composition of glass), Loglinear transformation has been recommended by Aitchison as a prerequisite This paper argues that previous comparisons of methodological merits have tended to circularity of argument by making assumptions about the form of a good multivariate result To break the circularity of argument the authors have introduced randomized variables into five data sets A good result must recognize these randomized variables as noise and place them near the centroid of the principal components scattergram of variable loadings Standard principal components analysis is found to perform better than loglinear transformation in its ability to recognize the randomized variables It is concluded that loglinear transformation tends to introduce spurious structure into a table of compositional data This paper is followed by a comment by M J Baxter

Journal ArticleDOI
01 Apr 1993
TL;DR: A neural model approach to perform adaptive calculation of the principal components (eigenvectors) of the covariance matrix of an input sequence is proposed and is shown to converge to the next dominant component that is linearly independent of all previously determined eigenvector.
Abstract: A neural model approach to perform adaptive calculation of the principal components (eigenvectors) of the covariance matrix of an input sequence is proposed. The algorithm is based on the successive application of the modified Hebbian learning rule proposed by Oja on every new covariance matrix that results after calculating the previous eigenvectors. The approach is shown to converge to the next dominant component that is linearly independent of all previously determined eigenvectors. The optimal learning rate is calculated by minimising an error function of the learning rate along the gradient descent direction. The approach is applied to encode grey-level images adaptively, by calculating a limited number of the KLT coefficients that meet a specified performance criterion. The effect of changing the size of the input sequence (number of image subimages), the maximum number of coding coefficients on the bit-rate values, the compression ratio, the signal-to-noise ratio, and the generalisation capability of the model to encode new images are investigated.

Journal ArticleDOI
TL;DR: The usefulness of principal component analysis (PCA) and factor analysis (FA) for source input elucidation in environmental studies using molecular markers for sample description was evaluated in this paper, where the determination of aliphatic and chlorinated hydrocarbons, fatty acids, alcohols, chlorophylls and some detergent indicators in water particulates from a deltaic system was selected as a representative testing dataset.

Journal ArticleDOI
TL;DR: It is shown that, at least in the average case, high degrees can be expected, which makes the procedure reasonable for many practical applications and shows that the success of body diagonality depends on the so‐called polarity of the core array.
Abstract: In contrast with conventional PCA, a direct superposition and joint interpretation of loading plots is not possible in three-way PCA, since there may be data variance which is described by unequal components of different modes The contributions to variance of all possible combinations of components are described in the core matrix Body diagonalization, which is achieved by appropriate rotation of component matrices, is an essential tool for simplifying the core matrix structure The maximum degree of body diagonality which may be obtained from such transformations is analysed from both the mathematical and simulation viewpoints It is shown that, at least in the average case, high degrees can be expected, which makes the procedure reasonable for many practical applications Furthermore, simulation as well as theoretical derivation show that the success of body diagonality depends on the so-called polarity of the core array The methodology is illustrated by a three-way data example from environmental chemistry

Journal ArticleDOI
TL;DR: A modified version of the Grassberger-Procaccia algorithm is proposed to estimate the correlation dimension of an attractor and gives a clearer scaling region in the D 2 ( M, r )-ln( r ) diagram and thus a better estimate of D 2, especially when the data set is noisy and relatively small.

Journal Article
TL;DR: It is shown that under certain conditions, it is possible to estimate the reliability by using the results of a Principal Component Analysis only and the relation between Cronbach's alpha and intraclass correlation coefficient, which are both used to estimates the reliability of continuous measures.
Abstract: The objective is to establish a simple relationship between two frequently used validation techniques which have been developed in the literature along the same lines: Principal Component Analysis and Cronbach's alpha. We have shown that under certain conditions, it is possible to estimate the reliability by using the results of a Principal Component Analysis only. Moreover, we report the relation between Cronbach's alpha and intraclass correlation coefficient, which are both used to estimate the reliability of continuous measures.

Journal ArticleDOI
TL;DR: In this article, the combinations of classical bilinear models and neural nets, extended to neural net models on residuals from partial least squares (PLS) are discussed, and the performances of principal component regression (PCR), PLS, neural networks (NN), principal component analysis (PCA)-NN and PLS residuals-NN are compared on simulated data, near-infrared data and quantitative structure-activity relationship data.