scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1992"


Journal ArticleDOI
TL;DR: In this paper, the authors introduce a conceptual framework for comparing methods that isolate important coupled modes of variability between time series of two fields, including principal component analysis with the fields combined (CPCA), canonical correlation analysis (CCA), and singular value decomposition of the covariance matrix between the two fields (SVD).
Abstract: This paper introduces a conceptual framework for comparing methods that isolate important coupled modes of variability between time series of two fields. Four specific methods are compared: principal component analysis with the fields combined (CPCA), canonical correlation analysis (CCA) and a variant of CCA proposed by Barnett and Preisendorfer (BP), principal component analysis of one single field followed by correlation of its component amplitudes with the second field (SFPCA), and singular value decomposition of the covariance matrix between the two fields (SVD). SVD and CPCA are easier to implement than BP, and do not involve user-chosen parameters. All methods are applied to a simple but geophysically relevant model system and their ability to detect a coupled signal is compared as parameters such as the number of points in each field, the number of samples in the time domain, and the signal-to-noise ratio are varied. In datasets involving geophysical fields, the number of sampling times ma...

1,482 citations


Journal ArticleDOI
TL;DR: The Stochastic Gradient Ascent neural network is proposed and shown to be closely related to the Generalized Hebbian Algorithm (GHA), and the SGA behaves better for extracting the less dominant eigenvectors.

857 citations


05 Jun 1992
TL;DR: The Independent Component Analysis (ICA) of a random vector consists of searching for the linear transformation that minimizes the statistical dependence between its components.
Abstract: The Independent Component Analysis (ICA) of a random vector consists of searching for the linear transformation that minimizes the statistical dependence between its components. In order to design a practical optimization criterion, the expression of mutual information is being resorted to, as a function of cumulants. The concept of ICA may be seen as an extension of Principal Component Analysis, which only imposes independence up to second order and consequently defines directions that are orthogonal. Applications of ICA include data compression, detection and localization of sources, or blind identification and deconvolution.

652 citations


Journal ArticleDOI
TL;DR: In this paper, single field principal component analysis (PCA), direct singular value decomposition (SVD), canonical correlation analysis (CCA), and combined PCA of two fields are applied to a 39-winter dataset consisting of normalized seasonal mean sea surface temperature anomalies over the North Pacific and concurrent 500-mb height anomaly over the same region.
Abstract: Single field principal component analysis (PCA), direct singular value decomposition (SVD), canonical correlation analysis (CCA), and combined principal component analysis (CPCA) of two fields are applied to a 39-winter dataset consisting of normalized seasonal mean sea surface temperature anomalies over the North Pacific and concurrent 500-mb height anomalies over the same region. The CCA solutions are obtained by linear transformations of the SVD solutions. Spatial patterns and various measures of the variances and covariances explained by the modes derived from the different types of expansions are compared, with emphasis on the relative merits of SVD versus CCA. Results for two different analysis domains (i.e., the Pacific sector versus a full hemispheric domain for the 500-mb height field) are also compared in order to assess the domain dependence of the two techniques. The SVD solution is also compared with the results of 28 Monte Carlo simulations in which the temporal order of the SST gri...

619 citations



Journal ArticleDOI
TL;DR: This paper compares and contrast the objectives of principal component analysis and exploratory factor analysis, and suggests that in some cases it may be useful to rotate certain principal components if and when that is appropriate.
Abstract: In this paper we compare and contrast the objectives of principal component analysis and exploratory factor analysis. This is done through consideration of nine examples. Basic theory is presented in appendices. As well as covering the standard material, we also describe a number of recent developments. As an alternative to factor analysis, it is pointed out that in some cases it may be useful to rotate certain principal components if and when that is appropriate.

475 citations


Journal ArticleDOI
TL;DR: In this article, a multivariate screening procedure is presented for the evaluation of these potential source solutions, which are groundwater and soil water from different horizons, in the case of stream water.
Abstract: Traditional multivariate data analysis techniques, such as principal components analysis (PCA), have often been used in an attempt to identify source solutions from potential mixtures, such as stream water. Artificial data, generated from conservative mixing of known source solutions in random proportions, are employed to demonstrate that PCA should be used only to determine the rank of the mixture and not to determine the composition of the source solutions. The rank of the mixture is related to the number of source solutions. Unambiguous identification of the source solution compositions from the mixture alone is impossible; thus it is necessary that potential source solutions be derived from independent measurements. In the case of stream water, possible source solutions are groundwater and soil water from different horizons. A multivariate screening procedure is presented for the evaluation of these potential source solutions.

448 citations


Journal ArticleDOI
TL;DR: The empirical approach to exploratory data analysis should be viewed as complementary to the more robust treatments that statistical methodologies afford, and a single example of chemical compositions data obtained on environmental dust particles is illustrated.

262 citations


Journal ArticleDOI
TL;DR: In this paper, an alternative definition of a principal curve, based on a mixture model, is presented. But this definition is restricted to the case where the principal curve is a smooth curve passing through the "middle" of a distribution or data cloud.
Abstract: A principal curve (Hastie and Stuetzle, 1989) is a smooth curve passing through the ‘middle’ of a distribution or data cloud, and is a generalization of linear principal components. We give an alternative definition of a principal curve, based on a mixture model. Estimation is carried out through an EM algorithm. Some comparisons are made to the Hastie-Stuetzle definition.

247 citations


Journal ArticleDOI
TL;DR: In this paper, a closed-form solution to principal component analysis in the limit of small window widths is derived, which explains the relationship between delays, derivatives, and principal components, and shows how the singular spectrum scales with dimension and delay time.

213 citations


Journal ArticleDOI
TL;DR: In this paper, the authors define canonical discriminant functions and canonical variates as linear combinations associated with canonical correlations between two sets of variables, which can be converted to correlations between the variables and the canonical function.
Abstract: Canonical discriminant functions are defined here as linear combinations that separate groups of observations, and canonical variates are defined as linear combinations associated with canonical correlations between two sets of variables. In standardized form, the coefficients in either type of canonical function provide information about the joint contribution of the variables to the canonical function. The standardized coefficients can be converted to correlations between the variables and the canonical function. These correlations generally alter the interpretation of the canonical functions. For canonical discriminant functions, the standardized coefficients are compared with the correlations, with partial t and F tests, and with rotated coefficients. For canonical variates, the discussion includes standardized coefficients, correlations between variables and the function, rotation, and redundancy analysis. Various approaches to interpretation of principal components are compared: the choice ...

Journal ArticleDOI
TL;DR: In this article, a simple principal component analysis is used to identify important modes of variation among the curves and that principal component scores are used to distinguish particular curves which clearly demonstrate the form and extent of that variation.
Abstract: Naively displaying a large collection of curves by superimposing them one on another all on the same graph is largely uninformative and aesthetically unappealing. We propose that a simple principal component analysis be used to identify important modes of variation among the curves and that principal component scores be used to identify particular curves which clearly demonstrate the form and extent of that variation. As a result, we obtain a small number of figures on which are plotted a very few “representative” curves from the original collection; these successfully convey the major information present in sets of “similar” curves in a clear and attractive manner. Useful adjunct displays, including the plotting of principal component scores against covariates, are also described. Two examples—one concerning a data-based bandwidth selection procedure for kernel density estimation, the other involving ozone level curve data—illustrate the ideas.

Journal ArticleDOI
TL;DR: In this paper, a geostatistical technique, factorial kriging analysis (FKA), is proposed for the fitting of a linear model of coregionalization, i.e., all experimental simple and cross-variograms are modelled with a linear combination of basic variogram functions.
Abstract: Most studies of relations between soil properties fail to take account of their regionalized nature because of the lack of appropriate methods. This paper describes a geostatistical technique, factorial kriging analysis, that bridges the gap between classical multivariate analysis and a univariate geostatistical approach. The basic feature of the method is the fitting of a linear model of coregionalization, i.e. all experimental simple and cross-variograms are modelled with a linear combination of basic variogram functions. A particular variance covariance matrix, the coregionalization matrix, can then be associated with each spatial scale defined by the range of the basic variogram function. Each coregionalization matrix describes relationships between variables at given spatial scale. A principal component analysis of these matrices produces a set of components, the regionalized factors, that reflect the main features of the multivariate information for each spatial scale and whose scores are estimated by cokriging. The technique is described and illustrated with three case studies based on a simulated data set and soil survey data. The results are compared with those of the principal component analysis of the variance-covariance matrix and the variogram matrices.

Journal ArticleDOI
TL;DR: A huge amount of data is collected by computer monitoring systems in the chemical process industry, and such tools as principal component analysis and partial least squares have been shown to be very effective in compressing this large volume of noisy correlated data into a subspace of much lower dimension than the original data set.
Abstract: A huge amount of data is collected by computer monitoring systems in the chemical process industry. Such tools as principal component analysis and partial least squares have been shown to be very effective in compressing this large volume of noisy correlated data into a subspace of much lower dimension than the original data set. Because most of what is eliminated is the collinearity of the original variables and the noise, the bulk of the information contained in the original data set is retained. The resulting low dimensional representation of the data set has been shown to be of great utility for process analysis and monitoring, as well as in selecting variables for control. These types of models can also be used directly in control system design. One way of approaching this is to use the loading matrices as compensators on the plant. Some advantages of using this approach as part of the overall control system design include automatic decoupling and efficient loop pairing, as well as natural handling of nonsquare systems and poorly conditioned systems.

Journal ArticleDOI
TL;DR: A new method for estimating sources in the frequency domain which fits dipoles to the whole crosspectrum is applied to explain the characteristics of the localized sources.
Abstract: The structure of the normal resting EEG crosspectrum Svv(ω) is analyzed using complex multivariate statistics. Exploratory data analysis with Principal Component Analysis (PCA) is followed by hypothesis testing and computer simulations related to possible neural generators. The Svv(ω) of 211 normal individuals (ages 5 to 97) may be decomposed into two types of processes: the ξ process with spatial isotropicity reflecting diffuse, correlated cortical generators with radial symmetry, and processes that seem to be generated by more spatially concentrated, correlated sources. The latter are reflected as spectral peaks such as the process. The eigenvectors of the ξ process are the Spherical Harmonic Functions which explains the recurring pattern of maps characteristic of the spatial PCA of qEEG data. A new method for estimating sources in the frequency domain which fits dipoles to the whole crosspectrum is applied to explain the characteristics of the localized sources.

Journal ArticleDOI
TL;DR: In this paper, an alternative procedure of treating PC selection as an optimization problem is discussed. But, without any regard to the ordering, the optimal subset of PC for an acceptable predictive model is desired, and five data sets are analyzed using the conventional and alternative approaches.
Abstract: Principal components (PCs) for principal component regression (PCR) have historically been selected from the top down for a reliable predictive model. That is, the PCs are arranged in a list starting with the most informative (PC associated with the largest singular value) and proceeding to the least informative (PC associated with the smallest singular value). PCs are then chosen starting at the top of this list. This paper discusses an alternative procedure of treating PC selection as an optimization problem. Specifically, without any regard to the ordering, the optimal subset of PCs for an acceptable predictive model is desired. Five data sets are analyzed using the conventional and alternative approaches. Two data sets are spectroscopic in nature, two data sets deal with quantitative structure-activity relationships (QSARs) and one data set is concerned with modeling. All five data sets confirm that selection of a subset without consideration to order secures the best results with PCR. One data set is also compared using partial least squares 1.

Journal ArticleDOI
TL;DR: In this article, a locally weighted regression (LWR) method is used for diffuse near-infrared transmittance spectroscopy (NIRTS) data from beef and pork samples.
Abstract: This paper presents an application of locally weighted regression (LWR) in diffuse near-infrared transmittance spectroscopy. The data are from beef and pork samples. The LWR method is based on the idea that a nonlinearity can be approximated by local linear equations. Different weight functions (for the samples) as well as different distance measures for “closeness” are tested. The LWR is compared to principal component regression and partial least-squares regression. The LWR with weighted principal components is shown to give the best results. The improvements with respect to linear regression are up to 15% of the prediction errors.

Journal ArticleDOI
TL;DR: In this paper, principal component maps of the gene arrangement frequencies of 108 natural populations in Europe, North Africa and the Middle East were prepared to investigate the evolutionary forces shaping the geographic variation of inversion frequencies.
Abstract: Principal component maps of the gene arrangement frequencies of 108 natural populations in Europe, North Africa and the Middle East were prepared to investigate the evolutionary forces shaping the geographic variation of inversion frequencies. Principal component maps were also prepared from ten climate variables at 347 localities of the same region. The first inversion principal component (18% of total variation) showed a N-S cline strikingly similar to the pattern exhibited by the first principal component of climatic variables. This resemblance is interpreted as showing the outcome of a selective process, which favors the increase in frequency of the Standard gene arrangements when moving to the north. This interpretation is corroborated by the fact that such clines were formed in South and North America, following recent colonization by this species. Patterns shown by the second (12% of total variance) and third (8% of total variance) principal components are interpreted as related to historical events, the migrational advance to the north after the end of the last glaciation and the locations of the species refugia at that time.


Journal ArticleDOI
TL;DR: A generalized principal components transform (PCT) that maximizes the signal-to-noise ratio (SNR) and that tailors to the multiplicative speckle noise characteristics of polarimetric SAR images is developed and makes automated image segmentation and better human interpretation possible.
Abstract: A generalized principal components transform (PCT) that maximizes the signal-to-noise ratio (SNR) and that tailors to the multiplicative speckle noise characteristics of polarimetric SAR images is developed. An implementation procedure that accurately estimates the signal and the noise covariance matrices is established. The properties of the eigenvalues and eigenvectors are investigated, revealing that the eigenvectors are not orthogonal, but the principal component images are statistically uncorrelated. Both amplitude (or intensity) and phase difference images are included for the PCT computation. The NASA/JPL polarimetric SAR imagery of P, L, and C bands and quadpolarizations is used for illustration. The capabilities of this principal components transformation in information compression and speckle reduction makes automated image segmentation and better human interpretation possible. >


Journal ArticleDOI
TL;DR: Directional transfer functions obtained in this laboratory, using substantially different measurement techniques, yielded principal component basis vectors that are remarkably similar to those reported by Kistler and Wightman.
Abstract: A recent principal components analysis (Kistler and Wightman, 1992) has shown that the transfer functions of the human external ear, for a wide range of source locations, can be expressed as weighted sums of a small number of basis vectors. Directional transfer functions obtained in this laboratory, using substantially different measurement techniques, yielded principal component basis vectors that are remarkably similar to those reported by Kistler and Wightman. When this subject population was divided in half according to the overall physical sizes of subjects, basis vectors computed for the subpopulation of smaller subjects were shifted systematically to higher frequencies relative to those computed for the subpopulation of larger subjects.

Journal ArticleDOI
TL;DR: In this article, a criterion of stability for PCA scatterplots is defined based on a classical distance between projectors, which is constructed as a risk function and can be estimated by bootstrap or jackknife methods.

Book ChapterDOI
TL;DR: This chapter presents with a discussion on the important role played by correlation when assessing similarity, and introduces the properties of principal component modelling of relevance to a classification problem.
Abstract: Publisher Summary The classification method Soft Independent Modelling of Class Analogies (SIMCA) is such a method that each class of samples is described by its own principal component model. Thus, in principle, any degree of data collinearity can be accommodated by the models. The chapter presents with a discussion on the important role played by correlation when assessing similarity, and introduces the properties of principal component modelling of relevance to a classification problem. All basic concepts and steps in the SIMCA approach to supervised modelling are thoroughly explored using chemical data obtained in an environmental study. Definition of distance is central in all classification procedures. Euclidean distance in variable space is the most commonly used for measuring similarity between samples. This measure is presented in two-dimensional space. Principal component modelling plays two different roles in the classification of multivariate data, they are as follows: (1) it is a tool for data reduction to obtain low-dimensional orthogonal representations of the multivariate variable- and object-space in which object and variable relationships can be explored, (2) it is used in the SIMCA method to separate the data into a model and a residual matrix from which a scale can be obtained for later classification of samples. Sometimes SIMCA classification is preceded by an unsupervised principal component modelling of the whole data set. The process of detecting and deleting outliers represents one side of the process termed “polishing of classes.”

Journal ArticleDOI
TL;DR: Multivariate analyses of meteorological data were used to partition the state into homogeneous climatic zones and principal component scores were predicted across the state along a grid composed of township line intersections.
Abstract: As part of a project to develop a productivity-oriented site classification system for spruce and fir in Maine, multivariate analyses of meteorological data were used to partition the state into homogeneous climatic zones. Data were obtained for 63 weather stations reporting both temperature and precipitation in Maine during the period 1954–1983. Monthly means were computed for each variable over the period of record and summarized by four 3-month seasons. Eighty-two percent of the variation in the 37 variables was accounted for by the first three principal components. Cluster analysis identified eight homogeneous groups of weather stations. Results from the principal components analysis were spatially extrapolated across the state using stepwise regression to define the relationship between the first two principal components and the location variables latitude, longitude, and elevation. Principal component scores were predicted across the state along a grid composed of township line intersections. The Tr...

Journal ArticleDOI
TL;DR: In this paper, the effect of process parameters (i.e., pressure, RF power, and gas mixture) on the optical emission and mass spectra of CHFJO2 plasma was investigated.
Abstract: We report on a simple technique that characterizes the effect of process parameters (i.e., pressure, RF power, and gas mixture) on the optical emission and mass spectra of CHFJO2 plasma. This technique is sensitive to changes in chamber contamination levels (e.g., formation of Teflon-like thin-film), and appears to be a promising tool for real-time monitoring and control of reactive ion etching. Through principal component analysis, we observe that 99% of the variance in the more than 1100 optical and mass spectra channels are accounted for by the first four principal components of each sensor. Projection of the mass spectrum on its principal components suggests a strong linear relationship with respect to chamber pressure. This representation also shows that the effect of changes in thin-film levels, gas mixture, and RF power on the mass spectrum is complicated, but predictable. To model the nonlinear relationship between these process parameters and the principal component projections, a feedforward, multi-layered neural network is trained and is shown to be able to predict all process parameters from either the mass or the optical spectrum. The projections of the optical emission spectrum on its principal components suggest that optical emission spectroscopy is much more sensitive to changes in RF power than the mass spectrum, as measured by the residual gas analyzer. Model performance can be significantly improved if both the optical and mass spectrum projections are used (so called sensor fusion). Our analysis indicates that accurate estimates of process parameters and chamber conditions can be made with relatively simple neural network models which fuse the principal components of the measured optical emission and mass spectra. In the reactive ion etching (RIE) process, plasma characteristics depend on many parameters; some of these parameter values are set by the tool operator, e.g., chamber pressure, RF power, and gas flow, while others are internal to the condition of the chamber, e.g., thin-film thickness on the chamber walls, or the amount of material etched. Plasma characteristics can be observed using in situ measurements, e.g., via optical emission spectroscopy (OES) or residual gas analysis (RGA). How these measurements can be used to estimate the process parameters is the question

Journal ArticleDOI
TL;DR: In this paper, the principal component analysis of the SAGE II extinction kernels was used to estimate both total aerosol mass and aerosol backscatter at a variety of wavelengths.
Abstract: SAGE II multiwavelength aerosol extinction measurements are used to estimate mass- and extinction-to-backscatter conversion parameters. The basis of the analysis is the principal component analysis of the SAGE II extinction kernels to estimate both total aerosol mass and aerosol backscatter at a variety of wavelengths. Comparisons of coincident SAGE II extinction profiles with 0.694-micron aerosol backscatter profiles demonstrate the validity of the method.

Journal ArticleDOI
TL;DR: A random sample of 490 Tabellaria specimens was analyzed using the harmonic amplitudes of the Fourier transformations of their valve outlines as shape descriptors to reveal many new morphological characteristics related to valve shape.
Abstract: A random sample of 490 Tabellaria specimens was analyzed using the harmonic amplitudes of the Fourier transformations of their valve outlines as shape descriptors. Principal component analysis (PCA) was applied to the sample to reduce dimensionality. The problem of non-normal distribution of these descriptors due to cell division was solved by sub-sectioning the entire data set based on its distribution on the first three components (PC1, PC2, and PC3) of the overall PCA. Each of the subsets was then analyzed by PCA. Shape groups from subset clusters were compared with one another and then similar groups were congregated into one growth series. Eight distinct shape groups were found. The results agree with some previous classical observations on the genus and at the same time reveal many new morphological characteristics related to valve shape. These new characteristics are impossible to obtain without appropriate specimen sampling, quantitative shape description, and data analysis techniques.

Journal ArticleDOI
TL;DR: Thirty-eight artifactual factors were identified which, alone, could not discriminate age but were relatively successful in discriminating gender and dementia and the need to parsimoniously develop real neurophysiologic measures and to objectively exclude artifact are discussed.
Abstract: Principal components analysis (PCA) was performed on the 1536 spectral and 2944 evoked potential (EP) variables generated by neurophysiologic paradigms including flash VER, click AER, and eyes open and closed spectral EEG from 202 healthy subjects aged 30 to 80. In each case data dimensionality of 1500 to 3000 was substantially reduced using PCA by magnitudes of 20 to over 200. Just 20 PCA factors accounted for 70% to 85% of the variance. Visual inspection of the topographic distribution of factor loading scores revealed complex loadings across multiple data dimensions (time-space and frequency-space). Forty-two non-artifactual factors were successful in classifying age, gender, and a separate group of 60 demented patients by linear discriminant analysis. Discrimination of age and gender primarily involved EP derived factors, whereas dementia primarily involved EEG derived factors. Thirty-eight artifactual factors were identified which, alone, could not discriminate age but were relatively successful in discriminating gender and dementia. The need to parsimoniously develop real neurophysiologic measures and to objectively exclude artifact are discussed. Unrestricted PCA is suggested as a step in this direction.

Journal ArticleDOI
TL;DR: A general purpose method for approximating an arbitrary continuous function on a compact set from a given set of observations based on a feedforward multilayer network, embedding both classical data analysis techniques and connectionist or neural network techniques is presented.