scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1990"


Book
01 Aug 1990
TL;DR: In this paper, the basic structure of a data matrix principal component analysis is presented, including multidimensional preference scaling correspondence analysis of Contingency Tables Correspondence Analysis of Non-Frequency Data Ordination, Seriation, and Guttman Scaling Multiple Correspondence analysis.
Abstract: Introduction The Basic Structure of a Data Matrix Principal Components Analysis Multidimensional Preference Scaling Correspondence Analysis of Contingency Tables Correspondence Analysis of Non-Frequency Data Ordination, Seriation, and Guttman Scaling Multiple Correspondence Analysis

238 citations


Proceedings ArticleDOI
03 Apr 1990
TL;DR: An algorithm called APEX which can recursively compute the principal components of a vector stochastic process using a linear neural network is proposed, and its computational advantages over previously proposed methods are demonstrated.
Abstract: The problem of the recursive computation of the principal components of a vector stochastic process is discussed. The applications of this problem arise in modeling of control systems, high-resolution spectrum analysis, image data compression, motion estimation, etc. An algorithm called APEX which can recursively compute the principal components using a linear neural network is proposed. The algorithm is recursive and adaptive: given the first m-1 principal components, it can produce the mth component iteratively. The numerical theoretical basis of the fast convergence of the APEX algorithm is given, and its computational advantages over previously proposed methods are demonstrated. Extension to extracting constrained principal components using APEX is also discussed. >

179 citations



Journal ArticleDOI
TL;DR: A two-layered network of linear neurons that organizes itself as to extract the complete information contained in a set of presented patterns, and the output units become detectors of orthogonal features, similar to ones found in the brain of mammals.
Abstract: We present a two-layered network of linear neurons that organizes itself as to extract the complete information contained in a set of presented patterns The weights between layers obey a Hebbian rule We propose a local anti-Hebbian rule for lateral, hierarchically organized weights within the output layer This rule forces the activities of the output units to become uncorrelated and the lateral weights to vanish The weights between layers converge to the eigenvectors of the covariance matrix of input patterns, ie, the network performs a principal component analysis, yielding all principal components As a consequence of the proposed learning scheme, the output units become detectors of orthogonal features, similar to ones found in the brain of mammals

160 citations


Journal ArticleDOI
TL;DR: The sample first principal component reflects the covariance among measured dimensions induced by general growth, and its coefficients are interpretable as exponents of postnatal growth allometry.
Abstract: Analyses of craniodental measurement data from 15 wild-collected population samples of the Neotropical muroid rodent genus Zygodontomys reveal consistent patterns of relative variability and correlation that suggest a common latent structure. Eigenanalysis of each sample covariance matrix of logarithms yields a first principal component that accounts for a large fraction of the total variance. Variances of subsequent sample principal components are much smaller, and the results of bootstrap resampling together with asymptotic statistics suggest that characteristic roots of the covariance matrix after the first are seldom distinct. The coefficients of normalized first principal components are strikingly similar from sample to sample: inner products of these vectors reveal an average between-sample correlation of 0.989, and the mean angle of divergence is only about eight degrees. Since first principal component coefficients identify the same contrasts among variables as comparisons of relative variability and correlation, we conclude that a single factor accounts for most of the common latent determination of these sample dispersions. Analyses of variance based on toothwear (a coarse index of age) and sex in the wild-collected samples, and on known age and sex in a captive-bred population, reveal that specimen scores on sample first principal components are age- and sex-dependent; residual sample dispersion, however, is essentially unaffected by age, sex, or age × sex interaction. The sample first principal component therefore reflects the covariance among measured dimensions induced by general growth, and its coefficients are interpretable as exponents of postnatal growth allometry. Path-analytic models that incorporate prior knowledge of the equivalent allometric effects of general growth within these samples can be used to decompose the between-sample variance by factors corresponding to other ontogenetic mechanisms of form change. The genetic or environmental determinants of differences in sample mean phenotypes induced by such mechanisms, however, can be demonstrated only by experiment.

73 citations


Journal ArticleDOI
TL;DR: This study presents an application of principal component analysis as a means for representing graphic waveforms in a parsimonious manner and shows that the corresponding basis vectors span parts of the gait cycle where the most variability between individual subjects exists.

59 citations


Journal ArticleDOI
TL;DR: In this article, the authors examined the cross-sectional pricing equation of the APT using the elements of eigenvectors and the maximum likelihood factor loadings of the covariance matrix of returns as measures of risk and concluded that in some circumstances principal components analysis may be preferred to factor analysis.
Abstract: We examine the cross-sectional pricing equation of the APT using the elements of eigenvectors and the maximum likelihood factor loadings of the covariance matrix of returns as measures of risk. The results indicate that, for data assumed stationary over twenty years, the first vector is a surprisingly good measure of risk when compared with either a one- or a five-factor model or a five-vector model. We conclude that in some circumstances principal components analysis may be preferred to factor analysis. THE ARBITRAGE PRICING THEORY (APT) of Ross (1976) hypothesizes that the cross-sectional distribution of expected returns of financial assets can be approximately measured by their sensitivities, usually factor loadings, to k unknown economic factors. In an important development, Chamberlain and Rothschild (1983) extend Ross's APT by proving that, if k eigenvalues of the population covariance matrix increase without bound as the number of securities in the population increases, then the elements of the corresponding k eigenvectors (the weights in principal components) of the covariance matrix can be used as the factor sensitivities. Connor and Korajczyk (1988) show that this conclusion holds for the sample covariance matrix as well. However, the proof that the eigenvectors can be used instead of statistical factor loadings in the returns-generating model for a large (with infinitely many assets) economy does not necessarily imply that the eigenvectors can be used in a cross-sectional model of security pricing in finite economies. Principal components analysis is equivalent to factor analysis only when the idiosyncratic risks of all firms are equal. If idiosyncratic risks vary, factor analysis is less constrained than the components analysis because factor analysis estimates idiosyncratic risks simultaneously with factor loadings while idiosyncratic risks are ignored when components are estimated. In practice this means that any nonstationarity and measurement errors in the data will affect the estimation of components more than the factors because k factor loadings are always estimated with more degrees of freedom than k components. Thus, there appears to be a reasonable a priori belief by some researchers that k eigenvectors simply will not perform as well as k factor loadings in measuring systematic risk.

59 citations


Journal ArticleDOI
TL;DR: In this paper, a relative measure corresponding to the percentage "variance accounted for" is proposed as an alternative to the variance measures usually reported with GPA analyses, which is easy to interpret, graphical display of the measures helps in interpreting the 'analysis of variance' tables.

58 citations


Journal ArticleDOI
TL;DR: Principal Components Analysis can be employed to demonstrate that variables derived from mapping studies are highly intercorrelated and data dimensionality substantially less than the total number of variables initially created, which reduces the likelihood of capitalization on chance.
Abstract: Topographic mapping of brain electrical activity has become a commonly used method in the clinical as well as research laboratory. To enhance analytic power and accuracy, mapping applications often involve statistical paradigms for the detection of abnormality or difference. Because mapping studies involve many measurements and variables, the appearance of a large data dimensionality may be created. If abnormality is sought by statistical mapping procedures and if the many variables are uncorrelated, certain positive findings could be attributable to chance. To protect against this undesirable possibility we advocate the replication of initial findings on independent data sets. Statistical difference attributable to chance will not replicate, whereas real difference will reproduce. Clinical studies must, therefore, provide for repeat measurements and research studies must involve analysis of second populations. Furthermore, Principal Components Analysis can be employed to demonstrate that variables derived from mapping studies are highly intercorrelated and data dimensionality substantially less than the total number of variables initially created. This reduces the likelihood of capitalization on chance. The need to constrain alpha levels is not necessary when dimensionality is low and/or a second data set is available. When only one data set is available in research applications, techniques such as the Bonferroni correction, the "leave-one-out" method, and Descriptive Data Analysis (DDA) are available. These techniques are discussed, clinical and research examples are given, and differences between Exploratory (EDA) and Confirmatory Data Analysis (EDA) are reviewed.

52 citations


Journal ArticleDOI
TL;DR: In this article, a set-theoretic approach to parameter estimation based on the bounded-error concept is proposed, which is an appropriate choice when incomplete knowledge of observation error statistics and unavoidable structural model error invalidate the presuppositions of stochastic methods.

51 citations


Journal ArticleDOI
TL;DR: The sequential fitting (SEFIT) approach of principal component analysis is extended to include several nonstandard data analysis and classification tasks and new methods are developed for both traditional and fuzzy clustering.
Abstract: A particular factor analysis model with parameter constraints is generalized to include classification problems definable within a framework of fitting linear models The sequential fitting (SEFIT) approach of principal component analysis is extended to include several nonstandard data analysis and classification tasks SEFIT methods attempt to explain the variability in the initial data (commonly defined by a sum of squares) through an additive decomposition attributable to the various terms in the model New methods are developed for both traditional and fuzzy clustering that have useful theoretic and computational properties (principal cluster analysis, additive clustering, and so on) Connections to several known classification strategies are also stated

Journal ArticleDOI
TL;DR: In this paper, the authors developed methods that will lead to the generation of a reliable mean annual precipitation data base for Ethiopia, where multiple regression models have been formulated that explain the mean annual rainfall as a function of elevation and geographical location.
Abstract: This study aims at developing methods that will lead to the generation of a reliable mean annual precipitation data base for Ethiopia. Multiple regression models have been formulated that explain the mean annual rainfall as a function of elevation and geographical location. The estimations, based on yearly values from a data set of 63 Ethiopian rainfall stations with records between 1969 and 1985, were developed for the whole country as well as for the already existing Food and Agricultural Organization (FAO) rainfall pattern regions and a new zonation derived by principal component and common factor analyses (PCA/CFA). In the PCA/CFA study, monthly rainfall values between 1968 and 1985 for 43 stations were used. The optimal zonation was derived by testing 36 different combinations resulting in different rainfall pattern regions. The alternatives tested were: correlation and covariance dispersion matrices, PCA and CFA eigentechni-ques, unrotated and rotated (Varimax and Direct Oblimin) components/factors and number of possible significant components/factors (3, 7, and 11). Principal component analysis of covariance matrix, Varimax rotation and seven extracted components gave by far the best relationship between mean annual rainfall, elevation, and geographical location. Models explaining at least 72 per cent of the variation in rainfall were constructed for regions covering about 98 per cent of the country, which is better than models based on the FAO rainfall pattern regions (69 per cent explained variation for 81 per cent of the country) and a model for the whole country (66·5 per cent explained variation).

Book ChapterDOI
01 Jan 1990
TL;DR: A robust estimate of a covariance matrix is introduced and some of its properties are investigated as well as two examples of suitable inner products chosen for measuring the distances between the units.
Abstract: Principal Component Analysis can produce several interesting projections of a point cloud if suitable inner products are chosen for measuring the distances between the units. We discuss two examples of such choices. The first one allows us to display outliers, while the second is expected to display clusters. Doing so we introduce a robust estimate of a covariance matrix and we investigate some of its properties.

Journal ArticleDOI
TL;DR: In this article, a simple modification of the original NIPALS procedure to avoid getting smaller eigenvalues is presented, and it is shown that even if this starting vector is used, the first principal component cannot always be obtained in all cases.
Abstract: The Non-linear Iterative Partial Least Squares (NIPALS) algorithm is used in principal component analysis to decompose a data matrix into score vectors and eigenvectors (loading vectors) plus a residual matrix. NIPALS starts with some guessed starting vector. The principal components obtained by NIPALS depends on the starting vector; the first principal component could not always be computed. Wold has suggested a starting vector for NIPALS, but we have found that even if this starting vector is used, the first principal component cannot be obtained in all cases. The reason why such a situation occurs is explained by the power method. A simple modification of the original NIPALS procedure to avoid getting smaller eigenvalues is presented.

Journal ArticleDOI
TL;DR: This study questioned use of certain multivariate techniques in studies of wildlife habitat, based on analyses of a single set of meaningless data, and concluded that PCA of the correlation matrix of their data reduced dimensionality while explaining a high proportion of the total variance.
Abstract: Rexstad et al. (1988) questioned use of certain multivariate techniques in studies of wildlife habitat, based on analyses of a single set of meaningless data. In addition to the lack of replication, their study was flawed by use of ill-defined or irrelevant hypotheses, by analyses conducted in violation of a statistical assumption, and by interpretation unsupported by their own analyses. J. WILDL. MANAGE. 54(1):186-189 Rexstad et al. (1988) criticized 3 multivariate techniques commonly used in wildlife habitat and community studies. They performed principal component analyses (PCA), canonical correlation analysis (CC), and stepwise discriminant function analysis (DFA) on a data set composed of 15 functionally unrelated variables (e.g., liquor prices and telephone numbers). They concluded that "Each [analysis] produced seemingly significant results and suggested strong relationships between the variables measured" when applied to their meaningless data set (Rexstad et al. 1988:794) and that "Our data, containing no inherent relationships, had relationships manufactured by the statistical methods" (Rexstad et al. 1988:797). This content downloaded from 157.55.39.231 on Thu, 06 Oct 2016 04:08:13 UTC All use subject to http://about.jstor.org/terms J. Wildl. Manage. 54(1):1990 COMMENT ON MULTIVARIATE TECHNIQUES * Taylor 187 My objective is to point out that these conclusions are not supported by their results. The statistical methods did not manufacture relationships. Indeed, their summaries indicate that PCA did not detect significant results or strong relationships between their variables, that any relationship suggested by CC was at best weak, and that the data set was for the most part inappropriate for analysis by DFA. Their conclusions were based on rejection of irrelevant or illdefined hypotheses, on failure to compare their results with those expected due to chance, and on analysis of data that violated a basic statistical assumption. I thank L. D. Meeker, P. F. Sale, and 2 anonymous reviewers for their comments on various drafts of this manuscript. Rexstad et al. (1988:795) did not propose a testable hypothesis about PCA, but merely stated that "[they] evaluated the extent to which PCA would explain the variance of the data set and reduce its dimensionality." They mentioned no criterion for the seemingly significant result specified in their conclusion. They did find that PCA of the covariance matrix required only 2 components to explain 99.5% of the variance. However, each component was dominated by a single variable with a large variance relative to the other 13 variables. Principal components dominated by single variables indicate no relationships between variables, contrary to Rexstad et al.'s conclusion. This PCA did not reduce dimensionality or explain a high proportion of the variance. It merely mirrored the essentially 2-dimensional nature of the original data, dominated by 2 uncorrelated variables contributing most of the variance. Rexstad et al. (1988) concluded that PCA of the correlation matrix of their data reduced dimensionality (from 15 original variables to 7 PC's with eigenvalues >1) while explaining a high proportion (69%) of the total variance. This reduction in dimensionality is trivial and well within the range of behavior expected from uncorrelated random variables. If the eigenvalues are themselves no more than random variables symmetrically distributed around a mean of 1 (the average contribution of a single original variable to the total variance), then about half, or around 7, should be >1. Also, any 10 of the original variables would, on average, explain about 67% (10/15) of the total variance even if they were totally unrelated. Such explanatory ability should be expected often in only 7 random variables, simply due to chance. Rexstad et al. (1988:796) actually confirmed this point with a sphericity test which "failed to reject the hypothesis that all eigenvalues were of equal magnitude, i.e., that all PC's explained an equal amount of variation in the data, characteristic of random data." Another interpretation of the sphericity test is that no relationships (correlations) between variables other than those expected from chance were found; a nonsignificant sphericity test should lead to a decision not to perform PCA (Cooley and Lohnes 1971:103). The correct inference from both the covariance PCA and the sphericity test is that there was no structure in the data. The inferences of reduced dimensionality, seemingly significant results, and strong relationships between variables are in-

Patent
Jacques Sirat1, Didier Zwierski1
30 Mar 1990
TL;DR: In this paper, a method for determining approximations of eigenvectors of a covariance matrix associated with signal data on the basis of the Principal Component Transform is proposed for Singular Value Decomposition of data matrices.
Abstract: Method of processing signal data on the basis of Principal Component Transform, apparatus for performing the method. A method is proposed for determining approximations of eigenvectors of a covariance matrix associated with signal data on the basis of the Principal Component Transform. Successive iterations on estimates enhance the development into the direction of the principal component considered. The method is applicable to the Singular Value Decomposition of data matrices. The operations to be performed are similar to those required in a neural net for adapting the synaptic coefficients. Apparatus are proposed for performing the method.

Journal ArticleDOI
TL;DR: Use of the SVD model leads to clear concepts concerning sampling, data reduction, normalization, and calculation of statistical significance, some of which are less evident when analysis is restricted to a single domain of interest.
Abstract: The application of Singular Value Decomposition (SVD)to analysis of EEG and evoked potential data has led to a hypothesis concerning the underlying structure of the EEG recorded from multiple channels. Based on the SVD algorithm the EEG is considered to be the linear combination of a sufficient number of features, each of which is defined in terms of its spatial distribution, temporal distribution, and amplitude. Use of this model leads to clear concepts concerning sampling, data reduction, normalization, and calculation of statistical significance, some of which are less evident when analysis is restricted to a single domain of interest.

Journal ArticleDOI
TL;DR: In this paper, a different approach based on some measures of closeness between the subspaces spanned by the initial eigenvectors and their corresponding version derived from an infinitesimal perturbation of the data distribution is proposed.
Abstract: In the context of sensitivity analysis in principal component analysis, Tanaka (1988) tackles the problem of the stability of the subspace spanned by dominant principal components. He derives the influence functions related to the projection operator on this subspace and to the spectral decomposition of the covariance or correlation matrix as sensitivity indicators. We suggest here a different approach based on some measures of closeness between the subspaces spanned by the initial eigenvectors and their corresponding version derived from an infinitesimal perturbation of the data distribution.


Journal ArticleDOI
TL;DR: Aitchison as mentioned in this paper proposed a method for principal component analysis of compositional data involving transformation of the raw data and showed how this can be approximated by a correspondence analysis of appropriately transformed data.
Abstract: Principal component and correspondence analysis can both be used as exploratory methods for representing multivariate data in two dimensions. Circumstances under which the, possibly inappropriate, application of principal components to untransformed compositional data approximates to a correspondence analysis of the raw data are noted. Aitchison (1986) has proposed a method for the principal component analysis of compositional data involving transformation of the raw data. It is shown how this can be approximated by a correspondence analysis of appropriately transformed data. The latter approach may be preferable when there are zeroes in the data.

Journal ArticleDOI
TL;DR: Principal components analysis reveals that the GCS sum score is a particularly inefficient summarizer of information in this cohort of acute brain injuries in a pediatric population, in which severity of brain injury had been assessed with the Glasgow Coma Scale.
Abstract: Principal components analysis is widely used as a practical tool for the analysis of multivariate data. The aim of this analysis is to reduce the dimensionality of a multivariate data set to the smallest number of meaningful and independent dimensions. The analysis can also provide interpretable linear functions of the original measured variables that may serve as valuable indices of variation. A brief introduction to principal components analysis is given herein, followed by an examination of a particular set of multivariate data accruing from a study of acute brain injuries in a pediatric population, in which severity of brain injury had been assessed with the Glasgow Coma Scale (CGS). Principal components analysis reveals that the GCS sum score is a particularly inefficient summarizer of information in this cohort. The determination of an objective weighting of measured variables, as provided through principal components analysis, is essential in the construction of meaningful neurological scoring instruments.

Journal ArticleDOI
TL;DR: A model for the analysis of time–budgets using a property that rows of this data matrix add up to one is discussed and compared with logcontrast principal component analysis.
Abstract: Time–budgets summarize how the time of objects is distributed over a number of categories. Usually they are collected in object by category matrices with the property that rows of this data matrix add up to one. In this paper we discuss a model for the analysis of time–budgets that used this property. The model approximates the observed time–budgets by weighted sums of a number of latent time–budgets. These latent time–budgets determine the behavior of all objects. Special attention is given to the identification of the model. The model is compared with logcontrast principal component analysis.

Journal ArticleDOI
TL;DR: In this article, nine trace elements (Al, Cr, Mn, Fe, Ni, Cu, Zn, Cd, Pb) were determined in cheese by atomic absorption spectrophotometry with electrothermal atomization in a graphite tube, using the ashing procedure.
Abstract: Nine trace elements (Al, Cr, Mn, Fe, Ni, Cu, Zn, Cd, Pb) were determined in cheese by atomic absorption spectrophotometry with electrothermal atomization in a graphite tube, using the ashing procedure. Associations among mineral constituents were studied by means of principal component analysis, which allows determination of interdependences among trace elements in foods. A test for normality was used to investigate monovariate distributions, in order to estimate the symmetry of data vector. The correlation matrix was used as a starting matrix for principal component analysis; nine variables were reduced to four principal components. The clusters of elements appear to be determined by their origin.

Journal ArticleDOI
TL;DR: In this article, the authors proposed two approaches to estimate the number of factors in a data compression model: the first uses the estimated standard error of the model and the second is an approximation to a leave-one-out method.
Abstract: Overcoming the collinearity problem in regression by data compression techniques [i.e., principal component regression (PCR) and partial least-squares (PLS)] requires estimation of the number of factors (principal component) to use for the model. The most common approach is to use cross-validation for this purpose. Unfortunately, cross-validation is time consuming to carry out. Accordingly, we have searched for time-saving methods to estimate the number of factors. Two approaches were considered. The first uses the estimated standard error of the model and the second is an approximation to a cross-validation leave-one-out method. Both alternatives have been tested on spectroscopic data. It has been found that, when the number of wavelengths is limited, both methods give results similar to those obtained by full cross-validation both for PCR and PLS. However, when the number of wavelengths is large, the tested methods are reliable only for PCR and not for PLS.

Proceedings ArticleDOI
17 Jun 1990
TL;DR: Experimentation showed that the 8-by-8-pixel segments are optimal with regard to TSSE on unseen data, and it would seem preferable to use PCA analysis rather than neural network methods to produce the reduced dimensional input to a diagnostic network.
Abstract: A set of images was compressed using artificial neural networks (ANNs) utilizing error back-propagation and by principal component analysis (PCA). The total sum square error (TSE) was compared for each method using the training data and a second set of test images. The difference between seen and unseen images with respect to TSSE is more pronounced in 32 by 16 pixel segments than in 8 by 8 segments in PCA compression. Further experimentation showed that the 8-by-8-pixel segments are optimal with regard to TSSE on unseen data. ANNs also seem to generalize less accurately on larger segments. Since the time taken for neural network compression learning is about an order of magnitude higher than PCA, and PCA is more repeatable in terms of the error magnitude and produces lower error for unseen segments, it would seem preferable to use PCA analysis rather than neural network methods to produce the reduced dimensional input to a diagnostic network

Journal ArticleDOI
TL;DR: In this article, principal component (PCA) and factor analysis (FA) were evaluated for the interpretation of the information contained in large datasets resulting from the study of environmental samples by gas chrom...
Abstract: Principal component (PCA) and factor analysis (FA) are evaluated for the interpretation of the information contained in large datasets resulting from the study of environmental samples by gas chrom...


Journal ArticleDOI
TL;DR: In this article, the stability of robust principal components analysis (PCA) is measured by means of an angular measure between sample principal components and population principal components, the latter being obtained by bootstraping.

Proceedings ArticleDOI
01 May 1990
TL;DR: The underlying numerical analysis for the theoretical proof of the convergency of OLN is discussed and the same numerical analysis provides a useful estimate of optimal learning rates leading to very fast convergence speed.
Abstract: The regular principal components (PC) analysis of stochastic processes is extended to the so-called constrained principal components (CPC) problem. The CPC analysis involves extracting representative components which contain the most information about the original processes, The CPC solution has to be extracted from a given constraint subspace. Therefore, the CPC solution may be adopted to best recover the original signal and simultaneously avoid the undesirably noisy or redundant components. A technique for finding optimal CPC solutions via an orthogonal learning network (OLN) is proposed. The underlying numerical analysis for the theoretical proof of the convergency of OLN is discussed. The same numerical analysis provides a useful estimate of optimal learning rates leading to very fast convergence speed. Simulation and application examples are provided. >

Journal Article
TL;DR: In this paper, an extended class of geometrically orthogonal designs of linear regression models is defined and studied, which includes all well-behaved ANOVA and regressions designs.
Abstract: Tjur (1984) showed that an orthogonal (=balanced) analysis of variance (ANOVA) design may be described and analysed in terms of an associated factor structure diagram. In this paper an extended class of orthogonal designs is defined and studied, the class of geometrically orthogonal designs of linear regression models, which includes all well-behaved ANOVA and regressions designs. It is shown that such designs may be characterized and analysed most naturally in terms of the lattice structure of L, the family of regression subspaces in the design. Any such design may be extended in a natural way to a family of canonical variance component models, called a geometrically orthogonal variance component design