scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1976"


Journal ArticleDOI
TL;DR: In this article, a matrix of partial correlations is used for principal component analysis and image component analysis, and no components are extracted after the average squared partial correlation reaches a minimum. But this approach can be applied to any type of component analysis; it can be used as an alternative to, or a first-stage solution for, factor analysis.
Abstract: A common problem for both principal component analysis and image component analysis is determining how many components to retain. A number of solutions have been proposed, none of which is totally satisfactory. An alternative solution which employs a matrix of partial correlations is considered. No components are extracted after the average squared partial correlation reaches a minimum. This approach gives an exact stopping point, has a direct operational interpretation, and can be applied to any type of component analysis. The method is most appropriate when component analysis is employed as an alternative to, or a first-stage solution for, factor analysis.

1,984 citations


Journal ArticleDOI
Svante Wold1
TL;DR: Pattern recognition based on modelling each separate class by a separate principal components (PC) model is discussed and these PC models are shown to be able to approximate any continuous variation within a single class.

1,060 citations


Journal ArticleDOI
TL;DR: In this article, it is shown that most classical methods of linear multivariate statistical analysis can be interpreted as the search for optimal linear transformations or, equivalently, the searching for optimal metrics to apply on two data matrices on the same sample; the optimality is defined in terms of the similarity of the corresponding configurations of points, which, in turn, calls for the maximization of the associated RV•coefficient.
Abstract: Consider two data matrices on the same sample of n individuals, X(p x n), Y(q x n). From these matrices, geometrical representations of the sample are obtained as two configurations of n points, in Rp and Rq It is shown that the RV‐coefficient (Escoufier, 1970, 1973) can be used as a measure of similarity of the two configurations, taking into account the possibly distinct metrics to be used on them to measure the distances between points. The purpose of this paper is to show that most classical methods of linear multivariate statistical analysis can be interpreted as the search for optimal linear transformations or, equivalently, the search for optimal metrics to apply on two data matrices on the same sample; the optimality is defined in terms of the similarity of the corresponding configurations of points, which, in turn, calls for the maximization of the associated RV‐coefficient. The methods studied are principal components, principal components of instrumental variables, multivariate regression, canonical variables, discriminant analysis; they are differentiated by the possible relationships existing between the two data matrices involved and by additional constraints under which the maximum of RV is to be obtained. It is also shown that the RV‐coefficient can be used as a measure of goodness of a solution to the problem of discarding variables.

897 citations


Journal ArticleDOI
TL;DR: In this paper, Atchley, W. R., C. T. Gaskins, and D. I. Anderson presented an empirical analysis on the statistical consequences of compounding ratios of continuous variables.
Abstract: Atchley, W. R., C. T. Gaskins, and D. Anderson (Departments of Biological Sciences, Animal Sciences, and Biomedical Engineering and Computer Medicine, Texas Tech University, Lubbock, Texas 79409) 1976. Statistical properties of ratios. I. Empirical results. Syst. Zool. 25:137-148.-Results are presented of an empirical analysis on the statistical consequences of compounding ratios of continuous variables. Three commonly employed relationships among the ratio variables Y and Z were examined including 1) Y=X1/X2, Z =X/X2 2) Y = X1/X2, Z = X1; and 3) Y = X1/X2, Z = X2. Simulation studies with minimum sample sizes of 25,0Q0 indicated large and systematic changes in both the structure and the underlying distributions of data when ratios and proportions were compounded between continuous variables. Ratio variables are skewed to the right and leptokurtic and the nonnormality is increased when magnitude of the denominator coefficient of variation is increased. Further, there is a pronounced increase in spurious correlation between variables when ratios are compounded and the magnitude of this spurious correlation is a function of the size of the denominator coefficient -of variation. This spurious correlation may increase from r = 0.0 between the original raw variables to r = 0.99 between the derived ratio variables. Multivariate statistical procedures such as principal components analysis are greatly affected when the data upon which the analyses are based include ratios or proportions. In this case, there is often an inflation of the first eigenvalue together with large changes in the magnitude and direction of the coefficients on the various principal components. Several common applications of ratios in biological research are discussed. Contrary to a widely held belief, it is shown that in the scaling of data, ratios do not remove the effect of the scaling variables but rather increase the correlation between the ratio variable and the original scaling variable. [Ratios; statistics; statistical distributions; principal components; taxonomy; size and shape.]

575 citations



Journal ArticleDOI
TL;DR: A survey of robust alternatives to the mean, standard deviation, product moment correlation, t-test, and analysis of variance is offered in this paper, with a focus on the effects of outliers.
Abstract: It is noted that the usual estimators that are optimal under a Gaussian assumption are very vulnerable to the effects of outliers. A survey of robust alternatives to the mean, standard deviation, product moment correlation, t-test, and analysis of variance is offered. Robust methods of factor analysis, principal components analysis and multivariate analysis of variance are also surveyed, as are schemes for outlier detection.

115 citations


Journal ArticleDOI
TL;DR: Theoretical problems with the factor analysis model, such as the factor indeterminacy issue, have resulted in increased interest in component analysis as an alternative to factor analysis as discussed by the authors.
Abstract: Theoretical problems with the factor analysis model, such as the factor indeterminacy issue, have resulted in increased interest in component analysis as an alternative. Indeed, component analysis ...

97 citations


Book
01 Jan 1976
TL;DR: Principal components analysis (PCA) as mentioned in this paper is a data analysis tool that is usually used to reduce the dimensionality (number of variables) of a large number of interrelated variables, while retaining as much of the information (variation) as possible.
Abstract: Principal Components Analysis, or PCA, is a data analysis tool that is usually used to reduce the dimensionality (number of variables) of a large number of interrelated variables, while retaining as much of the information (variation) as possible. PCA calculates an uncorrelated set of variables (components or pc’s). These components are ordered so that the first few retain most of the variation present in all of the original variables. Unlike its cousin Factor Analysis, PCA always yields the same solution from the same data (apart from arbitrary differences in the sign).

92 citations


Journal Article
TL;DR: A mathematical technique is described that separates potency from spectral information in biological activity spectra and is a preliminary step in the classification of chemical compounds and in the structuring of pharmacological assays.
Abstract: A mathematical technique is described that separates potency from spectral information in biological activity spectra. The technique is a preliminary step in the classification of chemical compounds and in the structuring of pharmacological assays. The method is illustrated using previously reported results on 40 neuroleptics in 12 assays on rats. Principal component analysis and cluster analysis are shown to provide complementing information.

78 citations


Journal ArticleDOI
TL;DR: A general statistical model for the multivariate analysis of mean and covariance structures is described, which has common-factor loadings that are invariant with respect to variable scaling and unique variances that must be positive.
Abstract: A general statistical model for the multivariate analysis of mean and covariance structures is described. Various models, such as those of Bock and Bargmann; Joreskog; and Wiley, Schmidt and Bramble, are special cases. One specialization of the general model produces a class of factor analytic models. The simplest case represents an alternative conceptualization of the multiple-factor model developed by Thurstone. In contrast to the traditional model, the new model has common-factor loadings that are invariant with respect to variable scaling and unique variances that must be positive. A special feature of the model is that it does not allow the confounding of principal components analysis and factor analysis. Matrix calculus is used to develop statistical aspects of the model. Parameters are estimated by the maximum likelihood method with Newton-Raphson iterations.

73 citations


Journal ArticleDOI
TL;DR: In this article, simulated coenoclines were used to test performance of several techniques for ordinating samples by species composition: Wisconsin polar or Bray-Curtis ordination with Euclidean distance (ED) and the complements of percentage similarity (PD) and coefficient of community (CD) as distance measures, Principal components analysis, and polar and non-polar or indirect use of Discriminant function analysis.
Abstract: Simulated coenoclines were used to test performance of several techniques for ordinating samples by species composition: Wisconsin polar or Bray-Curtis ordination with Euclidean distance (ED) and the complements of percentage similarity (PD) and coefficient of community (CD) as distance measures, Principal components analysis, and polar and non-polar or indirect use of Discriminant function analysis. In general the Bray-Curtis technique gave the best ordinations, and PD was the best distance measure. Euclidean distance gave greater distortion than PD in all tests; CD may be better than PD only for some sample sets of high alpha and beta diversity and high levels of noise or sample error. Principal components ordinations are increasingly distored as beta diversity increases, and are highly vulnerable to effects of both noise and sample clustering. Discriminant function analysis was found generally unsuitable for ordination of samples by species composition, but likely to be useful for sample classification.

Journal ArticleDOI
TL;DR: In this article, the authors compared several methods of handling missing observations in discrimination and found that a simple regression technique and a modified technique based on the first principal component gave relatively high probabilities of correct classification.
Abstract: This paper compares by simulations several methods of handling missing observations in discrimination. In an earlier paper, several methods were compared in discriminating by the usual linear discriminant function between two multivariate normal populations in which all pairs of variables are equally correlated. In the present study, a variety of population matrices was used and two additional methods were introduced: the first, a simpler regression technique and the second, a modified technique based on the first principal component. The new regression technique was found to give relatively high probabilities of correct classification.

Journal ArticleDOI
TL;DR: It is shown that this principal component analysis technique can be used to create new keys from a set of old keys, which are very useful in narrowing down the search domain.
Abstract: In this paper, we shall introduce a concept widely used by statisticians, the principal component analysis technique. We shall show that this principal component analysis technique can be used to create new keys from a set of old keys. These new keys are very useful in narrowing down the search domain. We shall also show that the projections on the first principal component direction can be viewed as hashing addresses for the best-match searching problem.

Journal ArticleDOI
TL;DR: A high correlation of the first principal component with crab size shows this component to be a better estimator of overall shell size than any of the original morphometric variates.

Journal ArticleDOI
TL;DR: The principal factors are not unique, but models which have a small number of parameters are more justifiable in light of the results of this study.

Journal ArticleDOI
TL;DR: In this paper, the application of R-mode principal components analysis to a set of closed chemical data is described using previously published chemical analyses of rocks from Gough Island, where different measures of similarity have been used and the results compared by calculating the correlation coefficients between each of the elements of the extracted eigenvectors and each of original variables.
Abstract: The application of R-mode principal components analysis to a set of closed chemical data is described using previously published chemical analyses of rocks from Gough Island. Different measures of similarity have been used and the results compared by calculating the correlation coefficients between each of the elements of the extracted eigenvectors and each of the original variables. These correlations provide a convenient measure of the contribution of each variable to each of the principal components. The choice of similarity measure (variance-covariance or correlation coefficient)should reflect the nature of the data and the view of the investigator as to which is the proper weighting of the variables—according to their sample variance or equally. If the data are appropriate for principal components analysis, then the Chayes and Kruskal concept of the hypothetical open and closed arrays and the expected closure correlations would seem to be useful in defining the structure to be expected in the absence of significant departures from randomness. If the data are not multivariate normally distributed, then it is possible that the principal components will not be independent. This may result in significant nonzero covariances between various pairs of principal components.

Journal ArticleDOI
TL;DR: In this paper, principal component analyses of the percentages of 19 species of planktic foraminifera in 22 Gulf of Mexico deep-sea cores of late Quaternary age show that the first component for each core generates an approximate paleotemperature curve.

Journal ArticleDOI
TL;DR: In this paper, the effect of interobserver error on a principal components analysis of a small sample of human crania is examined, and a comparison of individual specimen scores for components is made to find rotated principal components which identify interobservable error.
Abstract: The effect of interobserver error on a principal components analysis of a small sample of human crania is examined. A comparison of individual specimen scores for components is made to find rotated principal components which identify interobserver error. The individual variables which load highly on such components are then tested for interobserver error univariately. Multivariate components which must identify interobserver error contain no high loadings for variables which demonstrate interobserver error in the univariate case. Principal component analysis, in defining new component variables, extracts such error in an easily identified way which makes comparison of samples measured by more than one anthropometrist more reliable.

Journal ArticleDOI
TL;DR: In this article, the sampling properties of a test statistic which has important applications in the area of linear stochastic control systems with multi-inputs and multi-outputs were studied.
Abstract: In this paper we study the sampling properties of a test statistic which has important applications in the area of linear stochastic control systems with multi-inputs and multi-outputs. The statistic is the ratio of a partial sum of the eigenvalues of a sample covariance matrix and its trace. It turns out that using a method due to Sugiura we may derive a useful approximation for its distribution up to and including terms of order l/n, where n denotes the appropriate size. Numerical illustrations using real data are given.

Journal ArticleDOI
TL;DR: A compact program for performing a variety of regression and principal component computations is described in this article, where a singular value decomposition of the data matrix is used which permits calculations involving rank deficient data to be handled satisfactorily.
Abstract: A compact program for performing a variety of regression and principal component computations is described. A singular value decomposition of the data matrix is used which permits calculations involving rank deficient data to be handled satisfactorily. The importance of avoiding the calculation of a sum of squares and cross‐products matrix is demonstrated by an example.

Journal ArticleDOI
TL;DR: If the data are such that the eigenvectors are orthogonal functions of time and they have some recognizable non-random structure permitting predictability in time, then the observed response at time t can be used with the extrapolated forcing function to predict some physical quantity.
Abstract: The theorem of singular value decomposition is used to represent a data matrix X as the product of a system with a response R to a forcing function F. Algebraically, R is the matrix of principal components and F the transpose of the matrix of eigenvectors of X′X. If the data are such that the eigenvectors are orthogonal functions of time and they have some recognizable non-random structure permitting predictability in time, then the observed response at time t can be used with the extrapolated forcing function to predict some physical quantity (e.g., temperature, pressure). This method is called the time extrapolated eigenvector prediction (TEEP). An example is given to illustrate the method with a known forcing function, the annual solar heating cycle. We have access to efficient computer routines which will facilitate an extension to much larger data sets.

Journal ArticleDOI
TL;DR: A method is presented which can be used as a guideline in determining whether factor is structures obtained from two data sets are congruent, and an empirical sampling distribution of the statistic average trace was developed through a Monte Carlo approach.
Abstract: Attention has been drawn to the lack of standards for evaluating the degree of goodness of fit of patterns resulting from a principal components analysis of two data sets. An empirical sampling distribution of the statistic average trace (E'E), as E is obtained in the orthogonal Procrustes problem for various orders of A matrices was developed through a Monte Carlo approach. A method is presented which can be used as a guideline in determining whether factor is structures obtained from two data sets are congruent.

Journal ArticleDOI
TL;DR: In this article, it is argued that data reduction (when there is no a priori reason to keep the original units of measurement) should proceed by doing a component analysis on the covariance matrix rescaled so as to yield the maximally reliable composites found in a canonical reliability analysis.
Abstract: If variables are scaled so that they have equal errors of measurement, then the maximally reliable composites found in a canonical reliability analysis have highly desirable properties Under this scaling transformation, canonical factor analysis on all nonerror variance, principal components analysis and canonical reliability analysis all yield equivalent results Thus, the resulting components are successively maximally reliable and simultaneously provide a least squares fit to both the covariance matrix of rescaled observed scores and of rescaled true scores In addition, the proportion of true to observed score variance retained in an s-dimensional least squares fit is equal to the ratio of the s-dimensional canonical reliability to the canonical reliability of the entire battery Based on the above, it is argued that data reduction (when there is no a priori reason to keep the original units of measurement) should proceed by doing a components analysis on the covariance matrix rescaled so as to yield

Journal ArticleDOI
TL;DR: In this article, a systematic study was undertaken to correlate qualitatively a multiplicity of responses of nonwoven fabrics to a selected set of controlled variables, which were the chemical constitution of the base fiber, concentration of binder fibers, and three thermal bonding variables: bonding temperature, extent of annealing, and quench temperature.
Abstract: A systematic study was undertaken to correlate qualitatively a multiplicity of responses of nonwoven fabrics to a selected set of controlled variables. The nonwovens were bonded under heat and pressure using a bilateral bonding fiber mixed with a base fiber. The variables were the chemical constitution of the base fiber, concentration of binder fibers, and three thermal bonding variables: bonding temperature, extent of annealing, and quench temperature. Of 40 responses originally recorded, 23 mechanical properties are analyzed by regression on the first 5 new responses generated by a principal component analysis. The results of this method, which eliminates redundancies in the responses, are compared to the results which would have been obtained using the 23 separate responses. Improvement in ability to quantify effects of the variables was achieved.

Book ChapterDOI
01 Jan 1976
TL;DR: This chapter examines two other methods of analysis, principal factor analysis and principal component analysis, which are similar to the method of factor analysis using the centroid technique, but with a fundamental difference.
Abstract: In Chapter 2 we examined the method of factor analysis using the centroid technique. It was stated then that the aim of the analysis was to explain the correlations between the original observed variables in terms of their correlations with a smaller set of factors. In this chapter we will examine two other methods of analysis, principal factor analysis and principal component analysis. The approach of the two methods is similar and their aim, to aid interpretation of the underlying structure of the interrelationships between variables, is the same. But there is in fact, as we shall see later, a fundamental difference between the two methods.

01 Apr 1976
TL;DR: The Good enough Harris Drawing Test (GFT) as discussed by the authors is a test for detecting bias in the Harris drawing test, which has been used to identify race bias in elementary school students.
Abstract: EDRS PRICE MF-$0.83 HC-$1. 7 Plus Postage. DESCRIPTORS American Indians; *Analysis of variance; Anglo Americans; Elementary School Students; Ethnic Groups; *Factor Analysis; Factor Structure; *Item Analysis; Mexican Americans; Multiple Regression Analysis; Negroes; Orthogonal Rotation; Projective··Tests; Racial Discrimination; Statistical Analysis; *Test Bias IDENTIFIERS Good enough Harris Drawing Test; Principal Components Analysis; *Test Items

Journal ArticleDOI
TL;DR: In this article, an attempt is made to predict phosphate load by means of discriminant analysis, where eight groups of data are defined by a cluster analysis and principal component analysis and an F-ratio of the predictor variables are used to find a most favorable group of predictor variables by which an optimal separation between the eight different groups is possible.
Abstract: An attempt is made to predict phosphate load by means of discriminant analysis. Eight groups of data are defined by a cluster analysis. Principal component analysis and an F-ratio of the predictor variables are used to find a most favourable group of predictor variables by which an optimal separation between the eight different groups of data is possible. The discriminant functions, linear combination of the predictors, together with additional help of a classification procedure like Euclidic distances may be used to assign an individual phosphate measurement to the group it best corresponds to. The discriminant analysis shows that a linear combination of two out of a total of 33 predictors: namely the combination of (1) runoff and (2) settlement area have the best discriminant power. Multivariate tests of significance are performed. Tables are constructed that demonstrate the predicted versus actual group membership. Phosphate load along with the predictor variables were measured from 1973 to 19...

Journal ArticleDOI
TL;DR: In this article, a nonlinear mapping technique, operating in the R-mode and using the correlation matrix instead of the Euclidean distance matrix, is defined and its application to geoscience data is illustrated by an example of the ‘Fox’ stratigraphic data.