scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1972"


Journal ArticleDOI
TL;DR: It is shown that several of the rejection methods, of differing types, each discard precisely those variables known to be redundant, for all but a few sets of data.
Abstract: Often, results obtained from the use of principal component analysis are little changed if some of the variables involved are discarded beforehand. This paper examines some of the possible methods for deciding which variables to reject and these rejection methods are tested on artificial data containing variables known to be “redundant”. It is shown that several of the rejection methods, of differing types, each discard precisely those variables known to be redundant, for all but a few sets of data.

909 citations


Journal ArticleDOI
TL;DR: A brief review of the main methods and models for the analysis of multivariate binary data is given in this article, where the relation with standard second-order techniques is discussed, as well as the relation between the two methods.
Abstract: A brief review is given of the main methods and models for the analysis of multivariate binary data. The relation with standard second‐order techniques is discussed.

432 citations


Journal ArticleDOI
TL;DR: It is suggested here that the eigenvectors of the cross-spectrum matrix be used for interpreting atmospheric wave disturbances, analogous to the use of empirical orthogonal functions applied to band-pass filtered time series.
Abstract: Difficulties in using conventional cross-spectrum analysis to explore atmospheric wave disturbances have indicated the need for some extension of the usual technique. It is suggested here that the eigenvectors of the cross-spectrum matrix be used for interpreting such data. The method is analogous to the use of empirical orthogonal functions applied to band-pass filtered time series. However, the eigenvectors of the cross-spectrum matrix contain additional information concerning phase which is not available from the eigenvectors of the covariance matrix. It is possible to generate a new set of time series which are mutually uncorrelated within a pre-selected frequency interval and which have the same combined variance in the frequency interval as the original set of time series. These new series are obtained by applying the eigenvectors of the cross-spectrum matrix to a set of complex time series involving the original time series and their time derivatives. The application and physical interpret...

201 citations


Journal ArticleDOI
TL;DR: It was found that nonmetric scaling gave better results (as measured by the correlation between the distances in the k-dimensional configuration and the original distances) and principal coordinates analysis gave a better fit than principal components analysis when missing values were present.
Abstract: Rohlf, F. James (Ecology and Evolution, State University of New York, Stony Brook, New York 11790) 1972. An empirical comparison of three ordination techniques in numer ical taxonomy. Syst. Zool., 21:271-280.-This study reports on comparisons of Kruskal's nonmetric multidimensional scaling analysis, principal components analysis, and Gower's principal coordinates analysis. Nine different sets of data were used. It was found that nonmetric scaling gave better results (as measured by the correlation between the distances in the k-dimensional configuration and the original distances). It was also found that principal coordinates analysis gave a better fit than principal components analysis when missing values were present. [Nonmetric scaling; principal components; principal coordinates.]

121 citations


Journal ArticleDOI
TL;DR: The customary estimate ofrepeatability based on an intra-class correlation coefficient is shown to be a gross underestimate and two other methods based on a two-way analysis of variance and principal component analysis were found on verification to be very efficient.
Abstract: A statistical experiment designed to test the efficiency of different methods of estimatingrepeatability is described. The customary estimate ofrepeatability based on an intra-class correlation coefficient is shown to be a gross underestimate. Two other methods—one based on a two-way analysis of variance and the other based onprincipal component analysis— were found on verification to be very efficient. The latter method is more stable and more efficient in that it helps to highlight, measure, and eliminate certain latent aspects of variation, which the analysis of variance cannot disentangle and thereby it is superior to the analysis of variance approach in certain special circumstances. The method ofPrincipal Component Analysis, offers also a more efficient criterion of selection, in that the elements of the latent vector helps to obtain a weighted average which is superior to the arithmetic mean which latter can be biased by abnormal conditions which may occur from time to time during the period of the observations.

70 citations


Journal ArticleDOI
TL;DR: In this article, the relationship between patterns of generalized intrapopulational variation determined from principal components analysis and patterns of sexual dimorphism determined from Student's t and discriminant function analysis was investigated.
Abstract: The present research was undertaken to determine the relationship between patterns of generalized intrapopulational variation determined from principal components analysis and patterns of sexual dimorphism determined from Student's t and discriminant function analysis. The analysis was based upon 17 measurements of 97 femurs from a Middle Mississippian Amerindian population. The results of the principal components analysis indicated that the 17 original measurements could be represented as four principal component variates. Inspection of component loadings lent support to the contention that the first principal component reflected variation in general size while components two to four reflected variation in femoral shape. Analysis of the relationship between principal component loading and male-female differences reflected in Student's t demonstrated a high (0.97) positive correlation between absolute magnitude of loading in the first principal component and magnitude of Student's t. As a result, discriminant analyses of the femur, utilizing univariate criteria for the inclusion of variables, have been biased in the direction of size variation. Subsequent stepwise discriminant function analyses demonstrated that an adequate discriminant model must reflect all dimensions of morphological variation at the intrapopulational level.

57 citations



Journal ArticleDOI
01 Aug 1972-Heredity
TL;DR: The principal component analysis of genotype-environmental interactions and physical measures of the environment shows clear relationships between genotype and environment and the values of these values are related to health and disease.
Abstract: The principal component analysis of genotype-environmental interactions and physical measures of the environment

49 citations



Journal ArticleDOI
TL;DR: Application of differential weighting to quantify direction and rate of compositional change, expressed as a vector, provided evidence that the forest composition of the Ashland Wildlife Area is becoming more mesic.
Abstract: Several gradient analytical techniques were used to pro- vide rigorous ecological description of 75 forest plots in the Ashland Wild- life Area (AWA), Missouri. Tetrachoric correlation was used to construct a comparison matrix of the tree species. Gradients were then extracted from the comparison matrix using a principal components analysis. The model produced from the principal components analysis displayed vege- tational patterns evident in the area. The species principal component values were weighted, independently, by density, basal area and impor- tance value, to position the plots along the dimensions of the model. All three weighting variables yielded highly similar plot positions along a theoretical moisture gradient. Application of differential weighting to quantify direction and rate of compositional change, expressed as a vector, provided evidence that the forest composition of the Ashland Wildlife Area is becoming more mesic. A number of environmental variables were used as the third dimension on graphs of the plot posi- tions. Several environmental variables bear a definite relationship to species compositional patterns. A principal components model of the sapling and shrub species shows a few species shifting from xeric to more mesic conditions as compared to the tree model. This is verified by com- paring sapling densities of the dominant species. Acer saccharum is the leading sapling dominant in the Ashland Wildlife Area; and it appears likely that this species will eventually come to dominate the overstory of the present oak forests.

45 citations


Journal ArticleDOI
Ronald D. Snee1
TL;DR: In this article, a new model, which combines the univariate analysis of variance and principal component analysis, is developed, which overcomes some of the inadequacies of the available procedures and gives results which are easy to understand and interpret.
Abstract: Experimenters often make several observations on a given experimental unit. If these observations can be associated with some continuous variable, such as t)ime or temperature, they collectively form a curve. Wishart (1938) first recommended that a general regression model be fitted to each curve and that the effects of the experimental treatments be evaluated by analyzing the coefficients in the model. Univariate (Box, 1950) and multivariate analysis of variance (Cole and Grizzle, 1966) and principal component analysis (Church, 1966) are other techniques for the analysis of response curves. A discussion of these methods is presented and a new model, which combines the univariate analysis of variance and principal component analysis, is developed. This model overcomes some of the inadequacies of the available procedures and gives results which are easy to understand and interpret. Examples are presented which compare the various methods of analysis and illustrate the usefulness of this new model.

Journal ArticleDOI
TL;DR: Most effective discriminator analysis showed that between 5 and 7 species were sufficient to discriminate between the 7 clusters almost as well as the full species list, and no discontinuities were apparent in the environment from the information available.

Journal ArticleDOI
TL;DR: In this paper, the authors compare multiple regression and regression on the principal components of data matrices when applied to the problem of predicting water yields in Kentucky, and compare the two techniques when they are applied to predicting water yield in Kentucky.
Abstract: The advent of the digital computer has made the analysis of large quantities of hydrologic data possible. It has also made possible certain types of analyses that were not feasible for hand calculation. Among these techniques are multiple regression and regression on the principal components of data matrices. This report compares these two techniques when they are applied to the problem of predicting water yields in Kentucky.

Journal ArticleDOI
TL;DR: It was concluded that the chronological shift in the spatial distribution of the lycosid species was caused by changes in the vegetation, and the proposition that each principal component can be interpreted as only one real environmental factor does not hold for second and higher degree polynomial response curves.
Abstract: 1. The study was part of an extensive ecological program carried out in the dune are "Meijendel" of the Water Supply Company of The Hague. 2. The aim of this study was to analyse the spatial distribution of 12 wolfspider species occurring within one area, since it was expected that a competitive situation might exist. 3. The lycosid populations were sampled by 100 pitfalls grouped into four regions. The contents of these pitfalls were sampled weekly during a period of 7 years (1-3-1953 up till 24-2 1960 inclusive). 4. For a description of the relations between the distributions of the species the technique of Principal Component Analysis was used (R-technique) . 5. Principal Component Analysis was also used on the same data to obtain an ordination of the pitfalls (biotopes) (Q-technique). 6. From the R- and Q-technique, the following hypotheses suggested themselves: a. The degree of cover or a related factor is most important for the distribution of the species. b. A difference appears to exist between similar biotope types nearer to the sea and situated more landinward. 7. The optimum vegetation type appeared to be different for nearly each species. The optimum vegetation type for each species in my study area is in good agreement with the type of habitat stated in the literature for the species concerned. 8. A number of species showed a gradual shift in spatial distribution in the course of the seven years period of sampling. 9. Since it was shown that the spatial distribution of wolfspider species was linked to vegetation structure and since it was established that the vegetation itself had changed in the course of time, it was concluded that the chronological shift in the spatial distribution of the lycosid species was caused by changes in the vegetation. 10. The technique of Principal Component Analysis was amended at the following points: a. The linearity (additivity) of the principal components (factors) was changed in a more realistic multiplicative (fractional) model by taking the logarithms of the data. The additive model, however, still appeared to be a good approximation of the multiplicative one in this case. b. The proposition that each principal component, if necessary after rotation, can be interpreted as only one real environmental factor, does not hold for second and higher degree polynomial response curves. c. In the Q-technique, the mean and variances should be corrected, when the samples cannot be regarded as taken from one well-defined mathematical population.

Journal ArticleDOI
TL;DR: The relation between principal components and analysis of variance is examined in this paper, where it is shown that the model underlying the extended analysis of covariance developed by GOLLOB and MANDEL is useful also as a model for principal component analysis.
Abstract: The relation between principal components and analysis of variance is examined. It is shown that the model underlying the extended analysis of variance developed by GOLLOB and MANDEL is useful also as a model for principal component analysis. The elucidation of structure of two-factor data using the new analysis of variance model is illustrated by an example taken from thermodynamics. It has been may good fortune to have spent a full year in close association with Professor HAMAKER at the Technological University of Eindhoven. That year was among the most pleasant and most rewarding of my career. I feel honored to be able to join with Professor HAMAKER'S many friends and colleagues in dedicating this issue of Statistica Neerlandica to him. The method of principal components goes back to ideas proposed by PEARSON as early as 1901 (12) and developed systematically by HOTELLING in 1933 [4]. Since then the method has been applied to numerous sets of data, more particularly in the field of psychology, but also in numerous other areas of research, including the physical sciences [e.g. 1,2, 5, 7, 8, 10, 13, 14, IS, 16). In 1968 and 1969 respectively, GOLLOB (3) and MANDEL (8) proposed, independently of each other, an extension of the analysis of variance approach, which GOLLOB called "Fanova", because it combined features of analysis of variance and of factor analysis. This method, too, had been anticipated by some earlier authors [13, 17). It is immediately apparent that this extension of the analysis of variance involves the same matrix calculations as the method of principal components. The question then arises whether a deeper conceptual relationship exists between the two methods. In this paper this question is examined. The result is not only a positive answer to this question but also a clarification of the method of principal components. The author believes, as a result of this work, that interpretations of principal component analyses found in the literature are sometimes incorrect. We will attempt to show that such misinterpretations are due, in no small measure, to a particular terminology that has acquired common usage in inferences drawn from principal component analysis.



Journal ArticleDOI
TL;DR: A method which combines block size variance analysis with a multivariate (principal components) analysis is described and applied to a woodland community in south central Queensland.
Abstract: Multivariate analyses of vegetation data are normally restricted to a single scale of sampling, but since the pattern of species populations may vary over a range of scales, restriction to a single scale can result in a loss of potentially useful information. It is possible to examine spatial variation for a single species or pairs of species by block size variance (or covariance) analysis, but this is a somewhat cumbersome procedure when more than a few species are involved. A method which combines block size variance analysis with a multivariate (principal components) analysis is described and applied to a woodland community in south central Queensland. Contiguous site data, recorded as density scores for all tree and shrub species along a transect 512 m by 20 m, were grouped into successively larger blocks. Variance covariance matrices at each block size were calculated and added to form a combined covariance matrix. This was subjected to a principal components analysis to obtain species and sites coordinates. Each characteristic root was subsequently partitioned into contributions from the various block sizes, and the partitioned roots plotted against block size as in conventional pattern analysis. The first two components represented macro-variation in the vegetation of the transect (at scales of 120-250 m) and separated three macrocommunities which were associated with soil types. Two subsequent components expressed compositional differences at smaller scales (30-60 m) within these macrocommunities.

Journal ArticleDOI
TL;DR: In this article, the analysis of variance and principal component analysis divide sums of squares into orthogonal components, using external criteria and internal criteria, respectively, and the major difference is that the first uses external criteria, the second internal.
Abstract: Summary Both the analysis of variance and principal component analysis divide sums of squares into orthogonal components. The major difference is that the first uses external criteria, the second internal. The calculations involved in these techniques are demonstrated with an example. The way in which data may be standardized before principal component analysis is discussed. In some cases, the information discarded on standardization may be examined by an analysis of variance, and related to the results of the principal component analysis.

Journal ArticleDOI
TL;DR: In general it is found that matrices of zero-order correlations are of greater interpretive value than areMatrices of partial regression or canonical correlation vectors.


Journal ArticleDOI
TL;DR: In this article, a combination of principal component analysis (PCA) and regression analysis (RSA) was used to analyze the spectral reflectance curve of the directional reflectance, which is defined by two independent variables, namely wavelength and spatial direction.
Abstract: This paper deals with an analysis of the directional reflectance curve s specified by 2 kinds of independent variables, ie. wavelength and spatial direction.Generally, these reflectance curves have been analyzed for each of spatial directions successively, and it was suspected that this usual method was ineffective for utilization of informations contained in these observations.The principle of the present method is based upon the effective combination of principal component analysis and regression analysis, both of which are frequently used in multivariate statistics.The following effectiveness were confirmed by comparing true, observed and estimated spectral reflectance curves in a model experiment:(1) There is a possibility to make a useful estimation of physical meanings which might be hidden behind observed raw data.(2) The estimated spectral reflectance curves by the present method give better fit to the true curves than the raw data.

Journal ArticleDOI
TL;DR: McDonald's generalization of principal component analysis to groups of variables maximally channels the total variance of the original variables through the groups of variable acting as groups as mentioned in this paper, and a useful equation is obtained for determining the vectors of correlations of the L2 components with the original variable.
Abstract: It is shown that McDonald's generalization of classical Principal Components Analysis to groups of variables maximally channels the total variance of the original variables through the groups of variables acting as groups. A useful equation is obtained for determining the vectors of correlations of theL2 components with the original variables. A calculation example is given.

Book ChapterDOI
01 Jan 1972
TL;DR: In this paper, the authors present the methods for identifying outliers, the Hotelling T 2 statistic for testing hypotheses about one or two mean vectors and the problem of classifying an observation vector into one of two multivariate populations.
Abstract: This chapter presents the methods for identifying outliers, the Hotelling T 2 statistic for testing hypotheses about one or two mean vectors, and the problem of classifying an observation vector into one of two multivariate populations. It also presents the classification problem to k populations, k ≥ 2 and describes the stepwise classification procedure. It presents principal component analysis and factor analysis. In many applications, it is not sufficient to classify the individual into one of the populations and to report the probability of misclassification. Once the factor loadings have been obtained, the major burden of the analyst is to make the best interpretation of the common factors. This involves the technique of factor rotation, which, because of its subjectivity, is the most controversial part of factor analysis.

Journal ArticleDOI
TL;DR: It is presumed that there is an underlying factor which is called as •g heat tolerance•h and that this factor controls various physiological changes induced by heat exposure simultaneously and that these physiological changes may be used to assess a so called •gheat tolerance •h of each person.
Abstract: posure2)8)12) and that these physiological changes may be used to assess a so called •gheat tolerance•h of each person10). It will be natural to inquire what combination of these physiological changes is the best index for the evaluation of •gheat tolerance•h. It may be presumed that there is an underlying factor which is called as •g heat tolerance•h and that this factor controls various physiological changes induced by heat exposure simultaneously. If this hypothesis is true, the factor which may be expressed as a transformed variate of various physiological changes must be the best index for the evaluation of •gheat tolerance•h . The above hypothesis has been tested by applying a statistical method called •gprincipal components analysis•h4)9) to the data on several physiological changes when young healthy male subjects were compelled to exercise in a warm bath.3>