scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1983"


Journal ArticleDOI
TL;DR: In this article, a log linear contrast form of principal component analysis (LCCA) is proposed for compositional data analysis, which is based on transformation-based transformation techniques.
Abstract: SUMMARY Compositional data, consisting of vectors of proportions, have proved difficult to handle statistically because of the awkward constraint that the components of each vector must sum to unity. Moreover such data sets frequently display marked curvature so that linear techniques such as standard principal component analysis are likely to prove inadequate. From a critical reexamination of previous approaches we evolve, through adaptation of recently introduced transformation techniques for compositional data analysis, a log linear contrast form of principal component analysis and illustrate its advantages in applications.

461 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the principal components with the largest eigenvalues do not necessarily contain more information (distance), and the effect of scaling the variables on the distribution of the information to different components is investigated.
Abstract: SUMMARY In applying principal components for reducing the dimension of the data before clustering, it has ordinarily been the practice to use components with the largest eigenvalues. We prove, by means of a mixture of two multivariate normal distributions, that this practice is not justified in general. A relationship between the distance of the two subpopulations and any subset of principal components is derived, showing that the components with the larger eigenvalues do not necessarily contain more information (distance). This result is further demonstrated through hypothetical as well as real situations which use actual data. The effect of scaling the variables on the distribution of the information to different components is investigated. An application to a mixture of two normal distributions is illustrated by utilizing a set of generated data in which the information is concentrated in the components with the largest and the smallest eigenvalues.

327 citations



Journal ArticleDOI
TL;DR: In this paper, the use of partial least squares in latent variables (PLS) for multivariate calibration problems is described, where the application is the simultaneous determination of ligninsulfonate, humic acid and an optical whitener, from their severely overlapping fluorescence spectra.

239 citations


Journal ArticleDOI
01 Jun 1983-Ecology
TL;DR: The distance biplot as discussed by the authors is one of the most powerful analytical tools for species-composition data and derives some of its power from properties not possessed by, for example, reciprocal averaging.
Abstract: Attention is drawn to some useful but not generally known properties of principal components analysis (PCA). Noncentered PCA of proportion data gives site ordinations that display approximate alpha diversities of sites and beta diversities of groups of sites, as measured by the Simpson index and mean squared Euclidean distance, respectively. Species centering allows a better approximation to beta diversities. Alpha diversities can still be visualized after centering if the true origin is projected into the plane of the ordination. The approximate species composition of each site can also be visualized if the site ordination is combined with a species ordination. The resulting plot of site scores and species loadings is called a PCA biplot. Finally, in a PCA biplot that displays both species composition and diversity, diversity values can be explained in terms of the main species contributing to diversity. In such a biplot the sum of squares of the species loadings must be scaled to unity, while the site scores must be scaled to a sum of squares equal to the corresponding eigenvalue. This type of biplot is termed a "distance biplot." For a simple illustration noncentered and species—centered distance biplots were produced for some diatom samples taken from Dutch moorland pools in the 1920s and 1978. The distance biplot is concluded to be among the most powerful analytical tools for species—composition data and derives some of its power from properties not possessed by, for example, reciprocal averaging. One problem is that it attaches little weight to rare species, but this problem can be solved by various possible data transformations based on the theory of diversity indices.

172 citations


Journal ArticleDOI
TL;DR: In this article, a cross-Gramian matrix which contains information about both controllability and observability is defined for single-input, single-output, linear systems.
Abstract: A new matrix W co which can be considered as a cross-Gramian matrix which contains information about both controllability and observability is defined for single-input, single-output, linear systems. Using this matrix, the structural properties of linear systems are studied in the context of principal component analysis. The matrix W co can be used in obtaining balanced and other principal representations without computation of the controllability and the observability Gramians. The importance of this matrix in model-order reduction is highlighted.

146 citations


Journal ArticleDOI
01 Aug 1983-Ecology
TL;DR: A model of community organization for closely related species in which a relatively large number of specialists with invariant attributes are clustered near the community centroid and a smaller number of distinctive, variable species occupy niches more distant from the centroid is led.
Abstract: Diet and external morphology of nine species of insectivorous bats from Zambia, East Africa, were compared using multivariate methods. Morphological and dietary resemblance between species were positively correlated; that is, taxa which resembled each other most strongly morpho- logically were also most similar in dietary intake. The degree of morphological and dietary distinc- tiveness of a species was positively correlated with its morphological and dietary variability. For example, species which are quite distinct from others in morphology or diet tend also to be quite variable in those two attributes. Morphology of the bats was strongly predictive of their diets; most dietary variance was accounted for by morphological variance, and the first morphological principal component predicted the presence in the diet of Lepidoptera, beetles, and Orthoptera with a high level of significance. These results led to a model of community organization for closely related species in which a relatively large number of specialists with invariant attributes are clustered near the community centroid and a smaller number of distinctive, variable species occupy niches more distant from the centroid.

139 citations



Journal ArticleDOI
TL;DR: This method is based on principal component analysis and a constrained nonlinear optimization technique and is applicable to qualitative analysis of mixtures of more than three components and the noise problem with this method is discussed.
Abstract: A method is described for estimating the spectra of pure components from the spectra of unknown mixtures with various relative concentrations. This method is based on principal component analysis and a constrained nonlinear optimization technique and is applicable to qualitative analysis of mixtures of more than three components. The method gives two curves as the estimate of a component spectrum: one consists of the set of the maxima and the other consists of the set of the minima for all sampling points subject to a priori information. Experimental results of the estimation of the infrared absorption spectra of xylene–isomer mixtures are shown; the noise problem with this method is also discussed.

83 citations


Journal ArticleDOI
TL;DR: In this paper, it is shown that RQ-mode factor analysis can be expressed in measures determined by the form of the scalings that have been applied to the original data matrix.
Abstract: It is mathematically possible to extract both R-mode and Q-mode factors simultaneously (RQ-mode factor analysis)by invoking the Eckhart-Young theorem. The resulting factors will be expressed in measures determined by the form of the scalings that have been applied to the original data matrix. Unless the measures for both solutions are meaningful for the problem at hand, the factor results may be misleading or uninterpretable. Correspondence analysis uses a symmetrical scaling of both rows and columns to achieve measures of proportional similarity between objects and variables. In the literature, the resulting similarity is a χ 2 distance appropriate for analysis of enumerated data, the original application of correspondence analysis. Justification for the use of this measure with interval or ratio data is unconvincing, but a minor modification of the scaling procedure yields the profile similarity, which is an appropriate measure. Symmetrical scaling of rows and columns is unnecessary for RQ-mode factor analysis. If the data are scaled so the minor product W'Wis the correlation matrix, the major product WW'is expressed in the Euclidean distances between objects. Therefore, RQ-mode factor analysis can be performed so that the Rmode is a principal components solution and the Qmode is a principal coordinates solution. For applications where the magnitudes of differences are important, this approach will yield more interpretable results than will correspondence analysis.

67 citations


Journal ArticleDOI
TL;DR: In this paper, a simple derivation of the spectral decomposition of the covariance matrix for a general multi-way variance components model is presented, where balanced data are assumed to be available.

Journal ArticleDOI
TL;DR: In this paper, four relatively homogeneous field data sets were analyzed, representing boreal, heath-like forest-floor and rock vegetation in Finland, corresponding to Finnish Calluna and Cladina site types.
Abstract: Four relatively homogeneous field data sets were analyzed, representing boreal, heath-like forest-floor and rock vegetation in Finland, corresponding to Finnish Calluna and Cladina site types. The methods used were principal component analysis (PCA) of covariance matrices, orthogonal correspondence analysis or reciprocal averaging (RA), detrended correspondence analysis (DCA), and linear and nonmetric multidimensional scaling (MDS). RA and DCA gave ordinations in which every species had nearly equal weight. MDS and PCA gave results determined mostly by a few dominant species. MDS and PCA ordinations were very similar to RA and DCA ones when the original data were standardized so that for each species the mean of positive occurrences was the same while quantitative differences within species were retained. RA and PCA were generally very good and reliable, providing that the impact of rare species and outlier releves was removed in RA. DCA was slightly less reliable than RA. MDS was sensitive to uneven sampling patterns and was the least reliable method compared.

Journal ArticleDOI
TL;DR: The authors used principal component analysis to produce cokriging results in a computationally efficient manner and enable a straightforward extension to more than two variables, such as coal quality parameters, ash, heating value and sulfur.
Abstract: A method to estimate several spatially related variables is presented. The method uses principal component analysis to produce cokriging results in a computationally efficient manner and enables a straightforward extension to more than two variables. An example is given that describes the estimation of the coal quality parameters, ash, heating value, and sulfur.

Journal ArticleDOI
TL;DR: In this article, the basic underlying theory of factor analysis is presented and the results of an empirical study are reproduced and interpreted to illustrate application of the technique in the field of finance.
Abstract: The basic underlying theory of factor analysis is presented and the results of an empirical study are reproduced and interpreted to illustrate application of the technique.


Journal ArticleDOI
R. C. Tabony1
TL;DR: In this article, a simple technique devised for estimating climatological data in the U.K. was found to give results similar to those obtained from an eigenvector scheme used for quality control purposes, and it was suggested that satisfactory averages could be estimated for stations with only 10 years of data, and possibly less.
Abstract: Various methods of estimating montly means and extremes of climatological data are examined. Any generalized method is likely to be based on a correlation matrix, but the incompleteness of the data introduces problems with this approach. These are illustrated by program BMDPAM of the BMDP suite, which produces estimates inferior to those using traditional methods based on single station comparisons. Principal component analysis is considered likely to be the best statistical technique for estimating missing values among highly correlated data. The high quality correlation matrix required as input can be obtained by using a simple estimating procedure to produce a preliminary set of complete data. A simple technique devised for estimating climatological data in the U.K. was found to give results similar to those obtained from an eigenvector scheme used for quality control purposes. The accuracy of the technique is such that it is suggested that satisfactory averages could be estimated for stations with only 10 years of data, and possibly less.

Journal ArticleDOI
Franco Molteni, Paolo Bonelli1, P. Bacci1
TL;DR: In this article, principal component analysis is used to study the rainfall distribution over northern Italy in the cold season (October-April) and a spatial analysis is applied to the squares roots of daily data, recorded in 14 months, and of their 3-and 5-day means, working on cross-product matrices obtained from both standardized and nonstandardized values.
Abstract: Principal component analysis is used to study the rainfall distribution over northern Italy in the cold season (October-April). A spatial analysis is applied to the squares roots of daily data, recorded in 14 months, and of their 3- and 5-day means, working on cross-product matrices obtained from both standardized and nonstandardized values. Four principal components (PCs) can be selected: the first is an index of the mean rainfall; the second represents the longitudinal differences; the third and fourth are representative of orographic anomalies. For daily data, these 4 PCs account for more than 80% of the total variance (since cross-product matrices are used in the analysis the term variance must be interpreted as mean square value); this percentage is slightly higher working on nonstandardized values, but with standardized values the explained variance is distributed more uniformly among the 35 rainfall stations. Passing to 3- and 5-day means, the cumulative variance of the first 4 PCs increas...

Journal ArticleDOI
TL;DR: Logarithic bivariate regression slopes and logarithmic principal component coefficient ratios are two methods for estimating allometry coefficients corresponding to a in the classic power formula Y = BXa.
Abstract: Logarithmic bivariate regression slopes and logarithmic principal component coefficient ratios are two methods for estimating allometry coefficients corresponding to a in the classic power formula Y = BXa. Both techniques depend on high correlation between variables. Interpretation is logically limited to the variables included in analysis. Principal components analysis depends also on relatively uniform intercorrelations; given this, it serves satisfactorily as a method for summarizing many bivariate combinations. Unmodified major principal component coefficients cannot represent scaling to body weight; rather, they represent scaling to a composite size vector which usually is highly correlated with body size or weight but has an unspecified allometry. Thus, the concepts of proportionality and of isometry must be kept distinct.

Journal ArticleDOI
TL;DR: In this paper, the principal relations and the principal factors model for the set of unobserved (latent) variables are introduced and shown to be complementary for particular choices of the error covariance matrix, the general model reduces to several well-known multivariate techniques, including OLS, principal components, orthogonal regression and canonical correlations.

Journal ArticleDOI
TL;DR: In this paper, the asymptotic distribution for the principal component roots under local alternatives to multiple population roots is derived for an elliptical distribution, and the local alternative framework is used in deriving a local power function for the test for subsphericity.
Abstract: The asymptotic distribution for the principal component roots under local alternatives to multiple population roots is derived. The asymptotic theory assumes the estimate of the population covariance or scatter matrix to be asymptotically normal and to possess certain invariance properties. These assumptions are satisfied for the affine-invariant $M$-estimates of scatter for an elliptical distribution. The local alternative framework is used in deriving a local power function for the test for subsphericity.

Journal ArticleDOI
TL;DR: The retention factor of 54 drugs in eight eluent mixtures is reported and principal component analysis (PCA) of these data provided a significant two-components model that allowed an objective identification of unknown samples, provided they were included in the considered set.
Abstract: The retention factor of 54 drugs In eight eluent mixtures is reported. Principal component analysis (PCA) of these data provided a significant two-components model. These two parameters characteristic for each drug allowed an objective identification of unknown samples, provided they were included in the considered set. The analysis showed that the eluent mixtures cluster Into three groups. The PCA model, using only three eluents (one for each group), was also able to restrict the range of Inquiry to a few "candidates" and, In some cases, to allow unambiguous identiflcation of the drug. These results, based on a simple and quick analytical deterruination (thin layer chromatography) and a reliable statistical procedure (PCA), appear to be of significant practical importance in the field of analytical toxicology.

Journal ArticleDOI
TL;DR: A method for quantitative evaluation of diadochokinesia test (alternate pronation and supination of forearms) has been developed and a summarized measure for regularity agreed very well with visual judgment.

Journal ArticleDOI
TL;DR: In this article, the sampling behavior of the statistic W in the cross-vaiidatory scheme for selection of number of components in a principal component analysis, as proposed by Eastment and Krzanowski [1982, Technometrics, 24, 73, 77], is investigated.
Abstract: Sampling behaviour is investigated of the statistic W in the cross-vaiidatory scheme for selectionof number of components in a principal component analysis, as proposed by Eastment and Krzanowski [1982, Technometrics, 24, 73–77]. Two possible levels of variability are isolated, conditional and unconditional. The former is shown to be appropriate when assessing results from analysis of real data, and a previously given example is assessed in this way. As a by-product of this analysis, reasons for earlier erratic behaviour of W are suggested and the previous tentative proposals about the cut-off value of W are confirmed. The two possible sources of variability inW lead to a hierarchical design of the main Monte Carlo experiment. Results on sampling behaviour of W are given, and compared with traditional eigenvalue-based methods of choosing components. A possible interpretation of W is put forward.

Journal ArticleDOI
TL;DR: In this article, a class of invariant asymptotic tests based on the sample covariance matrix is derived, which are consistent and their local power functions are given, and the results are generalize when the procedures are based on any affine-invariant $M$-estimate of scatter and when the population is elliptical.
Abstract: In this paper, the hypothesis that a set of vectors lie in the subspace spanned by a prescribed subset of the principal component vectors for a normal population is considered. A class of invariant asymptotic tests based on the sample covariance matrix is derived. Tests in this class are shown to be consistent and their local power functions are given. The arguments used in deriving the class of tests are not heavily dependent on the assumption of normality nor on the use of the sample covariance matrix. The results are shown to generalize when the procedures are based on any affine-invariant $M$-estimate of scatter and when the population is elliptical.

Journal ArticleDOI
TL;DR: Basic physicochemical data and principal component analyses of both morphometric and chemical variables for 37 Ontario lakes are presented and multivariately derived composite variables were substi...
Abstract: We present basic physicochemical data and principal component analyses (PCA) of both morphometric and chemical variables for 37 Ontario lakes. Multivariately derived composite variables were substituted for more conventional independent variables in several regression models. These composite variables always explained a greater percentage of variance than the standard morphometric variables. A typology derived in part from the PCA was useful in identifying groups of lakes for which the Ragotzkie model predicting maximum depth of the summer thermocline was appropriate and other groups of lakes that appear to be outliers on the basis of insufficient volume. In addition, variables important in defining the typology, particularly ratio and lake volume, were shown to be among the most important in a stepwise, multiple linear regression explaining 91% of the variance in nannoplankton to filter-feeding zooplankton ratios in the lakes. This represents a 47% improvement over previously reported results. Total phos...

Book ChapterDOI
TL;DR: Publisher Summary In recent years, the application of principal component analysis (PCA) to brain event-related potential (ERP) data has come into vogue and there seem to be several benefits to be obtained.
Abstract: Publisher Summary In recent years, the application of principal component analysis (PCA) to brain event-related potential (ERP) data has come into vogue. There seem to be several benefits to be obtained by the application of PCA and related techniques to ERP data. The first is objectively to reduce the usual vast quantity of data to a more reasonable size by reducing the dimensionality in such a way that the salient features of the data are retained. A typical ERP study can generate several thousand waveforms that are each composed of 64 to 1024 points. PCA attempts to describe the underlying structure of this sort of large data set in terms of relatively few “basic waveforms”. These “basic waveforms” are also variously called the “principal components”, the “factor loadings”, or the “factors”. The “basic waveforms” are computationally determined from the cross-products, covariance, or correlation matrix of the data points in such a way that the first “basic waveform” accounts for the most variance and all subsequent waveforms account for the largest amount of residual variance in the data. Therefore the “basic waveforms” are orthogonal and represent independent dimensions of the data.

Journal ArticleDOI
TL;DR: In this paper, principal component and factor analyses are applied to the data set of fifteen elements in air-particulate matter collected in six particle-size fractions on a Bolivian mountain.

Proceedings ArticleDOI
26 Oct 1983
TL;DR: In this paper, a new method for the inspection of textile webs is proposed, which is characterized by the micro-texture of 3x3 neighbourhoods which is extracted by principal components, and local'rectification' of principal component images yields feature planes which can be fed to a classificator.
Abstract: A new method for the inspection of textile webs is proposed. The normal web structure is characterized by the micro-texture of 3x3 neighbourhoods which is extracted by principal components. Local 'rectification' of principal component images yields feature planes which can be fed to a classificator. Preliminary results are presented.© (1983) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Journal ArticleDOI
TL;DR: In this paper, a test family consisting of n-alkanes (methane to pentacontane), biphenyl, and all polychlorinated biphexyl congeners were developed with new variables obtained from a principal component analysis of the molecular connectivity indices.

Journal ArticleDOI
TL;DR: A method for obtaining a linear discriminant function to identify monogenic segregation in multivariate pedigree data and finds that linear function of the variables that maximizes the likelihood of a set of pedigree data, under the hypothesis of single gene segregation.
Abstract: We describe a method for obtaining a linear discriminant function to identify monogenic segregation in multivariate pedigree data. It differs from Fisher's linear discriminant function in that it does not assume that the genotype of each individual in the pedigree already known. The method consists of finding that linear function of the variables that maximizes the likelihood of a set of pedigree data, under the hypothesis of single gene segregation, subject to the constraint that the total sample variance of the function remains constant. To simplify the computation the variables are first transformed to their standardized principal components. Reanalysis of a set of pedigree data suggests that age and powers of age should be considered as extra variables from which the principal components are obtained, and virtually all of the variance should be accounted for by the principal components used to obtain the discriminant function.