scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1978"


Journal ArticleDOI
Svante Wold1
TL;DR: In this article, the rank estimation of the rank A of the matrix Y, i.e., the estimation of how much of the data y ik is signal and how much is noise, is considered.
Abstract: By means of factor analysis (FA) or principal components analysis (PCA) a matrix Y with the elements y ik is approximated by the model Here the parameters α, β and θ express the systematic part of the data yik, “signal,” and the residuals ∊ ik express the “random” part, “noise.” When applying FA or PCA to a matrix of real data obtained, for example, by characterizing N chemical mixtures by M measured variables, one major problem is the estimation of the rank A of the matrix Y, i.e. the estimation of how much of the data y ik is “signal” and how much is “noise.” Cross validation can be used to approach this problem. The matrix Y is partitioned and the rank A is determined so as to maximize the predictive properties of model (I) when the parameters are estimated on one part of the matrix Y and the prediction tested on another part of the matrix Y.

2,468 citations


Journal ArticleDOI
TL;DR: In this paper, a method for principal components analysis at a variety of scale levels (nominal, ordinal, or interval) is presented, where the variables may be either continuous or discrete.
Abstract: A method is discussed which extends principal components analysis to the situation where the variables may be measured at a variety of scale levels (nominal, ordinal or interval), and where they may be either continuous or discrete. There are no restrictions on the mix of measurement characteristics and there may be any pattern of missing observations. The method scales the observations on each variable within the restrictions imposed by the variable's measurement characteristics, so that the deviation from the principal components model for a specified number of components is minimized in the least squares sense. An alternating least squares algorithm is discussed. An illustrative example is given.

188 citations


Journal ArticleDOI
TL;DR: A new method, called step-across, is described which removes the horseshoe effect of principal components analysis of incidence data and can be used on both the rows and columns of an incidence matrix.
Abstract: SUMMARY (1) Principal components analysis of incidence (presence and absence) data produces a horseshoe effect. A new method, called step-across, is described which removes this effect. (2) From a matrix of joint occurrences, distances are found directly for all positive values. Distances for zero joint occurrences are found from the shortest network distance. The complete distance matrix is then analysed by Gower's principal coordinates technique. (3) The method can be used on both the rows and columns of an incidence matrix. It is illustrated with artificial data and with data on euphausiids from the Indian Ocean.

89 citations


Journal ArticleDOI
TL;DR: In this paper, the principal components analysis (PCA) was generalized to the treatment of vector fields of data and applied to a 12-month record of mean hourly wind velocities from 10 measurement locations in a mesoscale region.
Abstract: The method of principal components analysis (also known as empirical eigenvector analysis) was generalized to the treatment of vector fields of data and applied to a 12-month record of mean hourly wind velocities from 10 measurement locations in a mesoscale region. The primary spatial distributions of regional wind velocities were derived for each month. Time-series analysis in terms of the primary spatial velocity patterns was used to determine the fundamental temporal patterns or principal components. Necessary mathematical procedures are given and geometric representations of eigenvectors that define the primary spatial velocity patterns are presented. Applications of the generalized vector formulation of the method to current and future problems of atmospheric science are discussed.

74 citations


Journal ArticleDOI
TL;DR: A general comparison of multiple linear regression, discriminant analysis, canonical correlation, cluster analysis, and principal components analysis is included with a discussion of when these tests may be appropriate for vocalization studies.

61 citations


Journal ArticleDOI
01 Dec 1978
TL;DR: In this article, three precipitation regions in California are identified, mapped and otherwise described, and four independent sources of precipitation variation were discovered by a P-mode principal components analysis of a covariance matrix and they were evaluated by Ward's [14] algorithm in order to arrive at the regionalization.
Abstract: Three precipitation regions in California are identified, mapped and otherwise described. Four independent sources of precipitation variation were discovered by a P-mode principal components analysis of a covariance matrix and they, plus a fifth dimension of station means, were evaluated by Ward's [14] algorithm in order to arrive at the regionalization. A discussion of the present application of this methodology is also included as it is thought to be a more appropriate procedure than those described by Willmott [16] and others. Data for the study were 120 monthly precipitation totals observed from 1961–1970 at each of 90 randomly chosen stations within California.

58 citations


Journal ArticleDOI
TL;DR: In this article, the incorporation of measures of soil variability at the reconnaissance stage of soil survey is considered as a possible alternative to, or additional feature of, present procedures, and a straightforward hierarchical sampling design based upon the major parent materials of the upland section of the study area and associated analysis of variance model are described.
Abstract: Summary The incorporation of measures of soil variability at the reconnaissance stage of soil survey is considered as a possible alternative to, or additional feature of, present procedures. A straightforward hierarchical sampling design based upon the major parent materials of the upland section of the study area and associated analysis of variance model are described. This allowed the estimation of scale components for five levels of the sampling design at separation distances from over two kilometers down to five meters. Principal components analysis is used to economise on the number of variables for further analysis. Analysis of the pattern of principal component scores shows the first six components to have distinctive variability patterns both within and between strata. It is concluded that present soil survey procedures take too little account of soil variability patterns, and that inclusions of such information at the reconnaissance stage would greatly improve the detailed survey.

53 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the trace of the covariance matrix for estimators obtained by deleting principal components associated with the smallest eigenvalues is at least as small as that for any other least-squares estimator with an equal or smaller number of linear restrictions.
Abstract: A new optimal property for principal components regression is presented. In particular, it is shown that the trace of the covariance matrix for estimators obtained by deleting principal components associated with the smallest eigenvalues is at least as small as that for any other least-squares estimator with an equal or smaller number of linear restrictions. This property is useful in suggesting data transformations and determining the maximum variance reduction obtainable from the introduction of linear restrictions on the parameter space.

28 citations


Journal ArticleDOI
TL;DR: The principal components transformation as discussed by the authors provides an elegant escape from closure correlation if a petrographic problem can be restated entirely in terms of component scores, but not if a physical interpretation of the component vectors is required.
Abstract: The principal components transformation generates, from any data array, a new set of variables—the scores of the components—characterized by a total variance exactly equal to that of the initial set. It is in this sense that the transformed variables are said to “contain,” “preserve,” or “account for,” the variance of the original set. The scores, however, are uncorrelated. In the course of the transformation, what becomes of the strong interdependence of variance and covariance so characteristic of closed arrays? The question seems to have attracted little attention; we are aware of no study of it in the earth sciences. Experimental work reported here shows quite clearly that the overall equivalence of variance and covariance imposed by closure, though absent from the component scores,may emerge in relations between the coefficientsof each of the lower-order components; if the raw data are “complete” rock analyses, the sum of all the covariances of the coefficients of such a component is negative, and is very nearly equal to the sum of all the variances in absolute value. (In all cases so far examined, the absolute value of the first sum is a little less than that of the second.) The principal components transformation provides an elegant escape from closure correlation if a petrographic problem can be restated entirely in terms of component scores, but not if a physical interpretation of the component vectors is required.

28 citations


Journal Article
TL;DR: In this paper, a multivariate generalization of the bivariate allometric equation is given by the principal components or factors extracted from a covariance matrix of the logarithms of the original measurements.
Abstract: For many years, the bivariate allometric equation has yielded insights about size and shape. Multivariate generalizations of the bivariate allometric equation are given by the principal components or factors extracted from a covariance matrix of the logarithms of the original measurements. The coefficients in the principal components or the factor loadings show the nature of changes in the variables, which may be isometric or allometric. A single suite of multivariate data may contain one or more sources of allometry. The relationships between the variables are shown by the principal components or factor loadings whereas the corresponding ordinations of specimens are revealed by the principal component or factor scores. In the examination of allometry, most workers have used measurements such as lengths, widths, areas, volumes, weights, etc. However, these parameters only provide indirect insights about size and shape. Instead, for many applications we prefer to analyze the coordinates of points, which are homologous and/or topographically constant relative to orthogonal X, Y and Z axes. An X coordinate is actually a distance or linear dimension parallel to the X axis. The other coordinates can be treated in the same way. Such data deal directly with size and shape. The information that can be derived from multivariate allometry of point coordinates is similar to that obtained from transformation grids. Two case studies are discussed, namely the growth of Eurypterus remipes remipes DeKay, a Silurian eurypterid, and the evolution of the Dicoelosia varica lineage of Silurian-Devonian brachiopods. The technique has been sucessfully applied to various groups of organisms including foraminifera, brachiopods, trilobites, eurypterids, and vertebrates, as well as various non-geological objects including hang gliders and vintage aircraft.

27 citations


Journal ArticleDOI
TL;DR: In this article, principal component analysis of the infrared spectra of a series of related mixtures is used to determine the number of compounds present in the mixtures, using empirical error estimates.

Journal ArticleDOI
TL;DR: In this paper, a multivariate statistics technique of principal component analysis is proposed for analyzing the variability in the shape and amplitude of the reflected waveform, and the results of this analysis are used in conjunction with an improved cluster analysis program to produce maps of sediment distributions for a section of Narragansett Bay, Rhode Island.
Abstract: A new technique is presented for the study of seismic reflection profiles. Previous investigations of the reflection of sound from marine sediments have been restricted to the determination of a specular reflection coefficient. This study is based upon the hypothesis that the entire reflected waveform, rather than just the amplitude, contains valuable information on the classification and characteristics of these sediments. The multivariate statistics technique of principal component analysis is proposed for analyzing the variability in the shape and amplitude of the reflected waveform. The results of this analysis are used in conjunction with an improved cluster analysis program to produce maps of sediment distributions for a section of Narragansett Bay, Rhode Island. The groupings produced are in basic agreement with previous studies of the area. Preliminary regression equations are presented for converting the principal components to percentages of sand, silt, and clay.

Journal ArticleDOI
TL;DR: In this article, a polynomial ordination method was proposed to solve the problems of essentially linear ordination techniques on non-linear ecological data, and its effectiveness was compared to that of two standard techniques, Bray-Curtis ordination and principal components analysis, with both simulated and field data.
Abstract: Recent studies suggest problems resulting from using essentially linear ordination techniques on non-linear ecological data. A new non-linear method, termed polynomial ordination, was developed in response to these problems. Its effectiveness was compared to that of two standard techniques, Bray-Curtis ordination and principal components analysis, by testing with both simulated and field data. The original species axes are resolved into their principal components. If significant curvilinear relationships between principal components are present, new axes are defined along these curves. The coordinates of positions of the sample points along the axes are then determined. Using simulated data, the coordinates of the sample points on the first axis were compared to their coordinates on the original simulated gradient. Two statistics were used to evaluate how well the gradient was recovered. Of the methods tested, polynomial ordination best placed the samples in the correct order, although principal components analysis better recovered their absolute positions. Ordinations of vegetation samples along a Sonoran Desert bajada by all three methods suggested that soil particle size is a major environmental gradient affecting the species composition of the vegetation.


Journal ArticleDOI
TL;DR: In this paper, a general strategy for the quantification of assemblage zones is outlined, where the data represent presences and absences of different species for numerous samples in n stratigraphic sections, and can be applied in raw form or converted to range through form.

Journal ArticleDOI
TL;DR: Principal component analysis was used to develop equations for defining stem taper of loblolly and slash pine in Louisiana, U.S.A and no differences were found in stem form among diameter at breast height (dbh) or height classes but trees with crown ratios greater than 0.51 did appear to have more taper.
Abstract: Principal component analysis, a multivariate statistical procedure, was used to develop equations for defining stem taper of loblolly (Pinustaeda L.) and slash pine (P. elliottii Engelm.) in Louisiana, U.S.A. Four sets of data were used (three of loblolly and one of slash pine). Data included individual tree diameter measurements at the ground line, 0.02, 0.04, 0.06, 0.08, and at each 0.1 of total height. Thus each data-set i included ni trees, each with 14 diameter measurements. Principal component analysis was applied to each data set and in every case a single eigenvalue absorbed more than 99% of the total variance. A graph of the elements of the eigenvector associated with the principal eigenvalue, plotted over the appropriate proportional height, resembled the mean stem taper of the trees in the data set. Within a data set, no differences were found in stem form among diameter at breast height (dbh) or height classes but trees with crown ratios greater than 0.51 did appear to have more taper in the u...

Journal ArticleDOI
TL;DR: In this article, a linear regression on yield for large wheat production regions in the United States, Canada, and the Soviet Union was used to estimate national level of production and to demonstrate a link between crop forecasting and extended atmospheric outlooks.

Book ChapterDOI
TL;DR: In this paper, a four-year study (1972 to 1976) was carried out to determine the long-term changes of organochlorine compound concentrations and associations of epibenthic fishes and invertebrates in a river-dominated north Florida estuary.
Abstract: A four-year study (1972 to 1976) was carried out to determine the long-term changes of organochlorine compound concentrations and of associations of epibenthic fishes and invertebrates in a river-dominated north Florida estuary We assessed the relative effectiveness of a number of statistical techniques for describing the effects of key physicochemical variables on the estuarine biota Techniques used included transformations, correlation, regression with dummy variables, two and three-way analysis of variance, multivariate analysis of variance, principal components analysis, factor analysis, canonical correlation, and cluster analysis Several problems were encountered peculiar to studies of this type: missing observations, the sheer size of the data base in numbers of variables and observations, the domination of other effects by river flow, and extreme and noncyclical variation of some measures over the four-year study period

Journal ArticleDOI
TL;DR: The principal components analysis with Varimax rotation was applied to centro-occipital EEG power spectral density and asymmetry variables, and it is suggested that the factors are satisfactorily orthogonal and account for a large part of the original data variance.
Abstract: Principal components analysis with Varimax rotation was applied to centro-occipital EEG power spectral density (PSD) and asymmetry variables in order to detect the subsets of intercorrelated variables. Seven orthogonal variable subsets (factors) were found: left and right PSDs from 0 to 8, 6 to 12, 12 to 20, and 12 to 30 Hz, and asymmetries from 0 to 6, 6 to 14, and 14 to 30 Hz. Attempts to validate these results suggest that the factors are satisfactorily orthogonal and account for a large part of the original data variance. The same set of factors appears to be present in normal subjects, stabilized alcoholics, and some chronic schizophrenics. More effective use of multivariate data in later statistical tests may be made possible by replacing the original variables with the factor scores computed from a weighted sum of the variables in the factors. Also, factor extraction allows comparison of the variable organization in different subject groups. This can have both physiological and statistical significance.

Journal ArticleDOI
TL;DR: In this paper, a method for specifying climatic variability using a principal component analysis of real and derived meteorological variables is described, and two vectors derived from the analysis delimited areas in Western Australia between which climatic differences were large relative to those within such areas.

DOI
01 Apr 1978
TL;DR: The interpretive benefits of employing multivariate analysis methods on experimental data with more than one dependent variable are described heuristically and illustrated on a set of data from a simply designed experiment in physiological psychology as mentioned in this paper.
Abstract: The interpretive benefits of employing multivariate analysis methods on experimental data with more than one dependent variable are described heuristically and illustrated on a set of data from a simply designed experiment in physiological psychology. Multivariate analysis of variance (MANOVA) is performed on the 9 dependent variables contained in the sample data and on the four composites derived from a principal components analysis (PCA) of the variability of the nine. A linear discriminant analysis (LDA) is conducted following both MANOVA results, and 5 methods of determining the "important" dependent variables in the experimental-control group difference are presented and discussed in terms of the data at hand.

Journal ArticleDOI
TL;DR: In this paper, the authors examine all aspects of factor analysis: the data structure it is intended to describe, the methods of analysis aimed at estimating that structure; and the interpretations placed on the results of the analysis.
Abstract: In this survey paper the author critically examines all aspects of factor analysis: the data structure it is intended to describe; the methods of analysis aimed at estimating that structure; and the interpretations placed on the results of the analysis. The first part of the paper deals with the theory underlying the technique. It is shown that, in general, the estimation problem of factor analysis is insoluble. In particular, if there are p variables, and m factors are to be identified, there are indeterminacies in the model itself unless p is at least 2m+1; even when the condition is satisfied, the solution is determinate only up to a group of rotations. After a brief discussion of principal component analysis, principal factor analysis is described. It is shown by examples that "principal factor analysis may give factor estimates that bear no relationship whatever to the true values''. Even with maximum likelihood methods, which are briefly described, there are indeterminacies. It is shown with synthetic data that a restriction usually imposed to achieve uniqueness does not necessarily retrieve the originally applied parameters. The situation is further complicated by the rotation of factors to achieve either a simple structure or a psychologically meaningful solution. The second part of the paper describes the testing of factor analysis programs using artificial data. The general conclusion from this practical study is that, while the numerical accuracy of the programs is satisfactory, the results obtained do not correspond to the original structure of the data, but produce "factors'' that were not initially present. The examples also show that, when matrices of estimated loadings are subjected to an orthogonal transformation, "all resemblance to the true loadings disappeared''. The author makes various pungent comments on the findings from this study. The paper concludes with a short section on the clustering of variables into uncorrelated sets.

Journal ArticleDOI
TL;DR: In this article, a survey of principal component analysis methods for calculating component scores is presented, and means, variances, and the covariance structures of the resulting sets of scores are examined both by calculations based on a large set of electron microprobe analyses of melilite (supplied by D. Velde) and by a survey.
Abstract: Commonly used methods for calculating component scores are reviewed. Means, variances, and the covariance structures of the resulting sets of scores are examined both by calculations based on a large set of electron microprobe analyses of melilite (supplied by D. Velde)and by a survey of recent geological applications of principal component analysis. Most of the procedures used to project raw data into the new vector space yield uncorrelated scores. In exceptions so far encountered, correlations between scores seem to have been occasioned by the use of unstandardized variables with components calculated from a correlation matrix. In a number of cases substantive interpretations of such correlations have been proposed. A different set of correlations results for the same data if scores are computed from standardized variables and components based on the covariance matrix. If unscaled components are rotated by the varimax procedure, the result is a return to the original space. In the work reported here, nevertheless, scores calculated from varimax-rotated scaled vectors are uncorrelated.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a principal component analysis of biostratigraphic data, which provides a composite section useful in regional correlation and studies of organic evolution, where all sections are considered simultaneously in the calculations rather than sequentially.
Abstract: Principal components analysis of biostratigraphic data provides a composite section useful in regional correlation and studies of organic evolution. Sections are treated as variables and taxon range limits as observations in the principal components model. The method offers the advantages that: all sections are considered simultaneously in the calculations rather than sequentially; and the procedure computes the result from a correlation matrix dimensioned according to the number of sections instead of a similarity matrix dimensioned according to the number of range limits or samples. After initial calculation of the first principal component, missing data entries are estimated by projection of known coordinates on this component. The effects of missing data were evaluated through a Monte Carlo study: composite sections were calculated from data sets generated by randomly eliminating entries from an initial data array. Correlation between the composites thus created and that obtained from the unaltered da...

Book ChapterDOI
01 Jan 1978
TL;DR: In this article, the growth of the Silurian echinoderm Stephanocrinus angulatus is outlined as a case study using the bivariate allometric equation (BME).
Abstract: ABSTACT For many years, the bivariate allometric equation has yielded insights about size and shape. Multivariate generalizations of the bivariate allometric equation are provided by the principal components or factors derived from a covariance matrix of the logarithms of the original measurements. A single set of multivariate data may contain one or more sources of allometry. If the sample is from one population, maximum-likelihood or generalized least-squares factor analysis is proposed as a first step because significance tests aid in determination of the number of allometric factors in the data. If numerous factors are required, there are only slight differences between the factors and principal components. For samples obtained from mixed populations, only principal components are recommended. The growth of the Silurian echinoderm Stephanocrinus angulatus is outlined as a case study.

Proceedings ArticleDOI
01 Apr 1978
TL;DR: Mean-square-error minimizing signal compression techniques, such as Autoregressive Analysis or Linear Predictive Coding and Principal Component or Karhunen-Loeve Analysis, can be systematically characterized in terms of canonical coordinate or generalized eigenvector procedures.
Abstract: Mean-square-error minimizing signal compression techniques, such as Autoregressive Analysis or Linear Predictive Coding and Principal Component or Karhunen-Loeve Analysis, can be systematically characterized in terms of canonical coordinate or generalized eigenvector procedures. This approach provides considerable insight into the interrelationships between a variety of seemingly different signal compression methods. The approach also provides a convenient mechanism for introducing the types of non-Euclidean error measures that are needed to adjust the signal performance optimization criteria to take into account different types of a priori statistical and dynamical information relating to both the desired signal and to various interference processes.

Journal ArticleDOI
TL;DR: In this paper, an artificial data set is used to illustrate the morphologic properties of some common multivariate techniques and consideration of three common situations: a sample showing no obvious groupings, a homogeneous sample where size and shape have important implications, and a sample of unknown groupings in which shape variation is the only interest.
Abstract: An artificial data set is used to illustrate the morphologic properties of some common multivariate techniques and consideration of three common situations The first concerns a sample showing no obvious groupings In this situation principal components (or coordinates) and factor analyses give a logical ordination of form variation; cluster analysis produces sizedominated groups The second situation considers an homogeneous sample where size and shape have important implications Principal components are tested for association with size and shape, both of which can be isolated if isometry exists; if allometry is present, isolation of shape is possible only by size elimination, eg, conversion to ratios The third situation examines a sample of unknown groupings in which shape variation is the only interest Aside from ratios, two other methods which produce shape-dominant clusters are assessed Some of the options available in cluster analysis are also examined

Journal ArticleDOI
TL;DR: A principal component analysis of the cross-products for the palmar and dorsal data treated separately resulted in two components for thePalmar data (55% and 25% of variance) and one component for the dorsal data (80% of Variance) and supported Edelberg's findings.


Journal ArticleDOI
TL;DR: The Bashaw and Anderson (1967) correction for replicated errors in correlation coefficients was applied to the principal components analysis in this article, and significant changes were found in the components in terms of the loadings and subsequently in the percentage of variance accounted for.
Abstract: The Bashaw and Anderson (1967) correction for replicated errors in correlation coefficients was applied to the principal components analysis. Notable changes were found in the components in terms of the loadings and subsequently in the percentage of variance accounted for.