scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1977"


Journal ArticleDOI
TL;DR: In this article, a component method was presented to maximize Stewart and Love's redundancy index, and a rotational procedure for obtaining bi-orthogonal variates was given. And an elaborate example comparing canonical correlation analysis and redundancy analysis on artificial data was presented.
Abstract: A component method is presented maximizing Stewart and Love's redundancy index. Relationships with multiple correlation and principal component analysis are pointed out and a rotational procedure for obtaining bi-orthogonal variates is given. An elaborate example comparing canonical correlation analysis and redundancy analysis on artificial data is presented.

734 citations


Journal ArticleDOI
TL;DR: Comparison of ordination performance of reciprocal averaging with non-standardized and standardized principal components analysis (PCA) and polar or Bray-Curtis ordination (PO) found that RA is much superior to PCA at high beta diversities and on the whole preferable toPCA at low Beta diversities.
Abstract: SUMMARY Reciprocal averaging is a technique of indirect ordination, related both to weighted averages and to principal components analysis and other eigenvector techniques. A series of tests with simulated community gradients (coenoclines), simulated community patterns (coenoplanes), and sets of vegetation samples was used to compare ordination performance of reciprocal averaging (RA) with non-standardized and standardized principal components analysis (PCA) and polar or Bray-Curtis ordination (PO). Of these, non-standardized PCA is most vulnerable to effects of beta diversity, giving distorted ordinations of sample sets with three or more half-changes. PO and RA give good ordinations to five or more half-changes, and standardized PCA is intermediate. Sample errors affect all these techniques more at low than at high beta diversity, but PCA is most vulnerable to effects of sample errors. All three techniques could ordinate well a small (1-5 x 1-5 half-changes) simulated community pattern; and PO and RA could ordinate larger patterns (4 5 x 4-5 half-changes) well. PCA distorts larger community patterns into complex surfaces. Given a rectangular pattern (1-5 x 4-5 halfchanges), RA distorts the major axis of sample variation into an arch in the second axis of ordination. Clusters of samples tend to distort PCA ordinations in rather unpredictable ways, but they have smaller effects on RA, and none on PO. Outlier samples do not affect PO (unless used as endpoints), but can cause marked deterioration in RA and PCA ordinations. RA and PO are little subject to the involution of axis extremes that affects nonstandardized PCA. Despite the arch effect, RA is much superior to PCA at high beta diversities and on the whole preferable to PCA at low beta diversities. Second and higher axes of PCA and RA may express ecologically meaningless, curvilinear functions of lower axes. When curvilinear displacements are combined with sample error, axis interpretation is difficult. None of the techniques solves all the problems for ordination that result from the curvilinear relationships characteristic of community data. For applied ordination research consideration of sample set properties, careful use of supporting information to evaluate axes, and comparison of results of RA or PCA with PO and direct ordination are suggested.

348 citations


Journal ArticleDOI
TL;DR: In this paper, a scale invariant index of factorial simplicity is proposed as a summary statistic for principal components and factor analysis, which ranges from zero to one, and attains its maximum when all variables are simple rather than factorially complex.
Abstract: A scale-invariant index of factorial simplicity is proposed as a summary statistic for principal components and factor analysis. The index ranges from zero to one, and attains its maximum when all variables are simple rather than factorially complex. A factor scale-free oblique transformation method is developed to maximize the index. In addition, a new orthogonal rotation procedure is developed. These factor transformation methods are implemented using rapidly convergent computer programs. Observed results indicate that the procedures produce meaningfully simple factor pattern solutions.

116 citations


Journal ArticleDOI
TL;DR: In this paper, the authors examined relationships between psychological climate and components of a valence-instrumentality-expectancy model, and found significant relationships among the two components and supported several hypotheses proposed in the literature.
Abstract: The present study examined relationships between psychological climate and components of a valence-instrumentality-expectancy model. Data were obtained from 504 managerial employees of a large health care company. A principal component analysis of responses to 35 composites representing perceptions of the job, leadership, workgroup, and organization yielded six psychological climate components. Similar analyses were conducted separately for 20 valence items and 20 instrumentality items. Considerable similarity was found among the instrumentality and valence components, with one component in each area representing intrinsic outcomes, one component representing organizationally-mediated extrinsic outcomes, one representing negative or neutral outcomes, and one representing leader and workgroup-mediated extrinsic outcomes. Relationships among psychological climate and valence-instrumentality-expectancy components were significant and supported several hypotheses proposed in the literature.

101 citations


Journal ArticleDOI
TL;DR: To assess the degree of similarity between the results produced by each of the three methods, the patterns produced by maximum likelihood factor analysis, rescaled image analysis, and principal component analysis are compared for nine data sets.
Abstract: Factor analysis, image component analysis, and principal component analysis are three methods that have been employed for the same purpose. Factor analysis was traditionally viewed as the preferred method, with other methods serving as computationally easier approximations. Recently attention has been focused on theoretical problems with the factor analysis model such as the factor indeterminacy issue. In order to assess the degree of similarity between the results produced by each of the three methods, the patterns produced by maximum likelihood factor analysis, rescaled image analysis, and principal component analysis are compared for nine data sets. Two different comparisons are considered: a direct loading-by-loading comparison of the patterns, and a summary statistic defined on the matrix of differences between patterns. Comparisons are made for the patterns in both orthogonal and oblique rotational positions, and a position of maximum similarity is achieved by an orthogonal procrustes rotation. The patterns produced by each of the three methods were remarkably similar. Image component analysis and maximum likelihood factor analysis generally produced the most similar results and principal component analysis and maximum likelihood factor analysis generally produced the most dissimilar results. Differences generally occurred in the last factor extracted, possibly because too many factors had been extracted.

78 citations


Journal ArticleDOI
TL;DR: In this paper, an analytic technique for deleting predictor variables from a linear regression model when principal components of X'X are removed to adjust for multicollinearities in the data is presented.
Abstract: SUMMARY This paper presents an analytic technique for deleting predictor variables from a linear regression model when principal components of X'X are removed to adjust for multicollinearities in the data. The technique can be adapted to commonly used variable selection procedures such as backward elimination to eliminate redundant predictor variables without appreciably increasing the residual sum of squares. An analysis of the pitprop data of Jeffers (1967) is performed to illustrate the methods proposed in the paper. THE use of principal component procedures in either multivariate analysis or multiple linear regression generally results in a reduction of the rank of the variable space. A shortcoming often mentioned, however, is that there is no corresponding reduction in the number of original variables which must be measured. Draper (1964) became well aware of this problem when he attempted to eliminate redundant quality tests on reels of paper. Although Draper concludes that the principal components do not aid in deciding which properties of the paper to test, Jeffers (1965) argues they do, in fact, provide such information. Jolliffe (1972) discusses eight methods of reducing the number of variables in multivariate problems, four of which utilize principal components. In a second paper, Jolliffe (1973) applies five of the eight techniques to real data, including a multiple linear regression analysis. This latter example concerns the pitprop data of Jeffers (1967), in which he used principal com- ponents to provide insight into which physical properties should be used to investigate the compressive strength of the props. The purpose of this paper is to present another method of reducing the number of indepen- dent variables in a principal component regression analysis. The procedure first deletes principal components associated with small latent roots of X'X and then incorporates an analog of the backward elimination procedure (e.g. Draper and Smith, 1966, Chapter 6) to eliminate the independent variables. This furnishes an analytic procedure for deleting vari- ables, when using a principal component analysis, which is based upon minimal increases in residual sums of squares.

75 citations


Journal ArticleDOI
TL;DR: In this paper, the authors considered the asymptotic distributions of latent roots and vectors in principal components analysis when the parent population is non-normal, and they showed that sufficient of T. W. Anderson's theory in the multivariate normal case carries over for some results to be obtained.
Abstract: Summary This paper considers the asymptotic distributions of latent roots and vectors in principal components analysis when the parent population is non-normal. It is shown that sufficient of T. W. Anderson's asymptotic theory in the multivariate normal case carries over for some results to be obtained.

64 citations



Journal ArticleDOI
TL;DR: In this paper, a method of deriving an empirical eigenvector or principal components representation of wind velocity measurements is given, which can be employed in the analysis of regional wind velocity patterns.
Abstract: A method of deriving an empirical eigenvector or principal components representation of wind velocity measurements is given. The mathematical basis for generalizing the empirical eigenvector method to the treatment of vector data fields is stated briefly. The method presented can be employed in the analysis of regional wind velocity patterns. Applications of the analysis technique to other geophysical vector fields are also possible.

25 citations


Journal ArticleDOI
TL;DR: In this paper, the authors analyse monthly observations on unemployment rates of Dutch provinces by means of several statistical techniques and reveal the underlying intra- and inter-regional patterns of Dutch unemployment.

23 citations


Journal ArticleDOI
TL;DR: The main concepts of several multivariate statistical methods used for analyzing and classifying stored-grain infestation data observed during a decade's ecological studies is presented briefly in non-mathematical language with simple diagrams to encourage their use by stored-product entomologists.
Abstract: The main concepts of several multivariate statistical methods used for analyzing and classifying stored-grain infestation data observed during a decade's ecological studies is presented briefly in non-mathematical language with simple diagrams to encourage their use by stored-product entomologists. The mathematical assumptions and limitations of cluster analysis, multiple regression analysis, principal component analysis, factor analysis, canonical correlation analysis and discriminant analysis are given. Original examples of application and interpretation of principal component analyses to insect- and mite-infested wheat and rapeseed bulks on western Canadian farms are given, as this method was found to be the most useful hypothesis-formulating tool.

Book ChapterDOI
01 Jan 1977
TL;DR: In this paper, a Gaussian model for estimating the number of individuals of a species on a site is proposed, where the expected numbers are assumed to be a function of one or two environmental variables.
Abstract: Models are described for the numbers of individuals of species j on site i. It is assumed that the numbers can be conceived as independent trials from Poisson distributions. The expected numbers are thought to be a function of one or two environmental variables. This function is chosen to be Gaussian. Statistical tests are presented for goodness of fit and for contrasting several hypotheses concerning these models. Listings of computer programs for estimating the parameters involved, are available on request. A comparison of the models with the principal component analysis is included.

Journal ArticleDOI
TL;DR: In this article, four problems relating to the use of principal components analysis, based on correlation coefficients, are discussed with reference to geographical structures, and the general conclusion is that the method is not suited to many geographical data sets.
Abstract: Four problems relating to the use of principal components analysis, based on correlation coefficients, are discussed with reference to geographical structures. The general conclusion is that the method is not suited to many geographical data sets.


Book ChapterDOI
01 Jan 1977
TL;DR: In this paper, low dimensional hyperplanes, which, in some sense, best represent the populations, are discussed, which can be interpreted as orthogonal transformations of the standardized principal components.
Abstract: Publisher Summary This chapter discusses low dimensional hyperplanes, which, in some sense, best represent the populations. However, the Fisher method is not unique in providing representations. When discriminating among k p-variate normal distributions with common covariance matrix, it is well known that the optimal discrimination procedure is based on the score functions. However, for many practical reasons, it proves useful to have one-dimensional two-dimensional or three-dimensional representations of the data. Plotting the transformed observations in these lower dimensional spaces can lead to a better understanding of the relationships between populations and the detection of outliers. Such a representation is especially helpful when p is large compared to k or when the means almost lie in a low dimensional hyperplane. The p-dimensional representations, including Fisher's between-within method, can be interpreted as based on orthogonal transformations of the standardized principal components. Each representation provides a decomposition of the scores as a sum of squares.

Journal ArticleDOI
TL;DR: In this paper, two traditional methods used to form principal components (PC) regression estimates are reviewed, and small sample properties of the estimates are compared with OLS estimates with a Monte Carlo experiment to facilitate comparisons.
Abstract: Two traditional methods used to form principal components (PC) regression estimates are reviewed, and small sample properties of the estimates are compared with OLS estimates. A Monte Carlo experiment is used to facilitate comparisons. Theoretical considerations and empirical observation indicate that the PC techniques tend to produce estimates lower in mean square error (MSE) than OLS estimates under conditions of high multicollinearity, low R2, and small sample size. Although under these conditions the PC techniques may be preferred to OLS in the relative MSE sense, MSE in the absolute sense may still render the PC estimates useless in applications.

Journal ArticleDOI
TL;DR: In this article, a procedure for transforming an arbitrary set of component reference curves to a new set which are mutually orthogonal and, subject to orthogonality, are as smooth as possible in a well defined (least squares) sense is presented.
Abstract: Tucker has outlined an application of principal components analysis to a set of learning curves, for the purpose of identifying meaningful dimensions of individual differences in learning tasks. Since the principal components are defined in terms of a statistical criterion (maximum variance accounted for) rather than a substantive one, it is typically desirable to rotate the components to a more interpretable orientation. “Simple structure” is not a particularly appealing consideration for such a rotation; it is more reasonable to believe that any meaningful factor should form a (locally) smooth curve when the component loadings are plotted against trial number. Accordingly, this paper develops a procedure for transforming an arbitrary set of component reference curves to a new set which are mutually orthogonal and, subject to orthogonality, are as smooth as possible in a well defined (least squares) sense. Potential applications to learning data, electrophysiological responses, and growth data are indicated.

Book ChapterDOI
01 Jan 1977
TL;DR: In this article, the use of moments of economic variables is suggested for aggregation purposes, and the method of principal components gives a solution, which explains a certain percentage of the variance of the aggregated variables.
Abstract: Optimal aggregation is defined with the help of a quadratic loss function. The computation of the best linear aggregates involves in general the Penrose inverse. The use of moments of economic variables is suggested for aggregation purposes. The method of principal components gives a solution, which explains a certain percentage of the variance of the aggregated variables. Problems of aggregation in input-output analysis and with the Cobb-Douglas function are discussed.

Journal ArticleDOI
TL;DR: In this paper, the authors used 10 samples of approximately 25, 50, 100, and 200 items each were randomly drawn, with replacement, from a source sample consisting of 2086 subalkaline asalt analyses.
Abstract: The convenience of reducing the dimension of a data matrix by principal component analysis invites substantive interpretation of the coefficients of the components. To test the consistency of component coefficients, 10 samples of approximately 25, 50, 100, and 200 items each were randomly drawn, with replacement, from a source sample consisting of 2086 subalkaline asalt analyses. From each sample principal components were calculated using 9 major oxides as variables. Although the eigenvalues are remarkably consistent, both across and within sample size groups, the coefficients of the eigenvectors are subject to considerable sample variance. It is sometimes assumed that the coefficients of the components calculated from small samples are well enough known to be used in detailed petrological interpretation. Our results indicate that the validity of this assumption should be tested in each specific research even when rather large samples are used. The testing procedure used here is suitable of a sufficiently large reservoir of sample items is available; in the absence of such a reservoir complete simulation could be used.

Journal ArticleDOI
TL;DR: The aim of this paper is to describe an exploratory data analysis strategy for integrating the relative advantages of canonical correlation and multiple set factor analysis.
Abstract: This paper has two related aims. First, some conceptual and mathematical relationships are discussed among alternative procedures for analyzing multiple data sets, including: inter-battery factor analysis (Tucker, 1958; Kristof, 1967), multiple regression, canonical correlation, generalized canonical correlation (Horst, 1965; Kettenring, 1971), longitudinal factor analysis (Corballis and Traub, 1970), and multiple set factor analysis (Golding and Seidman, 1974; Jackson, 1975). To motivate the comparison, each technique is related to a principal components model. The second aim is to describe an exploratory data analysis strategy for integrating the relative advantages of canonical correlation and multiple set factor analysis. When considering two data sets, the testing of statistical significance of appropriate linear combinations is emphasized, together with a further transformation to enhance substantive interpretation of the data.

Journal ArticleDOI
01 May 1977
TL;DR: A solution to two problems, namely the characterization of multivariate data, recored from biological systems under stress, in the time domain and the screening of study data for redundancy, is dealt with.
Abstract: The analysis of the dynamic behavior of complex systems requires the simultaneous and continuous monitoring of many variables, often over extended periods of time. In older to prepare the large amounts of data so obtained for analysis, efficient methods for data reduction and the characterization of "condensed" time series using appropriate screening and statistical procedures are essential. In this paper a solution to two of these problems is dealt with, namely the characterization of multivariate data, recored from biological systems under stress, in the time domain and the screening of study data for redundancy. The processing of such data is subject to two major constraints: the variables are frequently nonnormally distributed and the order of the behavior of the system to be characterized is frequently not known. The approach described comprises an initial data reduction for intermediate storage by a factor of one thousand to seventy through an averaging procedure. The resulting second time series is then characterized in terms of a) steady-state values during both the control and the response state; b) the variability of these steady state values and c) the shape and magnitude of the transient part of the response. The number of parameters resulting from this procedure is further reduced by screening the data for redundancy without investigator bias using a functional proposed by Andrews. Further processing of the data includes cluster and principal component analysis in an attempt to identify biologically meaningful basic system dimensions.

Journal ArticleDOI
TL;DR: In this paper, a linear-predictive (LP) all-pole model was used to analyze the spectral properties of continuous speech and to regenerate the spectra from a small set of factors.
Abstract: The principal‐components method is sometimes employed to describe speech spectra in terms of a small number of factors, that is eigenvectors of the covariance matrix. In this study, this method has been used to analyze the spectral properties of continuous speech and to regenerate the spectra from a small set of factors. Spectral data from 0 to 5000 Hz was obtained using a linear‐predictive (LP) all‐pole model. This data was used to calculate the average energy in twenty bands, each about an auditory critical bandwidth wide. The analysis was applied to (1) the log‐coded band energies and (2) the band energies to the one‐third power. Continuous speech was synthesized using spectra computed from a small number of principal‐component factors and the LP residual signal. The quality of this speech is compared with speech synthesized using a low‐order LP vocoder. In addition, it is argued that the principal components might be more easily identifiable with linguistic categories than are the poles of the low‐ord...

Journal ArticleDOI
TL;DR: In this paper, the authors compare four different multivariate methods; principal components, factors analysis, discriminant analysis involving another use of principal component, and canonical correlation, and make careful note of the assumptions involved in each model and apply each form of analysis to variables drawn from the same set of data.

Journal ArticleDOI
TL;DR: In this article, the authors compared two preschool programs-Bank Street and Montessori-using an observational scale originally developed for a study of Head Start classrooms, and found that the differences were found in the first two sets.
Abstract: This study compared two preschool programs-Bank Street and Montessori-using an observational scale originally developed for a study of Head Start classrooms. The two questions underlying the study were: ( 1 ) could the observation scale discriminate between the two programs and ( 2 ) how many independent dimensions underlay the variables derived from the scale. Three classrooms were observed in each program. In each classroom observations were made on four boys and four girls. Each subject was observed for 40 min. on each of three separate days. Three sets of variables were formed from the basic categories. The variables in the first set measured activities and experiences emphasized by Montessori but not by Bank Street literature, in the second set the experiences emphasized by Bank Street and not by Montessori, and in the third set experiences emphasized by both programs. All the differences were found in the first two sets. Perhaps they only reflect a single basic dimension of preschool programs. In order to address this issue multivariate analysis of variance and factor analysis were applied to the nine variables, four from the Montessori set and five from the Bank Street set, which did not show effects of teachers (thus allowing this term to be dropped from the model). The analysis of variance was used to assess how many variables in each set differentiated independently of the others between the two programs, as determined by their step-down Fs. For each set of variables the most general variable was tested first followed by the remaining variables in order of their increasing univariate Fs. Only one of the Montessori variables reached significance according to this criterion, but three of the Bank Street did. Discriminant analysis of these data gave similar results. A second analysis examined the dimensions within the two programs by means of principal components factor analysis with varimax rotation. Of importance here was not the number of factors since this was limited by the small number of variables but how the composition of the factors compared with the results of the analysis of variance: the three Bank Street variables which discriminated independently between the two programs were all on the same factor when looking within the Bank Street program. In determining the dimensions needed to describe preschool programs, it is important to distinguish questions concerning differences between programs from interrelations among the variables within a program.

Journal ArticleDOI
TL;DR: In this paper, a scale-invariant simple structure function of previously studied function components is defined, and first and second partial derivatives are obtained, and Newton-Raphson iterations are utilized.
Abstract: The parameter matrices of factor analysis and principal component analysis are arbitrary with respect to the scale of the factors or components; typically, the scale is fixed so that the factors have unit variance. Oblique transformations to optimize an objective statement of a principle such as simple structure or factor simplicity yield arbitrary solutions, unless the criterion function is invariant with respect to the scale of the factors, or the parameter matrix is scale free with respect to the factors. Criterion functions that are factor scale-free have a number of invariance characteristics, such as being equally applicable to primary pattern or reference structure matrices. A scale-invariant simple structure function of previously studied function components is defined. First and second partial derivatives are obtained, and Newton-Raphson iterations are utilized. The resulting solutions are locally optimal and subjectively pleasing.

Journal ArticleDOI
TL;DR: Explore as mentioned in this paper is a computer program for analyzing multiple data sets, which includes: (1) the complete multiple correlation analysis of each variable in one set regressed on all variates of the other set, (2) a canonical correlation solution, and (3) a final transformation to enhance substantive interpretation of the data.
Abstract: EXPLORE is a flexible computer program for analyzing multiple data sets. For the case of two data sets, the output includes: (1) the complete multiple correlation analysis of each variable (component) in one set regressed on all variates of the other set, (2) a canonical correlation solution, and (3) a final transformation to enhance substantive interpretation of the data. The investigator has the option of focusing upon the original variables, or of selecting a reduced rank solution where these variables are summarized by a small number of principal components. A range of transformations (rotations) is available that allows EXPLORE to be employed in both an exploratory or hypothesis testing mode.

Book ChapterDOI
01 Jan 1977
TL;DR: In this paper, the authors studied the problem of fitting a line to multivariate data so as to minimize the sum of squared deviations of the points from the line, deviation being measured orthogonally to the line.
Abstract: Principal component analysis is a classical multivariate technique dating back to publications by Pearson (1901) and Hotelling (1933). Pearson focused on the aspect of approximation: Given a p-variate random vector (or a “system of points in space,” in Pearson’s terminology), find an optimal approximation in a linear subspace of lower dimension. More specifically, Pearson studied the problem of fitting a line to multivariate data so as to minimize the sum of squared deviations of the points from the line, deviation being measured orthogonally to the line. We will discuss Pearson’s approach in Section 8.3; however, it will be treated in a somewhat more abstract way by studying approximations of multivariate random vectors using the criterion of mean-squared error.

Book ChapterDOI
01 Jan 1977
TL;DR: In this article, the authors discuss three interrelated concepts concerning multivariate covariance models: principal components, factor models, and canonical correlations, which aim at reducing the dimension of the observable vector variables.
Abstract: This chapter discusses three interrelated concepts concerning multivariate covariance models: principal components, factor models, and canonical correlations. All these concepts deal with the covariance structure of the multivariate normal distribution and aim at reducing the dimension of the observable vector variables. Principal components of a random vector X are normalized linear combinations of the components of X that have special properties in terms of variances, where X = ( X 1 , …, X p )′. The second principal component is a linear combination that has maximum variance among all normalized linear combinations. Factor analysis is a multivariate technique that attempts to account for the correlation pattern present in the distribution of an observable random vector X in terms of a minimal number of unobservable random variables, called factors. The canonical model selects linear combinations of variables from each of the two sets, so that the correlations among the new variables in different sets are maximized subject to the restriction that the new variables in each set are uncorrelated with mean 0 and variance 1.

Journal ArticleDOI
01 Jan 1977-Geoforum
TL;DR: In this paper, the problems of using discrete data in correlation and component analysis are discussed, and some guidelines are suggested, where the authors consider the use of discrete data for correlation and components analysis.