scispace - formally typeset
Search or ask a question

Showing papers on "Principal component analysis published in 1986"


Book
01 May 1986
TL;DR: In this article, the authors present a graphical representation of data using Principal Component Analysis (PCA) for time series and other non-independent data, as well as a generalization and adaptation of principal component analysis.
Abstract: Introduction * Properties of Population Principal Components * Properties of Sample Principal Components * Interpreting Principal Components: Examples * Graphical Representation of Data Using Principal Components * Choosing a Subset of Principal Components or Variables * Principal Component Analysis and Factor Analysis * Principal Components in Regression Analysis * Principal Components Used with Other Multivariate Techniques * Outlier Detection, Influential Observations and Robust Estimation * Rotation and Interpretation of Principal Components * Principal Component Analysis for Time Series and Other Non-Independent Data * Principal Component Analysis for Special Types of Data * Generalizations and Adaptations of Principal Component Analysis

17,446 citations


Book
04 Dec 1986
TL;DR: In this article, the problem of displaying many variables in two dimensions is discussed, and the authors propose a method to display and test multivariate data in a multivariate manner using a mixture of multiple regressions.
Abstract: THE MATERIAL OF MULTIVARIATE ANALYSIS Examples of Multivariate Data Preview of Multivariate Methods The Multivariate Normal Distribution Computer Programs Graphical Methods Chapter Summary References MATRIX ALGEBRA The Need for Matrix Algebra Matrices and Vectors Operations on Matrices Matrix Inversion Quadratic Forms Eigenvalues and Eigenvectors Vectors of Means and Covariance Matrices Further Reading Chapter Summary References DISPLAYING MULTIVARIATE DATA The Problem of Displaying Many Variables in Two Dimensions Plotting index Variables The Draftsman's Plot The Representation of Individual Data P:oints Profiles of Variables Discussion and Further Reading Chapter Summary References TESTS OF SIGNIFICANCE WITH MULTIVARIATE DATA Simultaneous Tests on Several Varables Comparison of Mean Values for Two Samples: The Single Variable Case Comparison of Mean Values for Two Samples: The Multivariate Case Multivariate Versus Univariate Tests Comparison of Variation for Two Samples: The Single Variable Case Comparison of Variation for Two Samples: The Multivariate Case Comparison of Means for Several Samples Comparison of Variation for Several Samples Discussion Chapter Summary Exercises References MEASURING AND TESTING MULTIVARIATE DISTANCES Multivariate Distances Distances Between Individual Observations Distances Between Populations and Samples Distances Based on Proportions Presence-Absence data The Mantel Randomization Test Computer Programs Discussion and Further Reading Chapter Summary Exercise References PRINCIPAL COMPONENTS ANALYSIS Definition of Principal Components Procedure for a Principal Components Analysis Computer Programs Further Reading Chapter Summary Exercises References FACTOR ANALYSIS The Factor Analysis Model Procedure for a Factor Analysis Principal Components Factor Analysis Using a Factor Analysis Program to do Principal Components Analysis Options in Analyses The Value of Factor Analysis Computer Programs Discussion and Further Reading Chapter Summary Exercise References DISCRIMINANT FUNCTION ANALYSIS The Problem of Separating Groups Discrimination Using Mahalanobis Distances Canonical Discriminant Functions Tests of Significance Assumptions Allowing for Prior Probabilities of Group Membership Stepwise Discriminant Function Analysis Jackknife Classification of Individuals Assigning of Ungrouped Individuals to Groups Logistic Regression Computer Programs Discussion and Further Reading Chapter Summary Exercises References CLUSTER ANALYSIS Uses of Cluster analysis Types of Cluster Analysis Hierarchic Methods Problems of Cluster Analysis Measures of Distance Principal Components Analysis with Cluster Analysis Computer Programs Discussion and Further Reading Chapter Summary Exercises References CANONICAL CORRELATION ANALYSIS Generalizing a Multiple Regression Analysis Procedure for a Canonical Correlation Analysis Tests of Significance Interpreting Canonical Variates Computer Programs Further Reading Chapter Summary Exercise References MULTIDIMENSIONAL SCALING Constructing a Map from a Distance Matrix Procedure for Multidimensional Scaling Computer Programs Further Reading Chapter Summary Exercise References ORDINATION The Ordination Problem Principal Components Analysis Principal Coordinates Analysis Multidimensional Scaling Correspondence Analysis Comparison of Ordination Methods Computer Programs Further Reading Chapter Summary Exercise References EPILOGUE The Next Step Some General Reminders Missing Values References APPENDIX Computer Packages for Multivariate Analyses References

2,827 citations


Journal ArticleDOI
TL;DR: In this paper, simple structure rotation and Procrustes target rotation are examined in the context of meteorological/climatological applications, and six unique ways to decompose a rotated data set in order to maximize the physical interpretability of the rotated results are discussed.
Abstract: Recent research has pointed to a number of inherent disadvantages of unrotated principal components and empirical orthogonal functions when these techniques are used to depict individual modes of variation of data matrices in exploratory analyses. The various pitfalls are outlined and illustrated with an alternative method introduced to minimize these problems via available linear transformations known as simple structure rotations. The rationale and theory behind simple structure rotation and Procrustes target rotation is examined in the context of meteorological/climatological applications. This includes a discussion of the six unique ways to decompose a rotated data set in order to maximize the physical interpretability of the rotated results. The various analytic simple structure rotations available are compared by a Monte Carlo simulation, which is a modification of a similar technique developed by Tucker (1983), revealing that the DAPPFR and Promax k = 2 rotations are the most accurate in recovering the input structure of the modes of variation over a wide range of conditions. Additionally, these results allow the investigator the opportunity to check the accuracy of the unrotated or rotated solution for specific types of data. This is important because, in the past, the decision of whether or not to apply a specific rotation has been a ‘blind decision’. In response to this, a methodology is presented herein by which the researcher can assess the degree of simple structure embedded within any meteorological data set and then apply known information about the data to the Monte Carlo results to optimize the likelihood of achieving physically meaningful results from a principal component analysis.

1,647 citations


Book ChapterDOI
01 Jan 1986
TL;DR: The definition of principal components adopted in this chapter is fairly standard and is very much in the spirit of the factor model introduced in Section 7.1 of this chapter.
Abstract: Principal component analysis has often been dealt with in textbooks as a special case of factor analysis, and this tendency has been continued by many computer packages which treat PCA as one option in a program for factor analysis—see Appendix A2. This view is misguided since PCA and factor analysis, as usually defined, are really quite distinct techniques. The confusion may have arisen, in part, because of Hotelling’s (1933) original paper, in which principal components were introduced in the context of providing a small number of ‘more fundamental’ variables which determine the values of the p original variables. This is very much in the spirit of the factor model introduced in Section 7.1, although Girschick (1936) indicates that there were soon criticisms of Hotelling’s method of PCs, as being inappropriate for factor analysis. Further confusion results from the fact that practitioners of ‘factor analysis’ do not always have the same definition of the technique (see Jackson, 1981). The definition adopted in this chapter is, however, fairly standard.

1,015 citations


Journal ArticleDOI
TL;DR: In this paper, a generalized rank annihilation method for quantification in bilinear data arrays such as LC/UV, GC/MS or emission-excitation fluorescence was proposed.
Abstract: : The method of rank annihilation is shown to be a particular case of a more general method for quantitation in bilinear data arrays such as LC/UV, GC/ MS or emission-excitation fluorescence Generalized rank annihilation is introduced as a calibration method that allows for simultaneous quantitative determination of all the analyses of interest in a mixture of unknowns Only one calibration mixture is required The bilinear spectra of both unknown and calibration sample must be obtained Bilinear target factor analysis is introduced as a projection of a target bilinear matrix onto another principal component bilinear matrix space Keywords: Multivariate analysis; Principal component regression (PCR); Two-dimensional data; Singular value decomposition; and Pseudoinverse

447 citations


Book ChapterDOI
01 Jan 1986
TL;DR: A wide variety of methods of using PCA in analysing various types of data have been investigated in recent years as mentioned in this paper, including regression analysis, and many of these methods use PCs in some form or another.
Abstract: As illustrated in the other chapters of this book, research continues into a wide variety of methods of using PCA in analysing various types of data. However, in no area has this research been more active in recent years, than in investigating approaches to regression analysis which use PCs in some form or another.

370 citations


Book
31 Dec 1986
TL;DR: In this article, the authors present strategies for analysing data using IUE Low Dispersion Spectra (LDS) and principal component analysis (PCA) and discriminant analysis (DSA).
Abstract: Foreword. 1. Data Coding and Initial Treatment. 2. Principal Components Analysis. 3. Cluster Analysis. 4. Discriminant Analysis. 5. Other Methods. 6. Case Study: IUE Low Dispersion Spectra. 7. Conclusion: Strategies for Analysing Data. Index.

299 citations


Journal ArticleDOI
TL;DR: In this paper, the reproducing kernel for the Hilbert space of functions plays a central role, and defines the best interpolating functions, which are generalized spline functions, for principal component analysis of longitudinal data.
Abstract: This paper describes a technique for principal components analysis of data consisting ofn functions each observed atp argument values. This problem arises particularly in the analysis of longitudinal data in which some behavior of a number of subjects is measured at a number of points in time. In such cases information about the behavior of one or more derivatives of the function being sampled can often be very useful, as for example in the analysis of growth or learning curves. It is shown that the use of derivative information is equivalent to a change of metric for the row space in classical principal components analysis. The reproducing kernel for the Hilbert space of functions plays a central role, and defines the best interpolating functions, which are generalized spline functions. An example is offered of how sensitivity to derivative information can reveal interesting aspects of the data.

268 citations


Journal ArticleDOI
TL;DR: The size-constrained technique shows that turtles are dimorphic in isometric size but not in shape, and analysis of the crayfish data confirms that PCA with the correlation matrix separates size from shape more effectively than analysis with the covariance matrix.
Abstract: A method is presented that constrains principal components analysis (PCA) to extract a first component that, by definition, summarizes isometric size alone. The remaining information is partitioned according to variation in shape. Size-constrained and conventional procedures are compared with analyses of data on sexual dimorphism in the painted turtle (Chrysemys picta marginata) and a crayfish (Cambarus bartoni). Contrary to results from standard analyses using covariance or correlation matrices, the size-constrained technique shows that turtles are dimorphic in isometric size but not in shape. Conventional methods do not com- pletely isolate variation in isometric size from variation in shape. Analysis of the crayfish data confirms that PCA with the correlation matrix separates size from shape more effectively than analysis with the covariance matrix. Secondary shape components (i.e., the third and subsequent components) differ markedly, suggesting that incomplete partitions of isometric size and shape by the traditional methods dramatically affect the results. (Size; shape; allometry; principal components analysis.)

214 citations



Journal ArticleDOI
TL;DR: In this article, the authors show that principal modes of variation consist of eigenfunctions of the process covariance function C(s, t) for continuous sample curves, and compare their results with principal components analysis of the same data.
Abstract: Analysis of a process with continuous sample curves can be carried out in a manner similar to principal components analysis of vector processes. By appropriate definition of a best linear model in the continuous case, we show that principal modes of variation consist of eigenfunctions of the process covariance function C(s, t). Procedures for estimation of these eigenfunctions from a finite sample of observed curves are given, and results are compared with principal components analysis of the same data.

Book ChapterDOI
01 Jan 1986
TL;DR: In this paper, a projection method for multivariate classification problems is presented, which is based on partial least squares in latent variables (PLS), which models the dependency between two multivariate blocks X and Y.
Abstract: A projection method for multivariate classification problems is presented, which is based on partial least squares in latent variables (PLS). PLS models the dependency between two multivariate blocks X and Y. For a pattern recognition problem we use a dummy matrix Y which describes the class-membership of the training-set objects. The multivariate characterization of the objects forms the X block. The PLS model describes the X-block with a principal component (PC) like model, where class separation is enhanced. The significance of the class separation is tested by cross-validation. Plotting the score vectors of the PLS model gives a PLS discriminant plot.

Book ChapterDOI
01 Jan 1986
TL;DR: In this article, it is assumed that a data set consists of n independent observations on one or more random variables, x, and this assumption is often implicit when a PCA is done, with perhaps the stronger assumption of multivariate normality if we require to make some formal inference for the PCs.
Abstract: In much of statistical inference, it is assumed that a data set consists of n independent observations on one or more random variables, x, and this assumption is often implicit when a PCA is done. Another assumption which also may be made implicitly is that x consists of continuous variables, with perhaps the stronger assumption of multivariate normality if we require to make some formal inference for the PCs.

Journal ArticleDOI
TL;DR: It is demonstrated that this effect can be produced by Varimax rotation, without PCA as an intervening step, and it is stressed that infinitely many sets of prototypes may render the same final solution, a fact which cannot be overcome by any method.

Journal ArticleDOI
TL;DR: It is theoretically shown that approximately an additional component may be assumed, being the time-derivative of the underlying component, which is demonstrated in a simulation study.
Abstract: Principal component analysis of event-related potentials is known to be suspect when underlying components have varying latency. In this note it is theoretically shown that approximately an additional component may be assumed, being the time-derivative of the underlying component. Validity of this approximation is demonstrated in a simulation study.

01 Jan 1986
TL;DR: In this paper, simple structure rotation and Procrustes target rotation is examined in the context of meteorological/climatological applications, and six unique ways to decompose a rotated data set in order to maximize the physical interpretability of the rotated results are discussed.
Abstract: Recent research has pointed to a number of inherent disadvantages of unrotated principal components and empirical orthogonal functions when these techniques are used to depict individual modes of variation of data matrices in exploratory analyses. The various pitfalls are outlined and illustrated with an alternative method introduced to minimize these problems via available linear transformations known as simple structure rotations. The rationale and theory behind simple structure rotation and Procrustes target rotation is examined in the context of meteorological/climatological applications. This includes a discussion of the six unique ways to decompose a rotated data set in order to maximize the physical interpretability of the rotated results. The various analytic simple structure rotations available are compared by a Monte Carlo simulation, which is a modification of a similar technique developed by Tucker (1983), revealing that the DAPPFR and Promax k = 2 rotations are the most accurate in recovering the input structure of the modes of variation over a wide range of conditions. Additionally, these results allow the investigator the opportunity to check the accuracy of the unrotated or rotated solution for specific types of data. This is important because, in the past, the decision of whether or not to apply a specific rotation has been a ‘blind decision’. In response to this, a methodology is presented herein by which the researcher can assess the degree of simple structure embedded within any meteorological data set and then apply known information about the data to the Monte Carlo results to optimize the likelihood of achieving physically meaningful results from a principal component analysis.

Journal ArticleDOI
TL;DR: The empirical orthogonal functions (EOF) as mentioned in this paper can be considered as a mean square estimation technique of unknown values within a random process or field, and the optimization of error variance leads to a Fredholm integral equation.
Abstract: Some current uses of empirical orthogonal functions (EOF) are briefly summarized, together with some relations with spectral and principal component analyses. Considered as a mean square estimation technique of unknown values within a random process or field, the optimization of error variance leads to a Fredholm integral equation. Its kernel is the autocorrelation function, which in many practical cases is only known as discrete values of interstation correlation coefficients computed from a sample of independent realizations. The numerical solution in one or two dimensions of this integral equation is approximated in a new and more general framework that requires, in practice, the a priori choice of a set of generating functions. Developments are provided for piecewise constant, facetlike linear, and thin plate type spline functions. The first part of the paper ends with a review of the mapping, archiving and stochastic simulating possibilities of the EOF method. A second part includes a case s...

Journal ArticleDOI
TL;DR: In this article, regression equations for the logarithms of the latent roots of random data correlation matrices with unities on the diagonal are presented to make parallel analysis more accessible to researchers employing principal component techniques.
Abstract: In order to make parallel analysis more accessible to researchers employing principal component techniques, regression equations are presented for the logarithms of the latent roots of random data correlation matrices with unities on the diagonal. These regression equations have as independent variables logarithms of: the single variable degrees of freedom; Bartlett-Lawley degrees of freedom; the next lowest ordered eigenvalue. The multiple correlation coefficients are at least 0.96 in all cases.

Journal ArticleDOI
TL;DR: Principal components analysis and common factor analysis can provide similar results; however, to assume the results will be similar can lead to serious error.
Abstract: Principal components analysis and common factor analysis can provide similar results; however, to assume the results will be similar can lead to serious error. A simple example is provided to show how results can be substantially different.

Patent
28 Mar 1986
TL;DR: In this paper, a two-dimensional array of at least about 16 sensors for recording EEG or MEG traces from the subject is used to produce enhanced EEG or MG information related to selected brain activity in a subject.
Abstract: Method and apparatus for producing enhanced EEG or MEG information related to a selected brain activity in a subject. The apparatus includes a two-dimensional array of at least about 16 sensors for recording EEG or MEG traces from the subject. Control and test traces recorded before and during an interval in which the brain activity is occurring, respectively, are each decomposed into a series of basis functions which may be analytic components such as temporal frequency components, generated by spectral decomposition of an ensemble average of the recorded traces, or principal components determined by principal component analysis. The control and test traces are then represented as a sum of the products of the individual basis functions times a spatial domain matrix which represents the spatial pattern of amplitudes of that basis function, as measured by the individual sensors in the array. The signals can be extracted by filtering spatially and/or temporally to remove basis function components which are not related to the selected brain activity (clutter), and to remove spatial frequencies inherent in the spatial domain matrices to optimize the contrast between control and and corresponding test matrices, for each selected basis function.

Journal ArticleDOI
TL;DR: Using an interaction experiment with apomorphine and scopolamine effects on exploratory behavior as an illustrative example, four multivariate Statistical methods are described and compared with univariate statistical methods, with respect to their utility in pharmacological research.

Journal ArticleDOI
TL;DR: In this article, a fuzzy theory approach is proposed for interpreting component spectra that takes into account the imprecise character of both reference and sample spectra, enabling more decisions to be made than is possible in least-squares comparison of spectra.

Journal ArticleDOI
TL;DR: In this article, the spectral data between 1100 and 2500 nm were obtained with a holographic grating monochromator spectrometer and principal components analysis of the 30 selected wavelengths enabled the classification of forage species based solely on their spectra.

Book ChapterDOI
01 Jan 1986
TL;DR: In this paper, the mathematical and statistical properties of PCs are described, based on a known population covariance matrix Σ, and their properties are included in the context of sample, rather than population, PCs.
Abstract: In this chapter many of the mathematical and statistical properties of PCs are described, based on a known population covariance (or correlation) matrix Σ. Further properties are included in Chapter 3 but in the context of sample, rather than population, PCs. As well as being derived from a statistical viewpoint, PCs can be found using purely mathematical arguments; they are given by an orthogonal linear transformation of a set of variables which optimizes a certain algebraic criterion. In fact, the PCs optimize several different algebraic criteria and these optimization properties, together with their statistical implications, are described in the first section of the chapter.

Journal ArticleDOI
TL;DR: It would appear that Slow Wave factor scores emerging from a PCA can be fairly well approximated by a time-band measurement algorithm, and that this approximation is not improved by low-pass filtering.
Abstract: The present investigation sought to determine whether the relationship between event-related potential (ERP) principal components analysis (PCA) factor scores and analogous waveform amplitude measures could be improved by high- and low-pass filtering the waveforms at a suitable cutoff value. Visual oddball ERPs were submitted to a varimax-rotated PCA performed on the variance/covariance matrix. Principal components corresponding to P300 and Slow Wave were obtained. In keeping with the fact that the variance/covariance PCA analyzes sources of variance around the grand mean waveform, the grand mean waveform was subtracted from each of the original waveforms, and baseline-referenced amplitude measurements were then made of P300 and Slow Wave. P300 was measured both as the maximum positive peak between 275 and 425 ms, and as the average amplitude during that interval. Slow Wave was measured as the average amplitude during the interval 400–700 ms. The P300 measurements were then repeated after high-pass filtering the difference waveforms at 2 Hz. Slow Wave measurements were repeated after low-pass filtering at 2 Hz. The value of 2 Hz was chosen as giving a reasonable cutoff based upon estimates of the wavelengths of the two components derived from inspection of their respective factor loading vectors. The correlation between factor scores and amplitude measurements was .94 for unfiltered Slow Wave and actually declined slightly but significantly to .91 when the waveforms were low-pass filtered. It would appear that Slow Wave factor scores emerging from a PCA can be fairly well approximated by a time-band measurement algorithm, and that this approximation is not improved by low-pass filtering. For both filtered and unfiltered measurements of P300, the amplitude/factor score correlation was significantly higher for the time-band method than for the peak method. Further, high-pass filtering at 2 Hz improved the time-band/factor score correlation significantly from .62 to .75. This improvement is probably because the unfiltered measurements were tapping sources of variance due both to the higher frequency P300 component as well as a simultaneously active, lower frequency Slow Wave component. Theoretical implications of these findings are discussed.

Journal ArticleDOI
TL;DR: It is shown that a suitable choice of region and/or the temporal interval of interest anables the differential evaluation of such intrahepatic compartments, which could not be observed without enhancement.
Abstract: Factor analysis of dynamic radionuclide studies provides their decomposition into the images and time-activity curves corresponding to the underlying dynamic structures. The method is based on the analysis of study variance and on the subsequent differential imaging of its principal components into a simplified factor space. By changing the amount and the composition of the variance processed in the analysis it is possible to enhance the factors that are important for diagnosis while the less important factors can be suppressed. In our report, a short theoretical review of the problem is given and illustrated by the analysis of dynamic cholescintigraphy. It is shown that a suitable choice of region and/or the temporal interval of interest anables the differential evaluation of such intrahepatic compartments, which could not be observed without enhancement.

Journal ArticleDOI
TL;DR: In this article, two images were obtained from thematic mappers on Landsats 4 and 5 over the Washington, DC area during November 1982 and March 1984, and mean digital radiance values for each bandpass in each image were examined, and variances, standard deviations and covariances between bandpasses were calculated.

Book ChapterDOI
01 Jan 1986
TL;DR: The original purpose of PCA was to reduce a large number (p) of variables to a much small number (m) of PCs whilst retaining as much as possible of the variation in the p original variables as discussed by the authors.
Abstract: The original purpose of PCA was to reduce a large number (p) of variables to a much small number (m) of PCs whilst retaining as much as possible of the variation in the p original variables The technique is especially useful if m « p,and if the m PCs can be readily interpreted

Journal ArticleDOI
TL;DR: In this article, principal components analysis was used in conjunction with linear regression analysis to examine the structure-activity relationships of a diverse group of 20 para-substituted pyridines.

Journal ArticleDOI
TL;DR: The CLAS program is designed to combine classification methods with evaluation of their performance, for batch data processing, and contains flexible possibilities for data manipulation, variable transformation and missing data handling.