scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Principal component analysis

01 Jul 2010-Wiley Interdisciplinary Reviews: Computational Statistics (John Wiley & Sons, Ltd)-Vol. 2, Iss: 4, pp 433-459
TL;DR: Principal component analysis (PCA) as discussed by the authors is a multivariate technique that analyzes a data table in which observations are described by several inter-correlated quantitative dependent variables, and its goal is to extract the important information from the table, to represent it as a set of new orthogonal variables called principal components, and display the pattern of similarity of the observations and of the variables as points in maps.
Abstract: Principal component analysis PCA is a multivariate technique that analyzes a data table in which observations are described by several inter-correlated quantitative dependent variables. Its goal is to extract the important information from the table, to represent it as a set of new orthogonal variables called principal components, and to display the pattern of similarity of the observations and of the variables as points in maps. The quality of the PCA model can be evaluated using cross-validation techniques such as the bootstrap and the jackknife. PCA can be generalized as correspondence analysis CA in order to handle qualitative variables and as multiple factor analysis MFA in order to handle heterogeneous sets of variables. Mathematically, PCA depends upon the eigen-decomposition of positive semi-definite matrices and upon the singular value decomposition SVD of rectangular matrices. Copyright © 2010 John Wiley & Sons, Inc.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This work offers a comprehensive review on both structural and dynamical organization of graphs made of diverse relationships (layers) between its constituents, and cover several relevant issues, from a full redefinition of the basic structural measures, to understanding how the multilayer nature of the network affects processes and dynamics.

2,669 citations

Journal ArticleDOI
TL;DR: For both PLS methods, statistical inferences are implemented using cross-validation techniques to identify significant patterns of voxel activation and are presented with small numerical examples and typical applications in neuroimaging.

1,037 citations


Cites background or methods from "Principal component analysis"

  • ...For convenience, Appendix A also lists our main notations and acronyms (see also Abdi and Williams, 2010c, for more details on matrices)....

    [...]

  • ...Specifically mean-centered task PLSC is equivalent to “barycentric discriminant analysis,” because these two techniques compute the SVD of the matrix Rmean centered (see, e.g., Abdi and Williams, 2010a; Williams et al., 2010, for details)....

    [...]

  • ...(62) shows that ˆY can also be expressed as a regression model as: ˆY = TBCT = XBPLS ð63Þ with BPLS = P T+ BCT ; ð64Þ (where PT+ is the Moore-Penrose pseudo-inverse of PT, see, e.g., Abdi and Williams, 2010c, for definitions)....

    [...]

  • ...Random effect model The performance of PLSR (with respect to inference to the population) is done through cross-validation techniques such as the ’‘leave-one-out” procedure (Wold et al., 2001, also called the jackknife; see also Abdi and Williams, 2010b)....

    [...]

  • ...The smaller the value of RESS, the better the quality of prediction (Abdi, 2010; Abdi and Williams, 2010d)....

    [...]

Journal ArticleDOI
Andrea Cossarizza1, Hyun-Dong Chang, Andreas Radbruch, Andreas Acs2  +459 moreInstitutions (160)
TL;DR: These guidelines are a consensus work of a considerable number of members of the immunology and flow cytometry community providing the theory and key practical aspects offlow cytometry enabling immunologists to avoid the common errors that often undermine immunological data.
Abstract: These guidelines are a consensus work of a considerable number of members of the immunology and flow cytometry community. They provide the theory and key practical aspects of flow cytometry enabling immunologists to avoid the common errors that often undermine immunological data. Notably, there are comprehensive sections of all major immune cell types with helpful Tables detailing phenotypes in murine and human cells. The latest flow cytometry techniques and applications are also described, featuring examples of the data that can be generated and, importantly, how the data can be analysed. Furthermore, there are sections detailing tips, tricks and pitfalls to avoid, all written and peer-reviewed by leading experts in the field, making this an essential research companion.

698 citations

Journal ArticleDOI
TL;DR: This article assesses the different machine learning methods that deal with the challenges in IoT data by considering smart cities as the main use case and presents a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information.

690 citations


Cites background from "Principal component analysis"

  • ...cipal subspace, which has the maximal projected variance [83, 85]....

    [...]

Journal ArticleDOI
TL;DR: This work proposes the combination of a non-parametric and permutation-based statistical framework with linear classifiers that is more spatially precise, revealing both informative anatomical structures as well as the direction by which voxels contribute to the classification.
Abstract: Although ultra-high-field fMRI at field strengths of 7T or above provides substantial gains in BOLD contrast-to-noise ratio, when very high-resolution fMRI is required such gains are inevitably reduced. The improvement in sensitivity provided by multivariate analysis techniques, as compared with univariate methods, then becomes especially welcome. Information mapping approaches are commonly used, such as the searchlight technique, which take into account the spatially distributed patterns of activation in order to predict stimulus conditions. However, the popular searchlight decoding technique, in particular, has been found to be prone to spatial inaccuracies. For instance, the spatial extent of informative areas is generally exaggerated, and their spatial configuration is distorted. We propose the combination of a non-parametric and permutation-based statistical framework with linear classifiers. We term this new combined method Feature Weight Mapping (FWM). The main goal of the proposed method is to map the specific contribution of each voxel to the classification decision while including a correction for the multiple comparisons problem. Next, we compare this new method to the searchlight approach using a simulation and ultra-high-field 7T experimental data. We found that the searchlight method led to spatial inaccuracies that are especially noticeable in high-resolution fMRI data. In contrast, FWM was more spatially precise, revealing both informative anatomical structures as well as the direction by which voxels contribute to the classification. By maximizing the spatial accuracy of ultra-high-field fMRI results, global multivariate methods provide a substantial improvement for characterizing structure-function relationships.

654 citations

References
More filters
Book
01 Jan 1993
TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
Abstract: This article presents bootstrap methods for estimation, using simple arguments. Minitab macros for implementing these methods are given.

37,183 citations

Journal ArticleDOI
01 Aug 1996
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Abstract: Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.

16,118 citations

Reference EntryDOI
15 Oct 2005
TL;DR: Principal component analysis (PCA) as discussed by the authors replaces the p original variables by a smaller number, q, of derived variables, the principal components, which are linear combinations of the original variables.
Abstract: When large multivariate datasets are analyzed, it is often desirable to reduce their dimensionality. Principal component analysis is one technique for doing this. It replaces the p original variables by a smaller number, q, of derived variables, the principal components, which are linear combinations of the original variables. Often, it is possible to retain most of the variability in the original variables with q very much smaller than p. Despite its apparent simplicity, principal component analysis has a number of subtleties, and it has many uses and extensions. A number of choices associated with the technique are briefly discussed, namely, covariance or correlation, how many components, and different normalization constraints, as well as confusion with factor analysis. Various uses and extensions are outlined. Keywords: dimension reduction; factor analysis; multivariate analysis; variance maximization

14,773 citations

Journal ArticleDOI
TL;DR: The Scree Test for the Number Of Factors this paper was first proposed in 1966 and has been used extensively in the field of behavioral analysis since then, e.g., in this paper.
Abstract: (1966). The Scree Test For The Number Of Factors. Multivariate Behavioral Research: Vol. 1, No. 2, pp. 245-276.

12,228 citations

Journal ArticleDOI
TL;DR: This paper is concerned with the construction of planes of closest fit to systems of points in space and the relationships between these planes and the planes themselves.
Abstract: (1901). LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science: Vol. 2, No. 11, pp. 559-572.

10,656 citations


Additional excerpts

  • ...Global Expert 1sup Expert 2sup Expert 3sup F1 F2 [1] F1 [1] F2 [2] F1 [2] F2 [3] F1 [3] F2 Wine 1 2....

    [...]

  • ...Loadings with First Two Components from Subtable PCAs Expert 1 Expert 2 Expert 3 PC λ τ (%) [1] PC1 [1] PC2 [2] PC1 [2] PC2 [3] PC1 [3] PC2 1 2....

    [...]

Trending Questions (1)
What is the mathematical equation for principal component analysis?

The mathematical equation for principal component analysis (PCA) involves the eigen-decomposition of positive semi-definite matrices and the singular value decomposition (SVD) of rectangular matrices.