scispace - formally typeset
Search or ask a question
Topic

Principal component analysis

About: Principal component analysis is a research topic. Over the lifetime, 22148 publications have been published within this topic receiving 691657 citations. The topic is also known as: PCA & principal components analysis.


Papers
More filters
Journal ArticleDOI
Svante Wold1
TL;DR: In this article, the rank estimation of the rank A of the matrix Y, i.e., the estimation of how much of the data y ik is signal and how much is noise, is considered.
Abstract: By means of factor analysis (FA) or principal components analysis (PCA) a matrix Y with the elements y ik is approximated by the model Here the parameters α, β and θ express the systematic part of the data yik, “signal,” and the residuals ∊ ik express the “random” part, “noise.” When applying FA or PCA to a matrix of real data obtained, for example, by characterizing N chemical mixtures by M measured variables, one major problem is the estimation of the rank A of the matrix Y, i.e. the estimation of how much of the data y ik is “signal” and how much is “noise.” Cross validation can be used to approach this problem. The matrix Y is partitioned and the rank A is determined so as to maximize the predictive properties of model (I) when the parameters are estimated on one part of the matrix Y and the prediction tested on another part of the matrix Y.

2,468 citations

Journal ArticleDOI
TL;DR: A simple linear neuron model with constrained Hebbian-type synaptic modification is analyzed and a new class of unconstrained learning rules is derived.
Abstract: A simple linear neuron model with constrained Hebbian-type synaptic modification is analyzed and a new class of unconstrained learning rules is derived. It is shown that the model neuron tends to extract the principal component from a stationary input vector sequence.

2,405 citations

Book ChapterDOI
TL;DR: In this article, the authors present a theory of gradient analysis, in which the heuristic techniques are integrated with regression, calibration, ordination and constrained ordination as distinct, well-defined statistical problems.
Abstract: Publisher Summary This chapter concerns data analysis techniques that assist the interpretation of community composition in terms of species' responses to environmental gradients in the broadest sense. All species occur in a characteristic, limited range of habitats; and within their range, they tend to be most abundant around their particular environmental optimum. The composition of biotic communities thus changes along environmental gradients. Direct gradient analysis is a regression problem—fitting curves or surfaces to the relation between each species' abundance, probability of occurrence, and one or more environmental variables. Ecologists have independently developed a variety of alternative techniques. Many of these techniques are essentially heuristic, and have a less secure theoretical basis. This chapter presents a theory of gradient analysis, in which the heuristic techniques are integrated with regression, calibration, ordination and constrained ordination as distinct, well-defined statistical problems. The various techniques used for each type of problem are classified in families according to their implicit response model and the method used to estimate parameters of the model. Three such families are considered. The treatment shown here unites such apparently disparate data analysis techniques as linear regression, principal components analysis, redundancy analysis, Gaussian ordination, weighted averaging, reciprocal averaging, detrended correspondence analysis, and canonical correspondence analysis in a single theoretical framework.

2,289 citations

Posted Content
TL;DR: A new online optimization algorithm is proposed, based on stochastic approximations, which scales up gracefully to large data sets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems.
Abstract: Sparse coding--that is, modelling data vectors as sparse linear combinations of basis elements--is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set, adapting it to specific data. Variations of this problem include dictionary learning in signal processing, non-negative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to state-of-the-art performance in terms of speed and optimization for both small and large datasets.

2,256 citations

Book ChapterDOI
08 Oct 1997
TL;DR: A new method for performing a nonlinear form of Principal Component Analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.
Abstract: A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible d-pixel products in images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

2,223 citations


Network Information
Related Topics (5)
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
84% related
Image processing
229.9K papers, 3.5M citations
82% related
Feature extraction
111.8K papers, 2.1M citations
82% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20232,193
20224,793
20211,064
20201,090
20191,199
20181,169