scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Mixtures of probabilistic principal component analyzers

01 Feb 1999-Neural Computation (MIT Press)-Vol. 11, Iss: 2, pp 443-482
TL;DR: PCA is formulated within a maximum likelihood framework, based on a specific form of gaussian latent variable model, which leads to a well-defined mixture model for probabilistic principal component analyzers, whose parameters can be determined using an expectation-maximization algorithm.
Abstract: Principal component analysis (PCA) is one of the most popular techniques for processing, compressing, and visualizing data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Therefore, previous attempts to formulate mixture models for PCA have been ad hoc to some extent. In this article, PCA is formulated within a maximum likelihood framework, based on a specific form of gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analyzers, whose parameters can be determined using an expectationmaximization algorithm. We discuss the advantages of this model in the context of clustering, density modeling, and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.

Content maybe subject to copyright    Report

Citations
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Book
01 Jan 2001
TL;DR: This text introduces the basic mathematical and computational methods of theoretical neuroscience and presents applications in a variety of areas including vision, sensory-motor integration, development, learning, and memory.
Abstract: Theoretical neuroscience provides a quantitative basis for describing what nervous systems do, determining how they function, and uncovering the general principles by which they operate This text introduces the basic mathematical and computational methods of theoretical neuroscience and presents applications in a variety of areas including vision, sensory-motor integration, development, learning, and memory The book is divided into three parts Part I discusses the relationship between sensory stimuli and neural responses, focusing on the representation of information by the spiking activity of neurons Part II discusses the modeling of neurons and neural circuits on the basis of cellular and synaptic biophysics Part III analyzes the role of plasticity in development and learning An appendix covers the mathematical methods used, and exercises are available on the book's Web site

3,441 citations

Journal ArticleDOI
TL;DR: A probabilistic independent component analysis approach, optimized for the analysis of fMRI data, is reviewed and it is demonstrated that this is an effective and robust tool for the identification of low-frequency resting-state patterns from data acquired at various different spatial and temporal resolutions.
Abstract: Inferring resting-state connectivity patterns from functional magnetic resonance imaging (fMRI) data is a challenging task for any analytical technique. In this paper, we review a probabilistic independent component analysis (PICA) approach, optimized for the analysis of fMRI data, and discuss the role which this exploratory technique can take in scientific investigations into the structure of these effects. We apply PICA to fMRI data acquired at rest, in order to characterize the spatio-temporal structure of such data, and demonstrate that this is an effective and robust tool for the identification of low-frequency resting-state patterns from data acquired at various different spatial and temporal resolutions. We show that these networks exhibit high spatial consistency across subjects and closely resemble discrete cortical functional networks such as visual cortical areas or sensory-motor cortex.

3,252 citations


Cites background or methods from "Mixtures of probabilistic principal..."

  • ...In order to reduce computational load, therefore, we assumed a block-diagonal form of the data covariance matrix for the initial PCA dimensionality reduction, which is part of the spatial PICA decomposition....

    [...]

  • ...Keywords: functional magnetic resonance imaging; brain connectivity; resting-state fluctuations; independent component analysis...

    [...]

  • ...If we assume that the source distributions p(s) are Gaussian, the model then reduces to probabilistic principal component analysis (PCA) (Tipping & Bishop 1999) and we can use Bayesian model selection criteria....

    [...]

  • ...Probabilistic PCA is used to infer upon the unknown number of sources and results in an estimate of the noise and a set of spatially whitened observations....

    [...]

  • ...The spatial maps obtained from a PCA decomposition (figure 2c) have w0 spatial correlation, and fail to identify the ‘true’ spatial maps....

    [...]

Journal ArticleDOI
TL;DR: An integrated approach to probabilistic independent component analysis for functional MRI (FMRI) data that allows for nonsquare mixing in the presence of Gaussian noise is presented and compared to the spatio-temporal accuracy of results obtained from classical ICA and GLM analyses.
Abstract: We present an integrated approach to probabilistic independent component analysis (ICA) for functional MRI (FMRI) data that allows for nonsquare mixing in the presence of Gaussian noise. In order to avoid overfitting, we employ objective estimation of the amount of Gaussian noise through Bayesian analysis of the true dimensionality of the data, i.e., the number of activation and non-Gaussian noise sources. This enables us to carry out probabilistic modeling and achieves an asymptotically unique decomposition of the data. It reduces problems of interpretation, as each final independent component is now much more likely to be due to only one physical or physiological process. We also describe other improvements to standard ICA, such as temporal prewhitening and variance normalization of timeseries, the latter being particularly useful in the context of dimensionality reduction when weak activation is present. We discuss the use of prior information about the spatiotemporal nature of the source processes, and an alternative-hypothesis testing approach for inference, using Gaussian mixture models. The performance of our approach is illustrated and evaluated on real and artificial FMRI data, and compared to the spatio-temporal accuracy of results obtained from classical ICA and GLM analyses.

2,597 citations


Cites methods from "Mixtures of probabilistic principal..."

  • ...If we assume that the source distributions are Gaussian, the probabilistic ICA model (2) reduces to the probabilistic PCA model [20]....

    [...]

  • ...At the first stage we employ probabilistic PCA (PPCA, [20]) in order to find an appropriate linear subspace which contains the sources....

    [...]

Journal ArticleDOI
TL;DR: In this article, a sparse subspace clustering algorithm is proposed to cluster high-dimensional data points that lie in a union of low-dimensional subspaces, where a sparse representation corresponds to selecting a few points from the same subspace.
Abstract: Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.

2,298 citations

References
More filters
Book
01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

19,056 citations

Book
01 May 1986
TL;DR: In this article, the authors present a graphical representation of data using Principal Component Analysis (PCA) for time series and other non-independent data, as well as a generalization and adaptation of principal component analysis.
Abstract: Introduction * Properties of Population Principal Components * Properties of Sample Principal Components * Interpreting Principal Components: Examples * Graphical Representation of Data Using Principal Components * Choosing a Subset of Principal Components or Variables * Principal Component Analysis and Factor Analysis * Principal Components in Regression Analysis * Principal Components Used with Other Multivariate Techniques * Outlier Detection, Influential Observations and Robust Estimation * Rotation and Interpretation of Principal Components * Principal Component Analysis for Time Series and Other Non-Independent Data * Principal Component Analysis for Special Types of Data * Generalizations and Adaptations of Principal Component Analysis

17,446 citations

Book ChapterDOI
TL;DR: The chapter discusses two important directions of research to improve learning algorithms: the dynamic node generation, which is used by the cascade correlation algorithm; and designing learning algorithms where the choice of parameters is not an issue.
Abstract: Publisher Summary This chapter provides an account of different neural network architectures for pattern recognition. A neural network consists of several simple processing elements called neurons. Each neuron is connected to some other neurons and possibly to the input nodes. Neural networks provide a simple computing paradigm to perform complex recognition tasks in real time. The chapter categorizes neural networks into three types: single-layer networks, multilayer feedforward networks, and feedback networks. It discusses the gradient descent and the relaxation method as the two underlying mathematical themes for deriving learning algorithms. A lot of research activity is centered on learning algorithms because of their fundamental importance in neural networks. The chapter discusses two important directions of research to improve learning algorithms: the dynamic node generation, which is used by the cascade correlation algorithm; and designing learning algorithms where the choice of parameters is not an issue. It closes with the discussion of performance and implementation issues.

13,033 citations


"Mixtures of probabilistic principal..." refers background in this paper

  • ...(4.5) Thus the updates for π̃i and µ̃i correspond exactly to those of a standard gaussian mixture formulation (e.g., see Bishop, 1995)....

    [...]

  • ...This can be achieved with the use of a Lagrange multiplier λ (see Bishop, 1995) and maximizing 〈LC〉 + λ ( M∑ i=1 πi − 1 ) ....

    [...]

  • ...Examples include principal curves (Hastie & Stuetzle, 1989; Tibshirani, 1992), multilayer autoassociative neural networks (Kramer, 1991), the kernel-function approach of Webb (1996), and the generative topographic mapping (GTM) of Bishop, Svensén, and Williams (1998). An alternative paradigm to such global nonlinear approaches is to model nonlinear structure with a collection, or mixture, of local linear submodels....

    [...]

Journal ArticleDOI
TL;DR: This paper is concerned with the construction of planes of closest fit to systems of points in space and the relationships between these planes and the planes themselves.
Abstract: (1901). LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science: Vol. 2, No. 11, pp. 559-572.

10,656 citations


"Mixtures of probabilistic principal..." refers background in this paper

  • ...A complementary property of PCA, and that most closely related to the original discussions of Pearson (1901) , is that the projection onto the principal subspace minimizes the squared reconstruction error P ktn i^tnk2....

    [...]

Trending Questions (1)
How do i combine permanova and PCA in a statistical analysis?

The provided paper does not discuss the combination of PERMANOVA and PCA in a statistical analysis.