Mixtures of probabilistic principal component analyzers

doi:10.1162/089976699300016728

Home
/
Papers
/
Mixtures of probabilistic principal component analyzers

Journal Article•DOI•

Mixtures of probabilistic principal component analyzers

Michael E. Tipping¹, Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Feb 1999-Neural Computation (MIT Press)-Vol. 11, Iss: 2, pp 443-482

TL;DR: PCA is formulated within a maximum likelihood framework, based on a specific form of gaussian latent variable model, which leads to a well-defined mixture model for probabilistic principal component analyzers, whose parameters can be determined using an expectation-maximization algorithm.

read less

Abstract: Principal component analysis (PCA) is one of the most popular techniques for processing, compressing, and visualizing data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Therefore, previous attempts to formulate mixture models for PCA have been ad hoc to some extent. In this article, PCA is formulated within a maximum likelihood framework, based on a specific form of gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analyzers, whose parameters can be determined using an expectationmaximization algorithm. We discuss the advantages of this model in the context of clustering, density modeling, and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Book•

Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems

[...]

Peter Dayan, L. F. Abbott

01 Jan 2001

TL;DR: This text introduces the basic mathematical and computational methods of theoretical neuroscience and presents applications in a variety of areas including vision, sensory-motor integration, development, learning, and memory.

...read moreread less

Abstract: Theoretical neuroscience provides a quantitative basis for describing what nervous systems do, determining how they function, and uncovering the general principles by which they operate This text introduces the basic mathematical and computational methods of theoretical neuroscience and presents applications in a variety of areas including vision, sensory-motor integration, development, learning, and memory The book is divided into three parts Part I discusses the relationship between sensory stimuli and neural responses, focusing on the representation of information by the spiking activity of neurons Part II discusses the modeling of neurons and neural circuits on the basis of cellular and synaptic biophysics Part III analyzes the role of plasticity in development and learning An appendix covers the mathematical methods used, and exercises are available on the book's Web site

...read moreread less

3,441 citations

Journal Article•DOI•

Investigations into resting-state connectivity using independent component analysis

[...]

Christian F. Beckmann¹, Marilena DeLuca¹, Joseph T. Devlin¹, Stephen M. Smith¹•Institutions (1)

John Radcliffe Hospital¹

29 May 2005-Philosophical Transactions of the Royal Society B

TL;DR: A probabilistic independent component analysis approach, optimized for the analysis of fMRI data, is reviewed and it is demonstrated that this is an effective and robust tool for the identification of low-frequency resting-state patterns from data acquired at various different spatial and temporal resolutions.

...read moreread less

Abstract: Inferring resting-state connectivity patterns from functional magnetic resonance imaging (fMRI) data is a challenging task for any analytical technique. In this paper, we review a probabilistic independent component analysis (PICA) approach, optimized for the analysis of fMRI data, and discuss the role which this exploratory technique can take in scientific investigations into the structure of these effects. We apply PICA to fMRI data acquired at rest, in order to characterize the spatio-temporal structure of such data, and demonstrate that this is an effective and robust tool for the identification of low-frequency resting-state patterns from data acquired at various different spatial and temporal resolutions. We show that these networks exhibit high spatial consistency across subjects and closely resemble discrete cortical functional networks such as visual cortical areas or sensory-motor cortex.

...read moreread less

3,252 citations

Cites background or methods from "Mixtures of probabilistic principal..."

...In order to reduce computational load, therefore, we assumed a block-diagonal form of the data covariance matrix for the initial PCA dimensionality reduction, which is part of the spatial PICA decomposition....
[...]
...Keywords: functional magnetic resonance imaging; brain connectivity; resting-state fluctuations; independent component analysis...
[...]
...If we assume that the source distributions p(s) are Gaussian, the model then reduces to probabilistic principal component analysis (PCA) (Tipping & Bishop 1999) and we can use Bayesian model selection criteria....
[...]
...Probabilistic PCA is used to infer upon the unknown number of sources and results in an estimate of the noise and a set of spatially whitened observations....
[...]
...The spatial maps obtained from a PCA decomposition (figure 2c) have w0 spatial correlation, and fail to identify the ‘true’ spatial maps....
[...]

Journal Article•DOI•

Probabilistic independent component analysis for functional magnetic resonance imaging

[...]

Christian F. Beckmann¹, Stephen M. Smith¹•Institutions (1)

University of Oxford¹

06 Feb 2004-IEEE Transactions on Medical Imaging

TL;DR: An integrated approach to probabilistic independent component analysis for functional MRI (FMRI) data that allows for nonsquare mixing in the presence of Gaussian noise is presented and compared to the spatio-temporal accuracy of results obtained from classical ICA and GLM analyses.

...read moreread less

Abstract: We present an integrated approach to probabilistic independent component analysis (ICA) for functional MRI (FMRI) data that allows for nonsquare mixing in the presence of Gaussian noise. In order to avoid overfitting, we employ objective estimation of the amount of Gaussian noise through Bayesian analysis of the true dimensionality of the data, i.e., the number of activation and non-Gaussian noise sources. This enables us to carry out probabilistic modeling and achieves an asymptotically unique decomposition of the data. It reduces problems of interpretation, as each final independent component is now much more likely to be due to only one physical or physiological process. We also describe other improvements to standard ICA, such as temporal prewhitening and variance normalization of timeseries, the latter being particularly useful in the context of dimensionality reduction when weak activation is present. We discuss the use of prior information about the spatiotemporal nature of the source processes, and an alternative-hypothesis testing approach for inference, using Gaussian mixture models. The performance of our approach is illustrated and evaluated on real and artificial FMRI data, and compared to the spatio-temporal accuracy of results obtained from classical ICA and GLM analyses.

...read moreread less

2,597 citations

Cites methods from "Mixtures of probabilistic principal..."

...If we assume that the source distributions are Gaussian, the probabilistic ICA model (2) reduces to the probabilistic PCA model [20]....
[...]
...At the first stage we employ probabilistic PCA (PPCA, [20]) in order to find an appropriate linear subspace which contains the sources....
[...]

Journal Article•DOI•

Sparse Subspace Clustering: Algorithm, Theory, and Applications

[...]

Ehsan Elhamifar¹, René Vidal²•Institutions (2)

University of California, Berkeley¹, Johns Hopkins University²

01 Nov 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, a sparse subspace clustering algorithm is proposed to cluster high-dimensional data points that lie in a union of low-dimensional subspaces, where a sparse representation corresponds to selecting a few points from the same subspace.

...read moreread less

Abstract: Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.

...read moreread less

2,298 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Maximum likelihood from incomplete data via the EM algorithm

[...]

Arthur P. Dempster¹, Nan M. Laird¹, Donald B. Rubin¹•Institutions (1)

Harvard University¹

01 Sep 1977-Journal of the royal statistical society series b-methodological

49,597 citations

Book•

Neural networks for pattern recognition

[...]

Christopher M. Bishop¹•Institutions (1)

Aston University¹

01 Jan 1995

TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

19,056 citations

Book•

Principal Component Analysis

[...]

Ian T. Jolliffe¹•Institutions (1)

University of Aberdeen¹

01 May 1986

TL;DR: In this article, the authors present a graphical representation of data using Principal Component Analysis (PCA) for time series and other non-independent data, as well as a generalization and adaptation of principal component analysis.

...read moreread less

Abstract: Introduction * Properties of Population Principal Components * Properties of Sample Principal Components * Interpreting Principal Components: Examples * Graphical Representation of Data Using Principal Components * Choosing a Subset of Principal Components or Variables * Principal Component Analysis and Factor Analysis * Principal Components in Regression Analysis * Principal Components Used with Other Multivariate Techniques * Outlier Detection, Influential Observations and Robust Estimation * Rotation and Interpretation of Principal Components * Principal Component Analysis for Time Series and Other Non-Independent Data * Principal Component Analysis for Special Types of Data * Generalizations and Adaptations of Principal Component Analysis

...read moreread less

17,446 citations

Book Chapter•DOI•

Neural Networks for Pattern Recognition

[...]

Suresh Kothari¹, Heekuck Oh¹•Institutions (1)

Iowa State University¹

01 Jan 1993-Advances in Computers

TL;DR: The chapter discusses two important directions of research to improve learning algorithms: the dynamic node generation, which is used by the cascade correlation algorithm; and designing learning algorithms where the choice of parameters is not an issue.

...read moreread less

Abstract: Publisher Summary This chapter provides an account of different neural network architectures for pattern recognition. A neural network consists of several simple processing elements called neurons. Each neuron is connected to some other neurons and possibly to the input nodes. Neural networks provide a simple computing paradigm to perform complex recognition tasks in real time. The chapter categorizes neural networks into three types: single-layer networks, multilayer feedforward networks, and feedback networks. It discusses the gradient descent and the relaxation method as the two underlying mathematical themes for deriving learning algorithms. A lot of research activity is centered on learning algorithms because of their fundamental importance in neural networks. The chapter discusses two important directions of research to improve learning algorithms: the dynamic node generation, which is used by the cascade correlation algorithm; and designing learning algorithms where the choice of parameters is not an issue. It closes with the discussion of performance and implementation issues.

...read moreread less

13,033 citations

"Mixtures of probabilistic principal..." refers background in this paper

...(4.5) Thus the updates for π̃i and µ̃i correspond exactly to those of a standard gaussian mixture formulation (e.g., see Bishop, 1995)....
[...]
...This can be achieved with the use of a Lagrange multiplier λ (see Bishop, 1995) and maximizing 〈LC〉 + λ ( M∑ i=1 πi − 1 ) ....
[...]
...Examples include principal curves (Hastie & Stuetzle, 1989; Tibshirani, 1992), multilayer autoassociative neural networks (Kramer, 1991), the kernel-function approach of Webb (1996), and the generative topographic mapping (GTM) of Bishop, Svensén, and Williams (1998). An alternative paradigm to such global nonlinear approaches is to model nonlinear structure with a collection, or mixture, of local linear submodels....
[...]

Journal Article•DOI•

LIII. On lines and planes of closest fit to systems of points in space

[...]

Karl Pearson F.R.S.¹•Institutions (1)

University College London¹

01 Nov 1901-Philosophical Magazine Series 1

TL;DR: This paper is concerned with the construction of planes of closest fit to systems of points in space and the relationships between these planes and the planes themselves.

...read moreread less

Abstract: (1901). LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science: Vol. 2, No. 11, pp. 559-572.

...read moreread less

10,656 citations

"Mixtures of probabilistic principal..." refers background in this paper

...A complementary property of PCA, and that most closely related to the original discussions of Pearson (1901) , is that the projection onto the principal subspace minimizes the squared reconstruction error P ktn i^tnk2....
[...]