Journal Article•DOI•

Laplacian Eigenmaps for dimensionality reduction and data representation

Q: What are the contributions mentioned in the paper "Laplacian eigenmaps for dimensionality reduction and data representation" ?

The authors consider the problem of constructing a representation for data lying on a lowdimensional manifold embedded in a high-dimensional space. Drawing on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold, and the connections to the heat equation, the authors propose a geometrically motivated algorithm for representing the highdimensional data. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality-preserving properties and a natural connection to clustering. Some potential applications and illustrative examples are discussed.

Q: What is the Riemannian structure on the manifold?

If the manifold is embedded in Rl, the Riemannian structure (metric tensor) on the manifold is induced by the standard Riemannian structure on Rl.

Q: What is the dimensionality of the space of all images of the same object?

the intrinsic dimensionality of the space of all images of the same object is the number of degrees of freedom of the camera.

Q: What is the simplest way to approximate the distances on the manifold?

One approach to nonlinear dimensionality reduction as exemplified by Tenenbaum et al. attempts to approximate all geodesic distances on the manifold faithfully.

Q: What is the eigenfunction of the Laplace Beltrami operator?

Let the eigenvalues (in increasing order) be 0 = λ0 ≤ λ1 ≤ λ2 ≤ . . . , and let fi be the eigenfunction corresponding to eigenvalue λi.

Q: What is the simplest way to construct a weighted graph?

Given k points x1, . . . , xk in Rl, the authors construct a weighted graph with k nodes, one for each point, and a set of edges connecting neighboring points.

Q: What constraint prevents collapse onto a subspace of dimension less than m1?

For the m-dimensional embedding problem, the constraint presented above prevents collapse onto a subspace of dimension less than m−1 (m if, as in one-dimensional case, the authors require orthogonality to the constant vector).

Mikhail Belkin¹, Partha Niyogi¹•Institutions (1)

University of Chicago¹

01 Jun 2003-Neural Computation (MIT Press)-Vol. 15, Iss: 6, pp 1373-1396

TL;DR: In this article, the authors proposed a geometrically motivated algorithm for representing high-dimensional data, based on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold and the connections to the heat equation.

read less

Abstract: One of the central problems in machine learning and pattern recognition is to develop appropriate representations for complex data. We consider the problem of constructing a representation for data lying on a low-dimensional manifold embedded in a high-dimensional space. Drawing on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold, and the connections to the heat equation, we propose a geometrically motivated algorithm for representing the high-dimensional data. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality-preserving properties and a natural connection to clustering. Some potential applications and illustrative examples are discussed.

...read moreread less

Summary (2 min read)

Jump to: [1 Introduction] – [2 The Algorithm] – [3 Justification] – [4 Connections to Spectral Clustering] – [5 Analysis of Locally Linear Embedding Algorithm] – [6 Examples] and [7 Conclusions]

1 Introduction

In many areas of artificial intelligence, information retrieval, and data mining, one is often confronted with intrinsically low-dimensional data lying in a very high-dimensional space.
The general problem of dimensionality reduction has a long history.
Classical approaches include principal components analysis (PCA) and multidimensional scaling.
Most of these methods do not explicitly consider the structure of the manifold on which the data may possibly reside.
Thus, the embedding maps for the data approximate the eigenmaps of the Laplace Beltrami operator, which are maps intrinsically defined on the entire manifold.

2 The Algorithm

The embedding map is now provided by computing the eigenvectors of the graph Laplacian.
Step 1 (constructing the adjacency graph).
Easier to choose; does not tend to lead to disconnected graphs, also known as Advantages.
Here as well, the authors have two variations for weighting the edges: (a) Heat kernel (parameter t ∈ R).
This simplification avoids the need to choose t. 3. Step 3 .

3 Justification

Let us first show that the embedding provided by the Laplacian eigenmap algorithm preserves local information optimally in a certain sense.
The following section is based on standard spectral graph theory.
It follows from equation 3.1 that L is a positive semidefinite matrix, and the vector y that minimizes the objective function is given by the minimum eigenvalue solution to the generalized eigenvalue problem: Ly = λDy.
For the one-dimensional embedding problem, the constraint prevents collapse onto a point.
This observation leads to several possible approximation schemes for the manifold Laplacian.

4 Connections to Spectral Clustering

The approach to dimensionality reduction considered in this letter uses maps provided by the eigenvectors of the graph Laplacian and eigenfunctions of Laplace Beltrami operator on the manifold.
The approach considered there uses a graph that is globally connected with exponentially decaying weights.
The weight Wij associated with the edge eij is the similarity between vi and vj.
The authors assume that the matrix of pairwise similarities is symmetric and the corresponding undirected graph is connected.
The central observation to be made here is that the process of dimensionality reduction that preserves locality yields the same solution as clustering.

5 Analysis of Locally Linear Embedding Algorithm

The authors provide a brief analysis of the LLE algorithm recently proposed by Roweis and Saul (2000) and show its connection to the Laplacian.
Step 1 (discovering the adjacency information).
Let Wij be such that∑ j Wijxij equals the orthogonal projection of xi onto the affine linear span of xij ’s.
The authors develop this argument over several steps: Step 1: Let us fix a data point xi.
Since the difference of two points can be regarded as a vector with the origin at the second point, the authors see that vj’s are vectors in the tangent plane originating at o.

6 Examples

The authors now briefly consider several possible applications of the algorithmic framework developed in this letter.
The authors choose 1000 images (500 containing vertical bars and 500 containing horizontal bars) at random.
Each word is represented as a vector in a 600-dimensional space using information about the frequency of its left and right neighbors (computed from the corpus).
Points mapped to the same region in the representation space share similar phonetic features, though points with the same label may originate from different occurrences of the same phoneme.

7 Conclusions

The authors introduced a coherent framework for dimensionality reduction for the case where data reside on a low-dimensional manifold embedded in a higher-dimensional space.
They do not in general provide an isometric embedding.
It is unclear how to estimate reliably even such a simple invariant as the intrinsic dimensionality of the manifold.
There are further issues pertaining to their framework that need to be sorted out.
First, the authors have implicitly assumed a uniform probability distribution on the manifold according to which the data points have been sampled.

Did you find this useful? Give us your feedback

Figures (7)

Figure 7: A blowup of the three selected regions corresponding to the arrows in Figure 6. Notice the phonetic homogeneity of the chosen regions. The data points corresponding to the same region have similar phonetic identity, though they may (and do) arise from occurrences of the same phoneme at different points in the utterance. The symbol sh stands for the fricative in the word she; aa and ao stand for vowels in the words dark and all, respectively; kcl, dcl, and gcl stand for closures preceding the stop consonants k, d, g, respectively. h# stands for silence.

Figure 3: (Left) A horizontal and a vertical bar. (Middle) A two-dimensional representation of the set of all images using the Laplacian eigenmaps. (Right) The result of PCA using the first two principal directions to represent the data. Blue dots correspond to images of vertical bars, and plus signs correspond to images of horizontal bars.

Figure 4: The 300 most frequent words of the Brown corpus represented in the spectral domain.

Figure 5: Fragments labeled by arrows: (left) infinitives of verbs, (middle) prepositions, and (right) mostly modal and auxiliary verbs. We see that syntactic structure is well preserved.

Figure 2: Two-dimensional representations of the swiss roll data, for different values of the number of nearest neighbors N and the heat kernel parameter t. t = ∞ corresponds to the discrete weights.

Figure 6: The 685 speech data points plotted in the two-dimensional Laplacian spectral representation.

Figure 1: 2000 Random data points on the swiss roll.

Content maybe subject to copyright Report

LETTER Communicated by Joshua B. Tenenbaum

Laplacian Eigenmaps for Dimensionality Reduction and Data

Representation

Mikhail Belkin

misha@math.uchicago.edu

Department of Mathematics, University of Chicago, Chicago, IL 60637, U.S.A.

Partha Niyogi

niyogi@cs.uchicago.edu

Department of Computer Science and Statistics, University of Chicago,

Chicago, IL 60637 U.S.A.

One of the central problems in machine learning and pattern recognition

is to develop appropriate representations for complex data. We consider

the problem of constructing a representation for data lying on a low-

dimensional manifold embedded in a high-dimensional space. Drawing

on the correspondence between the graph Laplacian, the Laplace Beltrami

operator on the manifold, and the connections to the heat equation, we

propose a geometrically motivated algorithm for representing the high-

dimensional data. The algorithm provides a computationally efﬁcient ap-

proach to nonlinear dimensionality reduction that has locality-preserving

properties and a natural connection to clustering. Some potential appli-

cations and illustrative examples are discussed.

1 Introduction

In many areas of artiﬁcial intelligence, information retrieval, and data min-

ing, one is often confronted with intrinsically low-dimensional data lying in

a very high-dimensional space. Consider, for example, gray-scale images of

an object taken under ﬁxed lighting conditions with a moving camera. Each

such image would typically be represented by a brightness value at each

pixel. If there were n

pixels in all (corresponding to an n × n image), then

each image yields a data point in

. However, the intrinsic dimensional-

ity of the space of all images of the same object is the number of degrees of

freedom of the camera. In this case, the space under consideration has the

natural structure of a low-dimensional manifold embedded in

Recently, there has been some renewed interest (Tenenbaum, de Silva,

& Langford, 2000; Roweis & Saul, 2000) in the problem of developing low-

dimensional representations when data arise from sampling a probabil-

ity distribution on a manifold. In this letter, we present a geometrically

Neural Computation 15, 1373–1396 (2003)

 2003 Massachusetts Institute of Technology

1374 M. Belkin and P. Niyogi

motivated algorithm and an accompanying framework of analysis for this

problem.

The general problem of dimensionality reduction has a long history. Clas-

sical approaches include principal components analysis (PCA) and multi-

dimensional scaling. Various methods that generate nonlinear maps have

also been considered. Most of them, such as self-organizing maps and other

neural network–based approaches (e.g., Haykin, 1999), set up a nonlin-

ear optimization problem whose solution is typically obtained by gradient

descent that is guaranteed only to produce a local optimum; global op-

tima are difﬁcult to attain by efﬁcient means. Note, however, that the re-

cent approach of generalizing the PCA through kernel-based techniques

(Sch¨olkopf, Smola, & M¨uller, 1998) does not have this shortcoming. Most of

these methods do not explicitly consider the structure of the manifold on

which the data may possibly reside.

In this letter, we explore an approach that builds a graph incorporating

neighborhood information of the data set. Using the notion of the Laplacian

of the graph, we then compute a low-dimensional representation of the data

set that optimally preserves local neighborhood information in a certain

sense. The representation map generated by the algorithm may be viewed

as a discrete approximation to a continuous map that naturally arises from

the geometry of the manifold.

It is worthwhile to highlight several aspects of the algorithm and the

framework of analysis presented here:

• The core algorithm is very simple. It has a few local computations and

one sparse eigenvalue problem. The solution reﬂects the intrinsic geo-

metric structure of the manifold. It does, however, require a search for

neighboring points in a high-dimensional space. We note that there are

several efﬁcient approximate techniques for ﬁnding nearest neighbors

(e.g., Indyk, 2000).

• The justiﬁcation for the algorithm comes from the role of the Laplace

Beltrami operator in providing an optimal embedding for the mani-

fold. The manifold is approximated by the adjacency graph computed

from the data points. The Laplace Beltrami operator is approximated

by the weighted Laplacian of the adjacency graph with weights cho-

sen appropriately. The key role of the Laplace Beltrami operator in the

heat equation enables us to use the heat kernel to choose the weight

decay function in a principled manner. Thus, the embedding maps for

the data approximate the eigenmaps of the Laplace Beltrami operator,

which are maps intrinsically deﬁned on the entire manifold.

• The framework of analysis presented here makes explicit use of these

connections to interpret dimensionality-reduction algorithms in a ge-

ometric fashion. In addition to the algorithms presented in this letter,

we are also able to reinterpret the recently proposed locally linear em-

Laplacian Eigenmaps 1375

bedding (LLE) algorithm of Roweis and Saul (2000) within this frame-

work.

The graph Laplacian has been widely used for different clustering and

partition problems (Shi & Malik, 1997; Simon, 1991; Ng, Jordan, &

Weiss, 2002). Although the connections between the Laplace Beltrami

operator and the graph Laplacian are well known to geometers and

specialists in spectral graph theory (Chung, 1997; Chung, Grigor’yan,

& Yau, 2000), so far we are not aware of any application to dimen-

sionality reduction or data representation. We note, however, recent

work on using diffusion kernels on graphs and other discrete struc-

tures (Kondor & Lafferty, 2002).

• The locality-preserving character of the Laplacian eigenmap algorithm

makes it relatively insensitive to outliers and noise. It is also not prone

to short circuiting, as only the local distances are used. We show that by

trying to preserve local information in the embedding, the algorithm

implicitly emphasizes the natural clusters in the data. Close connec-

tions to spectral clustering algorithms developed in learning and com-

puter vision (in particular, the approach of Shi & Malik, 1997) then

become very clear. In this sense, dimensionality reduction and cluster-

ing are two sides of the same coin, and we explore this connection in

some detail. In contrast, global methods like that in Tenenbaum et al.

(2000), do not show any tendency to cluster, as an attempt is made to

preserve all pairwise geodesic distances between points.

However, not all data sets necessarily have meaningful clusters. Other

methods such as PCA or Isomap might be more appropriate in that

case. We will demonstate, however, that at least in one example of such

a data set ( the “swiss roll”), our method produces reasonable results.

• Since much of the discussion of Seung and Lee (2000), Roweis and

Saul (2000), and Tenenbaum et al. (2000) is motivated by the role that

nonlinear dimensionality reduction may play in human perception

and learning, it is worthwhile to consider the implication of the pre-

vious remark in this context. The biological perceptual apparatus is

confronted with high-dimensional stimuli from which it must recover

low-dimensional structure. If the approach to recovering such low-

dimensional structure is inherently local (as in the algorithm proposed

here), then a natural clustering will emerge and may serve as the basis

for the emergence of categories in biological perception.

• Since our approach is based on the intrinsic geometric structure of the

manifold, it exhibits stability with respect to the embedding. As long

as the embedding is isometric, the representation will not change. In

the example with the moving camera, different resolutions of the cam-

era (i.e., different choices of n in the n × n image grid) should lead to

embeddings of the same underlying manifold into spaces of very dif-

1376 M. Belkin and P. Niyogi

ferent dimension. Our algorithm will produce similar representations

independent of the resolution.

The generic problem of dimensionality reduction is the following. Given

a set x

,...,x

of k points in

, ﬁnd a set of points y

,...,y

(m  l)

such that y

“represents” x

. In this letter, we consider the special case where

,...,x

∈ M and M is a manifold embedded in R

We now consider an algorithm to construct representative y

’s for this

special case. The sense in which such a representation is optimal will become

clear later in this letter.

2 The Algorithm

Given k points x

,...,x

in R

, we construct a weighted graph with k nodes,

one for each point, and a set of edges connecting neighboring points. The

embedding map is now provided by computing the eigenvectors of the

graph Laplacian. The algorithmic procedure is formally stated below.

1. Step 1 (constructing the adjacency graph). We put an edge between

nodes i and j if x

and x

are “close.” There are two variations:

(a) -neighborhoods (parameter  ∈

R). Nodes i and j are con-

nected by an edge if x

−x



<where the norm is the usual

Euclidean norm in

. Advantages: Geometrically motivated,

the relationship is naturally symmetric. Disadvantages: Often

leads to graphs with several connected components, difﬁcult

to choose .

(b) n nearest neighbors (parameter n ∈

N). Nodes i and j are con-

nected by an edge if i is among n nearest neighbors of j or j is

among n nearest neighbors of i. Note that this relation is sym-

metric. Advantages: Easier to choose; does not tend to lead to

disconnected graphs. Disadvantages: Less geometrically intu-

itive.

2. Step 2 (choosing the weights).

Here as well, we have two variations

for weighting the edges:

(a) Heat kernel (parameter t ∈

R). If nodes i and j are connected,

put

= e

−

x

−



;

otherwise, put W

= 0. The justiﬁcation for this choice of

weights will be provided later.

In a computer implementation of the algorithm, steps 1 and 2 are executed

simultaneously.

Laplacian Eigenmaps 1377

(b) Simple-minded (no parameters (t =∞)). W

= 1 if vertices i

and j are connected by an edge and W

= 0 if vertices i and

j are not connected by an edge. This simpliﬁcation avoids the

need to choose t.

3. Step 3 (eigenmaps). Assume the graph G, constructed above, is con-

nected. Otherwise, proceed with step 3 for each connected component.

Compute eigenvalues and eigenvectors for the generalized eigenvec-

tor problem,

Lf = λDf, (2.1)

where D is diagonal weight matrix, and its entries are column (or

row, since W is symmetric) sums of W, D



. L = D − W is

the Laplacian matrix. Laplacian is a symmetric, positive semideﬁnite

matrix that can be thought of as an operator on functions deﬁned on

vertices of G.

Let f

,...,f

k−1

be the solutions of equation 2.1, ordered according

to their eigenvalues:

= λ

···

k−1

= λ

k−1

0 = λ

≤ λ

≤···≤λ

k−1

We leave out the eigenvector f

corresponding to eigenvalue 0 and use

the next m eigenvectors for embedding in m-dimensional Euclidean

space:

→ (f

(i),...,f

(i)).

3 Justiﬁcation

3.1 Optimal Embeddings. Let us ﬁrst show that the embedding pro-

vided by the Laplacian eigenmap algorithm preserves local information

optimally in a certain sense.

The following section is based on standard spectral graph theory. (See

Chung, 1997, for a comprehensive reference.)

Recall that given a data set, we construct a weighted graph G = (V, E)

with edges connecting nearby points to each other. For the purposes of

this discussion, assume the graph is connected. Consider the problem of

mapping the weighted graph G to a line so that connected points stay as close

together as possible. Let y = (y

, y

,...,y

)

be such a map. A reasonable

HTML Viewer

Frequently Asked Questions (13)

Q1. What are the contributions mentioned in the paper "Laplacian eigenmaps for dimensionality reduction and data representation" ?

The authors consider the problem of constructing a representation for data lying on a lowdimensional manifold embedded in a high-dimensional space. Drawing on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold, and the connections to the heat equation, the authors propose a geometrically motivated algorithm for representing the highdimensional data. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality-preserving properties and a natural connection to clustering. Some potential applications and illustrative examples are discussed.

Q2. What future works have the authors mentioned in the paper "Laplacian eigenmaps for dimensionality reduction and data representation" ?

There are further issues pertaining to their framework that need to be sorted out.

Q3. What is the Riemannian structure on the manifold?

If the manifold is embedded in Rl, the Riemannian structure (metric tensor) on the manifold is induced by the standard Riemannian structure on Rl.

Q4. What is the dimensionality of the space of all images of the same object?

the intrinsic dimensionality of the space of all images of the same object is the number of degrees of freedom of the camera.

Q5. What is the simplest way to approximate the distances on the manifold?

One approach to nonlinear dimensionality reduction as exemplified by Tenenbaum et al. attempts to approximate all geodesic distances on the manifold faithfully.

Q6. What is the eigenfunction of the Laplace Beltrami operator?

Let the eigenvalues (in increasing order) be 0 = λ0 ≤ λ1 ≤ λ2 ≤ . . . , and let fi be the eigenfunction corresponding to eigenvalue λi.

Q7. What is the simplest way to construct a weighted graph?

Given k points x1, . . . , xk in Rl, the authors construct a weighted graph with k nodes, one for each point, and a set of edges connecting neighboring points.

Q8. What is the representation map generated by the algorithm?

The representation map generated by the algorithm may be viewed as a discrete approximation to a continuous map that naturally arises from the geometry of the manifold.

Q9. What is the implication of the previous remark?

Since much of the discussion of Seung and Lee (2000), Roweis and Saul (2000), and Tenenbaum et al. (2000) is motivated by the role that nonlinear dimensionality reduction may play in human perception and learning, it is worthwhile to consider the implication of the previous remark in this context.

Q10. What constraint prevents collapse onto a subspace of dimension less than m1?

For the m-dimensional embedding problem, the constraint presented above prevents collapse onto a subspace of dimension less than m−1 (m if, as in one-dimensional case, the authors require orthogonality to the constant vector).

Q11. What is the way to estimate the isometric embedding of a manif?

The celebrated Nash’s embedding theorem (Nash, 1954) guarantees that an n-dimensional manifold admits an isometric C1 embedding into a 2n+1–dimensional Euclidean space.

Q12. How is the word represented in the Brown corpus?

Each word is represented as a vector in a 600-dimensional space using information about the frequency of its left and right neighbors (computed from the corpus).

Q13. What is the eigenvalue solution to the generalized eigenvalue problem?

It follows from equation 3.1 that L is a positive semidefinite matrix, and the vector y that minimizes the objective function is given by the minimum eigenvalue solution to the generalized eigenvalue problem:

Laplacian Eigenmaps for dimensionality reduction and data representation

Summary (2 min read)

1 Introduction

2 The Algorithm

3 Justification

4 Connections to Spectral Clustering

5 Analysis of Locally Linear Embedding Algorithm

6 Examples

7 Conclusions

Figures (7)

Citations

References

Related Papers (5)

Frequently Asked Questions (13)

Q1. What are the contributions mentioned in the paper "Laplacian eigenmaps for dimensionality reduction and data representation" ?

Q2. What future works have the authors mentioned in the paper "Laplacian eigenmaps for dimensionality reduction and data representation" ?

Q3. What is the Riemannian structure on the manifold?

Q4. What is the dimensionality of the space of all images of the same object?

Q5. What is the simplest way to approximate the distances on the manifold?

Q6. What is the eigenfunction of the Laplace Beltrami operator?

Q7. What is the simplest way to construct a weighted graph?

Q8. What is the representation map generated by the algorithm?

Q9. What is the implication of the previous remark?

Q10. What constraint prevents collapse onto a subspace of dimension less than m1?

Q11. What is the way to estimate the isometric embedding of a manif?

Q12. How is the word represented in the Brown corpus?

Q13. What is the eigenvalue solution to the generalized eigenvalue problem?