scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Laplacian Eigenmaps for dimensionality reduction and data representation

01 Jun 2003-Neural Computation (MIT Press)-Vol. 15, Iss: 6, pp 1373-1396
TL;DR: In this article, the authors proposed a geometrically motivated algorithm for representing high-dimensional data, based on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold and the connections to the heat equation.
Abstract: One of the central problems in machine learning and pattern recognition is to develop appropriate representations for complex data. We consider the problem of constructing a representation for data lying on a low-dimensional manifold embedded in a high-dimensional space. Drawing on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold, and the connections to the heat equation, we propose a geometrically motivated algorithm for representing the high-dimensional data. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality-preserving properties and a natural connection to clustering. Some potential applications and illustrative examples are discussed.

Summary (2 min read)

1 Introduction

  • In many areas of artificial intelligence, information retrieval, and data mining, one is often confronted with intrinsically low-dimensional data lying in a very high-dimensional space.
  • The general problem of dimensionality reduction has a long history.
  • Classical approaches include principal components analysis (PCA) and multidimensional scaling.
  • Most of these methods do not explicitly consider the structure of the manifold on which the data may possibly reside.
  • Thus, the embedding maps for the data approximate the eigenmaps of the Laplace Beltrami operator, which are maps intrinsically defined on the entire manifold.

2 The Algorithm

  • The embedding map is now provided by computing the eigenvectors of the graph Laplacian.
  • Step 1 (constructing the adjacency graph).
  • Easier to choose; does not tend to lead to disconnected graphs, also known as Advantages.
  • Here as well, the authors have two variations for weighting the edges: (a) Heat kernel (parameter t ∈ R).
  • This simplification avoids the need to choose t. 3. Step 3 .

3 Justification

  • Let us first show that the embedding provided by the Laplacian eigenmap algorithm preserves local information optimally in a certain sense.
  • The following section is based on standard spectral graph theory.
  • It follows from equation 3.1 that L is a positive semidefinite matrix, and the vector y that minimizes the objective function is given by the minimum eigenvalue solution to the generalized eigenvalue problem: Ly = λDy.
  • For the one-dimensional embedding problem, the constraint prevents collapse onto a point.
  • This observation leads to several possible approximation schemes for the manifold Laplacian.

4 Connections to Spectral Clustering

  • The approach to dimensionality reduction considered in this letter uses maps provided by the eigenvectors of the graph Laplacian and eigenfunctions of Laplace Beltrami operator on the manifold.
  • The approach considered there uses a graph that is globally connected with exponentially decaying weights.
  • The weight Wij associated with the edge eij is the similarity between vi and vj.
  • The authors assume that the matrix of pairwise similarities is symmetric and the corresponding undirected graph is connected.
  • The central observation to be made here is that the process of dimensionality reduction that preserves locality yields the same solution as clustering.

5 Analysis of Locally Linear Embedding Algorithm

  • The authors provide a brief analysis of the LLE algorithm recently proposed by Roweis and Saul (2000) and show its connection to the Laplacian.
  • Step 1 (discovering the adjacency information).
  • Let Wij be such that∑ j Wijxij equals the orthogonal projection of xi onto the affine linear span of xij ’s.
  • The authors develop this argument over several steps: Step 1: Let us fix a data point xi.
  • Since the difference of two points can be regarded as a vector with the origin at the second point, the authors see that vj’s are vectors in the tangent plane originating at o.

6 Examples

  • The authors now briefly consider several possible applications of the algorithmic framework developed in this letter.
  • The authors choose 1000 images (500 containing vertical bars and 500 containing horizontal bars) at random.
  • Each word is represented as a vector in a 600-dimensional space using information about the frequency of its left and right neighbors (computed from the corpus).
  • Points mapped to the same region in the representation space share similar phonetic features, though points with the same label may originate from different occurrences of the same phoneme.

7 Conclusions

  • The authors introduced a coherent framework for dimensionality reduction for the case where data reside on a low-dimensional manifold embedded in a higher-dimensional space.
  • They do not in general provide an isometric embedding.
  • It is unclear how to estimate reliably even such a simple invariant as the intrinsic dimensionality of the manifold.
  • There are further issues pertaining to their framework that need to be sorted out.
  • First, the authors have implicitly assumed a uniform probability distribution on the manifold according to which the data points have been sampled.

Did you find this useful? Give us your feedback

Figures (7)

Content maybe subject to copyright    Report

LETTER Communicated by Joshua B. Tenenbaum
Laplacian Eigenmaps for Dimensionality Reduction and Data
Representation
Mikhail Belkin
misha@math.uchicago.edu
Department of Mathematics, University of Chicago, Chicago, IL 60637, U.S.A.
Partha Niyogi
niyogi@cs.uchicago.edu
Department of Computer Science and Statistics, University of Chicago,
Chicago, IL 60637 U.S.A.
One of the central problems in machine learning and pattern recognition
is to develop appropriate representations for complex data. We consider
the problem of constructing a representation for data lying on a low-
dimensional manifold embedded in a high-dimensional space. Drawing
on the correspondence between the graph Laplacian, the Laplace Beltrami
operator on the manifold, and the connections to the heat equation, we
propose a geometrically motivated algorithm for representing the high-
dimensional data. The algorithm provides a computationally efficient ap-
proach to nonlinear dimensionality reduction that has locality-preserving
properties and a natural connection to clustering. Some potential appli-
cations and illustrative examples are discussed.
1 Introduction
In many areas of artificial intelligence, information retrieval, and data min-
ing, one is often confronted with intrinsically low-dimensional data lying in
a very high-dimensional space. Consider, for example, gray-scale images of
an object taken under fixed lighting conditions with a moving camera. Each
such image would typically be represented by a brightness value at each
pixel. If there were n
2
pixels in all (corresponding to an n × n image), then
each image yields a data point in
R
n
2
. However, the intrinsic dimensional-
ity of the space of all images of the same object is the number of degrees of
freedom of the camera. In this case, the space under consideration has the
natural structure of a low-dimensional manifold embedded in
R
n
2
.
Recently, there has been some renewed interest (Tenenbaum, de Silva,
& Langford, 2000; Roweis & Saul, 2000) in the problem of developing low-
dimensional representations when data arise from sampling a probabil-
ity distribution on a manifold. In this letter, we present a geometrically
Neural Computation 15, 1373–1396 (2003)
c
2003 Massachusetts Institute of Technology

1374 M. Belkin and P. Niyogi
motivated algorithm and an accompanying framework of analysis for this
problem.
The general problem of dimensionality reduction has a long history. Clas-
sical approaches include principal components analysis (PCA) and multi-
dimensional scaling. Various methods that generate nonlinear maps have
also been considered. Most of them, such as self-organizing maps and other
neural network–based approaches (e.g., Haykin, 1999), set up a nonlin-
ear optimization problem whose solution is typically obtained by gradient
descent that is guaranteed only to produce a local optimum; global op-
tima are difficult to attain by efficient means. Note, however, that the re-
cent approach of generalizing the PCA through kernel-based techniques
(Sch¨olkopf, Smola, & M¨uller, 1998) does not have this shortcoming. Most of
these methods do not explicitly consider the structure of the manifold on
which the data may possibly reside.
In this letter, we explore an approach that builds a graph incorporating
neighborhood information of the data set. Using the notion of the Laplacian
of the graph, we then compute a low-dimensional representation of the data
set that optimally preserves local neighborhood information in a certain
sense. The representation map generated by the algorithm may be viewed
as a discrete approximation to a continuous map that naturally arises from
the geometry of the manifold.
It is worthwhile to highlight several aspects of the algorithm and the
framework of analysis presented here:
The core algorithm is very simple. It has a few local computations and
one sparse eigenvalue problem. The solution reflects the intrinsic geo-
metric structure of the manifold. It does, however, require a search for
neighboring points in a high-dimensional space. We note that there are
several efficient approximate techniques for finding nearest neighbors
(e.g., Indyk, 2000).
The justification for the algorithm comes from the role of the Laplace
Beltrami operator in providing an optimal embedding for the mani-
fold. The manifold is approximated by the adjacency graph computed
from the data points. The Laplace Beltrami operator is approximated
by the weighted Laplacian of the adjacency graph with weights cho-
sen appropriately. The key role of the Laplace Beltrami operator in the
heat equation enables us to use the heat kernel to choose the weight
decay function in a principled manner. Thus, the embedding maps for
the data approximate the eigenmaps of the Laplace Beltrami operator,
which are maps intrinsically defined on the entire manifold.
The framework of analysis presented here makes explicit use of these
connections to interpret dimensionality-reduction algorithms in a ge-
ometric fashion. In addition to the algorithms presented in this letter,
we are also able to reinterpret the recently proposed locally linear em-

Laplacian Eigenmaps 1375
bedding (LLE) algorithm of Roweis and Saul (2000) within this frame-
work.
The graph Laplacian has been widely used for different clustering and
partition problems (Shi & Malik, 1997; Simon, 1991; Ng, Jordan, &
Weiss, 2002). Although the connections between the Laplace Beltrami
operator and the graph Laplacian are well known to geometers and
specialists in spectral graph theory (Chung, 1997; Chung, Grigor’yan,
& Yau, 2000), so far we are not aware of any application to dimen-
sionality reduction or data representation. We note, however, recent
work on using diffusion kernels on graphs and other discrete struc-
tures (Kondor & Lafferty, 2002).
The locality-preserving character of the Laplacian eigenmap algorithm
makes it relatively insensitive to outliers and noise. It is also not prone
to short circuiting, as only the local distances are used. We show that by
trying to preserve local information in the embedding, the algorithm
implicitly emphasizes the natural clusters in the data. Close connec-
tions to spectral clustering algorithms developed in learning and com-
puter vision (in particular, the approach of Shi & Malik, 1997) then
become very clear. In this sense, dimensionality reduction and cluster-
ing are two sides of the same coin, and we explore this connection in
some detail. In contrast, global methods like that in Tenenbaum et al.
(2000), do not show any tendency to cluster, as an attempt is made to
preserve all pairwise geodesic distances between points.
However, not all data sets necessarily have meaningful clusters. Other
methods such as PCA or Isomap might be more appropriate in that
case. We will demonstate, however, that at least in one example of such
a data set ( the “swiss roll”), our method produces reasonable results.
Since much of the discussion of Seung and Lee (2000), Roweis and
Saul (2000), and Tenenbaum et al. (2000) is motivated by the role that
nonlinear dimensionality reduction may play in human perception
and learning, it is worthwhile to consider the implication of the pre-
vious remark in this context. The biological perceptual apparatus is
confronted with high-dimensional stimuli from which it must recover
low-dimensional structure. If the approach to recovering such low-
dimensional structure is inherently local (as in the algorithm proposed
here), then a natural clustering will emerge and may serve as the basis
for the emergence of categories in biological perception.
Since our approach is based on the intrinsic geometric structure of the
manifold, it exhibits stability with respect to the embedding. As long
as the embedding is isometric, the representation will not change. In
the example with the moving camera, different resolutions of the cam-
era (i.e., different choices of n in the n × n image grid) should lead to
embeddings of the same underlying manifold into spaces of very dif-

1376 M. Belkin and P. Niyogi
ferent dimension. Our algorithm will produce similar representations
independent of the resolution.
The generic problem of dimensionality reduction is the following. Given
a set x
1
,...,x
k
of k points in
R
l
, find a set of points y
1
,...,y
k
in
R
m
(m l)
such that y
i
“represents” x
i
. In this letter, we consider the special case where
x
1
,...,x
k
M and M is a manifold embedded in R
l
.
We now consider an algorithm to construct representative y
i
’s for this
special case. The sense in which such a representation is optimal will become
clear later in this letter.
2 The Algorithm
Given k points x
1
,...,x
k
in R
l
, we construct a weighted graph with k nodes,
one for each point, and a set of edges connecting neighboring points. The
embedding map is now provided by computing the eigenvectors of the
graph Laplacian. The algorithmic procedure is formally stated below.
1. Step 1 (constructing the adjacency graph). We put an edge between
nodes i and j if x
i
and x
j
are “close.” There are two variations:
(a) -neighborhoods (parameter
R). Nodes i and j are con-
nected by an edge if x
i
x
j
2
<where the norm is the usual
Euclidean norm in
R
l
. Advantages: Geometrically motivated,
the relationship is naturally symmetric. Disadvantages: Often
leads to graphs with several connected components, difficult
to choose .
(b) n nearest neighbors (parameter n
N). Nodes i and j are con-
nected by an edge if i is among n nearest neighbors of j or j is
among n nearest neighbors of i. Note that this relation is sym-
metric. Advantages: Easier to choose; does not tend to lead to
disconnected graphs. Disadvantages: Less geometrically intu-
itive.
2. Step 2 (choosing the weights).
1
Here as well, we have two variations
for weighting the edges:
(a) Heat kernel (parameter t
R). If nodes i and j are connected,
put
W
ij
= e
x
i
x
j
2
t
;
otherwise, put W
ij
= 0. The justification for this choice of
weights will be provided later.
1
In a computer implementation of the algorithm, steps 1 and 2 are executed
simultaneously.

Laplacian Eigenmaps 1377
(b) Simple-minded (no parameters (t =∞)). W
ij
= 1 if vertices i
and j are connected by an edge and W
ij
= 0 if vertices i and
j are not connected by an edge. This simplification avoids the
need to choose t.
3. Step 3 (eigenmaps). Assume the graph G, constructed above, is con-
nected. Otherwise, proceed with step 3 for each connected component.
Compute eigenvalues and eigenvectors for the generalized eigenvec-
tor problem,
Lf = λDf, (2.1)
where D is diagonal weight matrix, and its entries are column (or
row, since W is symmetric) sums of W, D
ii
=
j
W
ji
. L = D W is
the Laplacian matrix. Laplacian is a symmetric, positive semidefinite
matrix that can be thought of as an operator on functions defined on
vertices of G.
Let f
0
,...,f
k1
be the solutions of equation 2.1, ordered according
to their eigenvalues:
Lf
0
= λ
0
Df
0
Lf
1
= λ
1
Df
1
···
Lf
k1
= λ
k1
Df
k1
0 = λ
0
λ
1
≤···≤λ
k1
.
We leave out the eigenvector f
0
corresponding to eigenvalue 0 and use
the next m eigenvectors for embedding in m-dimensional Euclidean
space:
x
i
(f
1
(i),...,f
m
(i)).
3 Justification
3.1 Optimal Embeddings. Let us first show that the embedding pro-
vided by the Laplacian eigenmap algorithm preserves local information
optimally in a certain sense.
The following section is based on standard spectral graph theory. (See
Chung, 1997, for a comprehensive reference.)
Recall that given a data set, we construct a weighted graph G = (V, E)
with edges connecting nearby points to each other. For the purposes of
this discussion, assume the graph is connected. Consider the problem of
mapping the weighted graph G to a line so that connected points stay as close
together as possible. Let y = (y
1
, y
2
,...,y
n
)
T
be such a map. A reasonable

Citations
More filters
Journal ArticleDOI
TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Abstract: The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.

11,201 citations

Journal ArticleDOI
TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.
Abstract: In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

9,141 citations

Journal ArticleDOI
TL;DR: In this paper, the authors prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the e1 norm.
Abstract: This article is about a curious phenomenon. Suppose we have a data matrix, which is the superposition of a low-rank component and a sparse component. Can we recover each component individuallyq We prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the e1 norm. This suggests the possibility of a principled approach to robust principal component analysis since our methodology and results assert that one can recover the principal components of a data matrix even though a positive fraction of its entries are arbitrarily corrupted. This extends to the situation where a fraction of the entries are missing as well. We discuss an algorithm for solving this optimization problem, and present applications in the area of video surveillance, where our methodology allows for the detection of objects in a cluttered background, and in the area of face recognition, where it offers a principled way of removing shadows and specularities in images of faces.

6,783 citations

Posted Content
TL;DR: The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance.
Abstract: UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology The result is a practical scalable algorithm that applies to real world data The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance Furthermore, UMAP has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning

5,390 citations

References
More filters
Book
16 Jul 1998
TL;DR: Thorough, well-organized, and completely up to date, this book examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks.
Abstract: From the Publisher: This book represents the most comprehensive treatment available of neural networks from an engineering perspective. Thorough, well-organized, and completely up to date, it examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks. Written in a concise and fluid manner, by a foremost engineering textbook author, to make the material more accessible, this book is ideal for professional engineers and graduate students entering this exciting field. Computer experiments, problems, worked examples, a bibliography, photographs, and illustrations reinforce key concepts.

29,130 citations

Journal ArticleDOI
22 Dec 2000-Science
TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Abstract: Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction: how to discover compact representations of high-dimensional data. Here, we introduce locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs. Unlike clustering methods for local dimensionality reduction, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima. By exploiting the local symmetries of linear reconstructions, LLE is able to learn the global structure of nonlinear manifolds, such as those generated by images of faces or documents of text.

15,106 citations

Journal ArticleDOI
22 Dec 2000-Science
TL;DR: An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
Abstract: Scientists working with large volumes of high-dimensional data, such as global climate patterns, stellar spectra, or human gene distributions, regularly confront the problem of dimensionality reduction: finding meaningful low-dimensional structures hidden in their high-dimensional observations. The human brain confronts the same problem in everyday perception, extracting from its high-dimensional sensory inputs-30,000 auditory nerve fibers or 10(6) optic nerve fibers-a manageably small number of perceptually relevant features. Here we describe an approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set. Unlike classical techniques such as principal component analysis (PCA) and multidimensional scaling (MDS), our approach is capable of discovering the nonlinear degrees of freedom that underlie complex natural observations, such as human handwriting or images of a face under different viewing conditions. In contrast to previous algorithms for nonlinear dimensionality reduction, ours efficiently computes a globally optimal solution, and, for an important class of data manifolds, is guaranteed to converge asymptotically to the true structure.

13,652 citations

Proceedings ArticleDOI
17 Jun 1997
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Abstract: We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We have applied this approach to segmenting static images and found results very encouraging.

11,827 citations

Proceedings Article
03 Jan 2001
TL;DR: A simple spectral clustering algorithm that can be implemented using a few lines of Matlab is presented, and tools from matrix perturbation theory are used to analyze the algorithm, and give conditions under which it can be expected to do well.
Abstract: Despite many empirical successes of spectral clustering methods— algorithms that cluster points using eigenvectors of matrices derived from the data—there are several unresolved issues. First. there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems.

9,043 citations

Frequently Asked Questions (13)
Q1. What are the contributions mentioned in the paper "Laplacian eigenmaps for dimensionality reduction and data representation" ?

The authors consider the problem of constructing a representation for data lying on a lowdimensional manifold embedded in a high-dimensional space. Drawing on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold, and the connections to the heat equation, the authors propose a geometrically motivated algorithm for representing the highdimensional data. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality-preserving properties and a natural connection to clustering. Some potential applications and illustrative examples are discussed. 

There are further issues pertaining to their framework that need to be sorted out. 

If the manifold is embedded in Rl, the Riemannian structure (metric tensor) on the manifold is induced by the standard Riemannian structure on Rl. 

the intrinsic dimensionality of the space of all images of the same object is the number of degrees of freedom of the camera. 

One approach to nonlinear dimensionality reduction as exemplified by Tenenbaum et al. attempts to approximate all geodesic distances on the manifold faithfully. 

Let the eigenvalues (in increasing order) be 0 = λ0 ≤ λ1 ≤ λ2 ≤ . . . , and let fi be the eigenfunction corresponding to eigenvalue λi. 

Given k points x1, . . . , xk in Rl, the authors construct a weighted graph with k nodes, one for each point, and a set of edges connecting neighboring points. 

The representation map generated by the algorithm may be viewed as a discrete approximation to a continuous map that naturally arises from the geometry of the manifold. 

Since much of the discussion of Seung and Lee (2000), Roweis and Saul (2000), and Tenenbaum et al. (2000) is motivated by the role that nonlinear dimensionality reduction may play in human perception and learning, it is worthwhile to consider the implication of the previous remark in this context. 

For the m-dimensional embedding problem, the constraint presented above prevents collapse onto a subspace of dimension less than m−1 (m if, as in one-dimensional case, the authors require orthogonality to the constant vector). 

The celebrated Nash’s embedding theorem (Nash, 1954) guarantees that an n-dimensional manifold admits an isometric C1 embedding into a 2n+1–dimensional Euclidean space. 

Each word is represented as a vector in a 600-dimensional space using information about the frequency of its left and right neighbors (computed from the corpus). 

It follows from equation 3.1 that L is a positive semidefinite matrix, and the vector y that minimizes the objective function is given by the minimum eigenvalue solution to the generalized eigenvalue problem: