scispace - formally typeset
Open AccessProceedings ArticleDOI

Sparse subspace clustering

TLDR
This work proposes a method based on sparse representation (SR) to cluster data drawn from multiple low-dimensional linear or affine subspaces embedded in a high-dimensional space and applies this method to the problem of segmenting multiple motions in video.
Abstract
We propose a method based on sparse representation (SR) to cluster data drawn from multiple low-dimensional linear or affine subspaces embedded in a high-dimensional space. Our method is based on the fact that each point in a union of subspaces has a SR with respect to a dictionary formed by all other data points. In general, finding such a SR is NP hard. Our key contribution is to show that, under mild assumptions, the SR can be obtained `exactly' by using l1 optimization. The segmentation of the data is obtained by applying spectral clustering to a similarity matrix built from this SR. Our method can handle noise, outliers as well as missing data. We apply our subspace clustering algorithm to the problem of segmenting multiple motions in video. Experiments on 167 video sequences show that our approach significantly outperforms state-of-the-art methods.

read more

Content maybe subject to copyright    Report

Sparse Subspace Clustering
Ehsan Elhamifar Ren
´
e Vidal
Center for Imaging Science, Johns Hopkins University, Baltimore MD 21218, USA
Abstract
We propose a method based on sparse representation
(SR) to cluster data drawn from multiple low-dimensional
linear or affine subspaces embedded in a high-dimensional
space. Our method is based on the fact that each point in
a union of subspaces has a SR with respect to a dictionary
formed by all other data points. In general, finding such a
SR is NP hard. Our key contribution is to show that, under
mild assumptions, the SR can be obtained ’exactly’ by using
!
1
optimization. The segmentation of the data is obtained by
applying spectral clustering to a similarity matrix built from
this SR. Our method can handle noise, outliers as well as
missing data. We apply our subspace clustering algorithm
to the problem of segmenting multiple motions in video. Ex-
periments on 167 video sequences show that our approach
significantly outperforms state-of-the-art methods.
1. Introduction
Subspace clustering is an important problem with nu-
merous applications in image processing, e.g. image rep-
resentation and compression [15, 29], and computer vision,
e.g. image/motion/video segmentation [6, 16, 30, 28, 26].
Given a set of points drawn from a union of subspaces, the
task is to find the number of subspaces, their dimensions, a
basis for each subspace, and the segmentation of the data.
Prior work on subspace clustering. Existing works on
subspace clustering can be divided into six main categories:
iterative, statistical, factorization-based, spectral clustering,
algebraic and information-theoretic approaches. Iterative
approaches, such as K-subspaces [14], alternate between as-
signing points to subspaces, and fitting a subspace to each
cluster. Statistical approaches, such as Mixtures of Proba-
bilistic PCA (MPPCA) [24], Multi-Stage Learning (MSL)
[22], or [13], assume that the distribution of the data in-
side each subspace is Gaussian and alternate between data
clustering and subspace estimation by applying Expecta-
tion Maximization (EM) to a mixture of probabilistic PCAs.
The main drawbacks of both approaches are that they gen-
erally require the number and dimensions of the subspaces
to be known, and that they are sensitive to correct initializa-
tion. Robust methods, such as Random Sample Consensus
(RANSAC) [11], fit a subspace of dimension d to randomly
chosen subsets of d points until the number of inliers is large
enough. The inliers are then removed, and the process is
repeated to find a second subspace, and so on. RANSAC
can deal with noise and outliers, and does need to know the
number of subspaces. However, the dimensions of the sub-
spaces must be known and equal, and the number of trials
needed to find d points in the same subspace grows expo-
nentially with the number and dimension of the subspaces.
Factorization-based methods [6, 12, 16] find an initial
segmentation by thresholding the entries of a similarity
matrix built from the factorization of the matrix of data
points. Such methods are provably correct when the sub-
spaces are independent, but fail when this assumption is vi-
olated. Also, these methods are sensitive to noise. Spectral-
clustering methods [30, 10, 28] deal with these issues by
using local information around each point to build a simi-
larity between pairs of points. The segmentation of the data
is then obtained by applying spectral clustering to this sim-
ilarity matrix. These methods have difficulties dealing with
points near the intersection of two subspaces, because the
neighborhood of a point can contain points from different
subspaces. This issue can be resolved by looking at multi-
way similarities that capture the curvature of a collection of
points within an affine subspace [5]. However, the complex-
ity of building a multi-way similarity grows exponentially
with the number of subspaces and their dimensions.
Algebraic methods, such as Generalized Principal Com-
ponent Analysis (GPCA) [25, 18], fit the data with a polyno-
mial whose gradient at a point gives a vector normal to the
subspace containing that point. Subspace clustering is then
equivalent to fitting and differentiating polynomials. GPCA
can deal with subspaces of different dimensions, and does
not impose any restriction on the relative orientation of the
subspaces. However, GPCA is sensitive to noise and out-
liers, and its complexity increases exponentially with the
number of subspaces and their dimensions. Information-
theoretic approaches, such as Agglomerative Lossy Com-
pression (ALC) [17], model each subspace with a degen-
erate Gaussian, and look for the segmentation of the data
that minimizes the coding length needed to fit these points
with a mixture of Gaussians. As this minimization problem
1

is NP hard, a suboptimal solution is found by first assuming
that each point forms its own group, and then iterative merg-
ing pairs of groups to reduce the coding length. ALC can
handle noise and outliers in the data, and can estimate the
number of subspaces and their dimensions. However, there
is no theoretical proof for the optimality of the algorithm.
Paper contributions. In this paper, we propose a com-
pletely different approach to subspace clustering based on
sparse representation. Sparse representation of signals has
attracted a lot of attention during the last decade, especially
in the signal and image processing communities (see §2 for
a brief review). However, its application to computer vi-
sion problems is fairly recent. [21] uses !
1
optimization to
deal with missing or corrupted data in motion segmenta-
tion. [20] uses sparse representation for restoration of color
images. [27] uses !
1
minimization for recognizing human
faces from frontal views with varying expression and illu-
mination as well as occlusion. [19] uses a sparse represen-
tation to learn a dictionary for object recognition.
Our work is the first one to directly use the sparse repre-
sentation of vectors lying on a union of subspaces to cluster
the data into separate subspaces. We exploit the fact that
each data point in a union of subspaces can always be writ-
ten as a linear or affine combination of all other points. By
searching for the sparsest combination, we automatically
obtain other points lying in the same subspace. This allows
us to build a similarity matrix, from which the segmentation
of the data can be easily obtained using spectral clustering.
Our work has numerous advantages over the state of the art.
Our sparse representation approach resolves the exponen-
tial complexity issue of methods such as RANSAC, spec-
tral clustering, and GPCA. While in principle finding the
sparsest representation is also an NP hard problem, we show
that under mild assumptions on the distribution of data on
the subspaces, the sparsest representation can be found effi-
ciently by solving a (convex) !
1
optimization problem.
Our work extends sparse representation work from one to
multiple subspaces. As we will see in §2, most of the sparse
representation literature assumes that the data lies in a single
linear subspace [1, 4, 7]. The work of [9] is the first one to
address the case of multiple subspaces, under the assump-
tion that a sparsifying basis for each subspace is known.
Our case is more challenging, because we do not have any
basis for any of the subspaces nor do we know which data
belong to which subspace. We only have the sparsifying
basis for the union of subspaces given by the data matrix.
Our work requires no initialization, can deal with both
linear and affine subspaces, can handle data points near the
intersections, noise, outliers, and missing data.
Last, but not least, our method significantly outperforms
existing motion segmentation algorithms on 167 sequences.
2. Sparse representation and compressed sensing
Compressed sensing (CS) is based on the idea that many
signals or vectors can have a concise representation when
expressed in a proper basis. So, the information rate of
a sparse signal is usually much smaller than the rate sug-
gested by its maximum frequency. In this section, we re-
view recently developed techniques from CS for sparsely
representing signals lying in one or more subspaces.
2.1. Sparse representation in a single subspace
Consider a vector x in R
D
, which can be represented in
a basis of D vectors {ψ
i
R
D
}
D
i=1
. If we form the basis
matrix Ψ =[ψ
1
, ψ
2
, · · · , ψ
D
], we can write x as:
x =
D
!
i=1
s
i
ψ
i
= Ψs (1)
where s =[s
1
,s
2
, . . . , s
D
]
!
. Both x and s represent the
same signal, one in the space domain and the other in the
Ψ domain. However, in many cases x can have a sparse
representation in a properly chosen basis Ψ. We say that
x is K-sparse if it is a linear combination of at most K
basis vectors in Ψ, i.e. if at most K of the coefficients are
nonzero. In practice, the signal is K-sparse when it has at
most K large nonzero coefficients and the remaining coef-
ficients are very small. We are in general interested in the
case where K " D.
Assume now that we do not measure x directly. Instead,
we measure m linear combinations of entries of x of the
form y
i
= φ
!
i
x for i {1, 2, · · · ,m}. We thus have
y =[y
1
,y
2
, · · · ,y
m
]
!
= Φx = ΦΨs = A s , (2)
where Φ =[φ
1
, φ
2
, · · · , φ
m
]
!
R
m×D
is called the
measurement matrix. The works of [1, 4, 7] show that, given
m measurements, one can recover K-sparse signals/vectors
if K ! m/ log(D/m). In principle, such a sparse represen-
tation can be obtained by solving the optimization problem:
min #s#
0
subject to y = As, (3)
where #s#
0
is the !
0
norm of s, i.e. the number of nonzero
elements. However, such an optimization problem is in gen-
eral non-convex and NP-hard. This has motivated the devel-
opment of several methods for efficiently extracting a sparse
representation of signals/vectors. One of the well-known
methods is the Basis Pursuit (BP) algorithm, which replaces
the non-convex optimization in (3) by the following convex
!
1
optimization problem [7]:
min #s#
1
subject to y = As. (4)
The works of [3, 2] show that we can recover perfectly a K-
sparse signal/vector by using the Basis Pursuit algorithm in
(4) under certain conditions on the so-called isometry con-
stant of the A matrix.

2.2. Sparse representation in a union of subspaces
Most of the work on CS deals with sparse representation
of signals/vectors lying in a single low-dimensional linear
subspace. The more general case where the signals/vectors
lie in a union of low-dimensional linear subspaces was only
recently considered. The work of Eldar [9] shows that when
the subspaces are disjoint (intersect only at the origin), a ba-
sis for each subspace is known, and certain condition on
a modified isometry constant holds, one can recover the
block-sparse vector s exactly by solving an !
1
/!
2
optimiza-
tion problem.
More precisely, let {A
i
R
D×d
i
}
n
i=1
be a set of bases
for n disjoint linear subspaces embedded in R
D
with di-
mensions {d
i
}
n
i=1
. If y belongs to the i-th subspace, we
can represent it as the sparse solution of
y = As =[A
1
,A
2
, · · · ,A
n
][s
!
1
, s
!
2
, · · · , s
!
n
]
!
, (5)
where s
i
R
d
i
is a nonzero vector and all other vectors
{s
j
R
d
j
}
j#=i
are zero. Therefore, s is the solution to the
following non-convex optimization problem:
min
n
!
i=1
1(#s
i
#
2
> 0) subject to y = As, (6)
where 1(#s
i
#
2
> 0) is an indicator function that takes the
value 1 when #s
i
#
2
> 0 and zero otherwise. [9] shows that
if a modified isometry constant satisfies a certain condition,
then the solution to the (convex) !
2
/!
1
program
min
n
!
i=1
#s
i
#
2
subject to y = As (7)
coincides with that of (6).
In this paper, we address the problem of clustering data
lying in multiple linear or affine subspaces. This subspace
clustering problem is more challenging, because the sub-
space bases {A
i
}
n
i=1
and the subspace dimensions {d
i
}
n
i=1
are unknown, and hence we do not know a priori which data
points belong to which subspace. To the best of our knowl-
edge, our work is the first one to use sparse representation
techniques to address the subspace clustering problem.
3. Subspace clustering via sparse representation
In this section, we consider the problem of clustering a
collection of data points drawn from a union of subspaces
using sparse representation. First we consider the case
where all subspaces are linear and then we extend our re-
sult to the more general case of affine subspaces.
3.1. Clustering linear subspaces
Let {y
j
R
D
}
N
j=1
be a collection of data points drawn
from a union of n independent
1
linear subspaces {S
i
}
n
i=1
.
Let {d
i
" D}
n
i=1
and {A
i
R
D×d
i
}
n
i=1
be, respec-
tively, the unknown dimensions and bases for the n sub-
spaces. Let Y
i
R
D×N
i
be the collection of N
i
data points
drawn from subspace i. Since we do not know which points
belong to which subspace, our data matrix is of the form
Y =[y
1
, y
2
, · · · , y
N
]=[Y
1
,Y
2
, · · · ,Y
n
] Γ R
D×N
,
where N =
"
n
i=1
N
i
and Γ R
N×N
is an unknown per-
mutation matrix that specifies the segmentation of data.
Although we do not know the subspace bases, we know
that such bases can be chosen from the columns of the data
matrix Y . In fact, if we assume that there are enough data
points from each linear subspace, N
i
d
i
, and that these
data points are in general positions, meaning that no d
i
points from subspace i live in a (d
i
1)-dimensional sub-
space, then the collection of data points is self-expressive.
This means that if y is a new data point in S
i
, then it can be
represented as a linear combination of d
i
points in the same
subspace. Thus if we let s = Γ
1
[s
!
1
, s
!
2
, · · · , s
!
n
]
!
R
N
, where s
i
R
N
i
, then y has a d
i
-sparse representation,
which can be recovered as a sparse solution of y = Y s,
with s
i
&=0and s
j
=0for all j &= i. That is, s is a solution
of the following non-convex optimization problem
min #s#
0
subject to y = Y s (8)
which is an NP-hard problem to solve.
2
The following theorem shows that when the subspaces
are independent
1
, the !
1
optimization problem
min #s#
1
subject to y = Y s (9)
gives block sparse solutions with the nonzero block corre-
sponding to points in the same subspace as y.
Theorem 1 Let Y R
D×N
be a matrix whose columns
are drawn from a union of n independent linear subspaces.
Assume that the points within each subspace are in general
position. Let y be a new point in subspace i. The solution
to the !
1
problem in (9) s = Γ
1
[s
!
1
, s
!
2
, · · · , s
!
n
]
!
R
N
is block sparse, i.e. s
i
&=0and s
j
=0for all j &= i.
Proof. Let s be any sparse representation of the data point
y S
i
, i.e. y = Y s with s
i
&=0and s
j
=0for all j &= i.
Since the points in each subspace are in general positions,
such a sparse representation exists. Now, if s
is a solution
of the !
1
program in (9), then s
is a vector of minimum
1
A collection of n linear subspaces {S
i
R
D
}
n
i=1
are independent if
dim(
n
i=1
S
i
)=
n
i=1
dim(S
i
), where is the direct sum.
2
Notice that our optimization problem in (8) is different from the one in
(6), because we do not know the subspace basis or the permutation matrix
Γ, and hence we cannot enforce that s
j
=0for j #= i whenever s
i
#=0.

!
1
norm satisfying y = Y s
. Let h = s
s denote the
error between the optimal solution and our sparse solution.
Then, we can write h as the sum of two vectors h
i
and
h
i
c
supported on disjoint subsets of indices: h
i
represents
the error for the corresponding points in subspace i and h
i
c
the error for the corresponding points in other subspaces.
We now show that h
i
c
=0. For the sake of contradiction,
assume that h
i
c
&=0. Since s
= s + h
i
+ h
i
c
, we have
that y = Y s
= Y (s + h
i
)+Y h
i
c
. Also, since y S
i
,
Y (s + h
i
) S
i
, and from the independence assumption
Y h
i
c
/ S
i
, we have that Y h
i
c
=0. This implies that
y = Y s
= Y (s + h
i
).
Now, from the fact that h
i
and h
i
c
are supported on disjoint
subset of indices, we have #s + h
i
#
1
< #s + h
i
+ h
i
c
#
1
=
#s
#
1
. In other words, s + h
i
is a feasible solution for
the !
1
program in (9) whose !
1
norm is smaller than that
of the optimal solution. This contradicts the optimality of
the solution s
. Thus we must always have s
i
c
= s
i
c
=0,
meaning that only the block corresponding to the points in
the true subspace can have nonzero entries.
Theorem 1 gives sufficient conditions on subspaces and
the data matrix in order to be able to recover a block sparse
representation of a new data point as a linear combination of
the points in the data matrix that are in the same subspace.
We now show how to use such a sparse representation for
clustering the data according to the multiple subspaces.
Let Y
ˆ
i
R
D×N 1
be the matrix obtained from Y by re-
moving its i-th column, y
i
. The circumflex notation
ˆ
i thus
means “not i”. According to Theorem 1, if y
i
belongs to
the j-th subspace, then it has a sparse representation with
respect to the basis matrix Y
ˆ
i
. Moreover, such a representa-
tion can be recovered by solving the following !
1
program
min #c
i
#
1
subject to y
i
= Y
ˆ
i
c
i
. (10)
The optimal solution c
i
R
N1
is a vector whose nonzero
entries correspond to points (columns) in Y
ˆ
i
that lie in the
same subspace as y
i
. Thus, by inserting a zero entry at the i-
th row of c
i
, we make it an N -dimensional vector,
ˆ
c
i
R
N
,
whose nonzero entries correspond to points in Y that lie in
the same subspace as y
i
.
After solving (10) at each point i =1, . . . , N , we obtain
a matrix of coefficients C =[
ˆ
c
1
,
ˆ
c
2
, · · · ,
ˆ
c
N
] R
N×N
.
We use this matrix to define a directed graph G =(V, E).
The vertices of the graph V are the N data points, and there
is an edge (v
i
,v
j
) E when the data point y
j
is one of the
vectors in the sparse representation of y
i
, i.e. when C
ji
&=0.
One can easily see that the adjacency matrix of the G is C.
In general G is an unbalanced digraph. To make it bal-
anced, we build a new graph
˜
G with the adjacency matrix
˜
C
where
˜
C
ij
= |C
ij
| + |C
ji
|.
˜
C is still a valid representation
of the similarity, because if y
i
can write itself as a linear
combination of some points including y
j
(all in the same
subspace), then y
j
can also write itself as a linear combina-
tion of some points in the same subspace including y
i
.
Having formed the similarity graph
˜
G, it follows from
Theorem 1 that all vertices representing the data points
in the same subspace form a connected component in the
graph, while the vertices representing points in different
subspaces have no edges between them. Therefore, in the
case of n subspaces,
˜
C has the following block diagonal
form
˜
C =
˜
C
1
0 · · · 0
0
˜
C
2
· · · 0
.
.
.
00 · · ·
˜
C
n
Γ (11)
where Γ is a permutation matrix. The Laplacian matrix of
˜
G is then formed by L = D
˜
C where D R
N×N
is a
diagonal matrix with D
ii
=
"
j
˜
C
ij
.
We use the following result from spectral graph theory
to infer the segmentation of the data by applying K-means
to a subset of eigenvectors of the Laplacian.
Proposition 1 The multiplicity of the zero eigenvalue of
the Laplacian matrix L corresponding to the graph
˜
G
is equal to the number of connected components of the
graph. Also, the components of the graph can be deter-
mined from the eigenspace of the zero eigenvalue. More
precisely, if the graph has n connected components, then
u
i
= [0, 0, . . . , 1
!
N
i
, 0, . . . , 0]Γ for i {1, 2, . . . , n} is the
i-th eigenvector of L corresponding to the zero eigenvalue
which means that the N
i
nonzero elements of u
i
belong to
the same group.
For data points drawn in general position from n inde-
pendent linear subspaces, the similarity graph
˜
G will have
n connected components. Therefore, when the number of
subspaces is unknown, we can estimate it as the number of
zero eigenvalues of L. In the case of real data with noise, we
have to consider a robust measure to determine the number
of eigenvalues of L close to zero.
3.2. Clustering affine subspaces
In many cases we need to cluster data lying in multiple
affine rather than linear subspaces. For instance, the motion
segmentation problem we will discuss in the next section in-
volves clustering data lying in multiple 3-dimensional affine
subspaces. However, most existing motion segmentation al-
gorithms deal with this problem by clustering the data as if
they belonged to multiple 4-dimensional linear subspaces.
In this section, we show that our method can easily han-
dle the case of affine subspaces by a simple modification
to the BP algorithm. The modified !
1
minimization is still a
convex optimization, which can be efficiently implemented.
More specifically, notice that in the case of affine subspace,

a point can no longer write itself as a linear combination of
points in the same subspace. However, we can still write a
point y as an affine combination of other points, i.e.
y = c
1
y
1
+ c
2
y
2
+ · · · + c
N
y
N
,
N
!
i=1
c
i
=1. (12)
Theorem 2 shows that one can recover the sparse represen-
tation of data points on an affine subspace by using the fol-
lowing modified Basis Pursuit algorithm
min #c#
1
subject to y = Y c and c
!
1 =1. (13)
Theorem 2 Let Y R
D×N
be a matrix whose columns
are drawn from a union of n independent
3
affine subspaces.
Assume that the points within each subspace are in general
position. Let y be a new point in subspace i. The solution to
the !
1
problem in (13), s = Γ
1
[s
!
1
, s
!
2
, · · · , s
!
n
]
!
R
N
is block sparse, i.e. s
i
&=0and s
j
=0 for all j &= i.
Proof. Analogous to that of Theorem 1.
Similar to what we did for linear subspaces, we can use
this result for clustering a collection of data points drawn
from n affine subspaces. Essentially, we solve the following
!
1
minimization problem for each data point y
i
min #c
i
#
1
subject to y
i
= Y
ˆ
i
c
i
and c
!
i
1 =1, (14)
and form the graph
˜
G from the sparse coefficients. We
then apply spectral clustering to the corresponding Lapla-
cian matrix in order to get the segmentation of data.
3.3. Subspace clustering with noisy data
Consider now the case where the data points drawn from
a collection of linear or affine subspaces are contaminated
with noise. More specifically, let
¯
y
i
= y
i
+ ζ
i
be the i-th
data point corrupted with noise ζ
i
bounded by #ζ
i
#
2
#.
In order to recover the sparse representation of
¯
y
i
, we can
look for the sparsest solution of
¯
y
i
= Y
ˆ
i
c
i
with an error of
at most #, i.e. #Y
ˆ
i
c
i
¯
y
i
#
2
#. We can find such a sparse
representation by solving the following problem
min #c
i
#
1
subject to #Y
ˆ
i
c
i
¯
y
i
#
2
#. (15)
However, in many situations we do not know the noise level
# beforehand. In such cases we can use the Lasso optimiza-
tion algorithm [23] to recover the sparse solution from
min #c
i
#
1
+ γ #Y
ˆ
i
c
i
¯
y
i
#
2
(16)
where γ > 0 is a constant. In the case data drawn from mul-
tiple affine subspaces and corrupted with noise, the sparse
representation can be obtained by solving the problem
min#c
i
#
1
subject to #Y
ˆ
i
c
i
¯
y
i
#
2
# and c
!
i
1 =1 (17)
3
A collection of affine subspaces is said to be independent if they are
independent as linear subspaces in homogeneous coordinates.
or the modified Lasso counterpart
min #c
i
#
1
+ γ #Y
ˆ
i
c
i
¯
y
i
#
2
subject to c
!
i
1 =1. (18)
Segmentation of the data into different subspaces then fol-
lows by applying spectral clustering to the Laplacian of
˜
G.
3.4. Clustering with missing or corrupted data
In practice, some of the entries of the data points may be
missing (incomplete data), or corrupted (outliers). In mo-
tion segmentation, for example, due to occlusions or lim-
itations of the tracker, we may loose some feature points
in some of the frames (missing entries), or the tracker may
loose track of some features, leading to gross errors. As
suggested in [21], we can fill in missing entries or correct
gross errors using sparse techniques. In this section, we
show that one can also cluster data points with missing or
corrupted entries using a sparse representation.
Let I
i
{1, . . . , D} denote the indices of missing en-
tries in y
i
R
D
. Let Y
ˆ
i
R
D×N 1
be obtained by elimi-
nating the vector y
i
from the i-th column of the data matrix
Y . We then form
˜
y
i
R
D|I
i
|
and
˜
Y
ˆ
i
R
D|I
i
|×N1
by
eliminating rows of y
i
and Y
ˆ
i
indexed by I
i
, respectively.
Assuming that
˜
Y
ˆ
i
is complete, we can find a sparse repre-
sentation, c
i
, for
˜
y
i
by solving the following problem
min #c
i
#
1
+ γ #
˜
Y
ˆ
i
c
i
˜
y
i
#
2
subject to c
!
i
1 =1. (19)
The missing entries of y
i
are then given by y
i
= Y
ˆ
i
c
i
.
Notice that this method for completion of missing data is
essentially the same as our method for computing the sparse
representation from complete data with noise in (18). Hence
we can use the sparse coefficient vectors {c
i
}
N
i=1
to build
the similarity graph and find the segmentation of data.
Assume now that a few entries of each data point are cor-
rupted. We can also use the sparse representation to correct
such entries. More precisely, let
˜
y
i
R
D
be a corrupted
vector obtained from
¯
y
i
= y
i
+ ζ
i
by adding a sparse error
vector e
i
R
D
as
˜
y
i
= y
i
+ ζ
i
+ e
i
. We can then write
˜
y
i
= Y
ˆ
i
c
i
+ e
i
=[Y
ˆ
i
I
D
]
)
c
i
e
i
*
+ ζ
i
, (20)
where the coefficient vector [c
!
i
, e
!
i
]
!
is still sparse, and
hence can be recovered from
min #
)
c
i
e
i
*
#
1
+γ #
˜
y
i
[Y
ˆ
i
I
D
]
)
c
i
e
i
*
#
2
subject to c
!
i
1 =1.
We can then recover the original vector without outliers as
y
i
= Y
ˆ
i
c
i
. As before, we can obtain the segmentation from
the sparse coefficients {c
i
}
N
i=1
using spectral clustering.
In summary, we have the following Sparse Subspace
Clustering (SSC) algorithm for clustering data drawn from
a collection of linear/affine subspaces, and corrupted by
noise, missing entries, and outliers.

Citations
More filters
Journal ArticleDOI

Robust Recovery of Subspace Structures by Low-Rank Representation

TL;DR: It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: When the data is clean, LRR exactly recovers the true subspace structures; when the data are contaminated by outliers, it is proved that under certain conditions LRR can exactly recover the row space of the original data.
Journal ArticleDOI

Sparse Subspace Clustering: Algorithm, Theory, and Applications

TL;DR: In this article, a sparse subspace clustering algorithm is proposed to cluster high-dimensional data points that lie in a union of low-dimensional subspaces, where a sparse representation corresponds to selecting a few points from the same subspace.
Journal ArticleDOI

Sparse Representation for Computer Vision and Pattern Recognition

TL;DR: This review paper highlights a few representative examples of how the interaction between sparse signal representation and computer vision can enrich both fields, and raises a number of open questions for further study.
Proceedings Article

Robust Subspace Segmentation by Low-Rank Representation

TL;DR: Both theoretical and experimental results show that low-rank representation is a promising tool for subspace segmentation from corrupted data.
Posted Content

Sparse Subspace Clustering: Algorithm, Theory, and Applications

TL;DR: This paper proposes and studies an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces, and demonstrates the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.
References
More filters
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Journal ArticleDOI

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.
Journal ArticleDOI

Robust Face Recognition via Sparse Representation

TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Journal ArticleDOI

Decoding by linear programming

TL;DR: F can be recovered exactly by solving a simple convex optimization problem (which one can recast as a linear program) and numerical experiments suggest that this recovery procedure works unreasonably well; f is recovered exactly even in situations where a significant fraction of the output is corrupted.
Journal ArticleDOI

Stable signal recovery from incomplete and inaccurate measurements

TL;DR: In this paper, the authors considered the problem of recovering a vector x ∈ R^m from incomplete and contaminated observations y = Ax ∈ e + e, where e is an error term.
Related Papers (5)