Sparse subspace clustering

doi:10.1109/CVPR.2009.5206547

Sparse Subspace Clustering

Ehsan Elhamifar Ren

´

e Vidal

Center for Imaging Science, Johns Hopkins University, Baltimore MD 21218, USA

Abstract

We propose a method based on sparse representation

(SR) to cluster data drawn from multiple low-dimensional

linear or afﬁne subspaces embedded in a high-dimensional

space. Our method is based on the fact that each point in

a union of subspaces has a SR with respect to a dictionary

formed by all other data points. In general, ﬁnding such a

SR is NP hard. Our key contribution is to show that, under

mild assumptions, the SR can be obtained ’exactly’ by using

!

1

optimization. The segmentation of the data is obtained by

applying spectral clustering to a similarity matrix built from

this SR. Our method can handle noise, outliers as well as

missing data. We apply our subspace clustering algorithm

to the problem of segmenting multiple motions in video. Ex-

periments on 167 video sequences show that our approach

signiﬁcantly outperforms state-of-the-art methods.

1. Introduction

Subspace clustering is an important problem with nu-

merous applications in image processing, e.g. image rep-

resentation and compression [15, 29], and computer vision,

e.g. image/motion/video segmentation [6, 16, 30, 28, 26].

Given a set of points drawn from a union of subspaces, the

task is to ﬁnd the number of subspaces, their dimensions, a

basis for each subspace, and the segmentation of the data.

Prior work on subspace clustering. Existing works on

subspace clustering can be divided into six main categories:

iterative, statistical, factorization-based, spectral clustering,

algebraic and information-theoretic approaches. Iterative

approaches, such as K-subspaces [14], alternate between as-

signing points to subspaces, and ﬁtting a subspace to each

cluster. Statistical approaches, such as Mixtures of Proba-

bilistic PCA (MPPCA) [24], Multi-Stage Learning (MSL)

[22], or [13], assume that the distribution of the data in-

side each subspace is Gaussian and alternate between data

clustering and subspace estimation by applying Expecta-

tion Maximization (EM) to a mixture of probabilistic PCAs.

The main drawbacks of both approaches are that they gen-

erally require the number and dimensions of the subspaces

to be known, and that they are sensitive to correct initializa-

tion. Robust methods, such as Random Sample Consensus

(RANSAC) [11], ﬁt a subspace of dimension d to randomly

chosen subsets of d points until the number of inliers is large

enough. The inliers are then removed, and the process is

repeated to ﬁnd a second subspace, and so on. RANSAC

can deal with noise and outliers, and does need to know the

number of subspaces. However, the dimensions of the sub-

spaces must be known and equal, and the number of trials

needed to ﬁnd d points in the same subspace grows expo-

nentially with the number and dimension of the subspaces.

Factorization-based methods [6, 12, 16] ﬁnd an initial

segmentation by thresholding the entries of a similarity

matrix built from the factorization of the matrix of data

points. Such methods are provably correct when the sub-

spaces are independent, but fail when this assumption is vi-

olated. Also, these methods are sensitive to noise. Spectral-

clustering methods [30, 10, 28] deal with these issues by

using local information around each point to build a simi-

larity between pairs of points. The segmentation of the data

is then obtained by applying spectral clustering to this sim-

ilarity matrix. These methods have difﬁculties dealing with

points near the intersection of two subspaces, because the

neighborhood of a point can contain points from different

subspaces. This issue can be resolved by looking at multi-

way similarities that capture the curvature of a collection of

points within an afﬁne subspace [5]. However, the complex-

ity of building a multi-way similarity grows exponentially

with the number of subspaces and their dimensions.

Algebraic methods, such as Generalized Principal Com-

ponent Analysis (GPCA) [25, 18], ﬁt the data with a polyno-

mial whose gradient at a point gives a vector normal to the

subspace containing that point. Subspace clustering is then

equivalent to ﬁtting and differentiating polynomials. GPCA

can deal with subspaces of different dimensions, and does

not impose any restriction on the relative orientation of the

subspaces. However, GPCA is sensitive to noise and out-

liers, and its complexity increases exponentially with the

number of subspaces and their dimensions. Information-

theoretic approaches, such as Agglomerative Lossy Com-

pression (ALC) [17], model each subspace with a degen-

erate Gaussian, and look for the segmentation of the data

that minimizes the coding length needed to ﬁt these points

with a mixture of Gaussians. As this minimization problem

1

is NP hard, a suboptimal solution is found by ﬁrst assuming

that each point forms its own group, and then iterative merg-

ing pairs of groups to reduce the coding length. ALC can

handle noise and outliers in the data, and can estimate the

number of subspaces and their dimensions. However, there

is no theoretical proof for the optimality of the algorithm.

Paper contributions. In this paper, we propose a com-

pletely different approach to subspace clustering based on

sparse representation. Sparse representation of signals has

attracted a lot of attention during the last decade, especially

in the signal and image processing communities (see §2 for

a brief review). However, its application to computer vi-

sion problems is fairly recent. [21] uses !

1

optimization to

deal with missing or corrupted data in motion segmenta-

tion. [20] uses sparse representation for restoration of color

images. [27] uses !

1

minimization for recognizing human

faces from frontal views with varying expression and illu-

mination as well as occlusion. [19] uses a sparse represen-

tation to learn a dictionary for object recognition.

Our work is the ﬁrst one to directly use the sparse repre-

sentation of vectors lying on a union of subspaces to cluster

the data into separate subspaces. We exploit the fact that

each data point in a union of subspaces can always be writ-

ten as a linear or afﬁne combination of all other points. By

searching for the sparsest combination, we automatically

obtain other points lying in the same subspace. This allows

us to build a similarity matrix, from which the segmentation

of the data can be easily obtained using spectral clustering.

Our work has numerous advantages over the state of the art.

– Our sparse representation approach resolves the exponen-

tial complexity issue of methods such as RANSAC, spec-

tral clustering, and GPCA. While in principle ﬁnding the

sparsest representation is also an NP hard problem, we show

that under mild assumptions on the distribution of data on

the subspaces, the sparsest representation can be found efﬁ-

ciently by solving a (convex) !

1

optimization problem.

– Our work extends sparse representation work from one to

multiple subspaces. As we will see in §2, most of the sparse

representation literature assumes that the data lies in a single

linear subspace [1, 4, 7]. The work of [9] is the ﬁrst one to

address the case of multiple subspaces, under the assump-

tion that a sparsifying basis for each subspace is known.

Our case is more challenging, because we do not have any

basis for any of the subspaces nor do we know which data

belong to which subspace. We only have the sparsifying

basis for the union of subspaces given by the data matrix.

– Our work requires no initialization, can deal with both

linear and afﬁne subspaces, can handle data points near the

intersections, noise, outliers, and missing data.

– Last, but not least, our method signiﬁcantly outperforms

existing motion segmentation algorithms on 167 sequences.

2. Sparse representation and compressed sensing

Compressed sensing (CS) is based on the idea that many

signals or vectors can have a concise representation when

expressed in a proper basis. So, the information rate of

a sparse signal is usually much smaller than the rate sug-

gested by its maximum frequency. In this section, we re-

view recently developed techniques from CS for sparsely

representing signals lying in one or more subspaces.

2.1. Sparse representation in a single subspace

Consider a vector x in R

D

, which can be represented in

a basis of D vectors {ψ

i

∈ R

D

}

D

i=1

. If we form the basis

matrix Ψ =[ψ

1

, ψ

2

, · · · , ψ

D

], we can write x as:

x =

D

!

i=1

s

i

ψ

i

= Ψs (1)

where s =[s

1

,s

2

, . . . , s

D

]

!

. Both x and s represent the

same signal, one in the space domain and the other in the

Ψ domain. However, in many cases x can have a sparse

representation in a properly chosen basis Ψ. We say that

x is K-sparse if it is a linear combination of at most K

basis vectors in Ψ, i.e. if at most K of the coefﬁcients are

nonzero. In practice, the signal is K-sparse when it has at

most K large nonzero coefﬁcients and the remaining coef-

ﬁcients are very small. We are in general interested in the

case where K " D.

Assume now that we do not measure x directly. Instead,

we measure m linear combinations of entries of x of the

form y

i

= φ

!

i

x for i ∈ {1, 2, · · · ,m}. We thus have

y =[y

1

,y

2

, · · · ,y

m

]

!

= Φx = ΦΨs = A s , (2)

where Φ =[φ

1

, φ

2

, · · · , φ

m

]

!

∈ R

m×D

is called the

measurement matrix. The works of [1, 4, 7] show that, given

m measurements, one can recover K-sparse signals/vectors

if K ! m/ log(D/m). In principle, such a sparse represen-

tation can be obtained by solving the optimization problem:

min #s#

0

subject to y = As, (3)

where #s#

0

is the !

0

norm of s, i.e. the number of nonzero

elements. However, such an optimization problem is in gen-

eral non-convex and NP-hard. This has motivated the devel-

opment of several methods for efﬁciently extracting a sparse

representation of signals/vectors. One of the well-known

methods is the Basis Pursuit (BP) algorithm, which replaces

the non-convex optimization in (3) by the following convex

!

1

optimization problem [7]:

min #s#

1

subject to y = As. (4)

The works of [3, 2] show that we can recover perfectly a K-

sparse signal/vector by using the Basis Pursuit algorithm in

(4) under certain conditions on the so-called isometry con-

stant of the A matrix.

2.2. Sparse representation in a union of subspaces

Most of the work on CS deals with sparse representation

of signals/vectors lying in a single low-dimensional linear

subspace. The more general case where the signals/vectors

lie in a union of low-dimensional linear subspaces was only

recently considered. The work of Eldar [9] shows that when

the subspaces are disjoint (intersect only at the origin), a ba-

sis for each subspace is known, and certain condition on

a modiﬁed isometry constant holds, one can recover the

block-sparse vector s exactly by solving an !

1

/!

2

optimiza-

tion problem.

More precisely, let {A

i

∈ R

D×d

i

}

n

i=1

be a set of bases

for n disjoint linear subspaces embedded in R

D

with di-

mensions {d

i

}

n

i=1

. If y belongs to the i-th subspace, we

can represent it as the sparse solution of

y = As =[A

1

,A

2

, · · · ,A

n

][s

!

1

, s

!

2

, · · · , s

!

n

]

!

, (5)

where s

i

∈ R

d

i

is a nonzero vector and all other vectors

{s

j

∈ R

d

j

}

j#=i

are zero. Therefore, s is the solution to the

following non-convex optimization problem:

min

n

!

i=1

1(#s

i

#

2

> 0) subject to y = As, (6)

where 1(#s

i

#

2

> 0) is an indicator function that takes the

value 1 when #s

i

#

2

> 0 and zero otherwise. [9] shows that

if a modiﬁed isometry constant satisﬁes a certain condition,

then the solution to the (convex) !

2

/!

1

program

min

n

!

i=1

#s

i

#

2

subject to y = As (7)

coincides with that of (6).

In this paper, we address the problem of clustering data

lying in multiple linear or afﬁne subspaces. This subspace

clustering problem is more challenging, because the sub-

space bases {A

i

}

n

i=1

and the subspace dimensions {d

i

}

n

i=1

are unknown, and hence we do not know a priori which data

points belong to which subspace. To the best of our knowl-

edge, our work is the ﬁrst one to use sparse representation

techniques to address the subspace clustering problem.

3. Subspace clustering via sparse representation

In this section, we consider the problem of clustering a

collection of data points drawn from a union of subspaces

using sparse representation. First we consider the case

where all subspaces are linear and then we extend our re-

sult to the more general case of afﬁne subspaces.

3.1. Clustering linear subspaces

Let {y

j

∈ R

D

}

N

j=1

be a collection of data points drawn

from a union of n independent

1

linear subspaces {S

i

}

n

i=1

.

Let {d

i

" D}

n

i=1

and {A

i

∈ R

D×d

i

}

n

i=1

be, respec-

tively, the unknown dimensions and bases for the n sub-

spaces. Let Y

i

∈ R

D×N

i

be the collection of N

i

data points

drawn from subspace i. Since we do not know which points

belong to which subspace, our data matrix is of the form

Y =[y

1

, y

2

, · · · , y

N

]=[Y

1

,Y

2

, · · · ,Y

n

] Γ ∈ R

D×N

,

where N =

"

n

i=1

N

i

and Γ ∈ R

N×N

is an unknown per-

mutation matrix that speciﬁes the segmentation of data.

Although we do not know the subspace bases, we know

that such bases can be chosen from the columns of the data

matrix Y . In fact, if we assume that there are enough data

points from each linear subspace, N

i

≥ d

i

, and that these

data points are in general positions, meaning that no d

i

points from subspace i live in a (d

i

− 1)-dimensional sub-

space, then the collection of data points is self-expressive.

This means that if y is a new data point in S

i

, then it can be

represented as a linear combination of d

i

points in the same

subspace. Thus if we let s = Γ

−1

[s

!

1

, s

!

2

, · · · , s

!

n

]

!

∈

R

N

, where s

i

∈ R

N

i

, then y has a d

i

-sparse representation,

which can be recovered as a sparse solution of y = Y s,

with s

i

&=0and s

j

=0for all j &= i. That is, s is a solution

of the following non-convex optimization problem

min #s#

0

subject to y = Y s (8)

which is an NP-hard problem to solve.

2

The following theorem shows that when the subspaces

are independent

1

, the !

1

optimization problem

min #s#

1

subject to y = Y s (9)

gives block sparse solutions with the nonzero block corre-

sponding to points in the same subspace as y.

Theorem 1 Let Y ∈ R

D×N

be a matrix whose columns

are drawn from a union of n independent linear subspaces.

Assume that the points within each subspace are in general

position. Let y be a new point in subspace i. The solution

to the !

1

problem in (9) s = Γ

−1

[s

!

1

, s

!

2

, · · · , s

!

n

]

!

∈ R

N

is block sparse, i.e. s

i

&=0and s

j

=0for all j &= i.

Proof. Let s be any sparse representation of the data point

y ∈ S

i

, i.e. y = Y s with s

i

&=0and s

j

=0for all j &= i.

Since the points in each subspace are in general positions,

such a sparse representation exists. Now, if s

∗

is a solution

of the !

1

program in (9), then s

∗

is a vector of minimum

1

A collection of n linear subspaces {S

i

⊂ R

D

}

n

i=1

are independent if

dim(

n

i=1

S

i

)=

n

i=1

dim(S

i

), where ⊕ is the direct sum.

2

Notice that our optimization problem in (8) is different from the one in

(6), because we do not know the subspace basis or the permutation matrix

Γ, and hence we cannot enforce that s

j

=0for j #= i whenever s

i

#=0.

!

1

norm satisfying y = Y s

∗

. Let h = s

∗

− s denote the

error between the optimal solution and our sparse solution.

Then, we can write h as the sum of two vectors h

i

and

h

i

c

supported on disjoint subsets of indices: h

i

represents

the error for the corresponding points in subspace i and h

i

c

the error for the corresponding points in other subspaces.

We now show that h

i

c

=0. For the sake of contradiction,

assume that h

i

c

&=0. Since s

∗

= s + h

i

+ h

i

c

, we have

that y = Y s

∗

= Y (s + h

i

)+Y h

i

c

. Also, since y ∈ S

i

,

Y (s + h

i

) ∈ S

i

, and from the independence assumption

Y h

i

c

/∈ S

i

, we have that Y h

i

c

=0. This implies that

y = Y s

∗

= Y (s + h

i

).

Now, from the fact that h

i

and h

i

c

are supported on disjoint

subset of indices, we have #s + h

i

#

1

< #s + h

i

+ h

i

c

#

1

=

#s

∗

#

1

. In other words, s + h

i

is a feasible solution for

the !

1

program in (9) whose !

1

norm is smaller than that

of the optimal solution. This contradicts the optimality of

the solution s

∗

. Thus we must always have s

∗

i

c

= s

i

c

=0,

meaning that only the block corresponding to the points in

the true subspace can have nonzero entries.

Theorem 1 gives sufﬁcient conditions on subspaces and

the data matrix in order to be able to recover a block sparse

representation of a new data point as a linear combination of

the points in the data matrix that are in the same subspace.

We now show how to use such a sparse representation for

clustering the data according to the multiple subspaces.

Let Y

ˆ

i

∈ R

D×N −1

be the matrix obtained from Y by re-

moving its i-th column, y

i

. The circumﬂex notation

ˆ

i thus

means “not i”. According to Theorem 1, if y

i

belongs to

the j-th subspace, then it has a sparse representation with

respect to the basis matrix Y

ˆ

i

. Moreover, such a representa-

tion can be recovered by solving the following !

1

program

min #c

i

#

1

subject to y

i

= Y

ˆ

i

c

i

. (10)

The optimal solution c

i

∈ R

N−1

is a vector whose nonzero

entries correspond to points (columns) in Y

ˆ

i

that lie in the

same subspace as y

i

. Thus, by inserting a zero entry at the i-

th row of c

i

, we make it an N -dimensional vector,

ˆ

c

i

∈ R

N

,

whose nonzero entries correspond to points in Y that lie in

the same subspace as y

i

.

After solving (10) at each point i =1, . . . , N , we obtain

a matrix of coefﬁcients C =[

ˆ

c

1

,

ˆ

c

2

, · · · ,

ˆ

c

N

] ∈ R

N×N

.

We use this matrix to deﬁne a directed graph G =(V, E).

The vertices of the graph V are the N data points, and there

is an edge (v

i

,v

j

) ∈ E when the data point y

j

is one of the

vectors in the sparse representation of y

i

, i.e. when C

ji

&=0.

One can easily see that the adjacency matrix of the G is C.

In general G is an unbalanced digraph. To make it bal-

anced, we build a new graph

˜

G with the adjacency matrix

˜

C

where

˜

C

ij

= |C

ij

| + |C

ji

|.

˜

C is still a valid representation

of the similarity, because if y

i

can write itself as a linear

combination of some points including y

j

(all in the same

subspace), then y

j

can also write itself as a linear combina-

tion of some points in the same subspace including y

i

.

Having formed the similarity graph

˜

G, it follows from

Theorem 1 that all vertices representing the data points

in the same subspace form a connected component in the

graph, while the vertices representing points in different

subspaces have no edges between them. Therefore, in the

case of n subspaces,

˜

C has the following block diagonal

form

˜

C =







˜

C

1

0 · · · 0

0

˜

C

2

· · · 0

.

00 · · ·

˜

C

n







Γ (11)

where Γ is a permutation matrix. The Laplacian matrix of

˜

G is then formed by L = D −

˜

C where D ∈ R

N×N

is a

diagonal matrix with D

ii

=

"

j

˜

C

ij

.

We use the following result from spectral graph theory

to infer the segmentation of the data by applying K-means

to a subset of eigenvectors of the Laplacian.

Proposition 1 The multiplicity of the zero eigenvalue of

the Laplacian matrix L corresponding to the graph

˜

G

is equal to the number of connected components of the

graph. Also, the components of the graph can be deter-

mined from the eigenspace of the zero eigenvalue. More

precisely, if the graph has n connected components, then

u

i

= [0, 0, . . . , 1

!

N

i

, 0, . . . , 0]Γ for i ∈ {1, 2, . . . , n} is the

i-th eigenvector of L corresponding to the zero eigenvalue

which means that the N

i

nonzero elements of u

i

belong to

the same group.

For data points drawn in general position from n inde-

pendent linear subspaces, the similarity graph

˜

G will have

n connected components. Therefore, when the number of

subspaces is unknown, we can estimate it as the number of

zero eigenvalues of L. In the case of real data with noise, we

have to consider a robust measure to determine the number

of eigenvalues of L close to zero.

3.2. Clustering afﬁne subspaces

In many cases we need to cluster data lying in multiple

afﬁne rather than linear subspaces. For instance, the motion

segmentation problem we will discuss in the next section in-

volves clustering data lying in multiple 3-dimensional afﬁne

subspaces. However, most existing motion segmentation al-

gorithms deal with this problem by clustering the data as if

they belonged to multiple 4-dimensional linear subspaces.

In this section, we show that our method can easily han-

dle the case of afﬁne subspaces by a simple modiﬁcation

to the BP algorithm. The modiﬁed !

1

minimization is still a

convex optimization, which can be efﬁciently implemented.

More speciﬁcally, notice that in the case of afﬁne subspace,

a point can no longer write itself as a linear combination of

points in the same subspace. However, we can still write a

point y as an afﬁne combination of other points, i.e.

y = c

1

y

1

+ c

2

y

2

+ · · · + c

N

y

N

,

N

!

i=1

c

i

=1. (12)

Theorem 2 shows that one can recover the sparse represen-

tation of data points on an afﬁne subspace by using the fol-

lowing modiﬁed Basis Pursuit algorithm

min #c#

1

subject to y = Y c and c

!

1 =1. (13)

Theorem 2 Let Y ∈ R

D×N

be a matrix whose columns

are drawn from a union of n independent

3

afﬁne subspaces.

Assume that the points within each subspace are in general

position. Let y be a new point in subspace i. The solution to

the !

1

problem in (13), s = Γ

−1

[s

!

1

, s

!

2

, · · · , s

!

n

]

!

∈ R

N

is block sparse, i.e. s

i

&=0and s

j

=0 for all j &= i.

Proof. Analogous to that of Theorem 1.

Similar to what we did for linear subspaces, we can use

this result for clustering a collection of data points drawn

from n afﬁne subspaces. Essentially, we solve the following

!

1

minimization problem for each data point y

i

min #c

i

#

1

subject to y

i

= Y

ˆ

i

c

i

and c

!

i

1 =1, (14)

and form the graph

˜

G from the sparse coefﬁcients. We

then apply spectral clustering to the corresponding Lapla-

cian matrix in order to get the segmentation of data.

3.3. Subspace clustering with noisy data

Consider now the case where the data points drawn from

a collection of linear or afﬁne subspaces are contaminated

with noise. More speciﬁcally, let

¯

y

i

= y

i

+ ζ

i

be the i-th

data point corrupted with noise ζ

i

bounded by #ζ

i

#

2

≤ #.

In order to recover the sparse representation of

¯

y

i

, we can

look for the sparsest solution of

¯

y

i

= Y

ˆ

i

c

i

with an error of

at most #, i.e. #Y

ˆ

i

c

i

−

¯

y

i

#

2

≤ #. We can ﬁnd such a sparse

representation by solving the following problem

min #c

i

#

1

subject to #Y

ˆ

i

c

i

−

¯

y

i

#

2

≤ #. (15)

However, in many situations we do not know the noise level

# beforehand. In such cases we can use the Lasso optimiza-

tion algorithm [23] to recover the sparse solution from

min #c

i

#

1

+ γ #Y

ˆ

i

c

i

−

¯

y

i

#

2

(16)

where γ > 0 is a constant. In the case data drawn from mul-

tiple afﬁne subspaces and corrupted with noise, the sparse

representation can be obtained by solving the problem

min#c

i

#

1

subject to #Y

ˆ

i

c

i

−

¯

y

i

#

2

≤ # and c

!

i

1 =1 (17)

3

A collection of afﬁne subspaces is said to be independent if they are

independent as linear subspaces in homogeneous coordinates.

or the modiﬁed Lasso counterpart

min #c

i

#

1

+ γ #Y

ˆ

i

c

i

−

¯

y

i

#

2

subject to c

!

i

1 =1. (18)

Segmentation of the data into different subspaces then fol-

lows by applying spectral clustering to the Laplacian of

˜

G.

3.4. Clustering with missing or corrupted data

In practice, some of the entries of the data points may be

missing (incomplete data), or corrupted (outliers). In mo-

tion segmentation, for example, due to occlusions or lim-

itations of the tracker, we may loose some feature points

in some of the frames (missing entries), or the tracker may

loose track of some features, leading to gross errors. As

suggested in [21], we can ﬁll in missing entries or correct

gross errors using sparse techniques. In this section, we

show that one can also cluster data points with missing or

corrupted entries using a sparse representation.

Let I

i

⊂ {1, . . . , D} denote the indices of missing en-

tries in y

i

∈ R

D

. Let Y

ˆ

i

∈ R

D×N −1

be obtained by elimi-

nating the vector y

i

from the i-th column of the data matrix

Y . We then form

˜

y

i

∈ R

D−|I

i

|

and

˜

Y

ˆ

i

∈ R

D−|I

i

|×N−1

by

eliminating rows of y

i

and Y

ˆ

i

indexed by I

i

, respectively.

Assuming that

˜

Y

ˆ

i

is complete, we can ﬁnd a sparse repre-

sentation, c

∗

i

, for

˜

y

i

by solving the following problem

min #c

i

#

1

+ γ #

˜

Y

ˆ

i

c

i

−

˜

y

i

#

2

subject to c

!

i

1 =1. (19)

The missing entries of y

i

are then given by y

∗

i

= Y

ˆ

i

c

∗

i

.

Notice that this method for completion of missing data is

essentially the same as our method for computing the sparse

representation from complete data with noise in (18). Hence

we can use the sparse coefﬁcient vectors {c

∗

i

}

N

i=1

to build

the similarity graph and ﬁnd the segmentation of data.

Assume now that a few entries of each data point are cor-

rupted. We can also use the sparse representation to correct

such entries. More precisely, let

˜

y

i

∈ R

D

be a corrupted

vector obtained from

¯

y

i

= y

i

+ ζ

i

by adding a sparse error

vector e

i

∈ R

D

as

˜

y

i

= y

i

+ ζ

i

+ e

i

. We can then write

˜

y

i

= Y

ˆ

i

c

i

+ e

i

=[Y

ˆ

i

I

D

]

)

c

i

e

i

*

+ ζ

i

, (20)

where the coefﬁcient vector [c

!

i

, e

!

i

]

!

is still sparse, and

hence can be recovered from

min #

)

c

i

e

i

*

#

1

+γ #

˜

y

i

−[Y

ˆ

i

I

D

]

)

c

i

e

i

*

#

2

subject to c

!

i

1 =1.

We can then recover the original vector without outliers as

y

∗

i

= Y

ˆ

i

c

∗

i

. As before, we can obtain the segmentation from

the sparse coefﬁcients {c

∗

i

}

N

i=1

using spectral clustering.

In summary, we have the following Sparse Subspace

Clustering (SSC) algorithm for clustering data drawn from

a collection of linear/afﬁne subspaces, and corrupted by

noise, missing entries, and outliers.

Sparse subspace clustering

Figures

Citations

Robust Recovery of Subspace Structures by Low-Rank Representation

Sparse Subspace Clustering: Algorithm, Theory, and Applications

Sparse Representation for Computer Vision and Pattern Recognition

Robust Subspace Segmentation by Low-Rank Representation

Sparse Subspace Clustering: Algorithm, Theory, and Applications

References

Regression Shrinkage and Selection via the Lasso

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Robust Face Recognition via Sparse Representation

Decoding by linear programming

Stable signal recovery from incomplete and inaccurate measurements

Related Papers (5)

Robust Recovery of Subspace Structures by Low-Rank Representation

Robust Subspace Segmentation by Low-Rank Representation

Sparse Subspace Clustering: Algorithm, Theory, and Applications

Normalized cuts and image segmentation

Robust Face Recognition via Sparse Representation