scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Robust Recovery of Subspace Structures by Low-Rank Representation

TL;DR: It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: When the data is clean, LRR exactly recovers the true subspace structures; when the data are contaminated by outliers, it is proved that under certain conditions LRR can exactly recover the row space of the original data.
Abstract: In this paper, we address the subspace clustering problem. Given a set of data samples (vectors) approximately drawn from a union of multiple subspaces, our goal is to cluster the samples into their respective subspaces and remove possible outliers as well. To this end, we propose a novel objective function named Low-Rank Representation (LRR), which seeks the lowest rank representation among all the candidates that can represent the data samples as linear combinations of the bases in a given dictionary. It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: When the data is clean, we prove that LRR exactly recovers the true subspace structures; when the data are contaminated by outliers, we prove that under certain conditions LRR can exactly recover the row space of the original data and detect the outlier as well; for data corrupted by arbitrary sparse errors, LRR can also approximately recover the row space with theoretical guarantees. Since the subspace membership is provably determined by the row space, these further imply that LRR can perform robust subspace clustering and error correction in an efficient and effective way.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, a sparse subspace clustering algorithm is proposed to cluster high-dimensional data points that lie in a union of low-dimensional subspaces, where a sparse representation corresponds to selecting a few points from the same subspace.
Abstract: Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.

2,298 citations

Posted Content
TL;DR: This paper proposes and studies an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces, and demonstrates the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.
Abstract: In many real-world problems, we are dealing with collections of high-dimensional data, such as images, videos, text and web documents, DNA microarray data, and more. Often, high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories the data belongs to. In this paper, we propose and study an algorithm, called Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of subspaces and the distribution of data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm can be solved efficiently and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal with data nuisances, such as noise, sparse outlying entries, and missing entries, directly by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.

1,521 citations


Cites methods from "Robust Recovery of Subspace Structu..."

  • ...We have used the correct code for computing the misclassification rate and, as a result, the reported performance for LRR-H is different from the published results in [38] and [40]....

    [...]

  • ...Using advances in sparse [29], [30], [31] and low-rank [32], [33], [34] recovery algorithms, the Sparse Subspace Clustering (SSC) [35], [36], [37], Low-Rank Recovery (LRR) [38], [39], [40], and Low-Rank Subspace Clustering (LRSC) [41] algorithms pose the clustering problem as one of finding a sparse or low-rank representation of the data in the dictionary of the data itself....

    [...]

  • ...However, the code of the algorithm applies a heuristic postprocessing step, similar to [65], to the lowrank solution prior to building the similarity graph [40]....

    [...]

Journal ArticleDOI
TL;DR: In GNMF, an affinity graph is constructed to encode the geometrical information and a matrix factorization is sought, which respects the graph structure, and the empirical study shows encouraging results of the proposed algorithm in comparison to the state-of-the-art algorithms on real-world problems.
Abstract: Recently, multiple graph regularizer based methods have shown promising performances in data representation However, the parameter choice of the regularizer is crucial to the performance of clustering and its optimal value changes for different real datasets To deal with this problem, we propose a novel method called Parameter-less Auto-weighted Multiple Graph regularized Nonnegative Matrix Factorization (PAMGNMF) in this paper PAMGNMF employs the linear combination of multiple simple graphs to approximate the manifold structure of data as previous methods do Moreover, the proposed method can automatically learn an optimal weight for each graph without introducing an additive parameter Therefore, the proposed PAMGNMF method is easily applied to practical problems Extensive experimental results on different real-world datasets have demonstrated that the proposed method achieves better performance than the state-of-the-art approaches

1,082 citations

Journal ArticleDOI
TL;DR: A survey of domain adaptation methods for visual recognition discusses the merits and drawbacks of existing domain adaptation approaches and identifies promising avenues for research in this rapidly evolving field.
Abstract: In pattern recognition and computer vision, one is often faced with scenarios where the training data used to learn a model have different distribution from the data on which the model is applied. Regardless of the cause, any distributional change that occurs after learning a classifier can degrade its performance at test time. Domain adaptation tries to mitigate this degradation. In this article, we provide a survey of domain adaptation methods for visual recognition. We discuss the merits and drawbacks of existing domain adaptation approaches and identify promising avenues for research in this rapidly evolving field.

871 citations

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper proposes to construct the dictionary by using both observed and unobserved, hidden data, and shows that the effects of the hidden data can be approximately recovered by solving a nuclear norm minimization problem, which is convex and can be solved efficiently.
Abstract: Low-Rank Representation (LRR) [16, 17] is an effective method for exploring the multiple subspace structures of data. Usually, the observed data matrix itself is chosen as the dictionary, which is a key aspect of LRR. However, such a strategy may depress the performance, especially when the observations are insufficient and/or grossly corrupted. In this paper we therefore propose to construct the dictionary by using both observed and unobserved, hidden data. We show that the effects of the hidden data can be approximately recovered by solving a nuclear norm minimization problem, which is convex and can be solved efficiently. The formulation of the proposed method, called Latent Low-Rank Representation (LatLRR), seamlessly integrates subspace segmentation and feature extraction into a unified framework, and thus provides us with a solution for both subspace segmentation and feature extraction. As a subspace segmentation algorithm, LatLRR is an enhanced version of LRR and outperforms the state-of-the-art algorithms. Being an unsupervised feature extraction algorithm, LatLRR is able to robustly extract salient features from corrupted data, and thus can work much better than the benchmark that utilizes the original data vectors as features for classification. Compared to dimension reduction based methods, LatLRR is more robust to noise.

656 citations


Cites background or methods from "Robust Recovery of Subspace Structu..."

  • ...Low-Rank Representation (LRR) [16, 17] is an effective method for exploring the multiple subspace structures of data....

    [...]

  • ...For subspace segmentation, the observed data matrix itself is usually used as the dictionary [16, 17, 24], resulting in the following convex optimization problem:...

    [...]

  • ...3 of [16], problem (2) has a unique minimizer...

    [...]

  • ...[16, 17] show that the optimal solution, denoted as Z O, to the above problem is the widely used Shape Iteration Matrix (SIM) [2],...

    [...]

  • ..., [2, 3, 16, 17]) are able to produce exactly correct segmentation results....

    [...]

References
More filters
Book
01 Jan 1983

34,729 citations

Journal ArticleDOI
TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.
Abstract: A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. A major portion of this paper describes the application of RANSAC to the Location Determination Problem (LDP): Given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing

23,396 citations

Journal ArticleDOI
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Abstract: We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We applied this approach to segmenting static images, as well as motion sequences, and found the results to be very encouraging.

13,789 citations


"Robust Recovery of Subspace Structu..." refers methods in this paper

  • ...Finally, we could use the spectral clustering algorithms su ch as Normalized Cuts (NCut) [26] to segment the data samples into a given numberk of clusters....

    [...]

  • ...As a data clustering problem, subspace segmentation can be done by firstly learning an affinity matrix from the given data , and then obtaining the final segmentation results by spectra l clustering algorithms such as Normalized Cuts (NCut) [26]....

    [...]

Journal ArticleDOI
TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Abstract: We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by C1-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.

9,658 citations


"Robust Recovery of Subspace Structu..." refers background in this paper

  • ...near) subspaces are possibly the most common choice, mainly because they are easy to compute and often effective in real applications. Several types of visual data, such as motion [1], [2], [3], face [4] and texture [5], have been known to be well characterized by subspaces. Moreover, by applying the concept of reproducing kernel Hilbert space [6], one can easily extend the linear models to handle no...

    [...]

  • ...) data into clusters with each cluster corresponding to a subspace. Subspace segmentation is an important data clustering problem and arises in numerous research areas, including computer vision [3], [4], [10], [11], image processing [5], [12], [13], machine learning [14], [15] and system identification [16]. When the data is clean, i.e., the samples are strictly drawn from the subspaces, several exis...

    [...]

Book
01 Nov 1996

8,608 citations


"Robust Recovery of Subspace Structu..." refers methods in this paper

  • ...mplexity to O(n2) because it is unnecessary to compute the singular values/vectors that will be shrunk to zeros. Step 2 can also be made efficient by using the preconditioned conjugate gradient method [38]. We leave these as future work. D. Discussions 9 Algorithm 2 Solving Problem (13) by Inexact ALM Input: data matrix X, parameter λ. Initialize: Z = J = 0,E = 0,Y1 = 0,Y2 = 0,Y3 = 0,µ = 10−6,max u = 1...

    [...]