scispace - formally typeset
Search or ask a question
Posted Content

Fast L1-Minimization Algorithms For Robust Face Recognition

TL;DR: This study focuses on the numerical implementation of a sparsity-based classification framework in robust face recognition, where sparse representation is sought to recover human identities from very high-dimensional facial images that may be corrupted by illumination, facial disguise, and pose variation.
Abstract: L1-minimization refers to finding the minimum L1-norm solution to an underdetermined linear system b=Ax. Under certain conditions as described in compressive sensing theory, the minimum L1-norm solution is also the sparsest solution. In this paper, our study addresses the speed and scalability of its algorithms. In particular, we focus on the numerical implementation of a sparsity-based classification framework in robust face recognition, where sparse representation is sought to recover human identities from very high-dimensional facial images that may be corrupted by illumination, facial disguise, and pose variation. Although the underlying numerical problem is a linear program, traditional algorithms are known to suffer poor scalability for large-scale applications. We investigate a new solution based on a classical convex optimization framework, known as Augmented Lagrangian Methods (ALM). The new convex solvers provide a viable solution to real-world, time-critical applications such as face recognition. We conduct extensive experiments to validate and compare the performance of the ALM algorithms against several popular L1-minimization solvers, including interior-point method, Homotopy, FISTA, SESOP-PCD, approximate message passing (AMP) and TFOCS. To aid peer evaluation, the code for all the algorithms has been made publicly available.
Citations
More filters
Journal ArticleDOI
TL;DR: A comprehensive overview of sparse representation is provided and an experimentally comparative study of these sparse representation algorithms was presented, which could sufficiently reveal the potential nature of the sparse representation theory.
Abstract: Sparse representation has attracted much attention from researchers in fields of signal processing, image processing, computer vision, and pattern recognition. Sparse representation also has a good reputation in both theoretical research and practical applications. Many different algorithms have been proposed for sparse representation. The main purpose of this paper is to provide a comprehensive study and an updated review on sparse representation and to supply guidance for researchers. The taxonomy of sparse representation methods can be studied from various viewpoints. For example, in terms of different norm minimizations used in sparsity constraints, the methods can be roughly categorized into five groups: 1) sparse representation with $l_{0}$ -norm minimization; 2) sparse representation with $l_{p}$ -norm ( $0 ) minimization; 3) sparse representation with $l_{1}$ -norm minimization; 4) sparse representation with $l_{2,1}$ -norm minimization; and 5) sparse representation with $l_{2}$ -norm minimization. In this paper, a comprehensive overview of sparse representation is provided. The available sparse representation algorithms can also be empirically categorized into four groups: 1) greedy strategy approximation; 2) constrained optimization; 3) proximity algorithm-based optimization; and 4) homotopy algorithm-based sparse representation. The rationales of different algorithms in each category are analyzed and a wide range of sparse representation applications are summarized, which could sufficiently reveal the potential nature of the sparse representation theory. In particular, an experimentally comparative study of these sparse representation algorithms was presented.

925 citations

Journal ArticleDOI
TL;DR: A survey of domain adaptation methods for visual recognition discusses the merits and drawbacks of existing domain adaptation approaches and identifies promising avenues for research in this rapidly evolving field.
Abstract: In pattern recognition and computer vision, one is often faced with scenarios where the training data used to learn a model have different distribution from the data on which the model is applied. Regardless of the cause, any distributional change that occurs after learning a classifier can degrade its performance at test time. Domain adaptation tries to mitigate this degradation. In this article, we provide a survey of domain adaptation methods for visual recognition. We discuss the merits and drawbacks of existing domain adaptation approaches and identify promising avenues for research in this rapidly evolving field.

871 citations


Cites methods from "Fast L1-Minimization Algorithms For..."

  • ...One can also adapt fast `1 solvers for sparse coding [96], [97] rather than using greedy orthogonal matching pursuit algorithms....

    [...]

Posted Content
TL;DR: Denoising-based approximate message passing (D-AMP) as mentioned in this paper integrates a wide class of denoisers within its iterations to improve the performance of compressed sensing (CS) reconstruction.
Abstract: A denoising algorithm seeks to remove noise, errors, or perturbations from a signal. Extensive research has been devoted to this arena over the last several decades, and as a result, today's denoisers can effectively remove large amounts of additive white Gaussian noise. A compressed sensing (CS) reconstruction algorithm seeks to recover a structured signal acquired using a small number of randomized measurements. Typical CS reconstruction algorithms can be cast as iteratively estimating a signal from a perturbed observation. This paper answers a natural question: How can one effectively employ a generic denoiser in a CS reconstruction algorithm? In response, we develop an extension of the approximate message passing (AMP) framework, called Denoising-based AMP (D-AMP), that can integrate a wide class of denoisers within its iterations. We demonstrate that, when used with a high performance denoiser for natural images, D-AMP offers state-of-the-art CS recovery performance while operating tens of times faster than competing methods. We explain the exceptional performance of D-AMP by analyzing some of its theoretical features. A key element in D-AMP is the use of an appropriate Onsager correction term in its iterations, which coerces the signal perturbation at each iteration to be very close to the white Gaussian noise that denoisers are typically designed to remove.

337 citations

Proceedings ArticleDOI
01 Dec 2013
TL;DR: By extending the popular soft-thresholding operator, a generalized iterated shrinkage algorithm (GISA) for Ip-norm non-convex sparse coding is proposed, which is theoretically more solid and can achieve more accurate solutions.
Abstract: In many sparse coding based image restoration and image classification problems, using non-convex Ip-norm minimization (0 ≤ p <; 1) can often obtain better results than the convex l1-norm minimization. A number of algorithms, e.g., iteratively reweighted least squares (IRLS), iteratively thresholding method (ITM-Ip), and look-up table (LUT), have been proposed for non-convex Ip-norm sparse coding, while some analytic solutions have been suggested for some specific values of p. In this paper, by extending the popular soft-thresholding operator, we propose a generalized iterated shrinkage algorithm (GISA) for Ip-norm non-convex sparse coding. Unlike the analytic solutions, the proposed GISA algorithm is easy to implement, and can be adopted for solving non-convex sparse coding problems with arbitrary p values. Compared with LUT, GISA is more general and does not need to compute and store the look-up tables. Compared with IRLS and ITM-Ip, GISA is theoretically more solid and can achieve more accurate solutions. Experiments on image restoration and sparse coding based face recognition are conducted to validate the performance of GISA.

313 citations


Cites methods from "Fast L1-Minimization Algorithms For..."

  • ...In the experiment, we used the efficient augmented Lagrangian method (ALM) [40] to solve the original SRC model....

    [...]

  • ...We denote by SRC-p, q the SRC method with 0 q = p 1, and embed the proposed GISA into ALM to implement SRC-p, q for robust face recognition....

    [...]

  • ...Then, by simply replacing the soft-thresholding operator in ALM by the proposed GST operator, we can embed the proposed GISA algorithm into the ALM method for solving the SRC model with arbitrary values of p and q....

    [...]

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a new face coding model, namely regularized robust coding (RRC), which could robustly regress a given signal with regularized regression coefficients by assuming that the coding residual and the coding coefficient are respectively independent and identically distributed.
Abstract: Recently the sparse representation based classification (SRC) has been proposed for robust face recognition (FR). In SRC, the testing image is coded as a sparse linear combination of the training samples, and the representation fidelity is measured by the l2-norm or l1 -norm of the coding residual. Such a sparse coding model assumes that the coding residual follows Gaussian or Laplacian distribution, which may not be effective enough to describe the coding residual in practical FR systems. Meanwhile, the sparsity constraint on the coding coefficients makes the computational cost of SRC very high. In this paper, we propose a new face coding model, namely regularized robust coding (RRC), which could robustly regress a given signal with regularized regression coefficients. By assuming that the coding residual and the coding coefficient are respectively independent and identically distributed, the RRC seeks for a maximum a posterior solution of the coding problem. An iteratively reweighted regularized robust coding (IR3C) algorithm is proposed to solve the RRC model efficiently. Extensive experiments on representative face databases demonstrate that the RRC is much more effective and efficient than state-of-the-art sparse representation based methods in dealing with face occlusion, corruption, lighting, and expression changes, etc.

251 citations

References
More filters
Book
01 Mar 2004
TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Abstract: Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.

33,341 citations

Journal ArticleDOI
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

13,656 citations


"Fast L1-Minimization Algorithms For..." refers background in this paper

  • ...s impossible to discuss and compare all the existing methods in a single paper. Methods that are not discussed in this paper include GPSR [24], SpaRSA [25], SPGL1 [26], NESTA [15], SALSA [27], GLMNET [28], and Bregman iterative algorithm [29], just to name a few. Nevertheless, vast majority of the existing algorithms are variants of those benchmarked in this paper, and share many common properties wit...

    [...]

Book
01 Jan 1995

12,671 citations


"Fast L1-Minimization Algorithms For..." refers background in this paper

  • ...angian of (30) given by L ˘(x;) = g(x) + ˘ 2 kh(x)k2 2 + Th(x); (31) where 2Rm is a vector of Lagrange multipliers. L ˘(;) is called the augmented Lagrangian function of (1). It has been shown in [64] that there exists m2R (not necessarily unique) and ˘ 2R such that x = argmin x L ˘(x;) 8˘&gt;˘ : (32) Thus, it is possible to find the optimal solution to (P 1) by minimizing the augmented Lagrangi...

    [...]

  • ...and , respectively, provided that f kg is a bounded sequence and f˘ kgis sufficiently large after a certain index. Furthermore, the convergence rate is linear as long as ˘&gt;˘, and superlinear if !1[64]. Here, we point out that the choice of f˘ kgis problemdependent. As shown in [64], increasing ˘ k increases the ill-conditionness or difficulty of minimizing L ˘ k (x; k), and the degree of difficulty...

    [...]

Journal ArticleDOI
TL;DR: A face recognition algorithm which is insensitive to large variation in lighting direction and facial expression is developed, based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variations in lighting and facial expressions.
Abstract: We develop a face recognition algorithm which is insensitive to large variation in lighting direction and facial expression. Taking a pattern classification approach, we consider each pixel in an image as a coordinate in a high-dimensional space. We take advantage of the observation that the images of a particular face, under varying illumination but fixed pose, lie in a 3D linear subspace of the high dimensional image space-if the face is a Lambertian surface without shadowing. However, since faces are not truly Lambertian surfaces and do indeed produce self-shadowing, images will deviate from this linear subspace. Rather than explicitly modeling this deviation, we linearly project the image into a subspace in a manner which discounts those regions of the face with large deviation. Our projection method is based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variation in lighting and facial expressions. The eigenface technique, another method based on linearly projecting the image space to a low dimensional subspace, has similar computational requirements. Yet, extensive experimental results demonstrate that the proposed "Fisherface" method has error rates that are lower than those of the eigenface technique for tests on the Harvard and Yale face databases.

11,674 citations

Journal ArticleDOI
TL;DR: A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.
Abstract: We consider the class of iterative shrinkage-thresholding algorithms (ISTA) for solving linear inverse problems arising in signal/image processing. This class of methods, which can be viewed as an extension of the classical gradient algorithm, is attractive due to its simplicity and thus is adequate for solving large-scale problems even with dense matrix data. However, such methods are also known to converge quite slowly. In this paper we present a new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically. Initial promising numerical results for wavelet-based image deblurring demonstrate the capabilities of FISTA which is shown to be faster than ISTA by several orders of magnitude.

11,413 citations


"Fast L1-Minimization Algorithms For..." refers background or methods in this paper

  • ...it at the expense of increasing the number of iterations as compared to the interior-point methods. Here we consider four most visible algorithms in recent years, namely, proximal-point methods [20], [21], [15], parallel coordinate descent (PCD) [22], approximate message passing (AMP) [17], and templates for convex cone solvers (TFOCS) [23]. Before proceeding, we first introduce the proximal operator o...

    [...]

  • ...celeration techniques for ‘ 1-min problems, which include two classical solutions using interior-point method and Homotopy method, and several first-order methods including proximalpoint methods [20], [21], [15], parallel coordinate descent (PCD) [22], approximate message passing (AMP) [17], and templates for convex cone solvers (TFOCS) [23]. To set up the stage for a fair comparison and help the reade...

    [...]

  • ... [58]. While the above methods enjoy a much lower computation complexity per iteration, in practice people have observed that it converges quite slowly in terms of the number of iterations. Recently, [21] proposes a fast iterative soft-thresholding algorithm (FISTA), which has a significantly better convergence rate. The key idea behind FISTA is that, instead of forming a quadratic approximation of F( ...

    [...]

  • ...nce behavior of the above scheme depends on the choice of k. For example, the popular iterative soft-thresholding algorithm (ISTA) [54], [55], [56], [25] employs a fixed choice of k related to L f. In [21], assuming k = L f, one can show that ISTA has a sublinear convergence rate that is no worse than O(1=k): F(x k) F(x) L fkx 0 2xk 2k ; 8k: (17) Meanwhile, an alternative way of determining kat each ...

    [...]

  • ...;2) in [15], yielding the so-called Nesterov’s algorithm (NESTA).6 Both algorithms enjoy the same non-asymptotic convergence rate of O(1=k2) in the ‘ 1-min setting. The interested reader may refer to [21] for a proof of the above result, which extends the original algorithm of Nesterov [59] devised only for smooth functions that are everywhere Lipschitz continuous. B. Parallel Coordinate Descent Algor...

    [...]