scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Face Verification via Class Sparsity Based Supervised Encoding

TL;DR: This paper presents a novel formulation for a class sparsity based supervised encoder, termed as CSSE, which is derived for a single hidden layer and it is applied for multiple hidden layers using a greedy layer-by-layer learning approach.
Abstract: Autoencoders are deep learning architectures that learn feature representation by minimizing the reconstruction error. Using an autoencoder as baseline, this paper presents a novel formulation for a class sparsity based supervised encoder, termed as CSSE. We postulate that features from the same class will have a common sparsity pattern/support in the latent space. Therefore, in the formulation of the autoencoder, a supervision penalty is introduced as a joint-sparsity promoting $l_{2,1}$ -norm. The formulation of CSSE is derived for a single hidden layer and it is applied for multiple hidden layers using a greedy layer-by-layer learning approach. The proposed CSSE approach is applied for learning face representation and verification experiments are performed on the LFW and PaSC face databases. The experiments show that the proposed approach yields improved results compared to autoencoders and comparable results with state-of-the-art face recognition algorithms.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, an autoencoder-based framework was proposed to simultaneously reconstruct and classify biomedical signals, which does not require any assumption regarding the signal as long as there is sufficiently large training data.
Abstract: Objective: An autoencoder-based framework that simultaneously reconstruct and classify biomedical signals is proposed. Previous work has treated reconstruction and classification as separate problems. This is the first study that proposes a combined framework to address the issue in a holistic fashion. Methods: For telemonitoring purposes, reconstruction techniques of biomedical signals are largely based on compressed sensing (CS); these are “designed” techniques where the reconstruction formulation is based on some “assumption” regarding the signal. In this study, we propose a new paradigm for reconstruction—the reconstruction is “learned,” using an autoencoder; it does not require any assumption regarding the signal as long as there is sufficiently large training data. But since the final goal is to analyze/classify the signal, the system can also learn a linear classification map that is added inside the autoencoder. The ensuing optimization problem is solved using the Split Bregman technique. Results: Experiments were carried out on reconstructing and classifying electrocardiogram (ECG) (arrhythmia classification) and EEG (seizure classification) signals. Conclusion: Our proposed tool is capable of operating in a semi-supervised fashion. We show that our proposed method is better in reconstruction and more than an order magnitude faster than CS based methods; it is capable of real-time operation. Our method also yields better results than recently proposed classification methods. Significance: This is the first study offering an alternative to CS-based reconstruction. It also shows that the representation learning approach can yield better results than traditional methods that use hand-crafted features for signal analysis.

105 citations

Posted Content
TL;DR: This paper attempts to unravel three aspects related to the robustness of DNNs for face recognition in terms of vulnerabilities to attacks inspired by commonly observed distortions in the real world, and presents several effective countermeasures to mitigate the impact of adversarial attacks and improve the overall robustnessof DNN-based face recognition.
Abstract: Deep neural network (DNN) architecture based models have high expressive power and learning capacity. However, they are essentially a black box method since it is not easy to mathematically formulate the functions that are learned within its many layers of representation. Realizing this, many researchers have started to design methods to exploit the drawbacks of deep learning based algorithms questioning their robustness and exposing their singularities. In this paper, we attempt to unravel three aspects related to the robustness of DNNs for face recognition: (i) assessing the impact of deep architectures for face recognition in terms of vulnerabilities to attacks inspired by commonly observed distortions in the real world that are well handled by shallow learning methods along with learning based adversaries; (ii) detecting the singularities by characterizing abnormal filter response behavior in the hidden layers of deep networks; and (iii) making corrections to the processing pipeline to alleviate the problem. Our experimental evaluation using multiple open-source DNN-based face recognition networks, including OpenFace and VGG-Face, and two publicly available databases (MEDS and PaSC) demonstrates that the performance of deep learning based face recognition algorithms can suffer greatly in the presence of such distortions. The proposed method is also compared with existing detection algorithms and the results show that it is able to detect the attacks with very high accuracy by suitably designing a classifier using the response of the hidden layers in the network. Finally, we present several effective countermeasures to mitigate the impact of adversarial attacks and improve the overall robustness of DNN-based face recognition.

103 citations


Cites background from "Face Verification via Class Sparsit..."

  • ...L-CSSE is a supervised autoencoder formulation that utilizes a class sparsity based supervision penalty in the loss function to improve the classification capabilities of autoencoder based deep networks....

    [...]

  • ...It is to be noted that for xMSB and grid attack, L-CSSE is able to achieve relatively better performance because L-CSSE is a supervised version of autoencoder which can handle noise better....

    [...]

  • ...Using an open source implementation of Facenet, i.e., OpenFace, and the recently proposed VGG-Face, LightCNN, and L-CSSE networks, we conduct a series of experiments on the publicly available PaSC and MEDS databases....

    [...]

  • ...In cases of LightCNN and L-CSSE, they both have shown higher performance with original images; however, as shown in Table 3, similar level of drops are observed....

    [...]

  • ...Table 3 summarizes the effect of image processing based adversarial distortions on OpenFace, VGG-Face, LightCNN, L-CSSE, and COTS....

    [...]

Journal ArticleDOI
TL;DR: This paper attempts to unravel three aspects related to the robustness of DNNs for face recognition in terms of vulnerabilities to attacks, detecting the singularities by characterizing abnormal filter response behavior in the hidden layers of deep networks; and making corrections to the processing pipeline to alleviate the problem.
Abstract: Deep neural network (DNN) architecture based models have high expressive power and learning capacity. However, they are essentially a black box method since it is not easy to mathematically formulate the functions that are learned within its many layers of representation. Realizing this, many researchers have started to design methods to exploit the drawbacks of deep learning based algorithms questioning their robustness and exposing their singularities. In this paper, we attempt to unravel three aspects related to the robustness of DNNs for face recognition: (i) assessing the impact of deep architectures for face recognition in terms of vulnerabilities to attacks, (ii) detecting the singularities by characterizing abnormal filter response behavior in the hidden layers of deep networks; and (iii) making corrections to the processing pipeline to alleviate the problem. Our experimental evaluation using multiple open-source DNN-based face recognition networks, and three publicly available face databases demonstrates that the performance of deep learning based face recognition algorithms can suffer greatly in the presence of such distortions. We also evaluate the proposed approaches on four existing quasi-imperceptible distortions: DeepFool, Universal adversarial perturbations, $$l_2$$ , and Elastic-Net (EAD). The proposed method is able to detect both types of attacks with very high accuracy by suitably designing a classifier using the response of the hidden layers in the network. Finally, we present effective countermeasures to mitigate the impact of adversarial attacks and improve the overall robustness of DNN-based face recognition.

98 citations

Journal ArticleDOI
03 Apr 2020
TL;DR: Different ways in which the robustness of a face recognition algorithm is challenged, which can severely affect its intended working are summarized.
Abstract: Face recognition algorithms have demonstrated very high recognition performance, suggesting suitability for real world applications Despite the enhanced accuracies, robustness of these algorithms against attacks and bias has been challenged This paper summarizes different ways in which the robustness of a face recognition algorithm is challenged, which can severely affect its intended working Different types of attacks such as physical presentation attacks, disguise/makeup, digital adversarial attacks, and morphing/tampering using GANs have been discussed We also present a discussion on the effect of bias on face recognition models and showcase that factors such as age and gender variations affect the performance of modern algorithms The paper also presents the potential reasons for these challenges and some of the future research directions for increasing the robustness of face recognition models

53 citations


Cites background from "Face Verification via Class Sparsit..."

  • ...Despite the high classification performance obtained by deep learning techniques (Majumdar, Singh, and Vatsa 2016; He et al. 2015; Silver et al. 2016), they are highly susceptible to changes in the input space (Figure 3(c))....

    [...]

Journal ArticleDOI
TL;DR: This work introduces the graph regularized autoencoder and proposes three variants, the first of which is the unsupervised version, tailored for clustering, and a supervised label consistent autoenCoder suitable for single label and multi-label classification problems.

43 citations

References
More filters
Journal ArticleDOI
01 Jan 1988-Nature
TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Abstract: We describe a new learning procedure, back-propagation, for networks of neurone-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal ‘hidden’ units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure1.

23,814 citations


"Face Verification via Class Sparsit..." refers background in this paper

  • ...An autoencoder consists of two parts: the encoder that maps the input to a latent space and the decoder that maps the latent representation to the reconstruction of the data [29], [30]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

13,037 citations

Journal ArticleDOI
TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.
Abstract: The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries --- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

9,950 citations


"Face Verification via Class Sparsit..." refers methods in this paper

  • ...This has widely been used in signal processing (Basis Pursuit) [43] and machine learning (LASSO...

    [...]

  • ...This has widely been used in signal processing (Basis Pursuit) [43] and machine learning (LASSO regression) [44]....

    [...]

Proceedings ArticleDOI
07 Jun 2015
TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.
Abstract: Despite significant recent advances in the field of face recognition [10, 14, 15, 17], implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors.

8,289 citations


"Face Verification via Class Sparsit..." refers methods in this paper

  • ...Two specific experiments are performed: (i) compare with OpenFace [58], which is an open source academic implementation of FaceNet [59] and (ii) replace CSSE and L-CSSE with discriminative RBM (D-RBM) [60] of same size in the proposed face verification framework (Fig....

    [...]

Book
01 Jan 2009
TL;DR: The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.
Abstract: Can machine learning deliver AI? Theoretical results, inspiration from the brain and cognition, as well as machine learning experiments suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one would need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers, graphical models with many levels of latent variables, or in complicated propositional formulae re-using many sub-formulae. Each level of the architecture represents features at a different level of abstraction, defined as a composition of lower-level features. Searching the parameter space of deep architectures is a difficult task, but new algorithms have been discovered and a new sub-area has emerged in the machine learning community since 2006, following these discoveries. Learning algorithms such as those for Deep Belief Networks and other related unsupervised learning algorithms have recently been proposed to train deep architectures, yielding exciting results and beating the state-of-the-art in certain areas. Learning Deep Architectures for AI discusses the motivations for and principles of learning algorithms for deep architectures. By analyzing and comparing recent results with different learning algorithms for deep architectures, explanations for their success are proposed and discussed, highlighting challenges and suggesting avenues for future explorations in this area.

7,767 citations


"Face Verification via Class Sparsit..." refers background in this paper

  • ...Several extensions have been proposed to the basic autoencoder architecture, for instance, stacked autoencoders [32] have multiple hidden layers and the corresponding cost function is represented as,...

    [...]