scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Regularizing deep learning architecture for face recognition with weight variations

TL;DR: This paper presents a novel approach to incorporate the weight variations during feature learning process in a deep learning architecture in terms of a regularization function which helps in learning the latent variables representative of different weight categories.
Abstract: Several mathematical models have been proposed for recognizing face images with age variations. However, effect of change in body-weight is also an interesting covariate that has not been much explored. This paper presents a novel approach to incorporate the weight variations during feature learning process. In a deep learning architecture, we propose incorporating the body-weight in terms of a regularization function which helps in learning the latent variables representative of different weight categories. The formulation has been proposed for both Autoencoder and Deep Boltzmann Machine. On extended WIT database of 200 subjects, the comparison with a commercial system and an existing algorithm show that the proposed algorithm outperforms them by more than 9% at rank-10 identification accuracy.
Citations
More filters
Journal ArticleDOI
TL;DR: This study presents a detailed review of the conventional and the latest strategies which would help in appraising the readers with the upsides and downsides of each strategy.
Abstract: Abstract Convolutional neural networks (CNN) is a contemporary technique for computer vision applications, where pooling implies as an integral part of the deep CNN. Besides, pooling provides the ability to learn invariant features and also acts as a regularizer to further reduce the problem of overfitting. Additionally, the pooling techniques significantly reduce the computational cost and training time of networks which are equally important to consider. Here, the performances of pooling strategies on different datasets are analyzed and discussed qualitatively. This study presents a detailed review of the conventional and the latest strategies which would help in appraising the readers with the upsides and downsides of each strategy. Also, we have identified four fundamental factors namely network architecture, activation function, overlapping and regularization approaches which immensely affect the performance of pooling operations. It is believed that this work would help in extending the scope of understanding the significance of CNN along with pooling regimes for solving computer vision problems.

32 citations

Journal ArticleDOI
TL;DR: A regularizer-based approach to learn weight invariant facial representations using two different deep learning architectures, namely, sparse-stacked denoising autoencoders and deep Boltzmann machines is proposed, which incorporates a body-weight aware regularization parameter in the loss function of these architectures to help learn weight-aware features.
Abstract: Body weight variations are an integral part of a person’s aging process. However, the lack of association between the age and the weight of an individual makes it challenging to model these variations for automatic face recognition. In this paper, we propose a regularizer-based approach to learn weight invariant facial representations using two different deep learning architectures, namely, sparse-stacked denoising autoencoders and deep Boltzmann machines. We incorporate a body-weight aware regularization parameter in the loss function of these architectures to help learn weight-aware features. The experiments performed on the extended WIT database show that the introduction of weight aware regularization improves the identification accuracy of the architectures both with and without dropout.

30 citations


Cites methods or result from "Regularizing deep learning architec..."

  • ...The experimental results and comparison on the eWIT [1] (extended WIT) database demonstrates that the proposed framework significantly improves the face recognition performance as compared to the existing algorithms....

    [...]

  • ...1A preliminary version of this research was published in IEEE International Conference on Biometrics: Theory, Applications and Systems, 2015 [1]....

    [...]

Journal ArticleDOI
TL;DR: A critical understanding of traditional and modern pooling techniques is provided and the strengths and weaknesses for readers are highlighted.
Abstract: One of the most promising techniques used in various sciences is deep neural networks (DNNs). A special type of DNN called a convolutional neural network (CNN) consists of several convolutional layers, each preceded by an activation function and a pooling layer. The feature map of the previous layer is sampled by the pooling layer (that seems to be an important layer) to create a new feature map with condensed resolution. This layer significantly reduces the spatial dimension of the input. It always accomplished two main goals. As a first step, it reduces the number of parameters or weights to minimize computational costs. The second step is to prevent the overfitting of the network. In addition, pooling techniques can significantly reduce model training time and computational costs. This paper provides a critical understanding of traditional and modern pooling techniques and highlights the strengths and weaknesses for readers. Moreover, the performance of pooling techniques on different datasets is qualitatively evaluated and reviewed. This study is expected to contribute to a comprehensive understanding of the importance of CNNs and pooling techniques in computer vision challenges.

21 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

13,037 citations

Proceedings ArticleDOI
23 Jun 2014
TL;DR: This work revisits both the alignment step and the representation step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network.
Abstract: In modern face recognition, the conventional pipeline consists of four stages: detect => align => represent => classify. We revisit both the alignment step and the representation step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network. This deep network involves more than 120 million parameters using several locally connected layers without weight sharing, rather than the standard convolutional layers. Thus we trained it on the largest facial dataset to-date, an identity labeled dataset of four million facial images belonging to more than 4, 000 identities. The learned representations coupling the accurate model-based alignment with the large facial database generalize remarkably well to faces in unconstrained environments, even with a simple classifier. Our method reaches an accuracy of 97.35% on the Labeled Faces in the Wild (LFW) dataset, reducing the error of the current state of the art by more than 27%, closely approaching human-level performance.

6,132 citations


"Regularizing deep learning architec..." refers methods in this paper

  • ...Deep learning algorithms have been utilized in encoding facial information and recognizing individuals with variations in pose, expression, and illumination as well as in video sequences [3], [8], [9], [10]....

    [...]

01 Jan 2010
TL;DR: This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations.
Abstract: We explore an original strategy for building deep networks, based on stacking layers of denoising autoencoders which are trained locally to denoise corrupted versions of their inputs. The resulting algorithm is a straightforward variation on the stacking of ordinary autoencoders. It is however shown on a benchmark of classification problems to yield significantly lower classification error, thus bridging the performance gap with deep belief networks (DBN), and in several cases surpassing it. Higher level representations learnt in this purely unsupervised fashion also help boost the performance of subsequent SVM classifiers. Qualitative experiments show that, contrary to ordinary autoencoders, denoising autoencoders are able to learn Gabor-like edge detectors from natural image patches and larger stroke detectors from digit images. This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations.

5,303 citations


"Regularizing deep learning architec..." refers methods in this paper

  • ...body-weight based regularization approach to modify the loss function of deep-learning architecture such as Deep Boltzmann Machine [6] and Sparse-Stacked Denoising Autoencoder (SDAE) [11]....

    [...]

  • ...The proposed regularization approach is applied in two deep learning architectures, Deep Boltzmann Machine (DBM) [6] and Sparse-Stacked Denoising Autoencoder (SDAE) [11]....

    [...]

  • ...Sparse denoising autoencoders are stacked to form a deep learning architecture and greedy layer-by-layer training is used to train the architecture[11]....

    [...]

Journal Article
TL;DR: Denoising autoencoders as mentioned in this paper are trained locally to denoise corrupted versions of their inputs, which is a straightforward variation on the stacking of ordinary autoencoder.
Abstract: We explore an original strategy for building deep networks, based on stacking layers of denoising autoencoders which are trained locally to denoise corrupted versions of their inputs. The resulting algorithm is a straightforward variation on the stacking of ordinary autoencoders. It is however shown on a benchmark of classification problems to yield significantly lower classification error, thus bridging the performance gap with deep belief networks (DBN), and in several cases surpassing it. Higher level representations learnt in this purely unsupervised fashion also help boost the performance of subsequent SVM classifiers. Qualitative experiments show that, contrary to ordinary autoencoders, denoising autoencoders are able to learn Gabor-like edge detectors from natural image patches and larger stroke detectors from digit images. This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations.

4,814 citations

Proceedings Article
04 Dec 2006
TL;DR: These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
Abstract: Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

4,385 citations