scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

On video based face recognition through adaptive sparse dictionary

04 May 2015-Vol. 1, pp 1-6
TL;DR: This paper proposes a video-based face recognition method which improves upon the sparse representation framework with an intelligent and adaptive sparse dictionary that updates the current probe image into the training matrix based on continuously monitoring the probe video through a novel confidence criterion and a Bayesian inference scheme.
Abstract: Sparse representation-based face recognition has gained considerable attention recently due to its robustness against illumination and occlusion. Recognizing faces from videos has become a topic of importance to alleviate the limit of information content in still images. However, the sparse recognition framework is not applicable to video-based face recognition due to its sensitivity towards pose and alignment changes. In this paper, we propose a video-based face recognition method which improves upon the sparse representation framework. Our key contribution is an intelligent and adaptive sparse dictionary that updates the current probe image into the training matrix based on continuously monitoring the probe video through a novel confidence criterion and a Bayesian inference scheme. Due to this novel approach, our method is robust to pose and alignment and hence can be used to recognize faces from unconstrained videos successfully. Moreover, in a moving scene, camera angle, illumination and other imaging conditions may change quickly leading to performance loss in accuracy. In such situations, it is impractical to re-enroll the individual and re-train the classifiers on a continuous basis. Our novel approach addresses these practical issues. Experimental results on the well known YouTube Face database demonstrates the effectiveness of our method.
Citations
More filters
Journal ArticleDOI
TL;DR: Experimental analysis suggests that the proposed feature-richness-based frame selection offers noticeable and consistent performance improvement compared with frontal only frames, random frames, or frame selection using perceptual no-reference image quality measures and joint feature learning in SDAE and sparse and low rank regularization in DBM helps in improving face verification performance.
Abstract: Abundance and availability of video capture devices, such as mobile phones and surveillance cameras, have instigated research in video face recognition, which is highly pertinent in law enforcement applications. While the current approaches have reported high accuracies at equal error rates, performance at lower false accept rates requires significant improvement. In this paper, we propose a novel face verification algorithm, which starts with selecting feature-rich frames from a video sequence using discrete wavelet transform and entropy computation. Frame selection is followed by representation learning-based feature extraction, where three contributions are presented: 1) deep learning architecture, which is a combination of stacked denoising sparse autoencoder (SDAE) and deep Boltzmann machine (DBM); 2) formulation for joint representation in an autoencoder; and 3) updating the loss function of DBM by including sparse and low rank regularization. Finally, a multilayer neural network is used as the classifier to obtain the verification decision. The results are demonstrated on two publicly available databases, YouTube Faces and Point and Shoot Challenge. Experimental analysis suggests that: 1) the proposed feature-richness-based frame selection offers noticeable and consistent performance improvement compared with frontal only frames, random frames, or frame selection using perceptual no-reference image quality measures and 2) joint feature learning in SDAE and sparse and low rank regularization in DBM helps in improving face verification performance. On the benchmark Point and Shoot Challenge database, the algorithm yields the verification accuracy of over 97% at 1% false accept rate whereas, on the YouTube Faces database, over 95% verification accuracy is observed at equal error rate.

57 citations


Cites background from "On video based face recognition thr..."

  • ..., 2015 [13]1 Adaptive Sparse Dictionary 82....

    [...]

  • ...…Nine-layer deep network 91.4 % (unrestricted) Wang et al., 2015 [12]1 Discriminant Analysis on Riemannian manifold of Gaussian distributions 73.01 AUC Khan et al., 2015 [13]1 Adaptive Sparse Dictionary 82.9% Li et al., 2015 [14]1 Eigen-PEP for video face recognition 84.8% Li et al., 2015 [15]1…...

    [...]

Dissertation
01 Nov 2018
TL;DR: This dissertation proposes novel feature extraction and fusion paradigms along with improvements to existing methodologies in order to address the challenge of unconstrained face recognition and presents a novel methodology to improve the robustness of such algorithms in a generalizable manner.
Abstract: Automatic face recognition in unconstrained environments is a popular and challenging research problem. With the improvements in recognition algorithms, focus has shifted from addressing various covariates individually to performing face recognition in truly unconstrained scenarios. Face databases such as the YouTube Faces and the Point-and-shoot-challenge capture a wide array of challenges such as pose, expression, illumination, resolution, and occlusion simultaneously. In general, every face recognition algorithm relies on some form of feature extraction mechanism to succinctly represent the most important characteristics of face images so that machine learning techniques can successfully distinguish face images of one individual apart from those of others. This dissertation proposes novel feature extraction and fusion paradigms along with improvements to existing methodologies in order to address the challenge of unconstrained face recognition. In addition, it also presents a novel methodology to improve the robustness of such algorithms in a generalizable manner. We begin with addressing the challenge of utilizing face data captured from consumer level RGB-D devices to improve face recognition performance without increasing the operational cost. The images captured using such devices is of poor quality compared to specialized 3D sensors. To solve this, we propose a novel feature descriptor based on the entropy of RGB-D faces along with the saliency feature obtained from a 2D face. Geometric facial attributes are also extracted from the depth image and face recognition is performed by fusing both the descriptor and attribute match scores. While score level fusion does increase the robustness of the overall framework, it cannot take into account and utilize the additional information present at the feature level. To address this challenge, we need a better feature-level fusion algorithm that can combine multiple features while preserving as much of this information before the score computation stage. To accomplish this, we propose the Group Sparse Representation based Classifier (GSRC) which removes the requirement for a separate feature-level fusion mechanism and integrates multiple features seamlessly into classification. We also propose a kernelization based extension to the GSRC that further improves its ability to separate classes that have high inter-class similarity. We next address the problem of efficiently using large amount of video data to perform face recognition. A single video contains hundreds of images, however, not all frames of a video contain useful features for face recognition and some frames might even deteriorate performance. Keeping this in mind, we propose a novel face verification algorithm which starts with selecting featurerich frames from a video sequence using discrete wavelet transform and entropy computation. Frame selection is followed by learning a joint representation from the proposed deep learning VII architecture which is a combination of stacked denoising sparse autoencoder and deep Boltzmann machine. A multilayer neural network is used as classifier to obtain the verification decision. Currently, most of the highly accurate face recognition algorithms are based on deep learning based feature extraction. These networks have been shown in literature to be vulnerable to engineered adversarial attacks. We assess that non-learning based image-level distortions can also adversely affect the performance of such algorithms. We capitalize on how some of these errors propagate through the network to devise detection and mitigation methodologies that can help improve the real-world robustness of deep network based face recognition. The proposed algorithm does not require any re-training of the existing networks and is not specific to a particular type of network. We also evaluate the generalizability and efficacy of the approach by testing it with multiple networks and distortions. We observe favorable results that are consistently better than existing methodologies in all the test cases.

3 citations


Additional excerpts

  • ..., 2015 [99](1) Adaptive Sparse Dictionary 82....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

13,037 citations

Journal ArticleDOI
TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Abstract: We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by C1-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.

9,658 citations


"On video based face recognition thr..." refers background or methods in this paper

  • ...Our method falls under the dictionary-based approach, which stemmed from the SRC algorithm [3]....

    [...]

  • ...Due to the sparse nature of the solution, the occlusion/illumination/noise variations are also sparse in nature [3]....

    [...]

  • ...In the original SRC method [3] and it’s variants [5], [13], the training matrix A is built only once and never changed throughout the process....

    [...]

  • ...To tackle the issues related to change in illumination and occlusion in face recognition from still images, the sparse representation-based face recognition method was proposed in [3]....

    [...]

  • ...original SRC method, which is linear in terms of the number of training images [3]....

    [...]

Proceedings ArticleDOI
20 Jun 2011
TL;DR: A comprehensive database of labeled videos of faces in challenging, uncontrolled conditions, the ‘YouTube Faces’ database, along with benchmark, pair-matching tests are presented and a novel set-to-set similarity measure, the Matched Background Similarity (MBGS), is described.
Abstract: Recognizing faces in unconstrained videos is a task of mounting importance. While obviously related to face recognition in still images, it has its own unique characteristics and algorithmic requirements. Over the years several methods have been suggested for this problem, and a few benchmark data sets have been assembled to facilitate its study. However, there is a sizable gap between the actual application needs and the current state of the art. In this paper we make the following contributions. (a) We present a comprehensive database of labeled videos of faces in challenging, uncontrolled conditions (i.e., ‘in the wild’), the ‘YouTube Faces’ database, along with benchmark, pair-matching tests1. (b) We employ our benchmark to survey and compare the performance of a large variety of existing video face recognition techniques. Finally, (c) we describe a novel set-to-set similarity measure, the Matched Background Similarity (MBGS). This similarity is shown to considerably improve performance on the benchmark tests.

1,423 citations


"On video based face recognition thr..." refers background or methods or result in this paper

  • ...In order to compare the results with existing methods, we follow an experimental setup that closely follows that of [2]....

    [...]

  • ...For testing, we adhered to the same criteria of same/notsame output as described in [2]....

    [...]

  • ...Moreover, video-based face recognition poses additional challenges in the form of rapid change in pose, illumination, occlusion etc [2]....

    [...]

  • ...The other methods that directly compare the results with [2] are also based on some form of similarity score measurement....

    [...]

  • ...In this section, we provide recognition results of our proposed method on the well-known YouTube Face Database....

    [...]

Journal ArticleDOI
TL;DR: This work proposes a conceptually simple face recognition system that achieves a high degree of robustness and stability to illumination variation, image misalignment, and partial occlusion, and demonstrates how to capture a set of training images with enough illumination variation that they span test images taken under uncontrolled illumination.
Abstract: Many classic and contemporary face recognition algorithms work well on public data sets, but degrade sharply when they are used in a real recognition system. This is mostly due to the difficulty of simultaneously handling variations in illumination, image misalignment, and occlusion in the test image. We consider a scenario where the training images are well controlled and test images are only loosely controlled. We propose a conceptually simple face recognition system that achieves a high degree of robustness and stability to illumination variation, image misalignment, and partial occlusion. The system uses tools from sparse representation to align a test face image to a set of frontal training images. The region of attraction of our alignment algorithm is computed empirically for public face data sets such as Multi-PIE. We demonstrate how to capture a set of training images with enough illumination variation that they span test images taken under uncontrolled illumination. In order to evaluate how our algorithms work under practical testing conditions, we have implemented a complete face recognition system, including a projector-based training acquisition system. Our system can efficiently and effectively recognize faces under a variety of realistic conditions, using only frontal images under the proposed illuminations as training.

669 citations


"On video based face recognition thr..." refers methods in this paper

  • ...Despite achieving good robustness against occlusion and change in illumination, the SRC method is very sensitive towards pose and alignment change [4], [5]....

    [...]

Proceedings ArticleDOI
23 Jun 2008
TL;DR: This work addresses the problem of tracking and recognizing faces in real-world, noisy videos using a tracker that adaptively builds a target model reflecting changes in appearance, typical of a video setting and introduces visual constraints using a combination of generative and discriminative models in a particle filtering framework.
Abstract: We address the problem of tracking and recognizing faces in real-world, noisy videos. We track faces using a tracker that adaptively builds a target model reflecting changes in appearance, typical of a video setting. However, adaptive appearance trackers often suffer from drift, a gradual adaptation of the tracker to non-targets. To alleviate this problem, our tracker introduces visual constraints using a combination of generative and discriminative models in a particle filtering framework. The generative term conforms the particles to the space of generic face poses while the discriminative one ensures rejection of poorly aligned targets. This leads to a tracker that significantly improves robustness against abrupt appearance changes and occlusions, critical for the subsequent recognition phase. Identity of the tracked subject is established by fusing pose-discriminant and person-discriminant features over the duration of a video sequence. This leads to a robust video-based face recognizer with state-of-the-art recognition performance. We test the quality of tracking and face recognition on real-world noisy videos from YouTube as well as the standard Honda/UCSD database. Our approach produces successful face tracking results on over 80% of all videos without video or person-specific parameter tuning. The good tracking performance induces similarly high recognition rates: 100% on Honda/UCSD and over 70% on the YouTube set containing 35 celebrities in 1500 sequences.

493 citations


"On video based face recognition thr..." refers methods in this paper

  • ...Most of the sequence-based approaches use Hidden Markov Models to utilize the temporal information [11], [12]....

    [...]