scispace - formally typeset
Search or ask a question

Showing papers on "Facial recognition system published in 2013"


Proceedings ArticleDOI
23 Jun 2013
TL;DR: The proposed approach outperforms state-of-the-art methods in both detection accuracy and reliability and can avoid local minimum caused by ambiguity and data corruption in difficult image samples due to occlusions, large pose variations, and extreme lightings.
Abstract: We propose a new approach for estimation of the positions of facial key points with three-level carefully designed convolutional networks. At each level, the outputs of multiple networks are fused for robust and accurate estimation. Thanks to the deep structures of convolutional networks, global high-level features are extracted over the whole face region at the initialization stage, which help to locate high accuracy key points. There are two folds of advantage for this. First, the texture context information over the entire face is utilized to locate each key point. Second, since the networks are trained to predict all the key points simultaneously, the geometric constraints among key points are implicitly encoded. The method therefore can avoid local minimum caused by ambiguity and data corruption in difficult image samples due to occlusions, large pose variations, and extreme lightings. The networks at the following two levels are trained to locally refine initial predictions and their inputs are limited to small regions around the initial predictions. Several network structures critical for accurate and robust facial point detection are investigated. Extensive experiments show that our approach outperforms state-of-the-art methods in both detection accuracy and reliability.

1,462 citations


Proceedings ArticleDOI
02 Dec 2013
TL;DR: The main goal of this challenge is to compare the performance of different methods on a new-collected dataset using the same evaluation protocol and the same mark-up and hence to develop the first standardized benchmark for facial landmark localization.
Abstract: Automatic facial point detection plays arguably the most important role in face analysis. Several methods have been proposed which reported their results on databases of both constrained and unconstrained conditions. Most of these databases provide annotations with different mark-ups and in some cases the are problems related to the accuracy of the fiducial points. The aforementioned issues as well as the lack of a evaluation protocol makes it difficult to compare performance between different systems. In this paper, we present the 300 Faces in-the-Wild Challenge: The first facial landmark localization Challenge which is held in conjunction with the International Conference on Computer Vision 2013, Sydney, Australia. The main goal of this challenge is to compare the performance of different methods on a new-collected dataset using the same evaluation protocol and the same mark-up and hence to develop the first standardized benchmark for facial landmark localization.

1,048 citations


Journal ArticleDOI
TL;DR: This paper explores the nature of open set recognition and formalizes its definition as a constrained minimization problem, and introduces a novel “1-vs-set machine,” which sculpts a decision space from the marginal distances of a 1-class or binary SVM with a linear kernel.
Abstract: To date, almost all experimental evaluations of machine learning-based recognition algorithms in computer vision have taken the form of “closed set” recognition, whereby all testing classes are known at training time. A more realistic scenario for vision applications is “open set” recognition, where incomplete knowledge of the world is present at training time, and unknown classes can be submitted to an algorithm during testing. This paper explores the nature of open set recognition and formalizes its definition as a constrained minimization problem. The open set recognition problem is not well addressed by existing algorithms because it requires strong generalization. As a step toward a solution, we introduce a novel “1-vs-set machine,” which sculpts a decision space from the marginal distances of a 1-class or binary SVM with a linear kernel. This methodology applies to several different applications in computer vision where open set recognition is a challenging problem, including object recognition and face verification. We consider both in this work, with large scale cross-dataset experiments performed over the Caltech 256 and ImageNet sets, as well as face matching experiments performed over the Labeled Faces in the Wild set. The experiments highlight the effectiveness of machines adapted for open set evaluation compared to existing 1-class and binary SVMs for the same tasks.

1,029 citations


Journal ArticleDOI
TL;DR: A novel distance metric, Finger-Earth Mover's Distance (FEMD), is proposed, which only matches the finger parts while not the whole hand, it can better distinguish the hand gestures of slight differences.
Abstract: The recently developed depth sensors, e.g., the Kinect sensor, have provided new opportunities for human-computer interaction (HCI). Although great progress has been made by leveraging the Kinect sensor, e.g., in human body tracking, face recognition and human action recognition, robust hand gesture recognition remains an open problem. Compared to the entire human body, the hand is a smaller object with more complex articulations and more easily affected by segmentation errors. It is thus a very challenging problem to recognize hand gestures. This paper focuses on building a robust part-based hand gesture recognition system using Kinect sensor. To handle the noisy hand shapes obtained from the Kinect sensor, we propose a novel distance metric, Finger-Earth Mover's Distance (FEMD), to measure the dissimilarity between hand shapes. As it only matches the finger parts while not the whole hand, it can better distinguish the hand gestures of slight differences. The extensive experiments demonstrate that our hand gesture recognition system is accurate (a 93.2% mean accuracy on a challenging 10-gesture dataset), efficient (average 0.0750 s per frame), robust to hand articulations, distortions and orientation or scale changes, and can work in uncontrolled environments (cluttered backgrounds and lighting conditions). The superiority of our system is further demonstrated in two real-life HCI applications.

693 citations


Journal ArticleDOI
TL;DR: To meet the need for publicly available corpora of well-labeled video, the Denver intensity of spontaneous facial action database is collected, ground-truthed, and prepared for distribution.
Abstract: Access to well-labeled recordings of facial expression is critical to progress in automated facial expression recognition. With few exceptions, publicly available databases are limited to posed facial behavior that can differ markedly in conformation, intensity, and timing from what occurs spontaneously. To meet the need for publicly available corpora of well-labeled video, we collected, ground-truthed, and prepared for distribution the Denver intensity of spontaneous facial action database. Twenty-seven young adults were video recorded by a stereo camera while they viewed video clips intended to elicit spontaneous emotion expression. Each video frame was manually coded for presence, absence, and intensity of facial action units according to the facial action unit coding system. Action units are the smallest visibly discriminable changes in facial action; they may occur individually and in combinations to comprise more molar facial expressions. To provide a baseline for use in future research, protocols and benchmarks for automated action unit intensity measurement are reported. Details are given for accessing the database for research in computer vision, machine learning, and affective and behavioral science.

650 citations


Journal ArticleDOI
TL;DR: The aim of this survey is to review the literature on affective body expression perception and recognition and to raise open questions on data collecting, labeling, modeling, and setting benchmarks for comparing automatic recognition systems.
Abstract: Thanks to the decreasing cost of whole-body sensing technology and its increasing reliability, there is an increasing interest in, and understanding of, the role played by body expressions as a powerful affective communication channel. The aim of this survey is to review the literature on affective body expression perception and recognition. One issue is whether there are universal aspects to affect expression perception and recognition models or if they are affected by human factors such as culture. Next, we discuss the difference between form and movement information as studies have shown that they are governed by separate pathways in the brain. We also review psychological studies that have investigated bodily configurations to evaluate if specific features can be identified that contribute to the recognition of specific affective states. The survey then turns to automatic affect recognition systems using body expressions as at least one input modality. The survey ends by raising open questions on data collecting, labeling, modeling, and setting benchmarks for comparing automatic recognition systems.

550 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: The decision function for verification is proposed to be viewed as a joint model of a distance metric and a locally adaptive thresholding rule, and the inference on the decision function is formulated as a second-order large-margin regularization problem, and an efficient algorithm is provided in its dual from.
Abstract: This paper considers the person verification problem in modern surveillance and video retrieval systems. The problem is to identify whether a pair of face or human body images is about the same person, even if the person is not seen before. Traditional methods usually look for a distance (or similarity) measure between images (e.g., by metric learning algorithms), and make decisions based on a fixed threshold. We show that this is nevertheless insufficient and sub-optimal for the verification problem. This paper proposes to learn a decision function for verification that can be viewed as a joint model of a distance metric and a locally adaptive thresholding rule. We further formulate the inference on our decision function as a second-order large-margin regularization problem, and provide an efficient algorithm in its dual from. We evaluate our algorithm on both human body verification and face verification problems. Our method outperforms not only the classical metric learning algorithm including LMNN and ITML, but also the state-of-the-art in the computer vision community.

533 citations


Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a label distribution approach for facial age estimation, which covers a certain number of class labels, representing the degree that each label describes the instance, and two algorithms, named IIS-LLD and CPNN, are proposed to learn from such label distributions.
Abstract: One of the main difficulties in facial age estimation is that the learning algorithms cannot expect sufficient and complete training data. Fortunately, the faces at close ages look quite similar since aging is a slow and smooth process. Inspired by this observation, instead of considering each face image as an instance with one label (age), this paper regards each face image as an instance associated with a label distribution. The label distribution covers a certain number of class labels, representing the degree that each label describes the instance. Through this way, one face image can contribute to not only the learning of its chronological age, but also the learning of its adjacent ages. Two algorithms, named IIS-LLD and CPNN, are proposed to learn from such label distributions. Experimental results on two aging face databases show remarkable advantages of the proposed label distribution learning algorithms over the compared single-label learning algorithms, either specially designed for age estimation or for general purpose.

519 citations


Journal ArticleDOI
TL;DR: A novel local feature descriptor, local directional number pattern (LDN), for face analysis, i.e., face and expression recognition, that encodes the directional information of the face's textures in a compact way, producing a more discriminative code than current methods.
Abstract: This paper proposes a novel local feature descriptor, local directional number pattern (LDN), for face analysis, i.e., face and expression recognition. LDN encodes the directional information of the face's textures (i.e., the texture's structure) in a compact way, producing a more discriminative code than current methods. We compute the structure of each micro-pattern with the aid of a compass mask that extracts directional information, and we encode such information using the prominent direction indices (directional numbers) and sign-which allows us to distinguish among similar structural patterns that have different intensity transitions. We divide the face into several regions, and extract the distribution of the LDN features from them. Then, we concatenate these features into a feature vector, and we use it as a face descriptor. We perform several experiments in which our descriptor performs consistently under illumination, noise, expression, and time lapse variations. Moreover, we test our descriptor with different masks to analyze its performance in different face analysis tasks.

469 citations


Proceedings ArticleDOI
22 Apr 2013
TL;DR: A novel Spontaneous Micro-expression Database SMIC is presented, which includes 164 micro-expression video clips elicited from 16 participants and provides sufficient source material for comprehensive testing of automatic systems for analyzing micro-expressions, which has not been possible with any previously published database.
Abstract: Micro-expressions are short, involuntary facial expressions which reveal hidden emotions. Micro-expressions are important for understanding humans' deceitful behavior. Psychologists have been studying them since the 1960's. Currently the attention is elevated in both academic fields and in media. However, while general facial expression recognition (FER) has been intensively studied for years in computer vision, little research has been done in automatically analyzing micro-expressions. The biggest obstacle to date has been the lack of a suitable database. In this paper we present a novel Spontaneous Micro-expression Database SMIC, which includes 164 micro-expression video clips elicited from 16 participants. Micro-expression detection and recognition performance are provided as baselines. SMIC provides sufficient source material for comprehensive testing of automatic systems for analyzing micro-expressions, which has not been possible with any previously published database.

438 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This is the first attempt to create a tool suitable for annotating massive facial databases, and the tool for creating annotations for MultiPIE, XM2VTS, AR, and FRGC Ver. 2 databases is employed.
Abstract: Developing powerful deformable face models requires massive, annotated face databases on which techniques can be trained, validated and tested. Manual annotation of each facial image in terms of landmarks requires a trained expert and the workload is usually enormous. Fatigue is one of the reasons that in some cases annotations are inaccurate. This is why, the majority of existing facial databases provide annotations for a relatively small subset of the training images. Furthermore, there is hardly any correspondence between the annotated land-marks across different databases. These problems make cross-database experiments almost infeasible. To overcome these difficulties, we propose a semi-automatic annotation methodology for annotating massive face datasets. This is the first attempt to create a tool suitable for annotating massive facial databases. We employed our tool for creating annotations for MultiPIE, XM2VTS, AR, and FRGC Ver. 2 databases. The annotations will be made publicly available from http://ibug.doc.ic.ac.uk/ resources/facial-point-annotations/. Finally, we present experiments which verify the accuracy of produced annotations.

Patent
28 Feb 2013
TL;DR: In this paper, the authors introduce Z-webs, including Z-factors and Z-nodes, for the understanding of relationships between objects, subjects, abstract ideas, concepts, or the like, including face, car, images, people, emotions, mood, text, natural language, voice, music, video, locations, formulas, facts, historical data, landmarks, personalities, ownership, family, friends, love, happiness, social behavior, voting behavior, and the like.
Abstract: Here, we introduce Z-webs, including Z-factors and Z-nodes, for the understanding of relationships between objects, subjects, abstract ideas, concepts, or the like, including face, car, images, people, emotions, mood, text, natural language, voice, music, video, locations, formulas, facts, historical data, landmarks, personalities, ownership, family, friends, love, happiness, social behavior, voting behavior, and the like, to be used for many applications in our life, including on the search engine, analytics, Big Data processing, natural language processing, economy forecasting, face recognition, dealing with reliability and certainty, medical diagnosis, pattern recognition, object recognition, biometrics, security analysis, risk analysis, fraud detection, satellite image analysis, machine generated data analysis, machine learning, training samples, extracting data or patterns (from the video, images, and the like), editing video or images, and the like. Z-factors include reliability factor, confidence factor, expertise factor, bias factor, and the like, which is associated with each Z-node in the Z-web.

Proceedings ArticleDOI
02 Dec 2013
TL;DR: The Constrained Local Neural Field model for facial landmark detection is presented, which introduces a probabilistic patch expert (landmark detector) that can learn non-linear and spatial relationships between the input pixels and the probability of a landmark being aligned.
Abstract: Facial feature detection algorithms have seen great progress over the recent years. However, they still struggle in poor lighting conditions and in the presence of extreme pose or occlusions. We present the Constrained Local Neural Field model for facial landmark detection. Our model includes two main novelties. First, we introduce a probabilistic patch expert (landmark detector) that can learn non-linear and spatial relationships between the input pixels and the probability of a landmark being aligned. Secondly, our model is optimised using a novel Non-uniform Regularised Landmark Mean-Shift optimisation technique, which takes into account the reliabilities of each patch expert. We demonstrate the benefit of our approach on a number of publicly available datasets over other state-of-the-art approaches when performing landmark detection in unseen lighting conditions and in the wild.

Proceedings ArticleDOI
02 Dec 2013
TL;DR: A four-level convolutional network cascade that is trained to locally refine a subset of facial landmarks generated by previous network levels, which enables the system to locate extensive facial landmarks accurately in the 300-W facial landmark localization challenge.
Abstract: We present a new approach to localize extensive facial landmarks with a coarse-to-fine convolutional network cascade. Deep convolutional neural networks (DCNN) have been successfully utilized in facial landmark localization for two-fold advantages: 1) geometric constraints among facial points are implicitly utilized, 2) huge amount of training data can be leveraged. However, in the task of extensive facial landmark localization, a large number of facial landmarks (more than 50 points) are required to be located in a unified system, which poses great difficulty in the structure design and training process of traditional convolutional networks. In this paper, we design a four-level convolutional network cascade, which tackles the problem in a coarse-to-fine manner. In our system, each network level is trained to locally refine a subset of facial landmarks generated by previous network levels. In addition, each level predicts explicit geometric constraints (the position and rotation angles of a specific facial component) to rectify the inputs of the current network level. The combination of coarse-to-fine cascade and geometric refinement enables our system to locate extensive facial landmarks (68 points) accurately in the 300-W facial landmark localization challenge.

Journal ArticleDOI
TL;DR: A generic HFR framework is proposed in which both probe and gallery images are represented in terms of nonlinear similarities to a collection of prototype face images, and Random sampling is introduced into the H FR framework to better handle challenges arising from the small sample size problem.
Abstract: Heterogeneous face recognition (HFR) involves matching two face images from alternate imaging modalities, such as an infrared image to a photograph or a sketch to a photograph. Accurate HFR systems are of great value in various applications (e.g., forensics and surveillance), where the gallery databases are populated with photographs (e.g., mug shot or passport photographs) but the probe images are often limited to some alternate modality. A generic HFR framework is proposed in which both probe and gallery images are represented in terms of nonlinear similarities to a collection of prototype face images. The prototype subjects (i.e., the training set) have an image in each modality (probe and gallery), and the similarity of an image is measured against the prototype images from the corresponding modality. The accuracy of this nonlinear prototype representation is improved by projecting the features into a linear discriminant subspace. Random sampling is introduced into the HFR framework to better handle challenges arising from the small sample size problem. The merits of the proposed approach, called prototype random subspace (P-RS), are demonstrated on four different heterogeneous scenarios: 1) near infrared (NIR) to photograph, 2) thermal to photograph, 3) viewed sketch to photograph, and 4) forensic sketch to photograph.

Journal ArticleDOI
TL;DR: This paper proposes a novel discriminative multimanifold analysis (DMMA) method by learning discrim inative features from image patches by partitioning each enrolled face image into several nonoverlapping patches to form an image set for each sample per person.
Abstract: Conventional appearance-based face recognition methods usually assume that there are multiple samples per person (MSPP) available for discriminative feature extraction during the training phase. In many practical face recognition applications such as law enhancement, e-passport, and ID card identification, this assumption, however, may not hold as there is only a single sample per person (SSPP) enrolled or recorded in these systems. Many popular face recognition methods fail to work well in this scenario because there are not enough samples for discriminant learning. To address this problem, we propose in this paper a novel discriminative multimanifold analysis (DMMA) method by learning discriminative features from image patches. First, we partition each enrolled face image into several nonoverlapping patches to form an image set for each sample per person. Then, we formulate the SSPP face recognition as a manifold-manifold matching problem and learn multiple DMMA feature spaces to maximize the manifold margins of different persons. Finally, we present a reconstruction-based manifold-manifold distance to identify the unlabeled subjects. Experimental results on three widely used face databases are presented to demonstrate the efficacy of the proposed approach.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This paper proposes a new learning based face representation: the face identity-preserving (FIP) features, a deep network that combines the feature extraction layers and the reconstruction layer that significantly outperforms the state-of-the-art face recognition methods.
Abstract: Face recognition with large pose and illumination variations is a challenging problem in computer vision. This paper addresses this challenge by proposing a new learning based face representation: the face identity-preserving (FIP) features. Unlike conventional face descriptors, the FIP features can significantly reduce intra-identity variances, while maintaining discriminative ness between identities. Moreover, the FIP features extracted from an image under any pose and illumination can be used to reconstruct its face image in the canonical view. This property makes it possible to improve the performance of traditional descriptors, such as LBP [2] and Gabor [31], which can be extracted from our reconstructed images in the canonical view to eliminate variations. In order to learn the FIP features, we carefully design a deep network that combines the feature extraction layers and the reconstruction layer. The former encodes a face image into the FIP features, while the latter transforms them to an image in the canonical view. Extensive experiments on the large MultiPIE face database [7] demonstrate that it significantly outperforms the state-of-the-art face recognition methods.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: A transductive learning method is introduced, which is referred to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases and achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject.
Abstract: Automatic facial action unit (AFA) detection from video is a long-standing problem in facial expression analysis. Most approaches emphasize choices of features and classifiers. They neglect individual differences in target persons. People vary markedly in facial morphology (e.g., heavy versus delicate brows, smooth versus deeply etched wrinkles) and behavior. Individual differences can dramatically influence how well generic classifiers generalize to previously unseen persons. While a possible solution would be to train person-specific classifiers, that often is neither feasible nor theoretically compelling. The alternative that we propose is to personalize a generic classifier in an unsupervised manner (no additional labels for the test subjects are required). We introduce a transductive learning method, which we refer to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases. STM achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject. To evaluate the effectiveness of STM, we compared STM to generic classifiers and to cross-domain learning methods in three major databases: CK+, GEMEP-FERA and RU-FACS. STM outperformed generic classifiers in all.

Journal ArticleDOI
TL;DR: In this paper, the authors focus on the numerical implementation of a sparsity-based classification framework in robust face recognition, where sparse representation is sought to recover human identities from high-dimensional facial images that may be corrupted by illumination, facial disguise, and pose variation.
Abstract: l 1-minimization refers to finding the minimum l1-norm solution to an underdetermined linear system \mbib=A\mbix. Under certain conditions as described in compressive sensing theory, the minimum l1-norm solution is also the sparsest solution. In this paper, we study the speed and scalability of its algorithms. In particular, we focus on the numerical implementation of a sparsity-based classification framework in robust face recognition, where sparse representation is sought to recover human identities from high-dimensional facial images that may be corrupted by illumination, facial disguise, and pose variation. Although the underlying numerical problem is a linear program, traditional algorithms are known to suffer poor scalability for large-scale applications. We investigate a new solution based on a classical convex optimization framework, known as augmented Lagrangian methods. We conduct extensive experiments to validate and compare its performance against several popular l1-minimization solvers, including interior-point method, Homotopy, FISTA, SESOP-PCD, approximate message passing, and TFOCS. To aid peer evaluation, the code for all the algorithms has been made publicly available.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: The composition of the database, evaluation protocols and baseline performance of PCA on the database are described and two interesting tricks, the facial symmetry and heterogeneous component analysis (HCA) are introduced to improve the performance.
Abstract: In recent years, heterogeneous face biometrics has attracted more attentions in the face recognition community. After published in 2009, the HFB database has been applied by tens of research groups and widely used for Near infrared vs. Visible light (NIR-VIS) face recognition. Despite its success the HFB database has two disadvantages: a limited number of subjects, lacking specific evaluation protocols. To address these issues we collected the NIR-VIS 2.0 database. It contains 725 subjects, imaged by VIS and NIR cameras in four recording sessions. Because the 3D modality in the HFB database was less used in the literature, we don't consider it in the current version. In this paper, we describe the composition of the database, evaluation protocols and present the baseline performance of PCA on the database. Moreover, two interesting tricks, the facial symmetry and heterogeneous component analysis (HCA) are also introduced to improve the performance.

Proceedings ArticleDOI
22 Apr 2013
TL;DR: A newly developed 3D video database of spontaneous facial expressions in a diverse group of young adults is presented to promote the exploration of 3D spatiotemporal features in subtle facial expression, better understanding of the relation between pose and motion dynamics in facial action units, and deeper understanding of naturally occurring facial action.
Abstract: Facial expression is central to human experience. Its efficient and valid measurement is a challenge that automated facial image analysis seeks to address. Most publically available databases are limited to 2D static images or video of posed facial behavior. Because posed and un-posed (aka “spontaneous”) facial expressions differ along several dimensions including complexity and timing, well-annotated video of un-posed facial behavior is needed. Moreover, because the face is a three-dimensional deformable object, 2D video may be insufficient, and therefore 3D video archives are needed. We present a newly developed 3D video database of spontaneous facial expressions in a diverse group of young adults. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains using both person-specific and generic approaches. The work promotes the exploration of 3D spatiotemporal features in subtle facial expression, better understanding of the relation between pose and motion dynamics in facial action units, and deeper understanding of naturally occurring facial action.

Journal ArticleDOI
TL;DR: A novel geometric framework for analyzing 3D faces, with the specific goals of comparing, matching, and averaging their shapes, which allows for formal statistical inferences, such as the estimation of missing facial parts using PCA on tangent spaces and computing average shapes.
Abstract: We propose a novel geometric framework for analyzing 3D faces, with the specific goals of comparing, matching, and averaging their shapes. Here we represent facial surfaces by radial curves emanating from the nose tips and use elastic shape analysis of these curves to develop a Riemannian framework for analyzing shapes of full facial surfaces. This representation, along with the elastic Riemannian metric, seems natural for measuring facial deformations and is robust to challenges such as large facial expressions (especially those with open mouths), large pose variations, missing parts, and partial occlusions due to glasses, hair, and so on. This framework is shown to be promising from both-empirical and theoretical-perspectives. In terms of the empirical evaluation, our results match or improve upon the state-of-the-art methods on three prominent databases: FRGCv2, GavabDB, and Bosphorus, each posing a different type of challenge. From a theoretical perspective, this framework allows for formal statistical inferences, such as the estimation of missing facial parts using PCA on tangent spaces and computing average shapes.

Journal ArticleDOI
TL;DR: It is argued that a probe face image, holistic or partial, can be sparsely represented by a large dictionary of gallery descriptors by adopting a variable-size description which represents each face with a set of keypoint descriptors.
Abstract: Numerous methods have been developed for holistic face recognition with impressive performance. However, few studies have tackled how to recognize an arbitrary patch of a face image. Partial faces frequently appear in unconstrained scenarios, with images captured by surveillance cameras or handheld devices (e.g., mobile phones) in particular. In this paper, we propose a general partial face recognition approach that does not require face alignment by eye coordinates or any other fiducial points. We develop an alignment-free face representation method based on Multi-Keypoint Descriptors (MKD), where the descriptor size of a face is determined by the actual content of the image. In this way, any probe face image, holistic or partial, can be sparsely represented by a large dictionary of gallery descriptors. A new keypoint descriptor called Gabor Ternary Pattern (GTP) is also developed for robust and discriminative face recognition. Experimental results are reported on four public domain face databases (FRGCv2.0, AR, LFW, and PubFig) under both the open-set identification and verification scenarios. Comparisons with two leading commercial face recognition SDKs (PittPatt and FaceVACS) and two baseline algorithms (PCA+LDA and LBP) show that the proposed method, overall, is superior in recognizing both holistic and partial faces without requiring alignment.

Proceedings ArticleDOI
01 Sep 2013
TL;DR: The 3D Mask Attack Database (3DMAD), the first publicly available 3D spoofing database, recorded with a low-cost depth camera is introduced and it is shown that easily attainable facial masks can pose a serious threat to 2D face recognition systems and LBP is a powerful weapon to eliminate it.
Abstract: The problem of detecting face spoofing attacks (presentation attacks) has recently gained a well-deserved popularity. Mainly focusing on 2D attacks forged by displaying printed photos or replaying recorded videos on mobile devices, a significant portion of these studies ground their arguments on the flatness of the spoofing material in front of the sensor. In this paper, we inspect the spoofing potential of subject-specific 3D facial masks for 2D face recognition. Additionally, we analyze Local Binary Patterns based coun-termeasures using both color and depth data, obtained by Kinect. For this purpose, we introduce the 3D Mask Attack Database (3DMAD), the first publicly available 3D spoofing database, recorded with a low-cost depth camera. Extensive experiments on 3DMAD show that easily attainable facial masks can pose a serious threat to 2D face recognition systems and LBP is a powerful weapon to eliminate it.

Proceedings ArticleDOI
Wen-Jing Yan, Qi Wu1, Yong-Jin Liu, Su-Jing Wang, Xiaolan Fu 
22 Apr 2013
TL;DR: This study attempted to create a database of spontaneous micro-expressions which were elicited from neutralized faces and analyzed the video data with care to offer valid and reliable codings.
Abstract: Micro-expressions are facial expressions which are fleeting and reveal genuine emotions that people try to conceal. These are important clues for detecting lies and dangerous behaviors and therefore have potential applications in various fields such as the clinical field and national security. However, recognition through the naked eye is very difficult. Therefore, researchers in the field of computer vision have tried to develop micro-expression detection and recognition algorithms but lack spontaneous micro-expression databases. In this study, we attempted to create a database of spontaneous micro-expressions which were elicited from neutralized faces. Based on previous psychological studies, we designed an effective procedure in lab situations to elicit spontaneous micro-expressions and analyzed the video data with care to offer valid and reliable codings. From 1500 elicited facial movements filmed under 60fps, 195 micro-expressions were selected. These samples were coded so that the first, peak and last frames were tagged. Action units (AUs) were marked to give an objective and accurate description of the facial movements. Emotions were labeled based on psychological studies and participants' self-report to enhance the validity.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This work proposes a hybrid convolutional network-Restricted Boltzmann Machine model for face verification in wild conditions to directly learn relational visual features, which indicate identity similarities, from raw pixels of face pairs with a hybrid deep network.
Abstract: This paper proposes a hybrid convolutional network (ConvNet)-Restricted Boltzmann Machine (RBM) model for face verification in wild conditions. A key contribution of this work is to directly learn relational visual features, which indicate identity similarities, from raw pixels of face pairs with a hybrid deep network. The deep ConvNets in our model mimic the primary visual cortex to jointly extract local relational visual features from two face images compared with the learned filter pairs. These relational features are further processed through multiple layers to extract high-level and global features. Multiple groups of ConvNets are constructed in order to achieve robustness and characterize face similarities from different aspects. The top-layer RBM performs inference from complementary high-level features extracted from different ConvNet groups with a two-level average pooling hierarchy. The entire hybrid deep network is jointly fine-tuned to optimize for the task of face verification. Our model achieves competitive face verification performance on the LFW dataset.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A two-stage cascaded deformable shape model to effectively and efficiently localize facial landmarks with large head pose variations and a group sparse learning method to automatically select the most salient facial landmarks is proposed.
Abstract: This paper addresses the problem of facial landmark localization and tracking from a single camera. We present a two-stage cascaded deformable shape model to effectively and efficiently localize facial landmarks with large head pose variations. For face detection, we propose a group sparse learning method to automatically select the most salient facial landmarks. By introducing 3D face shape model, we use procrustes analysis to achieve pose-free facial landmark initialization. For deformation, the first step uses mean-shift local search with constrained local model to rapidly approach the global optimum. The second step uses component-wise active contours to discriminatively refine the subtle shape variation. Our framework can simultaneously handle face detection, pose-free landmark localization and tracking in real time. Extensive experiments are conducted on both laboratory environmental face databases and face-in-the-wild databases. All results demonstrate that our approach has certain advantages over state-of-the-art methods in handling pose variations.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a new face coding model, namely regularized robust coding (RRC), which could robustly regress a given signal with regularized regression coefficients by assuming that the coding residual and the coding coefficient are respectively independent and identically distributed.
Abstract: Recently the sparse representation based classification (SRC) has been proposed for robust face recognition (FR). In SRC, the testing image is coded as a sparse linear combination of the training samples, and the representation fidelity is measured by the l2-norm or l1 -norm of the coding residual. Such a sparse coding model assumes that the coding residual follows Gaussian or Laplacian distribution, which may not be effective enough to describe the coding residual in practical FR systems. Meanwhile, the sparsity constraint on the coding coefficients makes the computational cost of SRC very high. In this paper, we propose a new face coding model, namely regularized robust coding (RRC), which could robustly regress a given signal with regularized regression coefficients. By assuming that the coding residual and the coding coefficient are respectively independent and identically distributed, the RRC seeks for a maximum a posterior solution of the coding problem. An iteratively reweighted regularized robust coding (IR3C) algorithm is proposed to solve the RRC model efficiently. Extensive experiments on representative face databases demonstrate that the RRC is much more effective and efficient than state-of-the-art sparse representation based methods in dealing with face occlusion, corruption, lighting, and expression changes, etc.

Proceedings ArticleDOI
04 Jun 2013
TL;DR: A component-based face coding approach for liveness detection that makes good use of micro differences between genuine faces and fake faces is proposed and can achieve the best liveness Detection performance in three databases.
Abstract: Spoofing attacks mainly include printing artifacts, electronic screens and ultra-realistic face masks or models. In this paper, we propose a component-based face coding approach for liveness detection. The proposed method consists of four steps: (1) locating the components of face; (2) coding the low-level features respectively for all the components; (3) deriving the high-level face representation by pooling the codes with weights derived from Fisher criterion; (4) concatenating the histograms from all components into a classifier for identification. The proposed framework makes good use of micro differences between genuine faces and fake faces. Meanwhile, the inherent appearance differences among different components are retained. Extensive experiments on three published standard databases demonstrate that the method can achieve the best liveness detection performance in three databases.

Proceedings ArticleDOI
04 Jun 2013
TL;DR: This work proposes a hierarchical approach for automatic age estimation, and provides an analysis of how aging influences individual facial components, and experimental results show that eyes and nose are more informative than the other facial components inautomatic age estimation.
Abstract: There has been a growing interest in automatic age estimation from facial images due to a variety of potential applications in law enforcement, security control, and human-computer interaction. However, despite advances in automatic age estimation, it remains a challenging problem. This is because the face aging process is determined not only by intrinsic factors, e.g. genetic factors, but also by extrinsic factors, e.g. lifestyle, expression, and environment. As a result, different people with the same age can have quite different appearances due to different rates of facial aging. We propose a hierarchical approach for automatic age estimation, and provide an analysis of how aging influences individual facial components. Experimental results on the FG-NET, MORPH Album2, and PCSO databases show that eyes and nose are more informative than the other facial components in automatic age estimation. We also study the ability of humans to estimate age using data collected via crowdsourcing, and show that the cumulative score (CS) within 5-year mean absolute error (MAE) of our method is better than the age estimates provided by humans.