scispace - formally typeset
Search or ask a question

Showing papers on "Three-dimensional face recognition published in 2017"


Proceedings ArticleDOI
01 Jul 2017
TL;DR: Quantitative and qualitative evaluation on both controlled and in-the-wild databases demonstrate the superiority of DR-GAN over the state of the art.
Abstract: The large pose discrepancy between two face images is one of the key challenges in face recognition. Conventional approaches for pose-invariant face recognition either perform face frontalization on, or learn a pose-invariant representation from, a non-frontal face image. We argue that it is more desirable to perform both tasks jointly to allow them to leverage each other. To this end, this paper proposes Disentangled Representation learning-Generative Adversarial Network (DR-GAN) with three distinct novelties. First, the encoder-decoder structure of the generator allows DR-GAN to learn a generative and discriminative representation, in addition to image synthesis. Second, this representation is explicitly disentangled from other face variations such as pose, through the pose code provided to the decoder and pose estimation in the discriminator. Third, DR-GAN can take one or multiple images as the input, and generate one unified representation along with an arbitrary number of synthetic images. Quantitative and qualitative evaluation on both controlled and in-the-wild databases demonstrate the superiority of DR-GAN over the state of the art.

1,016 citations


Journal ArticleDOI
TL;DR: A simple solution for facial expression recognition that uses a combination of Convolutional Neural Network and specific image pre-processing steps to extract only expression specific features from a face image and explore the presentation order of the samples during training.

639 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: In this article, the authors explore three aspects of the problem in the context of finding small faces: the role of scale invariance, image resolution, and contextual reasoning, and train separate detectors for different scales.
Abstract: Though tremendous strides have been made in object recognition, one of the remaining open challenges is detecting small objects. We explore three aspects of the problem in the context of finding small faces: the role of scale invariance, image resolution, and contextual reasoning. While most recognition approaches aim to be scale-invariant, the cues for recognizing a 3px tall face are fundamentally different than those for recognizing a 300px tall face. We take a different approach and train separate detectors for different scales. To maintain efficiency, detectors are trained in a multi-task fashion: they make use of features extracted from multiple layers of single (deep) feature hierarchy. While training detectors for large objects is straightforward, the crucial challenge remains training detectors for small objects. We show that context is crucial, and define templates that make use of massively-large receptive fields (where 99% of the template extends beyond the object of interest). Finally, we explore the role of scale in pre-trained deep networks, providing ways to extrapolate networks tuned for limited scales to rather extreme ranges. We demonstrate state-of-the-art results on massively-benchmarked face datasets (FDDB and WIDER FACE). In particular, when compared to prior art on WIDER FACE, our results reduce error by a factor of 2 (our models produce an AP of 82% while prior art ranges from 29-64%).

579 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper used a CNN to regress 3DMM shape and texture parameters directly from an input photo and achieved state-of-the-art results on the LFW, YTF and IJB-A benchmarks.
Abstract: The 3D shapes of faces are well known to be discriminative. Yet despite this, they are rarely used for face recognition and always under controlled viewing conditions. We claim that this is a symptom of a serious but often overlooked problem with existing methods for single view 3D face reconstruction: when applied in the wild, their 3D estimates are either unstable and change for different photos of the same subject or they are over-regularized and generic. In response, we describe a robust method for regressing discriminative 3D morphable face models (3DMM). We use a convolutional neural network (CNN) to regress 3DMM shape and texture parameters directly from an input photo. We overcome the shortage of training data required for this purpose by offering a method for generating huge numbers of labeled examples. The 3D estimates produced by our CNN surpass state of the art accuracy on the MICC data set. Coupled with a 3D-3D face matching pipeline, we show the first competitive face recognition results on the LFW, YTF and IJB-A benchmarks using 3D face shapes as representations, rather than the opaque deep feature vectors used by other modern systems.

451 citations


Journal ArticleDOI
16 Mar 2017-Sensors
TL;DR: The experimental results show that the proposed person recognition method using the information extracted from body images is efficient for enhancing recognition accuracy compared to systems that use only visible light or thermal images of the human body.
Abstract: The human body contains identity information that can be used for the person recognition (verification/recognition) problem. In this paper, we propose a person recognition method using the information extracted from body images. Our research is novel in the following three ways compared to previous studies. First, we use the images of human body for recognizing individuals. To overcome the limitations of previous studies on body-based person recognition that use only visible light images for recognition, we use human body images captured by two different kinds of camera, including a visible light camera and a thermal camera. The use of two different kinds of body image helps us to reduce the effects of noise, background, and variation in the appearance of a human body. Second, we apply a state-of-the art method, called convolutional neural network (CNN) among various available methods, for image features extraction in order to overcome the limitations of traditional hand-designed image feature extraction methods. Finally, with the extracted image features from body images, the recognition task is performed by measuring the distance between the input and enrolled samples. The experimental results show that the proposed method is efficient for enhancing recognition accuracy compared to systems that use only visible light or thermal images of the human body.

335 citations


Book ChapterDOI
TL;DR: A face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above and allows explicit body contextual reasoning in the network inspired from the intuition of human vision system.
Abstract: Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e., unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g., heavy facial occlusions, extremely low resolutions, strong illumination, exceptional pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. First, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Second, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e., the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods.

256 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: In this article, a 3D Convolutional Neural Network (CNN) is proposed for facial expression recognition in videos, which consists of 3D Inception-ResNet layers followed by an LSTM unit that together extracts the spatial relations within facial images as well as the temporal relations between different frames in the video.
Abstract: Deep Neural Networks (DNNs) have shown to outperform traditional methods in various visual recognition tasks including Facial Expression Recognition (FER). In spite of efforts made to improve the accuracy of FER systems using DNN, existing methods still are not generalizable enough in practical applications. This paper proposes a 3D Convolutional Neural Network method for FER in videos. This new network architecture consists of 3D Inception-ResNet layers followed by an LSTM unit that together extracts the spatial relations within facial images as well as the temporal relations between different frames in the video. Facial landmark points are also used as inputs to our network which emphasize on the importance of facial components rather than the facial regions that may not contribute significantly to generating facial expressions. Our proposed method is evaluated using four publicly available databases in subject-independent and cross-database tasks and outperforms state-of-the-art methods.

220 citations


Journal ArticleDOI
TL;DR: A novel method called the Facial Dynamics Map is proposed to characterize the movements of a microexpression in different granularity, and a classifier is developed to identify the presence of microexpressions and to categorize different types.
Abstract: Unlike conventional facial expressions, microexpressions are instantaneous and involuntary reflections of human emotion. Because microexpressions are fleeting, lasting only a few frames within a video sequence, they are difficult to perceive and interpret correctly, and they are highly challenging to identify and categorize automatically. Existing recognition methods are often ineffective at handling subtle face displacements, which can be prevalent in typical microexpression applications due to the constant movements of the individuals being observed. To address this problem, a novel method called the Facial Dynamics Map is proposed to characterize the movements of a microexpression in different granularity. Specifically, an algorithm based on optical flow estimation is used to perform pixel-level alignment for microexpression sequences. Each expression sequence is then divided into spatiotemporal cuboids in the chosen granularity. We also present an iterative optimal strategy to calculate the principal optical flow direction of each cuboid for better representation of the local facial dynamics. With these principal directions, the resulting Facial Dynamics Map can characterize a microexpression sequence. Finally, a classifier is developed to identify the presence of microexpressions and to categorize different types. Experimental results on four benchmark datasets demonstrate higher recognition performance and improved interpretability.

217 citations


Journal ArticleDOI
TL;DR: It is suggested that the performance of pain assessment can be enhanced by feeding the raw frames to deep learning models, outperforming the latest state-of-the-art results while also directly facing the problem of imbalanced data.
Abstract: Pain is an unpleasant feeling that has been shown to be an important factor for the recovery of patients. Since this is costly in human resources and difficult to do objectively, there is the need for automatic systems to measure it. In this paper, contrary to current state-of-the-art techniques in pain assessment, which are based on facial features only, we suggest that the performance can be enhanced by feeding the raw frames to deep learning models, outperforming the latest state-of-the-art results while also directly facing the problem of imbalanced data. As a baseline, our approach first uses convolutional neural networks (CNNs) to learn facial features from VGG_Faces, which are then linked to a long short-term memory to exploit the temporal relation between video frames. We further compare the performances of using the so popular schema based on the canonically normalized appearance versus taking into account the whole image. As a result, we outperform current state-of-the-art area under the curve performance in the UNBC-McMaster Shoulder Pain Expression Archive Database. In addition, to evaluate the generalization properties of our proposed methodology on facial motion recognition, we also report competitive results in the Cohn Kanade+ facial expression database.

216 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: The inherent correlation between face detection and facial express-ion recognition is exploited, and the results of facial expression recognition based on MTCNN are reported.
Abstract: The Multi-task Cascaded Convolutional Networks (MTCNN) has recently demonstrated impressive results on jointly face detection and alignment. By using the hard sample ming and training a model on FER2013 datasets, we exploit the inherent correlation between face detection and facial express-ion recognition, and report the results of facial expression recognition based on MTCNN.

170 citations


Journal ArticleDOI
06 Jun 2017-Sensors
TL;DR: A finger-vein recognition method that is robust to various database types and environmental changes based on the convolutional neural network (CNN) is proposed and showed a better performance compared to the conventional methods.
Abstract: Conventional finger-vein recognition systems perform recognition based on the finger-vein lines extracted from the input images or image enhancement, and texture feature extraction from the finger-vein images. In these cases, however, the inaccurate detection of finger-vein lines lowers the recognition accuracy. In the case of texture feature extraction, the developer must experimentally decide on a form of the optimal filter for extraction considering the characteristics of the image database. To address this problem, this research proposes a finger-vein recognition method that is robust to various database types and environmental changes based on the convolutional neural network (CNN). In the experiments using the two finger-vein databases constructed in this research and the SDUMLA-HMT finger-vein database, which is an open database, the method proposed in this research showed a better performance compared to the conventional methods.

Journal ArticleDOI
TL;DR: This survey presents a state-of-the-art for 3D face recognition using local features, with the main focus being the extraction of these features.

Journal ArticleDOI
TL;DR: This paper proposes a novel scene text recognition technique that performs word level recognition without character segmentation and adapts the recurrent neural network with Long Short Term Memory, the technique that has been widely used for handwriting recognition in recent years.

Proceedings ArticleDOI
04 Apr 2017
TL;DR: The vulnerability of biometric systems to morphed face attacks is investigated by evaluating the techniques proposed to detect morphed face images and two new databases are created to study the vulnerability of state-of-the-art face recognition systems with a comprehensive evaluation.
Abstract: Morphed face images are artificially generated images, which blend the facial images of two or more different data subjects into one. The resulting morphed image resembles the constituent faces, both in visual and feature representation. If a morphed image is enroled as a probe in a biometric system, the data subjects contributing to the morphed image will be verified against the enroled probe. As a result of this infiltration, which is referred to as morphed face attack, the unambiguous assignment of data subjects is not warranted, i.e. the unique link between subject and probe is annulled. In this work, we investigate the vulnerability of biometric systems to such morphed face attacks by evaluating the techniques proposed to detect morphed face images. We create two new databases by printing and scanning digitally morphed images using two different types of scanners, a flatbed scanner and a line scanner. Further, the newly created databases are employed to study the vulnerability of state-of-the-art face recognition systems with a comprehensive evaluation.

Journal ArticleDOI
TL;DR: To better exploit the nonlinearity of face samples from different image sets, a deep SFDL (D-SFDL) method is proposed by jointly learning hierarchical non-linear transformations and class-specific dictionaries to further improve the recognition performance.
Abstract: In this paper, we propose a simultaneous feature and dictionary learning (SFDL) method for image set-based face recognition, where each training and testing example contains a set of face images, which were captured from different variations of pose, illumination, expression, resolution, and motion. While a variety of feature learning and dictionary learning methods have been proposed in recent years and some of them have been successfully applied to image set-based face recognition, most of them learn features and dictionaries for facial image sets individually, which may not be powerful enough because some discriminative information for dictionary learning may be compromised in the feature learning stage if they are applied sequentially, and vice versa. To address this, we propose a SFDL method to learn discriminative features and dictionaries simultaneously from raw face pixels so that discriminative information from facial image sets can be jointly exploited by a one-stage learning procedure. To better exploit the nonlinearity of face samples from different image sets, we propose a deep SFDL (D-SFDL) method by jointly learning hierarchical non-linear transformations and class-specific dictionaries to further improve the recognition performance. Extensive experimental results on five widely used face data sets clearly shows that our SFDL and D-SFDL achieve very competitive or even better performance with the state-of-the-arts.

Journal ArticleDOI
TL;DR: An highly efficient PIFR algorithm that effectively handles the main challenges caused by pose variation and an effective approach for occlusion detection, which enables face recognition with visible patches only is proposed.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This paper proposes an approach to extend the deep learning breakthrough for VIS face recognition to the NIR spectrum, without retraining the underlying deep models that see only VIS faces, and obtains state-of-the-art accuracy on the CASIA NIR-VIS v2.0 benchmark.
Abstract: Surveillance cameras today often capture NIR (near infrared) images in low-light environments. However, most face datasets accessible for training and verification are only collected in the VIS (visible light) spectrum. It remains a challenging problem to match NIR to VIS face images due to the different light spectrum. Recently, breakthroughs have been made for VIS face recognition by applying deep learning on a huge amount of labeled VIS face samples. The same deep learning approach cannot be simply applied to NIR face recognition for two main reasons: First, much limited NIR face images are available for training compared to the VIS spectrum. Second, face galleries to be matched are mostly available only in the VIS spectrum. In this paper, we propose an approach to extend the deep learning breakthrough for VIS face recognition to the NIR spectrum, without retraining the underlying deep models that see only VIS faces. Our approach consists of two core components, cross-spectral hallucination and low-rank embedding, to optimize respectively input and output of a VIS deep model for cross-spectral face recognition. Cross-spectral hallucination produces VIS faces from NIR images through a deep learning approach. Low-rank embedding restores a low-rank structure for faces deep features across both NIR and VIS spectrum. We observe that it is often equally effective to perform hallucination to input NIR images or low-rank embedding to output deep features for a VIS deep model for cross-spectral recognition. When hallucination and low-rank embedding are deployed together, we observe significant further improvement, we obtain state-of-the-art accuracy on the CASIA NIR-VIS v2.0 benchmark, without the need at all to re-train the recognition system.

Journal ArticleDOI
TL;DR: Five data augmentation methods dedicated to face images are proposed, including landmark perturbation and four synthesis methods (hairstyles, glasses, poses, illuminations), which effectively enlarge the training dataset, which alleviates the impacts of misalignment, pose variance, illumination changes and partial occlusions.

Journal ArticleDOI
TL;DR: This paper presents a new face descriptor, local directional ternary pattern (LDTP), for facial expression recognition that uses a two-level grid to construct the face descriptor while sampling expression-related information at different scales and shows that the approaches improve the overall accuracy of facialexpression recognition on six data sets.
Abstract: This paper presents a new face descriptor, local directional ternary pattern (LDTP), for facial expression recognition. LDTP efficiently encodes information of emotion-related features (i.e., eyes, eyebrows, upper nose, and mouth) by using the directional information and ternary pattern in order to take advantage of the robustness of edge patterns in the edge region while overcoming weaknesses of edge-based methods in smooth regions. Our proposal, unlike existing histogram-based face description methods that divide the face into several regions and sample the codes uniformly, uses a two-level grid to construct the face descriptor while sampling expression-related information at different scales. We use a coarse grid for stable codes (highly related to non-expression), and a finer one for active codes (highly related to expression). This multi-level approach enables us to do a finer grain description of facial motions while still characterizing the coarse features of the expression. Moreover, we learn the active LDTP codes from the emotion-related facial regions. We tested our method by using person-dependent and independent cross-validation schemes to evaluate the performance. We show that our approaches improve the overall accuracy of facial expression recognition on six data sets.

Proceedings ArticleDOI
01 Sep 2017
TL;DR: A new deep learning based face recognition attendance system that is composed of several essential steps developed using today's most advanced techniques: CNN cascade for face detection and CNN for generating face embeddings.
Abstract: In the interest of recent accomplishments in the development of deep convolutional neural networks (CNNs) for face detection and recognition tasks, a new deep learning based face recognition attendance system is proposed in this paper. The entire process of developing a face recognition model is described in detail. This model is composed of several essential steps developed using today's most advanced techniques: CNN cascade for face detection and CNN for generating face embeddings. The primary goal of this research was the practical employment of these state-of-the-art deep learning approaches for face recognition tasks. Due to the fact that CNNs achieve the best results for larger datasets, which is not the case in production environment, the main challenge was applying these methods on smaller datasets. A new approach for image augmentation for face recognition tasks is proposed. The overall accuracy was 95.02% on a small dataset of the original face images of employees in the real-time environment. The proposed face recognition model could be integrated in another system with or without some minor alternations as a supporting or a main component for monitoring purposes.

Proceedings ArticleDOI
Ke Shan1, Junqi Guo1, Wenwan You1, Di Lu1, Rongfang Bie1 
07 Jun 2017
TL;DR: A deep convolutional neural network is employed to devise a facial expression recognition system, which is capable to discover deeper feature representation of facial expression to achieve automatic recognition.
Abstract: Facial expression recognition, which many researchers have put much effort in, is an important portion of affective computing and artificial intelligence. However, human facial expressions change so subtly that recognition accuracy of most traditional approaches largely depend on feature extraction. Meanwhile, deep learning is a hot research topic in the field of machine learning recently, which intends to simulate the organizational structure of human brain's nerve and combine low-level features to form a more abstract level. In this paper, we employ a deep convolutional neural network (CNN) to devise a facial expression recognition system, which is capable to discover deeper feature representation of facial expression to achieve automatic recognition. The proposed system is composed of the Input Module, the Pre-processing Module, the Recognition Module and the Output Module. We introduce both the Japanese Female Facial Expression Database(JAFFE) and the Extended Cohn-Kanade Dataset(CK+) to simulate and evaluate the recognition performance under the influence of different factors (network structure, learning rate and pre-processing). We also introduce a K-nearest neighbor (KNN) algorithm compared with CNN to make the results more convincing. The accuracy performance of the proposed system reaches 76.7442% and 80.303% in the JAFFE and CK+, respectively, which demonstrates feasibility and effectiveness of our system.

Journal ArticleDOI
TL;DR: Experimental validation on the standard Adience, Images of Groups, and MORPH II benchmarks show that including attention mechanisms enhances the performance of CNNs in terms of robustness and accuracy.

Proceedings ArticleDOI
01 May 2017
TL;DR: This work outlines the evaluation protocol, the data used, and the results of a baseline method for both sub-challenges in automatic recognition of facial expressions, to be held in conjunction with the 12th IEEE conference on Face and Gesture Recognition, May 2017.
Abstract: The field of Automatic Facial Expression Analysis has grown rapidly in recent years. However, despite progress in new approaches as well as benchmarking efforts, most evaluations still focus on either posed expressions, near-frontal recordings, or both. This makes it hard to tell how existing expression recognition approaches perform under conditions where faces appear in a wide range of poses (or camera views), displaying ecologically valid expressions. The main obstacle for assessing this is the availability of suitable data, and the challenge proposed here addresses this limitation. The FG 2017 Facial Expression Recognition and Analysis challenge (FERA 2017) extends FERA 2015 to the estimation of Action Units occurrence and intensity under different camera views. In this paper we present the third challenge in automatic recognition of facial expressions, to be held in conjunction with the 12th IEEE conference on Face and Gesture Recognition, May 2017, in Washington, United States. Two sub-challenges are defined: the detection of AU occurrence, and the estimation of AU intensity. In this work we outline the evaluation protocol, the data used, and the results of a baseline method for both sub-challenges.

Journal ArticleDOI
TL;DR: The results show that the optical flow information from emotional-face and neutral-face is a useful complement to spatial feature and can effectively improve the performance of facial expression recognition from static images.

Journal ArticleDOI
TL;DR: This paper proposes a new feature descriptor called common encoding model for heterogeneous face recognition, which is able to capture common discriminant information, such that the large modality gap can be significantly reduced at the feature extraction stage.
Abstract: Heterogeneous face recognition is an important, yet challenging problem in face recognition community. It refers to matching a probe face image to a gallery of face images taken from alternate imaging modality. The major challenge of heterogeneous face recognition lies in the great discrepancies between different image modalities. Conventional face feature descriptors, e.g., local binary patterns, histogram of oriented gradients, and scale-invariant feature transform, are mostly designed in a handcrafted way and thus generally fail to extract the common discriminant information from the heterogeneous face images. In this paper, we propose a new feature descriptor called common encoding model for heterogeneous face recognition, which is able to capture common discriminant information, such that the large modality gap can be significantly reduced at the feature extraction stage. Specifically, we turn a face image into an encoded one with the encoding model learned from the training data, where the difference of the encoded heterogeneous face images of the same person can be minimized. Based on the encoded face images, we further develop a discriminant matching method to infer the hidden identity information of the cross-modality face images for enhanced recognition performance. The effectiveness of the proposed approach is demonstrated (on several public-domain face datasets) in two typical heterogeneous face recognition scenarios: matching NIR faces to VIS faces and matching sketches to photographs.

Journal ArticleDOI
TL;DR: This article proposes a new micro- expression recognition approach based on the Eulerian motion magnification technique, which could reveal the hidden information and accentuate the subtle changes in micro-expression motion.
Abstract: Facial expression recognition has been intensively studied for decades, notably by the psychology community and more recently the pattern recognition community. What is more challenging, and the subject of more recent research, is the problem of recognizing subtle emotions exhibited by so-called micro-expressions. Recognizing a micro-expression is substantially more challenging than conventional expression recognition because these micro-expressions are only temporally exhibited in a fraction of a second and involve minute spatial changes. Until now, work in this field is at a nascent stage, with only a few existing micro-expression databases and methods. In this article, we propose a new micro-expression recognition approach based on the Eulerian motion magnification technique, which could reveal the hidden information and accentuate the subtle changes in micro-expression motion. Validation of our proposal was done on the recently proposed CASME II dataset in comparison with baseline and state-of-the-art methods. We achieve a good recognition accuracy of up to 75.30 % by using leave-one-out cross validation evaluation protocol. Extensive experiments on various factors at play further demonstrate the effectiveness of our proposed approach.

Journal ArticleDOI
TL;DR: The system uses a Microsoft Kinect sensor as a wearable device, performs face detection, and uses temporal coherence along with a simple biometric procedure to generate a sound associated with the identified person, virtualized at his/her estimated 3-D location.
Abstract: In this paper, we introduce a real-time face recognition (and announcement) system targeted at aiding the blind and low-vision people. The system uses a Microsoft Kinect sensor as a wearable device, performs face detection, and uses temporal coherence along with a simple biometric procedure to generate a sound associated with the identified person, virtualized at his/her estimated 3-D location. Our approach uses a variation of the K-nearest neighbors algorithm over histogram of oriented gradient descriptors dimensionally reduced by principal component analysis. The results show that our approach, on average, outperforms traditional face recognition methods while requiring much less computational resources (memory, processing power, and battery life) when compared with existing techniques in the literature, deeming it suitable for the wearable hardware constraints. We also show the performance of the system in the dark, using depth-only information acquired with Kinect's infrared camera. The validation uses a new dataset available for download, with 600 videos of 30 people, containing variation of illumination, background, and movement patterns. Experiments with existing datasets in the literature are also considered. Finally, we conducted user experience evaluations on both blindfolded and visually impaired users, showing encouraging results.

Journal ArticleDOI
TL;DR: Experimental results on the JAFFE and Cohn-Kanade data sets show that the proposed TFP method outperforms some state-of-the-art LBP-based feature extraction methods for facial expression feature extraction and can be suited for real-time applications.
Abstract: The aim of an automatic video-based facial expression recognition system is to detect and classify human facial expressions from image sequence. An integrated automatic system often involves two components: 1) peak expression frame detection and 2) expression feature extraction. In comparison with the image-based expression recognition system, the video-based recognition system often performs online detection, which prefers low-dimensional feature representation for cost-effectiveness. Moreover, effective feature extraction is needed for classification. Many recent recognition systems often incorporate rich additional subjective information and thus become less efficient for real-time application. In our facial expression recognition system, first, we propose the double local binary pattern (DLBP) to detect the peak expression frame from the video. The proposed DLBP method has a much lower-dimensional size and can successfully reduce detection time. Besides, to handle the illumination variations in LBP, logarithm-laplace (LL) domain is further proposed to get a more robust facial feature for detection. Finally, the Taylor expansion theorem is employed in our system for the first time to extract facial expression feature. We propose the Taylor feature pattern (TFP) based on the LBP and Taylor expansion to obtain an effective facial feature from the Taylor feature map. Experimental results on the JAFFE and Cohn-Kanade data sets show that the proposed TFP method outperforms some state-of-the-art LBP-based feature extraction methods for facial expression feature extraction and can be suited for real-time applications.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: Various face detection algorithms are discussed and analyzed like Viola-Jones, SMQT features & SNOW Classifier, Neural Network-Based Face Detection and Support Vector Machine-Based face detection and all these face detection methods are compared based on the precision and recall value calculated using a DetEval Software.
Abstract: With the tremendous increase in video and image database there is a great need of automatic understanding and examination of data by the intelligent systems as manually it is becoming out of reach. Narrowing it down to one specific domain, one of the most specific objects that can be traced in the images are people i.e. faces. Face detection is becoming a challenge by its increasing use in number of applications. It is the first step for face recognition, face analysis and detection of other features of face. In this paper, various face detection algorithms are discussed and analyzed like Viola-Jones, SMQT features & SNOW Classifier, Neural Network-Based Face Detection and Support Vector Machine-Based face detection. All these face detection methods are compared based on the precision and recall value calculated using a DetEval Software which deals with precised values of the bounding boxes around the faces to give accurate results.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: The results show a significant improvement on the use of pre-trained models against randomly initialized Convolutional Neural Networks on the facial expression recognition problem, for example achieving 88.58%, 67.97%, and 72.55% average accuracy testing in the CK+, MMI, RaFD, and KDEF, respectively.
Abstract: Facial expression recognition is a very important research field to understand human emotions. Many facial expression recognition systems have been proposed in the literature over the years. Some of these methods use neural network approaches with deep architectures to address the problem. Although it seems that the facial expression recognition problem has been solved, there is a large difference between the results achieved using the same database to train and test the network and the cross-database protocol. In this paper, we extensively investigate the performance influence of fine-tuning with cross-database approach. In order to perform the study, the VGG-Face Deep Convolutional Network model (pre-trained for face recognition) was fine-tuned to recognize facial expressions considering different well-established databases in the literature: CK+, JAFFE, MMI, RaFD, KDEF, BU3DFE, and AR Face. The cross-database experiments were organized so that one of the databases was separated as test set and the others as training, and each experiment was ran multiple times to ensure the results. Our results show a significant improvement on the use of pre-trained models against randomly initialized Convolutional Neural Networks on the facial expression recognition problem, for example achieving 88.58%, 67.03%, 85.97%, and 72.55% average accuracy testing in the CK+, MMI, RaFD, and KDEF, respectively. Additionally, in absolute terms, the results show an improvement in the literature for cross-database facial expression recognition with the use of pre-trained models.