scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Facial expression recognition from near-infrared videos

01 Aug 2011-Image and Vision Computing (Butterworth-Heinemann)-Vol. 29, Iss: 9, pp 607-619
TL;DR: A novel research on a dynamic facial expression recognition, using near-infrared (NIR) video sequences and LBP-TOP feature descriptors and component-based facial features are presented to combine geometric and appearance information, providing an effective way for representing the facial expressions.
About: This article is published in Image and Vision Computing.The article was published on 2011-08-01. It has received 586 citations till now. The article focuses on the topics: Three-dimensional face recognition & Face hallucination.
Citations
More filters
Journal ArticleDOI
TL;DR: A comprehensive survey on deep facial expression recognition (FER) can be found in this article, including datasets and algorithms that provide insights into the intrinsic problems of deep FER, including overfitting caused by lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias.
Abstract: With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.

712 citations

Proceedings ArticleDOI
Heechul Jung1, Sihaeng Lee1, Junho Yim1, Sunjeong Park1, Junmo Kim1 
07 Dec 2015
TL;DR: A deep learning technique, which is regarded as a tool to automatically extract useful features from raw data, is adopted and is combined using a new integration method in order to boost the performance of the facial expression recognition.
Abstract: Temporal information has useful features for recognizing facial expressions. However, to manually design useful features requires a lot of effort. In this paper, to reduce this effort, a deep learning technique, which is regarded as a tool to automatically extract useful features from raw data, is adopted. Our deep network is based on two different models. The first deep network extracts temporal appearance features from image sequences, while the other deep network extracts temporal geometry features from temporal facial landmark points. These two models are combined using a new integration method in order to boost the performance of the facial expression recognition. Through several experiments, we show that the two models cooperate with each other. As a result, we achieve superior performance to other state-of-the-art methods in the CK+ and Oulu-CASIA databases. Furthermore, we show that our new integration method gives more accurate results than traditional methods, such as a weighted summation and a feature concatenation method.

668 citations


Cites background or methods from "Facial expression recognition from ..."

  • ...Similar to the Oulu-CASIA database, there are six kinds of emotion labels....

    [...]

  • ...We achieved best recognition rates using the integrated deep network on the CK+ and Oulu-CASIA databases....

    [...]

  • ...However, facial expression databases, such as CK+, Oulu-CASIA, and MMI, provide only hundreds of sequences....

    [...]

  • ...For further experiments, we used Oulu-CASIA, which includes 480 image sequences taken under normal illumination conditions....

    [...]

  • ...Finally, the outputs of these networks are integrated using a proposed joint fine-tuning method, which is represented in the purple box. expression recognition databases, such as CK+ [13], MMI [18], and Oulu-CASIA [23]....

    [...]

Journal ArticleDOI
TL;DR: Visualization results demonstrate that, compared with the CNN without Gate Unit, ACNNs are capable of shifting the attention from the occluded patches to other related but unobstructed ones and outperform other state-of-the-art methods on several widely used in thelab facial expression datasets under the cross-dataset evaluation protocol.
Abstract: Facial expression recognition in the wild is challenging due to various unconstrained conditions. Although existing facial expression classifiers have been almost perfect on analyzing constrained frontal faces, they fail to perform well on partially occluded faces that are common in the wild. In this paper, we propose a convolution neutral network (CNN) with attention mechanism (ACNN) that can perceive the occlusion regions of the face and focus on the most discriminative un-occluded regions. ACNN is an end-to-end learning framework. It combines the multiple representations from facial regions of interest (ROIs). Each representation is weighed via a proposed gate unit that computes an adaptive weight from the region itself according to the unobstructedness and importance. Considering different RoIs, we introduce two versions of ACNN: patch-based ACNN (pACNN) and global–local-based ACNN (gACNN). pACNN only pays attention to local facial patches. gACNN integrates local representations at patch-level with global representation at image-level. The proposed ACNNs are evaluated on both real and synthetic occlusions, including a self-collected facial expression dataset with real-world occlusions, the two largest in-the-wild facial expression datasets (RAF-DB and AffectNet) and their modifications with synthesized facial occlusions. Experimental results show that ACNNs improve the recognition accuracy on both the non-occluded faces and occluded faces. Visualization results demonstrate that, compared with the CNN without Gate Unit, ACNNs are capable of shifting the attention from the occluded patches to other related but unobstructed ones. ACNNs also outperform other state-of-the-art methods on several widely used in-the-lab facial expression datasets under the cross-dataset evaluation protocol.

536 citations


Cites background or methods from "Facial expression recognition from ..."

  • ...Oulu-CASIA dataset contains six prototypic expressions from 80 people between 23 to 58 years old....

    [...]

  • ...Although many facial expression recognition systems have been proposed and implemented, majority of them are built on images captured in controlled environment, such as CK+ [1], MMI [2], OuluCASIA [3], and other lab-collected datasets....

    [...]

  • ...In our experiments, ACNNs were trained on RAF-DB or AffectNet dataset and evaluated on CK+, MMI, Oulu-CASIA, SFEW dataset with or without synthetic occlusions....

    [...]

  • ...1) Datasets: We evaluated the methods on both in-the-wild datasets (RAF-DB [4], AffectNet [5], SFEW [35]) and inthe-lab datasets (CK+ [1], MMI [2], and Oulu-CASIA [3])....

    [...]

  • ...Similar performance improvements can be found in Table IV, where gACNN outperforms pACNN on CK+, MMI, Oulu-CASIA, SFEW datasets....

    [...]

Proceedings ArticleDOI
22 Apr 2013
TL;DR: A novel Spontaneous Micro-expression Database SMIC is presented, which includes 164 micro-expression video clips elicited from 16 participants and provides sufficient source material for comprehensive testing of automatic systems for analyzing micro-expressions, which has not been possible with any previously published database.
Abstract: Micro-expressions are short, involuntary facial expressions which reveal hidden emotions. Micro-expressions are important for understanding humans' deceitful behavior. Psychologists have been studying them since the 1960's. Currently the attention is elevated in both academic fields and in media. However, while general facial expression recognition (FER) has been intensively studied for years in computer vision, little research has been done in automatically analyzing micro-expressions. The biggest obstacle to date has been the lack of a suitable database. In this paper we present a novel Spontaneous Micro-expression Database SMIC, which includes 164 micro-expression video clips elicited from 16 participants. Micro-expression detection and recognition performance are provided as baselines. SMIC provides sufficient source material for comprehensive testing of automatic systems for analyzing micro-expressions, which has not been possible with any previously published database.

438 citations


Cites background from "Facial expression recognition from ..."

  • ...Oulu-CASIA [17] 80 adults 2880 videos 6 classes Posed...

    [...]

  • ...A NIR camera shows its advantage over a VIS camera under darker illumination conditions [17]....

    [...]

Journal ArticleDOI
30 Jan 2018-Sensors
TL;DR: A brief review of researches in the field of FER conducted over the past decades, focusing on an up-to-date hybrid deep-learning approach combining a convolutional neural network for the spatial features of an individual frame and long short-term memory for temporal features of consecutive frames.
Abstract: Facial emotion recognition (FER) is an important topic in the fields of computer vision and artificial intelligence owing to its significant academic and commercial potential. Although FER can be conducted using multiple sensors, this review focuses on studies that exclusively use facial images, because visual expressions are one of the main information channels in interpersonal communication. This paper provides a brief review of researches in the field of FER conducted over the past decades. First, conventional FER approaches are described along with a summary of the representative categories of FER systems and their main algorithms. Deep-learning-based FER approaches using deep networks enabling “end-to-end” learning are then presented. This review also focuses on an up-to-date hybrid deep-learning approach combining a convolutional neural network (CNN) for the spatial features of an individual frame and long short-term memory (LSTM) for temporal features of consecutive frames. In the later part of this paper, a brief review of publicly available evaluation metrics is given, and a comparison with benchmark results, which are a standard for a quantitative comparison of FER researches, is described. This review can serve as a brief guidebook to newcomers in the field of FER, providing basic knowledge and a general understanding of the latest state-of-the-art studies, as well as to experienced researchers looking for productive directions for future work.

437 citations


Cites methods from "Facial expression recognition from ..."

  • ...As the databa e captured from NIR camera, Oulu-CASIA NIR&VIS facial expression database [31] consists of six expressions from 80 people between 23 and 58 years old....

    [...]

  • ...Stepwise approach [31] Six prototypical emotions • Stepwise linear discriminant analysis (SWLDA) used to select the localized features from the expression Hidden conditional random fields (HCRFs) CK+ [10], JAFFE [41], B+ [42], MMI [43]...

    [...]

  • ...As the database captured from NIR camera, Oulu-CASIA NIR&VIS facial expression database [31] consists of six expressions from 80 people between 23 and 58 years old....

    [...]

  • ...[31] used near-infrared (NIR) video sequences and LBP-TOP (Local binary patterns from three orthogonal planes) feature descriptors....

    [...]

References
More filters
Book
01 Jan 1973

20,541 citations


"Facial expression recognition from ..." refers methods in this paper

  • ...Here, the Fisher separation criterion is used to learn suitable weights from the training data [3]....

    [...]

Journal ArticleDOI
TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Abstract: Presents a theoretically very simple, yet efficient, multiresolution approach to gray-scale and rotation invariant texture classification based on local binary patterns and nonparametric discrimination of sample and prototype distributions. The method is based on recognizing that certain local binary patterns, termed "uniform," are fundamental properties of local image texture and their occurrence histogram is proven to be a very powerful texture feature. We derive a generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray-scale variations since the operator is, by definition, invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity as the operator can be realized with a few operations in a small neighborhood and a lookup table. Experimental results demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation invariant local binary patterns.

14,245 citations


"Facial expression recognition from ..." refers background or methods in this paper

  • ...The LBP operator [23] describes a local texture pattern with a binary code, which is obtained by thresholding a neighborhood of pixels with the gray value of its center pixel....

    [...]

  • ...“Uniform patterns” [23] are usually used to shorten the length of the feature vector of LBP....

    [...]

Journal ArticleDOI
TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Abstract: We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by C1-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.

9,658 citations

Journal ArticleDOI
TL;DR: A common theoretical framework for combining classifiers which use distinct pattern representations is developed and it is shown that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision.
Abstract: We develop a common theoretical framework for combining classifiers which use distinct pattern representations and show that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision. An experimental comparison of various classifier combination schemes demonstrates that the combination rule developed under the most restrictive assumptions-the sum rule-outperforms other classifier combinations schemes. A sensitivity analysis of the various schemes to estimation errors is carried out to show that this finding can be justified theoretically.

5,670 citations

Journal ArticleDOI
TL;DR: A generative appearance-based method for recognizing human faces under variation in lighting and viewpoint that exploits the fact that the set of images of an object in fixed pose but under all possible illumination conditions, is a convex cone in the space of images.
Abstract: We present a generative appearance-based method for recognizing human faces under variation in lighting and viewpoint. Our method exploits the fact that the set of images of an object in fixed pose, but under all possible illumination conditions, is a convex cone in the space of images. Using a small number of training images of each face taken with different lighting directions, the shape and albedo of the face can be reconstructed. In turn, this reconstruction serves as a generative model that can be used to render (or synthesize) images of the face under novel poses and illumination conditions. The pose space is then sampled and, for each pose, the corresponding illumination cone is approximated by a low-dimensional linear subspace whose basis vectors are estimated using the generative model. Our recognition algorithm assigns to a test image the identity of the closest approximated illumination cone. Test results show that the method performs almost without error, except on the most extreme lighting directions.

5,027 citations


"Facial expression recognition from ..." refers background in this paper

  • ...The use of illumination normalizationmethods are shown to improve recognition performance when there are illumination variations in the faces, but have not led to illumination invariant face representation due to significant difficulties, especially uncontrolled illumination directions [1,8,9,31]....

    [...]

  • ...Much work has been done to model and correct illumination changes on faces in VIS images [8,9,11,31]....

    [...]