scispace - formally typeset
Search or ask a question

Showing papers on "Facial expression published in 2019"


Journal ArticleDOI
TL;DR: There is an urgent need for research that examines how people actually move their faces to express emotions and other social information in the variety of contexts that make up everyday life, as well as careful study of the mechanisms by which people perceive instances of emotion in one another.
Abstract: It is commonly assumed that a person’s emotional state can be readily inferred from his or her facial movements, typically called emotional expressions or facial expressions. This assumption influences legal judgments, policy decisions, national security protocols, and educational practices; guides the diagnosis and treatment of psychiatric illness, as well as the development of commercial applications; and pervades everyday social interactions as well as research in other scientific fields such as artificial intelligence, neuroscience, and computer vision. In this article, we survey examples of this widespread assumption, which we refer to as the common view, and we then examine the scientific evidence that tests this view, focusing on the six most popular emotion categories used by consumers of emotion research: anger, disgust, fear, happiness, sadness, and surprise. The available scientific evidence suggests that people do sometimes smile when happy, frown when sad, scowl when angry, and so on, as proposed by the common view, more than what would be expected by chance. Yet how people communicate anger, disgust, fear, happiness, sadness, and surprise varies substantially across cultures, situations, and even across people within a single situation. Furthermore, similar configurations of facial movements variably express instances of more than one emotion category. In fact, a given configuration of facial movements, such as a scowl, often communicates something other than an emotional state. Scientists agree that facial movements convey a range of information and are important for social communication, emotional or otherwise. But our review suggests an urgent need for research that examines how people actually move their faces to express emotions and other social information in the variety of contexts that make up everyday life, as well as careful study of the mechanisms by which people perceive instances of emotion in one another. We make specific research recommendations that will yield a more valid picture of how people move their faces to express emotions and how they infer emotional meaning from facial movements in situations of everyday life. This research is crucial to provide consumers of emotion research with the translational information they require.

772 citations


Journal ArticleDOI
TL;DR: Visualization results demonstrate that, compared with the CNN without Gate Unit, ACNNs are capable of shifting the attention from the occluded patches to other related but unobstructed ones and outperform other state-of-the-art methods on several widely used in thelab facial expression datasets under the cross-dataset evaluation protocol.
Abstract: Facial expression recognition in the wild is challenging due to various unconstrained conditions. Although existing facial expression classifiers have been almost perfect on analyzing constrained frontal faces, they fail to perform well on partially occluded faces that are common in the wild. In this paper, we propose a convolution neutral network (CNN) with attention mechanism (ACNN) that can perceive the occlusion regions of the face and focus on the most discriminative un-occluded regions. ACNN is an end-to-end learning framework. It combines the multiple representations from facial regions of interest (ROIs). Each representation is weighed via a proposed gate unit that computes an adaptive weight from the region itself according to the unobstructedness and importance. Considering different RoIs, we introduce two versions of ACNN: patch-based ACNN (pACNN) and global–local-based ACNN (gACNN). pACNN only pays attention to local facial patches. gACNN integrates local representations at patch-level with global representation at image-level. The proposed ACNNs are evaluated on both real and synthetic occlusions, including a self-collected facial expression dataset with real-world occlusions, the two largest in-the-wild facial expression datasets (RAF-DB and AffectNet) and their modifications with synthesized facial occlusions. Experimental results show that ACNNs improve the recognition accuracy on both the non-occluded faces and occluded faces. Visualization results demonstrate that, compared with the CNN without Gate Unit, ACNNs are capable of shifting the attention from the occluded patches to other related but unobstructed ones. ACNNs also outperform other state-of-the-art methods on several widely used in-the-lab facial expression datasets under the cross-dataset evaluation protocol.

536 citations


Journal ArticleDOI
TL;DR: In this paper, the authors collected, annotated, and prepared for public distribution a new database of facial emotions in the wild (called AffectNet), which contains more than 1,000,000 facial images from the Internet by querying three major search engines using 1,250 emotion related keywords in six different languages.
Abstract: Automated affective computing in the wild setting is a challenging problem in computer vision. Existing annotated databases of facial expressions in the wild are small and mostly cover discrete emotions (aka the categorical model). There are very limited annotated facial databases for affective computing in the continuous dimensional model (e.g., valence and arousal). To meet this need, we collected, annotated, and prepared for public distribution a new database of facial emotions in the wild (called AffectNet). AffectNet contains more than 1,000,000 facial images from the Internet by querying three major search engines using 1,250 emotion related keywords in six different languages. About half of the retrieved images were manually annotated for the presence of seven discrete facial expressions and the intensity of valence and arousal. AffectNet is by far the largest database of facial expression, valence, and arousal in the wild enabling research in automated facial expression recognition in two different emotion models. Two baseline deep neural networks are used to classify images in the categorical model and predict the intensity of valence and arousal. Various evaluation metrics show that our deep neural network baselines can perform better than conventional machine learning methods and off-the-shelf facial expression recognition systems.

432 citations


Journal ArticleDOI
TL;DR: A new deep locality-preserving convolutional neural network (DLP-CNN) method that aims to enhance the discriminative power of deep features by preserving the locality closeness while maximizing the inter-class scatter is proposed.
Abstract: Facial expression is central to human experience, but most previous databases and studies are limited to posed facial behavior under controlled conditions In this paper, we present a novel facial expression database, Real-world Affective Face Database (RAF-DB), which contains approximately 30 000 facial images with uncontrolled poses and illumination from thousands of individuals of diverse ages and races During the crowdsourcing annotation, each image is independently labeled by approximately 40 annotators An expectation–maximization algorithm is developed to reliably estimate the emotion labels, which reveals that real-world faces often express compound or even mixture emotions A cross-database study between RAF-DB and CK+ database further indicates that the action units of real-world emotions are much more diverse than, or even deviate from, those of laboratory-controlled emotions To address the recognition of multi-modal expressions in the wild, we propose a new deep locality-preserving convolutional neural network (DLP-CNN) method that aims to enhance the discriminative power of deep features by preserving the locality closeness while maximizing the inter-class scatter Benchmark experiments on 7-class basic expressions and 11-class compound expressions, as well as additional experiments on CK+, MMI, and SFEW 20 databases, show that the proposed DLP-CNN outperforms the state-of-the-art handcrafted features and deep learning-based methods for expression recognition in the wild To promote further study, we have made the RAF database, benchmarks, and descriptor encodings publicly available to the research community

429 citations


Journal ArticleDOI
TL;DR: This paper systematically review all components of such systems: pre-processing, feature extraction and machine coding of facial actions, and the existing FACS-coded facial expression databases are summarised.
Abstract: As one of the most comprehensive and objective ways to describe facial expressions, the Facial Action Coding System (FACS) has recently received significant attention. Over the past 30 years, extensive research has been conducted by psychologists and neuroscientists on various aspects of facial expression analysis using FACS. Automating FACS coding would make this research faster and more widely applicable, opening up new avenues to understanding how we communicate through facial expressions. Such an automated process can also potentially increase the reliability, precision and temporal resolution of coding. This paper provides a comprehensive survey of research into machine analysis of facial actions. We systematically review all components of such systems: pre-processing, feature extraction and machine coding of facial actions. In addition, the existing FACS-coded facial expression databases are summarised. Finally, challenges that have to be addressed to make automatic facial action analysis applicable in real-life situations are extensively discussed. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the future of machine recognition of facial actions: what are the challenges and opportunities that researchers in the field face.

257 citations


Journal ArticleDOI
17 Jul 2019
TL;DR: This article proposed the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues.
Abstract: Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations.

252 citations


Posted Content
TL;DR: A novel Region Attention Network (RAN), to adaptively capture the importance of facial regions for occlusion and pose variant FER, and a region biased loss to encourage high attention weights for the most important regions.
Abstract: Occlusion and pose variations, which can change facial appearance significantly, are two major obstacles for automatic Facial Expression Recognition (FER). Though automatic FER has made substantial progresses in the past few decades, occlusion-robust and pose-invariant issues of FER have received relatively less attention, especially in real-world scenarios. This paper addresses the real-world pose and occlusion robust FER problem with three-fold contributions. First, to stimulate the research of FER under real-world occlusions and variant poses, we build several in-the-wild facial expression datasets with manual annotations for the community. Second, we propose a novel Region Attention Network (RAN), to adaptively capture the importance of facial regions for occlusion and pose variant FER. The RAN aggregates and embeds varied number of region features produced by a backbone convolutional neural network into a compact fixed-length representation. Last, inspired by the fact that facial expressions are mainly defined by facial action units, we propose a region biased loss to encourage high attention weights for the most important regions. We validate our RAN and region biased loss on both our built test datasets and four popular datasets: FERPlus, AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region biased loss largely improve the performance of FER with occlusion and variant pose. Our method also achieves state-of-the-art results on FERPlus, AffectNet, RAF-DB, and SFEW. Code and the collected test data will be publicly available.

241 citations


Journal ArticleDOI
TL;DR: The aim of this work is to classify each image into one of six facial emotion classes, based on single Deep Convolutional Neural Networks (DNNs), which contain convolution layers and deep residual blocks.

231 citations


Journal ArticleDOI
TL;DR: This paper performs an extensive review of the facial landmark detection algorithms and identifies future research directions, including combining methods in different categories to leverage their respective strengths to solve landmark detection “in-the-wild”.
Abstract: The locations of the fiducial facial landmark points around facial components and facial contour capture the rigid and non-rigid facial deformations due to head movements and facial expressions. They are hence important for various facial analysis tasks. Many facial landmark detection algorithms have been developed to automatically detect those key points over the years, and in this paper, we perform an extensive review of them. We classify the facial landmark detection algorithms into three major categories: holistic methods, Constrained Local Model (CLM) methods, and the regression-based methods. They differ in the ways to utilize the facial appearance and shape information. The holistic methods explicitly build models to represent the global facial appearance and shape information. The CLMs explicitly leverage the global shape model but build the local appearance models. The regression based methods implicitly capture facial shape and appearance information. For algorithms within each category, we discuss their underlying theories as well as their differences. We also compare their performances on both controlled and in the wild benchmark datasets, under varying facial expressions, head poses, and occlusion. Based on the evaluations, we point out their respective strengths and weaknesses. There is also a separate section to review the latest deep learning based algorithms. The survey also includes a listing of the benchmark databases and existing software. Finally, we identify future research directions, including combining methods in different categories to leverage their respective strengths to solve landmark detection "in-the-wild".

212 citations


Journal ArticleDOI
TL;DR: A new scheme for FER system based on hierarchical deep learning, which combines the result of the softmax function of two features by considering the error associated with the second highest emotion (Top-2) prediction result, and a technique to generate facial images with neutral emotion using the autoencoder technique.
Abstract: With the continued development of artificial intelligence (AI) technology, research on interaction technology has become more popular. Facial expression recognition (FER) is an important type of visual information that can be used to understand a human's emotional situation. In particular, the importance of AI systems has recently increased due to advancements in research on AI systems applied to AI robots. In this paper, we propose a new scheme for FER system based on hierarchical deep learning. The feature extracted from the appearance feature-based network is fused with the geometric feature in a hierarchical structure. The appearance feature-based network extracts holistic features of the face using the preprocessed LBP image, whereas the geometric feature-based network learns the coordinate change of action units (AUs) landmark, which is a muscle that moves mainly when making facial expressions. The proposed method combines the result of the softmax function of two features by considering the error associated with the second highest emotion (Top-2) prediction result. In addition, we propose a technique to generate facial images with neutral emotion using the autoencoder technique. By this technique, we can extract the dynamic facial features between the neutral and emotional images without sequence data. We compare the proposed algorithm with the other recent algorithms for CK+ and JAFFE dataset, which are typically considered to be verified datasets in the facial expression recognition. The ten-fold cross validation results show 96.46% of accuracy in the CK+ dataset and 91.27% of accuracy in the JAFFE dataset. When comparing with other methods, the result of the proposed hierarchical deep network structure shows up to about 3% of the accuracy improvement and 1.3% of average improvement in CK+ dataset, respectively. In JAFFE datasets, up to about 7% of the accuracy is enhanced, and the average improvement is verified by about 1.5%.

186 citations


Journal ArticleDOI
TL;DR: A new spatio-temporal feature representation learning for FER that is robust to expression intensity variations is proposed that achieved higher recognition rates in both datasets compared to the state-of-the-art methods.
Abstract: Facial expression recognition (FER) is increasingly gaining importance in various emerging affective computing applications. In practice, achieving accurate FER is challenging due to the large amount of inter-personal variations such as expression intensity variations. In this paper, we propose a new spatio-temporal feature representation learning for FER that is robust to expression intensity variations. The proposed method utilizes representative expression-states (e.g., onset, apex and offset of expressions) which can be specified in facial sequences regardless of the expression intensity. The characteristics of facial expressions are encoded in two parts in this paper. As the first part, spatial image characteristics of the representative expression-state frames are learned via a convolutional neural network. Five objective terms are proposed to improve the expression class separability of the spatial feature representation. In the second part, temporal characteristics of the spatial feature representation in the first part are learned with a long short-term memory of the facial expression. Comprehensive experiments have been conducted on a deliberate expression dataset (MMI) and a spontaneous micro-expression dataset (CASME II). Experimental results showed that the proposed method achieved higher recognition rates in both datasets compared to the state-of-the-art methods.

Posted ContentDOI
TL;DR: This work proposes a progressive training approach for multi-class classification of fashion attributes, where weights learnt from an attribute are fine tuned for another attribute of the same fashion article (say, dresses).
Abstract: Extracting fashion attributes from images of people wearing clothing/fashion accessories is a very hard multi-class classification problem. Most often, even catalogues of fashion do not have all the fine-grained attributes tagged due to prohibitive cost of annotation. Using images of fashion articles, running multi-class attribute extraction with a single model for all kinds of attributes (neck design detailing, sleeves detailing, etc) requires classifiers that are robust to missing and ambiguously labelled data. In this work, we propose a progressive training approach for such multi-class classification, where weights learnt from an attribute are fine tuned for another attribute of the same fashion article (say, dresses). We branch networks for each attributes from a base network progressively during training. While it may have many labels, an image doesn't need to have all possible labels for fashion articles present in it. We also compare our approach to multi-label classification, and demonstrate improvements over overall classification accuracies using our approach.

Posted ContentDOI
TL;DR: Compared to the commercial susceptometer that was previously used as receiver, the new detector provides an increased sampling rate of 100 samples/s and flexibility in the dimensions of the propagation channel, which allows to implement both single-ended and differential signaling in SPION-bases MC testbeds.
Abstract: Superparamagnetic iron oxide nanoparticles (SPIONs) have recently been introduced as information carriers in a testbed for molecular communication (MC) in duct flow. Here, a new receiver for this testbed is presented, based on the concept of a bridge circuit. The capability for a reliable transmission using the testbed and detection of the proposed receiver was evaluated by sending a text message and a 80 bit random sequence at a bit rate of 1/s, which resulted in a bit error rate of 0 %. Furthermore, the sensitivity of the device was assessed by a dilution series, which gave a limit for the detectability of peaks between 0.1 to 0.5 mg/mL. Compared to the commercial susceptometer that was previously used as receiver, the new detector provides an increased sampling rate of 100 samples/s and flexibility in the dimensions of the propagation channel. Furthermore, it allows to implement both single-ended and differential signaling in SPION-bases MC testbeds.

DOI
05 Aug 2019
TL;DR: In this article, the authors investigate the effects of selecting features, learning, and making predictions from data that has been compressed using lossy transformations, and propose a specialised feature selection approach that considers predictive performance alongside compressibility, measured by compressing them individually or in a single concatenated stream.
Abstract: In data mining it is important for any transforms made to training data to be replicated on evaluation or deployment data. If they is not, the model may perform poorly or be unable to accept the input. Lossy data compression has other considerations, however, for example it may not be known whether or not lossy compression will be applied to deployment data, or if a variable compression ratio is to be used. Furthermore, lossy data compression typically reduces noise, which may not affect or even improve model performances, and performing feature selection on lossy data may find better features than selecting from the original data. In this paper, we investigate the effects of selecting features, learning, and making predictions from data that has been compressed using lossy transforms. Using vehicle telemetry data, we determine where in the data mining methodology lossy compression is detrimental or beneficial, and how it should be compressed. We also propose a specialised feature selection approach that considers predictive performance alongside compressibility, measured by compressing them either individually or in a single concatenated stream

Posted ContentDOI
TL;DR: An innovative analytics tool which bridges the gap between feature models as more abstract representations of variability and its concrete implementation with the means of CPP, and simplifies tracing and understanding the effect of enabling or disabling feature flags.
Abstract: The C preprocessor (CPP) is a standard tool for introducing variability into source programs and is often applied either implicitly or explicitly for implementing a Software Product Line (SPL). Despite its practical relevance, CPP has many drawbacks. Because of that it is very difficult to understand the variability implemented using CPP. To facilitate this task we provide an innovative analytics tool which bridges the gap between feature models as more abstract representations of variability and its concrete implementation with the means of CPP. It allows to interactively explore the entities of a source program with respect to the variability realized by conditional compilation. Thus, it simplifies tracing and understanding the effect of enabling or disabling feature flags.

Proceedings ArticleDOI
Xuesong Niu1, Hu Han1, Songfan Yang, Yan Huang, Shiguang Shan1 
15 Jun 2019
TL;DR: This work proposes a novel AU detection method by utilizing local information and the relationship of individual local face regions, which outperforms the state-of-the-art methods on two widely used AU detection datasets in the public domain.
Abstract: Encoding individual facial expressions via action units (AUs) coded by the Facial Action Coding System (FACS) has been found to be an effective approach in resolving the ambiguity issue among different expressions. While a number of methods have been proposed for AU detection, robust AU detection in the wild remains a challenging problem because of the diverse baseline AU intensities across individual subjects, and the weakness of appearance signal of AUs. To resolve these issues, in this work, we propose a novel AU detection method by utilizing local information and the relationship of individual local face regions. Through such a local relationship learning, we expect to utilize rich local information to improve the AU detection robustness against the potential perceptual inconsistency of individual local regions. In addition, considering the diversity in the baseline AU intensities of individual subjects, we further regularize local relationship learning via person-specific face shape information, i.e., reducing the influence of person-specific shape information, and obtaining more AU discriminative features. The proposed approach outperforms the state-of-the-art methods on two widely used AU detection datasets in the public domain (BP4D and DISFA).

Journal ArticleDOI
TL;DR: The available evidence supports the facial feedback hypothesis' central claim that facial feedback influences emotional experience, although these effects tend to be small and heterogeneous.
Abstract: The facial feedback hypothesis suggests that an individual's experience of emotion is influenced by feedback from their facial movements. To evaluate the cumulative evidence for this hypothesis, we conducted a meta-analysis on 286 effect sizes derived from 138 studies that manipulated facial feedback and collected emotion self-reports. Using random effects meta-regression with robust variance estimates, we found that the overall effect of facial feedback was significant but small. Results also indicated that feedback effects are stronger in some circumstances than others. We examined 12 potential moderators, and 3 were associated with differences in effect sizes: (a) Type of emotional outcome: Facial feedback influenced emotional experience (e.g., reported amusement) and, to a greater degree, affective judgments of a stimulus (e.g., the objective funniness of a cartoon). Three publication bias detection methods did not reveal evidence of publication bias in studies examining the effects of facial feedback on emotional experience, but all 3 methods revealed evidence of publication bias in studies examining affective judgments. (b) Presence of emotional stimuli: Facial feedback effects on emotional experience were larger in the absence of emotionally evocative stimuli (e.g., cartoons). (c) Type of stimuli: When participants were presented with emotionally evocative stimuli, facial feedback effects were larger in the presence of some types of stimuli (e.g., emotional sentences) than others (e.g., pictures). The available evidence supports the facial feedback hypothesis' central claim that facial feedback influences emotional experience, although these effects tend to be small and heterogeneous. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

Journal ArticleDOI
TL;DR: A system called DriCare is proposed, which detects the drivers’ fatigue status, such as yawning, blinking, and duration of eye closure, using video images, without equipping their bodies with devices, and can alert the driver using a fatigue warning.
Abstract: The face, an important part of the body, conveys a lot of information. When a driver is in a state of fatigue, the facial expressions, e.g., the frequency of blinking and yawning, are different from those in the normal state. In this paper, we propose a system called DriCare, which detects the drivers’ fatigue status, such as yawning, blinking, and duration of eye closure, using video images, without equipping their bodies with devices. Owing to the shortcomings of previous algorithms, we introduce a new face-tracking algorithm to improve the tracking accuracy. Further, we designed a new detection method for facial regions based on 68 key points. Then we use these facial regions to evaluate the drivers’ state. By combining the features of the eyes and mouth, DriCare can alert the driver using a fatigue warning. The experimental results showed that DriCare achieved around 92% accuracy.

Journal ArticleDOI
01 Jul 2019
TL;DR: Analysis of audiovisual information to recognize human emotions is presented and the performance of emotion recognition algorithm is compared with the validation of human decision makers.
Abstract: People express emotions through different modalities. Utilization of both verbal and nonverbal communication channels allows to create a system in which the emotional state is expressed more clearly and therefore easier to understand. Expanding the focus to several expression forms can facilitate research on emotion recognition as well as human–machine interaction. This article presents analysis of audiovisual information to recognize human emotions. A cross-corpus evaluation is done using three different databases as the training set (SAVEE, eNTERFACE’05 and RML) and AFEW (database simulating real-world conditions) as a testing set. Emotional speech is represented by commonly known audio and spectral features as well as MFCC coefficients. The SVM algorithm has been used for classification. In case of facial expression, faces in key frames are found using Viola–Jones face recognition algorithm and facial image emotion classification done by CNN (AlexNet). Multimodal emotion recognition is based on decision-level fusion. The performance of emotion recognition algorithm is compared with the validation of human decision makers.

Proceedings ArticleDOI
01 Jan 2019
TL;DR: In this article, the authors presented an algorithm to automatically infer facial expressions by analyzing only a partially occluded face while the user was engaged in a virtual reality experience, which achieved a mean accuracy of 74% (F1 of 0.73) among five 'emotive' expressions and a mean performance of 70% (f1 of 1.68) among 10 distinct facial action units, outperforming human raters.
Abstract: One of the main challenges of social interaction in virtual reality settings is that head-mounted displays occlude a large portion of the face, blocking facial expressions and thereby restricting social engagement cues among users. We present an algorithm to automatically infer expressions by analyzing only a partially occluded face while the user is engaged in a virtual reality experience. Specifically, we show that images of the user's eyes captured from an IR gaze-tracking camera within a VR headset are sufficient to infer a subset of facial expressions without the use of any fixed external camera. Using these inferences, we can generate dynamic avatars in real-time which function as an expressive surrogate for the user. We propose a novel data collection pipeline as well as a novel approach for increasing CNN accuracy via personalization. Our results show a mean accuracy of 74% (F1 of 0.73) among 5 'emotive' expressions and a mean accuracy of 70% (F1 of 0.68) among 10 distinct facial action units, outperforming human raters.

Journal ArticleDOI
TL;DR: The results show that verbal instructions can readily overwrite the intrinsic meaning of facial emotions, with clear benefits for social communication as learning and anticipation of threat and safety readjusted to accurately track environmental changes.
Abstract: Facial expressions inform about other peoples' emotion and motivation and thus are central for social communication. However, the meaning of facial expressions may change depending on what we have learned about the related consequences. For instance, a smile might easily become threatening when displayed by a person who is known to be dangerous. The present study examined the malleability of emotional facial valence by means of social learning. To this end, facial expressions served as cues for verbally instructed threat-of-shock or safety (e.g., "happy faces cue shocks"). Moreover, reversal instructions tested the flexibility of threat/safety associations (e.g., "now happy faces cue safety"). Throughout the experiment, happy, neutral, and angry facial expressions were presented and auditory startle probes elicited defensive reflex activity. Results show that self-reported ratings and physiological reactions to threat/safety cues dissociate. Regarding threat and valence ratings, happy facial expressions tended to be more resistant becoming a threat cue, and angry faces remain threatening even when instructed as safety cue. For physiological response systems, however, we observed threat-potentiated startle reflex and enhanced skin conductance responses for threat compared to safety cues regardless of whether threat was cued by happy or angry faces. Thus, the incongruity of visual and verbal threat/safety information modulates conscious perception, but not the activation of physiological response systems. These results show that verbal instructions can readily overwrite the intrinsic meaning of facial emotions, with clear benefits for social communication as learning and anticipation of threat and safety readjusted to accurately track environmental changes.

Journal ArticleDOI
17 Oct 2019-PLOS ONE
TL;DR: This study validates automated emotion and action unit (AU) coding applying FaceReader 7 to a dataset of standardized facial expressions of six basic emotions (Standardized and Motivated Facial Expressions of Emotion).
Abstract: This study validates automated emotion and action unit (AU) coding applying FaceReader 7 to a dataset of standardized facial expressions of six basic emotions (Standardized and Motivated Facial Expressions of Emotion). Percentages of correctly and falsely classified expressions are reported. The validity of coding AUs is provided by correlations between the automated analysis and manual Facial Action Coding System (FACS) scoring for 20 AUs. On average 80% of the emotional facial expressions are correctly classified. The overall validity of coding AUs is moderate with the highest validity indicators for AUs 1, 5, 9, 17 and 27. These results are compared to the performance of FaceReader 6 in previous research, with our results yielding comparable validity coefficients. Practical implications and limitations of the automated method are discussed.

Journal ArticleDOI
TL;DR: The extensive experiments on three public video-based facial expression datasets, i.e., BAUM-1s, RML, and MMI, show the effectiveness of the proposed method, outperforming the state of thearts.
Abstract: One key challenging issues of facial expression recognition (FER) in video sequences is to extract discriminative spatiotemporal video features from facial expression images in video sequences. In this paper, we propose a new method of FER in video sequences via a hybrid deep learning model. The proposed method first employs two individual deep convolutional neural networks (CNNs), including a spatial CNN processing static facial images and a temporal CN network processing optical flow images, to separately learn high-level spatial and temporal features on the divided video segments. These two CNNs are fine-tuned on target video facial expression datasets from a pre-trained CNN model. Then, the obtained segment-level spatial and temporal features are integrated into a deep fusion network built with a deep belief network (DBN) model. This deep fusion network is used to jointly learn discriminative spatiotemporal features. Finally, an average pooling is performed on the learned DBN segment-level features in a video sequence, to produce a fixed-length global video feature representation. Based on the global video feature representations, a linear support vector machine (SVM) is employed for facial expression classification tasks. The extensive experiments on three public video-based facial expression datasets, i.e., BAUM-1s, RML, and MMI, show the effectiveness of our proposed method, outperforming the state-of-the-arts.

Journal ArticleDOI
01 Mar 2019-Pain
TL;DR: Health care professionals should use a more individualized approach to determining which pain-related facial responses an individual combines and aggregates to express pain, instead of erroneously searching for a uniform expression of pain.
Abstract: Facial expressions of pain are not undefined grimaces, but they convey specific information about the internal state of the individual in pain. With this systematic review, we aim to answer the question of which facial movements are displayed most consistently during pain. We searched for studies that used the Facial Action Coding System to analyze facial activity during pain in adults, and that report on distinct facial responses (action units [AUs]). Twenty-seven studies using experimental pain and 10 clinical pain studies were included. We synthesized the data by taking into consideration (1) the criteria used to define whether an AU is pain-related; (2) types of pain; and (3) the cognitive status of the individuals. When AUs were selected as being pain-related based on a "pain > baseline" increase, a consistent subset of pain-related AUs emerged across studies: lowering the brows (AU4), cheek raise/lid tightening (AUs6_7), nose wrinkling/raising the upper lip (AUs9_10), and opening of the mouth (AUs25_26_27). This subset was found independently of the cognitive status of the individuals and was stable across clinical and experimental pain with only one variation, namely that eye closure (AU43) occurred more frequently during clinical pain. This subset of pain-related facial responses seems to encode the essential information about pain available in the face. However, given that these pain-related AUs are most often not displayed all at once, but are differently combined, health care professionals should use a more individualized approach, determining which pain-related facial responses an individual combines and aggregates to express pain, instead of erroneously searching for a uniform expression of pain.

Journal ArticleDOI
TL;DR: Why tests of a basic-six model of emotion are not tests of the diagnostic value of facial expression more generally are discussed, and an alternative conceptual and methodological approach is offered that reveals a richer taxonomy of emotion.
Abstract: What would a comprehensive atlas of human emotions include? For 50 years, scientists have sought to map emotion-related experience, expression, physiology, and recognition in terms of the "basic six"-anger, disgust, fear, happiness, sadness, and surprise Claims about the relationships between these six emotions and prototypical facial configurations have provided the basis for a long-standing debate over the diagnostic value of expression (for review and latest installment in this debate, see Barrett et al, p 1) Building on recent empirical findings and methodologies, we offer an alternative conceptual and methodological approach that reveals a richer taxonomy of emotion Dozens of distinct varieties of emotion are reliably distinguished by language, evoked in distinct circumstances, and perceived in distinct expressions of the face, body, and voice Traditional models-both the basic six and affective-circumplex model (valence and arousal)-capture a fraction of the systematic variability in emotional response In contrast, emotion-related responses (eg, the smile of embarrassment, triumphant postures, sympathetic vocalizations, blends of distinct expressions) can be explained by richer models of emotion Given these developments, we discuss why tests of a basic-six model of emotion are not tests of the diagnostic value of facial expression more generally Determining the full extent of what facial expressions can tell us, marginally and in conjunction with other behavioral and contextual cues, will require mapping the high-dimensional, continuous space of facial, bodily, and vocal signals onto richly multifaceted experiences using large-scale statistical modeling and machine-learning methods

Journal ArticleDOI
TL;DR: There is a need for future research that systematically analyses the impact of age and modality on the emergence of these valence effects, and it is found that children exhibit a clear positivity advantage for both word and face processing, indicating similar processing biases in both modalities.
Abstract: Emotional valence is predominately conveyed in social interactions by words and facial expressions. The existence of broad biases which favor more efficient processing of positive or negative emotions is still a controversial matter. While so far this question has been investigated separately for each modality, in this narrative review of the literature we focus on valence effects in processing both words and facial expressions. In order to identify the factors underlying positivity and negativity effects, and to uncover whether these effects depend on modality and age, we present and analyze three representative overviews of the literature concerning valence effects in word processing, face processing, and combinations of word and face processing. Our analysis of word processing studies points to a positivity bias or a balanced processing of positive and negative words, whereas the analysis of face processing studies showed the existence of separate positivity and negativity biases depending on the experimental paradigm. The mixed results seem to be a product of the different methods and types of stimuli being used. Interestingly, we found that children exhibit a clear positivity advantage for both word and face processing, indicating similar processing biases in both modalities. Over the course of development, the initial positivity advantage gradually disappears, and in some face processing studies even reverses into a negativity bias. We therefore conclude that there is a need for future research that systematically analyses the impact of age and modality on the emergence of these valence effects. Finally, we discuss possible explanations for the presence of the early positivity advantage and its subsequent decrease.

Journal ArticleDOI
TL;DR: According to the findings of this research, the multi-modal emotion recognition systems through information fusion as facial expressions, body gestures and user's messages provide better efficiency than the single- modal ones.

Journal ArticleDOI
Guanbin Li1, Xin Zhu1, Yirui Zeng1, Qing Wang1, Liang Lin1 
17 Jul 2019
TL;DR: This paper investigates how to integrate the semantic relationship propagation between AUs in a deep neural network framework to enhance the feature representation of facial regions, and proposes an AU semantic relationship embedded representation learning (SRERL) framework.
Abstract: Facial action unit (AU) recognition is a crucial task for facial expressions analysis and has attracted extensive attention in the field of artificial intelligence and computer vision. Existing works have either focused on designing or learning complex regional feature representations, or delved into various types of AU relationship modeling. Albeit with varying degrees of progress, it is still arduous for existing methods to handle complex situations. In this paper, we investigate how to integrate the semantic relationship propagation between AUs in a deep neural network framework to enhance the feature representation of facial regions, and propose an AU semantic relationship embedded representation learning (SRERL) framework. Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation. As the learned feature involves both the appearance characteristics and the AU relationship reasoning, the proposed model is more robust and can cope with more challenging cases, e.g., illumination change and partial occlusion. Extensive experiments on the two public benchmarks demonstrate that our method outperforms the previous work and achieves state of the art performance.

Journal ArticleDOI
TL;DR: A new deep manifold learning network is proposed, called Deep Bi-Manifold CNN, to learn the discriminative feature for multi-label expressions by jointly preserving the local affinity of deep features and the manifold structures of emotion labels.
Abstract: Comprehending different categories of facial expressions plays a great role in the design of computational model analyzing human perceived and affective state. Authoritative studies have revealed that facial expressions in human daily life are in multiple or co-occurring mental states. However, due to the lack of valid datasets, most previous studies are still restricted to basic emotions with single label. In this paper, we present a novel multi-label facial expression database, RAF-ML, along with a new deep learning algorithm, to address this problem. Specifically, a crowdsourcing annotation of 1.2 million labels from 315 participants was implemented to identify the multi-label expressions collected from social network, then EM algorithm was designed to filter out unreliable labels. For all we know, RAF-ML is the first database in the wild that provides with crowdsourced cognition for multi-label expressions. Focusing on the ambiguity and continuity of blended expressions, we propose a new deep manifold learning network, called Deep Bi-Manifold CNN, to learn the discriminative feature for multi-label expressions by jointly preserving the local affinity of deep features and the manifold structures of emotion labels. Furthermore, a deep domain adaption method is leveraged to extend the deep manifold features learned from RAF-ML to other expression databases under various imaging conditions and cultures. Extensive experiments on the RAF-ML and other diverse databases (JAFFE, CK$$+$$+, SFEW and MMI) show that the deep manifold feature is not only superior in multi-label expression recognition in the wild, but also captures the elemental and generic components that are effective for a wide range of expression recognition tasks.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This work proposes a novel FER framework, named Facial Motion Prior Networks (FMPN), which introduces an addition branch to generate a facial mask so as to focus on facial muscle moving regions.
Abstract: Deep learning based facial expression recognition (FER) has received a lot of attention in the past few years. Most of the existing deep learning based FER methods do not consider domain knowledge well, which thereby fail to extract representative features. In this work, we propose a novel FER framework, named Facial Motion Prior Networks (FMPN). Particularly, we introduce an addition branch to generate a facial mask so as to focus on facial muscle moving regions. To guide the facial mask learning, we propose to incorporate prior domain knowledge by using the average differences between neutral faces and the corresponding expressive faces as the training guidance. Extensive experiments on three facial expression benchmark datasets demonstrate the effectiveness of the proposed method, compared with the state-of-the-art approaches.