scispace - formally typeset
Open AccessJournal ArticleDOI

Deep Facial Expression Recognition: A Survey

- 01 Jul 2022 - 
- Vol. 13, Iss: 3, pp 1195-1215
TLDR
A comprehensive review of deep facial expression recognition (FER) including datasets and algorithms that provide insights into these intrinsic problems can be found in this article , where the authors introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets.
Abstract
With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose, and identity bias. In this survey, we provide a comprehensive review of deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. We then describe the standard pipeline of a deep FER system with the related background knowledge and suggestions for applicable implementations for each stage. For the state-of-the-art in deep FER, we introduce existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences and discuss their advantages and limitations. Competitive performances and experimental comparisons on widely used benchmarks are also summarized. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Emotion recognition from facial images with simultaneous occlusion, pose and illumination variations using meta-learning

TL;DR: The proposed method is named as ERMOPI (Emotion Recognition using Meta-learning across Occlusion, Pose and Illumination) which performs emotion recognition from facial expressions using meta-learning approach for still images and it is robust to partial occlusions, varying head poses and illumination levels which is the novelty of this work.
Journal ArticleDOI

Facial Expression Recognition With Visual Transformers and Attentional Selective Fusion

TL;DR: Wang et al. as discussed by the authors proposed the Visual Transformers with Feature Fusion (VTFF) to tackle FER in the wild by two main steps, namely, attentional selective fusion (ASF) and global self-attention.
Journal ArticleDOI

A Spontaneous Driver Emotion Facial Expression (DEFE) Dataset for Intelligent Vehicles: Emotions Triggered by Video-audio Clips in Driving Scenarios

TL;DR: There were significant differences in AUs (Action Units) presence of facial expressions between driving and non-driving scenarios, indicating that human emotional expressions in driving scenarios were different from other life scenarios, and publishing a human emotion dataset specifically for the driver is necessary for traffic safety improvement.
Journal ArticleDOI

The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements

TL;DR: The MuSe-CaR dataset as discussed by the authors is a large-scale multimodal dataset for sentiment and emotion research, which includes audio-visual and language modalities, and has been used as a testbed for the 1st Multimodal Sentiment Analysis Challenge.
Journal ArticleDOI

A Deep Multiscale Spatiotemporal Network for Assessing Depression from Facial Dynamics

TL;DR: Wang et al. as mentioned in this paper proposed a multiscale spatiotemporal network (MSN) to represent facial information related to depressive behaviors, which is composed of parallel convolutional layers with different temporal depths and sizes of receptive field.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Proceedings ArticleDOI

Rapid object detection using a boosted cascade of simple features

TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Proceedings ArticleDOI

Object recognition from local scale-invariant features

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Book ChapterDOI

Visualizing and Understanding Convolutional Networks

TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
Related Papers (5)