scispace - formally typeset
Search or ask a question

Showing papers on "Facial Action Coding System published in 2019"


DatasetDOI
14 Jan 2019

3,663 citations


Journal ArticleDOI
TL;DR: This paper systematically review all components of such systems: pre-processing, feature extraction and machine coding of facial actions, and the existing FACS-coded facial expression databases are summarised.
Abstract: As one of the most comprehensive and objective ways to describe facial expressions, the Facial Action Coding System (FACS) has recently received significant attention. Over the past 30 years, extensive research has been conducted by psychologists and neuroscientists on various aspects of facial expression analysis using FACS. Automating FACS coding would make this research faster and more widely applicable, opening up new avenues to understanding how we communicate through facial expressions. Such an automated process can also potentially increase the reliability, precision and temporal resolution of coding. This paper provides a comprehensive survey of research into machine analysis of facial actions. We systematically review all components of such systems: pre-processing, feature extraction and machine coding of facial actions. In addition, the existing FACS-coded facial expression databases are summarised. Finally, challenges that have to be addressed to make automatic facial action analysis applicable in real-life situations are extensively discussed. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the future of machine recognition of facial actions: what are the challenges and opportunities that researchers in the field face.

257 citations


Posted Content
TL;DR: Diversity in Faces (DiF) provides a data set of one million annotated human face images for advancing the study of facial diversity, and believes that by making the extracted coding schemes available on a large set of faces, can accelerate research and development towards creating more fair and accurate facial recognition systems.
Abstract: Face recognition is a long standing challenge in the field of Artificial Intelligence (AI). The goal is to create systems that accurately detect, recognize, verify, and understand human faces. There are significant technical hurdles in making these systems accurate, particularly in unconstrained settings due to confounding factors related to pose, resolution, illumination, occlusion, and viewpoint. However, with recent advances in neural networks, face recognition has achieved unprecedented accuracy, largely built on data-driven deep learning methods. While this is encouraging, a critical aspect that is limiting facial recognition accuracy and fairness is inherent facial diversity. Every face is different. Every face reflects something unique about us. Aspects of our heritage - including race, ethnicity, culture, geography - and our individual identify - age, gender, and other visible manifestations of self-expression, are reflected in our faces. We expect face recognition to work equally accurately for every face. Face recognition needs to be fair. As we rely on data-driven methods to create face recognition technology, we need to ensure necessary balance and coverage in training data. However, there are still scientific questions about how to represent and extract pertinent facial features and quantitatively measure facial diversity. Towards this goal, Diversity in Faces (DiF) provides a data set of one million annotated human face images for advancing the study of facial diversity. The annotations are generated using ten well-established facial coding schemes from the scientific literature. The facial coding schemes provide human-interpretable quantitative measures of facial features. We believe that by making the extracted coding schemes available on a large set of faces, we can accelerate research and development towards creating more fair and accurate facial recognition systems.

141 citations


Proceedings ArticleDOI
Xuesong Niu1, Hu Han1, Songfan Yang, Yan Huang, Shiguang Shan1 
15 Jun 2019
TL;DR: This work proposes a novel AU detection method by utilizing local information and the relationship of individual local face regions, which outperforms the state-of-the-art methods on two widely used AU detection datasets in the public domain.
Abstract: Encoding individual facial expressions via action units (AUs) coded by the Facial Action Coding System (FACS) has been found to be an effective approach in resolving the ambiguity issue among different expressions. While a number of methods have been proposed for AU detection, robust AU detection in the wild remains a challenging problem because of the diverse baseline AU intensities across individual subjects, and the weakness of appearance signal of AUs. To resolve these issues, in this work, we propose a novel AU detection method by utilizing local information and the relationship of individual local face regions. Through such a local relationship learning, we expect to utilize rich local information to improve the AU detection robustness against the potential perceptual inconsistency of individual local regions. In addition, considering the diversity in the baseline AU intensities of individual subjects, we further regularize local relationship learning via person-specific face shape information, i.e., reducing the influence of person-specific shape information, and obtaining more AU discriminative features. The proposed approach outperforms the state-of-the-art methods on two widely used AU detection datasets in the public domain (BP4D and DISFA).

103 citations


Journal ArticleDOI
17 Oct 2019-PLOS ONE
TL;DR: This study validates automated emotion and action unit (AU) coding applying FaceReader 7 to a dataset of standardized facial expressions of six basic emotions (Standardized and Motivated Facial Expressions of Emotion).
Abstract: This study validates automated emotion and action unit (AU) coding applying FaceReader 7 to a dataset of standardized facial expressions of six basic emotions (Standardized and Motivated Facial Expressions of Emotion). Percentages of correctly and falsely classified expressions are reported. The validity of coding AUs is provided by correlations between the automated analysis and manual Facial Action Coding System (FACS) scoring for 20 AUs. On average 80% of the emotional facial expressions are correctly classified. The overall validity of coding AUs is moderate with the highest validity indicators for AUs 1, 5, 9, 17 and 27. These results are compared to the performance of FaceReader 6 in previous research, with our results yielding comparable validity coefficients. Practical implications and limitations of the automated method are discussed.

89 citations


Journal ArticleDOI
01 Mar 2019-Pain
TL;DR: Health care professionals should use a more individualized approach to determining which pain-related facial responses an individual combines and aggregates to express pain, instead of erroneously searching for a uniform expression of pain.
Abstract: Facial expressions of pain are not undefined grimaces, but they convey specific information about the internal state of the individual in pain. With this systematic review, we aim to answer the question of which facial movements are displayed most consistently during pain. We searched for studies that used the Facial Action Coding System to analyze facial activity during pain in adults, and that report on distinct facial responses (action units [AUs]). Twenty-seven studies using experimental pain and 10 clinical pain studies were included. We synthesized the data by taking into consideration (1) the criteria used to define whether an AU is pain-related; (2) types of pain; and (3) the cognitive status of the individuals. When AUs were selected as being pain-related based on a "pain > baseline" increase, a consistent subset of pain-related AUs emerged across studies: lowering the brows (AU4), cheek raise/lid tightening (AUs6_7), nose wrinkling/raising the upper lip (AUs9_10), and opening of the mouth (AUs25_26_27). This subset was found independently of the cognitive status of the individuals and was stable across clinical and experimental pain with only one variation, namely that eye closure (AU43) occurred more frequently during clinical pain. This subset of pain-related facial responses seems to encode the essential information about pain available in the face. However, given that these pain-related AUs are most often not displayed all at once, but are differently combined, health care professionals should use a more individualized approach, determining which pain-related facial responses an individual combines and aggregates to express pain, instead of erroneously searching for a uniform expression of pain.

83 citations


Journal ArticleDOI
25 Aug 2019-Sensors
TL;DR: The first study in the literature that has aimed to determine Depression Anxiety Stress Scale (DASS) levels by analyzing facial expressions using Facial Action Coding System (FACS) by means of a unique noninvasive architecture designed to offer high accuracy and fast convergence.
Abstract: We present the first study in the literature that has aimed to determine Depression Anxiety Stress Scale (DASS) levels by analyzing facial expressions using Facial Action Coding System (FACS) by means of a unique noninvasive architecture on three layers designed to offer high accuracy and fast convergence: in the first layer, Active Appearance Models (AAM) and a set of multiclass Support Vector Machines (SVM) are used for Action Unit (AU) classification; in the second layer, a matrix is built containing the AUs’ intensity levels; and in the third layer, an optimal feedforward neural network (FFNN) analyzes the matrix from the second layer in a pattern recognition task, predicting the DASS levels. We obtained 87.2% accuracy for depression, 77.9% for anxiety, and 90.2% for stress. The average prediction time was 64 s, and the architecture could be used in real time, allowing health practitioners to evaluate the evolution of DASS levels over time. The architecture could discriminate with 93% accuracy between healthy subjects and those affected by Major Depressive Disorder (MDD) or Post-traumatic Stress Disorder (PTSD), and 85% for Generalized Anxiety Disorder (GAD). For the first time in the literature, we determined a set of correlations between DASS, induced emotions, and FACS, which led to an increase in accuracy of 5%. When tested on AVEC 2014 and ANUStressDB, the method offered 5% higher accuracy, sensitivity, and specificity compared to other state-of-the-art methods.

66 citations


Journal ArticleDOI
01 Apr 2019
TL;DR: The proposed method is proven to be effective for emotion recognition and employs a regularization method called "dropout" that proved to be very effective to reduce overfitting in the CNN fully-connected layers.
Abstract: Automatic facial expression recognition is an actively emerging research in Emotion Recognition. This paper extends the deep Convolutional Neural Network (CNN) approach to facial expression recognition task. This task is done by detecting the occurrence of facial Action Units (AUs) as a subpart of Facial Action Coding System (FACS) which represents human emotion. In the CNN fully-connected layers we employ a regularization method called "dropout" that proved to be very effective to reduce overfitting. This research uses the extended Cohn Kanade (CK+) dataset which is collected for facial expression recognition experiment. The system performance gain average accuracy rate of 92.81%. The system has been successfully classified eight basic emotion classes. Thus, the proposed method is proven to be effective for emotion recognition.

54 citations


Journal ArticleDOI
TL;DR: The limits and strengths of traditional and deep-learning FER techniques are analyzed, intending to provide the research community an overview of the results obtained looking to the next future.
Abstract: In recent years, facial expression analysis and recognition (FER) have emerged as an active research topic with applications in several different areas, including the human-computer interaction domain. Solutions based on 2D models are not entirely satisfactory for real-world applications, as they present some problems of pose variations and illumination related to the nature of the data. Thanks to technological development, 3D facial data, both still images and video sequences, have become increasingly used to improve the accuracy of FER systems. Despite the advance in 3D algorithms, these solutions still have some drawbacks that make pure three-dimensional techniques convenient only for a set of specific applications; a viable solution to overcome such limitations is adopting a multimodal 2D+3D analysis. In this paper, we analyze the limits and strengths of traditional and deep-learning FER techniques, intending to provide the research community an overview of the results obtained looking to the next future. Furthermore, we describe in detail the most used databases to address the problem of facial expressions and emotions, highlighting the results obtained by the various authors. The different techniques used are compared, and some conclusions are drawn concerning the best recognition rates achieved.

53 citations


Journal ArticleDOI
29 Mar 2019
TL;DR: W!NCE re-purposes a commercially available Electrooculography-based eyeglass for continuously and unobtrusively sensing of upper facial action units with high fidelity and validate its applicability through extensive evaluation on data from 17 users under stationary and ambulatory settings.
Abstract: The ability to unobtrusively and continuously monitor one's facial expressions has implications for a variety of application domains ranging from affective computing to health-care and the entertainment industry. The standard Facial Action Coding System (FACS) along with camera based methods have been shown to provide objective indicators of facial expressions; however, these approaches can also be fairly limited for mobile applications due to privacy concerns and awkward positioning of the camera. To bridge this gap, W!NCE re-purposes a commercially available Electrooculography-based eyeglass (J!NS MEME) for continuously and unobtrusively sensing of upper facial action units with high fidelity. W!NCE detects facial gestures using a two-stage processing pipeline involving motion artifact removal and facial action detection. We validate our system's applicability through extensive evaluation on data from 17 users under stationary and ambulatory settings, a pilot study for continuous pain monitoring and several performance benchmarks. Our results are very encouraging, showing that we can detect five distinct facial action units with a mean F1 score of 0.88 in stationary and 0.82 in ambulatory settings, and that we can accurately detect facial gestures that due to pain.

27 citations


Journal ArticleDOI
05 Feb 2019-PLOS ONE
TL;DR: The results show that CVML can both determine the importance of different facial actions that human coders use to derive positive and negative affective ratings when combined with interpretable machine learning methods, and efficiently automate positive andnegative affect intensity coding on large facial expression databases.
Abstract: Facial expressions are fundamental to interpersonal communication, including social interaction, and allow people of different ages, cultures, and languages to quickly and reliably convey emotional information. Historically, facial expression research has followed from discrete emotion theories, which posit a limited number of distinct affective states that are represented with specific patterns of facial action. Much less work has focused on dimensional features of emotion, particularly positive and negative affect intensity. This is likely, in part, because achieving inter-rater reliability for facial action and affect intensity ratings is painstaking and labor-intensive. We use computer-vision and machine learning (CVML) to identify patterns of facial actions in 4,648 video recordings of 125 human participants, which show strong correspondences to positive and negative affect intensity ratings obtained from highly trained coders. Our results show that CVML can both (1) determine the importance of different facial actions that human coders use to derive positive and negative affective ratings when combined with interpretable machine learning methods, and (2) efficiently automate positive and negative affect intensity coding on large facial expression databases. Further, we show that CVML can be applied to individual human judges to infer which facial actions they use to generate perceptual emotion ratings from facial expressions.

Journal ArticleDOI
TL;DR: AM-FED+ is presented, an extended dataset of naturalistic facial response videos collected in everyday settings that act as a challenging benchmark for automated facial coding systems.
Abstract: Public datasets have played a significant role in advancing the state-of-the-art in automated facial coding. Many of these datasets contain posed expressions and/or videos recorded in controlled lab conditions with little variation in lighting or head pose. As such, the data do not reflect the conditions observed in many real-world applications. We present AM-FED+ an extended dataset of naturalistic facial response videos collected in everyday settings. The dataset contains 1,044 videos of which 545 videos (263,705 frames or 21,859 seconds) have been comprehensively manually coded for facial action units. These videos act as a challenging benchmark for automated facial coding systems. All the videos contain gender labels and a large subset (77 percent) contain age and country information. Subject self-reported liking and familiarity with the stimuli are also included. We provide automated facial landmark detection locations for the videos. Finally, baseline action unit classification results are presented for the coded videos. The dataset is available to download online: https://www.affectiva.com/facial-expression-dataset/

Journal ArticleDOI
TL;DR: It is found that interpretative/confrontative interventions are associated with displays of contempt from both therapists and patients, and it is proposed that these seemingly contradictory results may be a consequence of the complexity of affects and the interplay of primary and secondary emotions with intervention type.
Abstract: Introduction: The significance of psychotherapeutic micro-processes, such as nonverbal facial expressions and relationship quality, is widely known, yet hitherto has not been investigated satisfactorily. In this exploratory study, we aim to examine the occurrence of micro-processes during psychotherapeutic treatment sessions, specifically facial micro-expressions, in order to shed light on their impact on psychotherapeutic interactions and patient-clinician relationships. Methods: In analyzing 22 video recordings of psychiatric interviews in a routine/acute psychiatric care unit of Vienna General Hospital, we were able to investigate clinicians’ and patients’ facial micro-expressions in conjunction with verbal interactions and types. To this end, we employed the Emotion Facial Action Coding System (EmFACS)—assessing the action units and microexpressions—and the Psychodynamic Intervention List (PIL). Also, the Working Alliance Inventory (WAI), assessed after each session by both patients and clinicians, provided information on the subjective quality of the clinician–patient relationship. Results: We found that interpretative/confrontative interventions are associated with displays of contempt from both therapists and patients. Interestingly, displays of contempt also correlated with higher WAI scores. We propose that these seemingly contradictory results may be a consequence of the complexity of affects and the interplay of primary and secondary emotions with intervention type. Conclusion: Interpretation, confrontation, and working through contemptuous microexpressions are major elements to the adequate control major pathoplastic elements. Affect-cognitive interplay is an important mediator in the working alliance.

Proceedings ArticleDOI
28 May 2019
TL;DR: A framework for interactive mobile applications to harness consumer hardware camera technology for facial feature extraction to enable emotion detection following the facial action coding system and shows that emotional responses can be detected in three out of four cases and that they relate to usability problems.
Abstract: Tracking down usability problems poses a challenge for developers since users rarely report explicit feedback without being asked for it. Hence, implicit feedback represents a valuable information source, in particular for rapid development processes with frequent software releases. Users' emotions expressed by their facial expressions during interactions with the application can act as the source of such information. Recent development in consumer hardware offers mechanisms to efficiently detect facial expressions. We developed a framework for interactive mobile applications to harness consumer hardware camera technology for facial feature extraction to enable emotion detection following the facial action coding system. In a study with 12 participants, we evaluated its performance within a sample application that was seeded with usability problems. A qualitative analysis of the study results indicates that the framework is applicable for detecting user emotions from facial expressions. A quantitative analysis shows that emotional responses can be detected in three out of four cases and that they relate to usability problems. We conclude that, in combination with interaction events, the framework can support developers in the exploration of usability problems in interactive applications.

Journal ArticleDOI
TL;DR: To identify initial and later responses to surprising stimuli, two repetition-change studies were conducted and the general valence of facial expressions were coded using computerised facial coding and specific facial action using the Facial Action Coding System (FACS).
Abstract: Responses to surprising events are dynamic. We argue that initial responses are primarily driven by the unexpectedness of the surprising event and reflect an interrupted and surprised state in which the outcome does not make sense yet. Later responses, after sense-making, are more likely to incorporate the valence of the outcome itself. To identify initial and later responses to surprising stimuli, we conducted two repetition-change studies and coded the general valence of facial expressions using computerised facial coding and specific facial action using the Facial Action Coding System (FACS). Results partly supported our unfolding logic. The computerised coding showed that initial expressions to positive surprises were less positive than later expressions. Moreover, expressions to positive and negative surprises were initially similar, but after some time differentiated depending on the valence of the event. Importantly, these patterns were particularly pronounced in a subset of facially expressive participants, who also showed facial action in the FACS coding. The FACS data showed that the initial phase was characterised by limited facial action, whereas the later increase in positivity seems to be explained by smiling. Conceptual as well as methodological implications are discussed.

Proceedings ArticleDOI
01 Jan 2019
TL;DR: Six basic facial expressions are detected using Six Facial Expressions Hexagon (SFEH) Model, a model to customize only the varied region of the face using morphological operations under reasonable computational cost.
Abstract: Within the diverse field of image processing, facial expression detection is tremendously an interesting part of face recognition. In order to track and locate facial expressions unusual parts of face movements like eyes, nose, mouth, cheeks etc must be observed. In this paper, six basic facial expressions are detected using Six Facial Expressions Hexagon (SFEH) Model. The SFEH model provides the general representation of six facial expressions on six edges of surface hexagon (S-Hex). The S-hex is the outer boundary of face and can be classified into three parts as upper triangle, middle rectangle and lower triangle. The partitioning of facial features into three parts narrows down the processing area of face where expression originates and is very helpful to locate the varied region. Moreover, Facial Action Coding System (FACS) and Facial Animation Parameters System (FAPS) are used as intermediate frame to analyze the proposed SFEH model. The scope of the proposed SFEH model is to customize only the varied region of the face using morphological operations under reasonable computational cost

Journal ArticleDOI
TL;DR: A method for the real-time detection of AUs intensity in terms of the Facial Action Coding System scale is proposed, grounded on a novel and robust anatomically based facial representation strategy, for which features are registered from a different region of interest depending on the AU considered.
Abstract: Most research on facial expressions recognition has focused on binary Action Units (AUs) detection, while graded changes in their intensity have rarely been considered. This paper proposes a method for the real-time detection of AUs intensity in terms of the Facial Action Coding System scale. It is grounded on a novel and robust anatomically based facial representation strategy, for which features are registered from a different region of interest depending on the AU considered. Real-time processing is achieved by combining Histogram of Gradients descriptors with linear kernel Support Vector Machines. Following this method, AU intensity detection models are built and validated through the DISFA database, outperforming previous approaches without real-time capabilities. An in-depth evaluation through three different databases (DISFA, BP4D and UNBC Shoulder-Pain) further demonstrates that the proposed method generalizes well across datasets. This study also brings insights about existing public corpora and their impact on AU intensity prediction.

Journal ArticleDOI
TL;DR: The unusual or awkward patterns of facial emotional responses in ASD may hamper the recognition of affect in other people as well as the interaction partner's sense of interpersonal resonance, and thereby lead to social disadvantage in individuals with ASD.
Abstract: Background Reduced facial expressivity (flat affect) and deficits in nonverbal communicative behaviors are characteristic symptoms of autism spectrum disorder (ASD). Based on the important interpersonal functions of facial emotional responsiveness the present study aimed at a comprehensive and differentiated analysis of perceptible facial behavior in response to another person's naturalistic, dynamic facial expressions of emotion. Methods In a group of 21 adolescent and adult individuals with High-Funtioning autism spectrum disorder (HF-ASD) and in 21 matched healthy controls we examined perceptible facial responses using the whole range of action units of the Facial Action Coding System (FACS) while participants were watching films displaying continuous, dynamic real-life facial expressions of four universal emotions (cheerfulness, anger, sadness, anxiety). The duration of the 80 s films was in the typical range of casual face-to-face interactions. Results Overall, the number of congruent facial muscle movements while watching the emotion-laden stimulus films did not differ in the two groups. However, the comprehensive FACS analysis indicated that participants with HF-ASD displayed less differentiated facial responses to the watched emotional expressions. Conclusions The unusual or awkward patterns of facial emotional responses in ASD may hamper the recognition of affect in other people as well as the interaction partner's sense of interpersonal resonance, and thereby lead to social disadvantage in individuals with ASD.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: An end-to-end deep learning-based Automated Facial Expression Recognition (AFER) that jointly detects the complete set of pain-related AUs that are associated with pain is developed.
Abstract: A new method to objectively measure pain using computer vision and machine learning technologies is presented. Our method seeks to capture facial expressions of pain to detect pain, especially when a patients cannot communicate pain verbally. This approach relies on using Facial muscle-based Action Units (AUs), defined by the Facial Action Coding System (FACS), that are associated with pain. It is impractical to use human FACS coding experts in clinical settings to perform this task as it is too labor-intensive and recent research has sought computer-based solutions to the problem. An effective automated system for performing the task is proposed here in which we develop an end-to-end deep learning-based Automated Facial Expression Recognition (AFER) that jointly detects the complete set of pain-related AUs. The facial video clip is processed frame by frame to estimate a vector of AU likelihood values for each frame using a deep convolutional neural network. The AU vectors are concatenated to form a table of AU values for a given video clip. Our results show significantly improved performance compared with those obtained with other known methods.

Proceedings ArticleDOI
08 Jul 2019
TL;DR: In this article, the authors present a large dataset named FEAFA, which contains 99,356 frames of real-world conditions recorded by one hundred and twenty-two participants and each action unit is well-annotated with a floating point number between 0 and 1.
Abstract: Facial expression analysis based on machine learning requires large number of well-annotated data to reflect different changes in facial motion, but all of existing datasets, to the best of our knowledge, are limited to rough annotations for action units, including only their absence, presence, or a five-level intensity. To meet the need for videos labeled in great detail, we present a well-annotated dataset named FEAFA. One hundred and twenty-two participants were recorded in real-world conditions. 99,356 frames were manually labeled using Expression Quantitative Tool developed by us to quantify the re-defined action units according to Facial Action Coding System. Each action unit is well-annotated with a floating point number between 0 and 1. To provide a baseline for use in future research, a benchmark for the regression of action unit values based on Convolutional Neural Networks are presented. We also demonstrate the potential of FEAFA for 3D facial animation. Almost all state-of-the-art algorithms for facial animation are achieved based on 3D face reconstruction. We hence propose a novel method that drives virtual characters only based on action unit value regression of the 2D video frames of source actors.

Journal ArticleDOI
TL;DR: A new approach is proposed that mimics the strategy of human coders of decoupling pain detection into two consecutive tasks: one performed at the individual video-frame level and the other at video-sequence level, and two novel data structures to encode AU combinations from single AU scores.
Abstract: Patient pain can be detected highly reliably from facial expressions using a set of facial muscle-based action units (AUs) defined by the Facial Action Coding System (FACS). A key characteristic of facial expression of pain is the simultaneous occurrence of pain-related AU combinations, whose automated detection would be highly beneficial for efficient and practical pain monitoring. Existing general Automated Facial Expression Recognition (AFER) systems prove inadequate when applied specifically for detecting pain as they either focus on detecting individual pain-related AUs but not on combinations or they seek to bypass AU detection by training a binary pain classifier directly on pain intensity data but are limited by lack of enough labeled data for satisfactory training. In this paper, we propose a new approach that mimics the strategy of human coders of decoupling pain detection into two consecutive tasks: one performed at frame level and the other at sequence level. Using state-of-the-art AFER tools to detect single AUs at the frame level, we propose two novel data structures to encode AU combinations . Two weakly supervised learning frameworks are employed to learn pain from video sequences. Experimental results show an 87% pain recognition accuracy with 0.94 AUC on the UNBC-McMaster dataset.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: The new Actor Study Database is introduced to address the resulting need for reliable benchmark datasets and to provide real multi-view data, that is not synthesized through perspective distortion.
Abstract: Over the last few decades, there has been an increasing call in the field of computer vision to use machine-learning techniques for the detection, categorization, and indexing of facial behaviors, as well as for the recognition of emotion phenomena. Automated Facial Expression Analysis has become a highly attractive field of competition for academic laboratories, startups and large technology corporations. This paper introduces the new Actor Study Database to address the resulting need for reliable benchmark datasets. The focus of the database is to provide real multi-view data, that is not synthesized through perspective distortion. The database contains 68-minutes of high-quality videos of facial expressions performed by 21 actors. The videos are synchronously recorded from five different angles. The actors' tasks ranged from displaying specific Action Units and their combinations at different intensities to enactment of a variety of emotion scenarios. Over 1.5 million frames have been annotated and validated with the Facial Action Coding System by certified FACS coders. These attributes make the Actor Study Database particularly applicable in machine recognition studies as well as in psychological research into affective phenomena-whether prototypical basic emotions or subtle emotional responses. Two state-of-the-art systems were used to produce benchmark results for all five different views that this new database encompasses. The database is publicly available for non-commercial research.

Journal ArticleDOI
TL;DR: Geometric positions and optical flow are the key methods deployed in the implemented methodology which estimated facial muscle movement by computing 24 landmark points, 16 mutual distances between them and wrinkles caused due to changing expressions.
Abstract: Recent times have witnessed an exponential increase in multimedia specifically visual contents. Emotions are considered an essential part for extracting facial features, evaluating the expressions and as a result predicting the emotions of any person is a trending topic of the time. Based on still images and consecutive video frames, a methodology has been proposed to anticipate the emotions. Facial action coding system (FACS) standards are utilised in the development of an automated visual based emotion detection system worldwide. Employing FACS, the authors estimated facial muscle movement by computing 24 landmark points, 16 mutual distances between them and wrinkles caused due to changing expressions. Canny edge detection has been deployed to calculate the intensity of wrinkles. Geometric positions and optical flow are the key methods deployed in the implemented methodology. The methodology was evaluated on self-generated, JAFFE dataset and EmotioNet.

Proceedings ArticleDOI
01 Jan 2019
TL;DR: This work proposes a metric-based intensity estimation mechanism for primary emotions, and a deep hybrid convolutional neural network-based approach to recognise the defined intensities of the primary emotions from spontaneous and posed sequences and extends the intensity estimation approach to detect the basic emotions.
Abstract: Detecting emotional states of human from videos is essential in order to automate the process of profiling human behaviour, which has applications in a variety of domains, such as social, medical and behavioural science. Considerable research has been carried out for binary classification of emotions using facial expressions. However, a challenge exists to automate the feature extraction process to recognise the various intensities or levels of emotions. The intensity information of emotions is essential for tasks such as sentiment analysis. In this work, we propose a metric-based intensity estimation mechanism for primary emotions, and a deep hybrid convolutional neural network-based approach to recognise the defined intensities of the primary emotions from spontaneous and posed sequences. Further, we extend the intensity estimation approach to detect the basic emotions. The frame level facial action coding system annotations and the intensities of action units associated with each primary emotion are considered for deriving the various intensity levels of emotions. The evaluation on benchmark datasets demonstrates that our proposed approach is capable of correctly classifying the various intensity levels of emotions as well as detecting them.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: This work explores the relationship between facial muscle movements and speech signals and explores the efficacy of different sequence-to-sequence neural network architectures for the task of predicting Facial Action Coding System Action Units (AUs) from one of two acoustic feature representations extracted from speech signals.
Abstract: Multimodal data sources offer the possibility to capture and model interactions between modalities, leading to an improved understanding of underlying relationships. In this regard, the work presented in this paper explores the relationship between facial muscle movements and speech signals. Specifically, we explore the efficacy of different sequence-to-sequence neural network architectures for the task of predicting Facial Action Coding System Action Units (AUs) from one of two acoustic feature representations extracted from speech signals, namely the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPs) or the Interspeech Computational Paralinguistics Challenge features set (ComParE). Furthermore, these architectures were enhanced by two different attention mechanisms (intra- and inter-attention) and various state-of-the-art network settings to improve prediction performance. Results indicate that a sequence-to-sequence model with inter-attention can achieve on average an Unweighted Average Recall (UAR) of 65.9 % for AU onset, 67.8 % for AU apex (both eGeMAPs), 79.7 % for AU offset and 65.3 % for AU occurrence (both ComParE) detection over all AUs.

Journal ArticleDOI
TL;DR: This article found that Chinese are more likely than Dutch to see multiple concurrent emotions in facial expressions, and that Chinese participants produced smaller differences in ratings between intended and non-intended emotions than did Dutch participants.

Proceedings ArticleDOI
01 Apr 2019
TL;DR: A model which utilises AUs to explain Convolutional Neural Network (CNN) model's classification results and shows that with only features and emotion classes obtained from the CNN model, Explanation model generates AUs very well.
Abstract: Facial expression is the most powerful and natural non-verbal emotional communication method. Facial Expression Recognition(FER) has significance in machine learning tasks. Deep Learning models perform well in FER tasks, but it doesn't provide any justification for its decisions. Based on the hypothesis that facial expression is a combination of facial muscle movements, we find that Facial Action Coding Units(AUs) and Emotion label have a relationship in CK+ Dataset. In this paper, we propose a model which utilises AUs to explain Convolutional Neural Network(CNN) model's classification results. The CNN model is trained with CK+ Dataset and classifies emotion based on extracted features. Explanation model classifies the multiple AUs with the extracted features and emotion classes from the CNN model. Our experiment shows that with only features and emotion classes obtained from the CNN model, Explanation model generates AUs very well.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: A novel framework that measures the engagement level of students either in a class environment or in an e-learning environment and can be utilized in numerous applications including but not limited to the monitoring the progress of students with various degrees of learning disabilities, and the analysis of nerve palsy.
Abstract: In this paper, we propose a novel framework that measures the engagement level of students either in a class environment or in an e-learning environment. The proposed framework captures the user’s video and tracks their faces’ through the video’s frames. Different features are extracted from the user’s face e.g., facial fiducial points, head pose, eye gaze, learned features, etc. These features are then used to detect the Facial Action Coding System (FACS), which decomposes facial expressions in terms of the fundamental actions of individual muscles or groups of muscles (i.e., action units). The decoded action units (AU’s) are then used to measures the student’s willingness to participate in the learning process (i.e., behavioral engagement) and his/her emotional attitude towards learning (i.e., emotional engagement). This framework will allow the lecturer to receive a real-time feedback from facial features, gaze, and other body kinesics. The framework is robust and can be utilized in numerous applications including but not limited to the monitoring the progress of students with various degrees of learning disabilities, and the analysis of nerve palsy and its effects on facial expression and social interactions.

Journal ArticleDOI
TL;DR: Video-recorded older adults with and without dementia using cameras capturing different observational angles to add specificity to the communications models of pain and have implications for the development of computer vision algorithms and vision technologies designed to monitor and interpret facial expressions in a pain context.
Abstract: Facial expressions of pain are important in assessing individuals with dementia and severe communicative limitations. Though frontal views of the face are assumed to allow for the most valid and reliable observational assessments, the impact of viewing angle is unknown. We video-recorded older adults with and without dementia using cameras capturing different observational angles (e.g., front vs. profile view) both during a physiotherapy examination designed to identify painful areas and during a baseline period. Facial responses were coded using the fine-grained Facial Action Coding System, as well as a systematic clinical observation method. Coding was conducted separately for panoramic (incorporating left, right, and front views), and a profile view of the face. Untrained observers also judged the videos in a laboratory setting. Trained coder reliability was satisfactory for both the profile and panoramic view. Untrained observer judgments from a profile view were substantially more accurate compared to the front view and accounted for more variance in differentiating non-painful from painful situations. The findings add specificity to the communications models of pain (clarifying factors influencing observers' ability to decode pain messages). Perhaps more importantly, the findings have implications for the development of computer vision algorithms and vision technologies designed to monitor and interpret facial expressions in a pain context. That is, the performance of such automated systems is heavily influenced by how reliably these human annotations could be provided and, hence, evaluation of human observers' reliability, from multiple angles of observation, has implications for machine learning development efforts.

Posted Content
TL;DR: A novel method that drives virtual characters only based on action unit value regression of the 2D video frames of source actors is proposed and the potential of FEAFA for 3D facial animation is demonstrated.
Abstract: Facial expression analysis based on machine learning requires large number of well-annotated data to reflect different changes in facial motion. Publicly available datasets truly help to accelerate research in this area by providing a benchmark resource, but all of these datasets, to the best of our knowledge, are limited to rough annotations for action units, including only their absence, presence, or a five-level intensity according to the Facial Action Coding System. To meet the need for videos labeled in great detail, we present a well-annotated dataset named FEAFA for Facial Expression Analysis and 3D Facial Animation. One hundred and twenty-two participants, including children, young adults and elderly people, were recorded in real-world conditions. In addition, 99,356 frames were manually labeled using Expression Quantitative Tool developed by us to quantify 9 symmetrical FACS action units, 10 asymmetrical (unilateral) FACS action units, 2 symmetrical FACS action descriptors and 2 asymmetrical FACS action descriptors, and each action unit or action descriptor is well-annotated with a floating point number between 0 and 1. To provide a baseline for use in future research, a benchmark for the regression of action unit values based on Convolutional Neural Networks are presented. We also demonstrate the potential of our FEAFA dataset for 3D facial animation. Almost all state-of-the-art algorithms for facial animation are achieved based on 3D face reconstruction. We hence propose a novel method that drives virtual characters only based on action unit value regression of the 2D video frames of source actors.