scispace - formally typeset
Open AccessProceedings ArticleDOI

Facial expression recognition with temporal modeling of shapes

TLDR
This work proposes a framework for automatic facial expression recognition from continuous video sequence by modeling temporal variations within shapes using Latent-Dynamic Conditional Random Fields, and shows that the proposed approach outperforms CRFs for recognizing facial expressions.
Abstract
Conditional Random Fields (CRFs) can be used as a discriminative approach for simultaneous sequence segmentation and frame labeling. Latent-Dynamic Conditional Random Fields (LDCRFs) incorporates hidden state variables within CRFs which model sub-structure motion patterns and dynamics between labels. Motivated by the success of LDCRFs in gesture recognition, we propose a framework for automatic facial expression recognition from continuous video sequence by modeling temporal variations within shapes using LDCRFs. We show that the proposed approach outperforms CRFs for recognizing facial expressions. Using Principal Component Analysis (PCA) we study the separability of various expression classes in lower dimension projected spaces. By comparing the performance of CRFs and LDCRFs against that of Support Vector Machines (SVMs), we demonstrate that temporal variations within shapes are crucial in classifying expressions especially for those with a small range of facial motion like anger and sadness. We also show empirically that only using changes in facial appearance over time, without using shape variations, is not sufficient to obtain high performance for facial expression recognition.

read more

Content maybe subject to copyright    Report

Copyright
by
Suyog Dutt Jain
2011

The Thesis committee for Suyog Dutt Jain
Certifies that this is the approved version of the following thesis
Facial Expression Recognition with Temporal Modeling
of Shapes
APPROVED BY
SUPERVISING COMMITTEE:
J. K. Aggarwal, Supervisor
Kristen Grauman

Facial Expression Recognition with Temporal Modeling
of Shapes
by
Suyog Dutt Jain, B.E.
THESIS
Presented to the Faculty of the Graduate School of
The University of Texas at Austin
in Partial Fulfillment
of the Requirements
for the Degree of
Master Of Science in Computer Science
THE UNIVERSITY OF TEXAS AT AUSTIN
August 2011

Dedicated to my family

Acknowledgments
This work has been possible because of the following special people who
have always been a source of inspiration for me to do research and pursue my
dreams.
I offer my greatest regards to Professor J. K. Aggarwal for providing
me the opportunity to work with him. His ideas have formed the foundations
of this work and his continuous motivation, guidance and vision enabled me
to explore the problem in right perspective and find solutions.
I also deeply thank Professor Kristen Grauman for introducing me to
the world of computer vision research. Her teaching and discussions helped
me in building a strong background which is invaluable for my research.
I would like to thank Dr. Changbo Hu, Birgi Tamersoy, Jong Taek Lee
and all other Computer & Vision Research Center members who made my
stay here a memorable one. I will always cherish the numerous philosophical
discussions on evolution and existence which were a part of most lunches we
had together.
I owe my wonderful time in Austin as a graduate student to my friends
Yashesh, Nikhil, Akash, Deepak, Sameer, Pooja, Akanksha, Gouri, Lakshmi
and Anushree who were always there with their support, love and encourage-
ment.
v

Citations
More filters
Proceedings ArticleDOI

Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition

TL;DR: A deep learning technique, which is regarded as a tool to automatically extract useful features from raw data, is adopted and is combined using a new integration method in order to boost the performance of the facial expression recognition.
Journal ArticleDOI

Facial expression recognition with Convolutional Neural Networks

TL;DR: A simple solution for facial expression recognition that uses a combination of Convolutional Neural Network and specific image pre-processing steps to extract only expression specific features from a face image and explore the presentation order of the samples during training.
Proceedings ArticleDOI

Facial Expression Recognition via a Boosted Deep Belief Network

TL;DR: A novel Boosted Deep Belief Network for performing the three training stages iteratively in a unified loopy framework and showed that the BDBN framework yielded dramatic improvements in facial expression analysis.
Proceedings ArticleDOI

Learning Expressionlets on Spatio-temporal Manifold for Dynamic Facial Expression Recognition

TL;DR: This paper attempts to solve temporal alignment and semantics-aware dynamic representation problems via manifold modeling of videos based on a novel mid-level representation, i.e. expressionlet, and reports results better than the known state-of-the-art.
Journal ArticleDOI

Spatial–Temporal Recurrent Neural Network for Emotion Recognition

TL;DR: The proposed two-layer RNN model provides an effective way to make use of both spatial and temporal dependencies of the input signals for emotion recognition and experimental results demonstrate the proposed STRNN method is more competitive over those state-of-the-art methods.
References
More filters
Journal ArticleDOI

A tutorial on hidden Markov models and selected applications in speech recognition

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Journal ArticleDOI

Active appearance models

Abstract: We describe a new method of matching statistical models of appearance to images. A set of model parameters control modes of shape and gray-level variation learned from a training set. We construct an efficient iterative matching algorithm by learning the relationship between perturbations in the model parameters and the induced image errors.
Proceedings ArticleDOI

The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression

TL;DR: The Cohn-Kanade (CK+) database is presented, with baseline results using Active Appearance Models (AAMs) and a linear support vector machine (SVM) classifier using a leave-one-out subject cross-validation for both AU and emotion detection for the posed data.
Related Papers (5)
Frequently Asked Questions (7)
Q1. What have the authors contributed in "Facial expression recognition with temporal modeling of shapes" ?

In this paper, the authors proposed a new approach for facial expression recognition by modeling temporal dynamics of face shapes. 

There are several open possibilities for enhancing their current 51 work. The authors wish to go beyond that and see if they can extend the current work to handle real world issues like pose and illumination variations, recognizing expressions from continuous video streams like web-cams etc. The authors also want to analyze 3D face shapes and see if temporal modeling of 3D data can give us better results for recognizing facial expressions. 

The optimization procedure also involves a regularization term which is decided using cross validation with values ranging from 10−3 to 103 during training. 

The problem of supervised sequence labeling requires us to learn a clas-sifier from training data consisting of a set of labeled sequences. 

Their proposed approach using LDCRFs is also more robust in modeling facial expressions as compared to CRFs which shows that capturing subtle facial motion is very essential in differentiating between facial expressions. 

It has been shown[40] that using a single histogram for the entire imageis not a good technique for facial expression recognition, hence the cropped face image is subdivided into 42 regions using a 6 x 7 grid (see Figure 3.5). 

Kanaujia et al. [24] use active shape models with localized Non-negative Matrix Factorization in order to perform the localization of landmark points.