What are the future works mentioned in the paper "Openface: an open source facial behavior analysis toolkit" ?

Furthermore, the future development of the tool will continue and it will attempt to incorporate the newest and most reliable approaches for the problem at hand while remaining a transparent open source tool and retaining its real-time capacity. The authors hope that this tool will encourage other researchers in the field to share their code.

What are the use cases of saving facial behaviors using OpenFace?

Example use case of saving facial behaviors using OpenFace would involve using them as features for emotion prediction, medical condition analysis, and social signal analysis systems.

How did the authors measure the performance of OpenFace on a head pose estimation task?

To measure OpenFace performance on a head pose estimation task the authors used three publicly available datasets with existing ground truth head pose data: BU [15], Biwi [21] and ICT-3DHP [9].

Why is the recognition of certain AUs not as reliable as others?

The recognition of certain AUs is not as reliable as that of others partly due to lack of representation in training data and inherent difficulty of the problem.

(Open Access) OpenFace: An open source facial behavior analysis toolkit (2016) | Tadas Baltrusaitis

Q: What are the contributions mentioned in the paper "Openface: an open source facial behavior analysis toolkit" ?

The authors present OpenFace – an open source tool intended for computer vision and machine learning researchers, affective computing community and people interested in building interactive applications based on facial behavior analysis. Furthermore, their tool is capable of real-time performance and is able to run from a simple webcam without any specialist hardware.

Q: How many blocks of facial expressions are used?

The authors use blocks of 2 × 2 cells, of 8 × 8 pixels, leading to 12×12 blocks of 31 dimensional histograms (4464 dimensional vector describing the face).

Q: What is the way to extract facial appearance features?

In order to extract facial appearance features the authors used a similarity transform from the currently detected landmarks to a representation of frontal landmarks from a neutral expression.

OpenFace: an open source facial behavior analysis toolkit

Tadas Baltru

saitis

Tadas.Baltrusaitis@cl.cam.ac.uk

Peter Robinson

Peter.Robinson@cl.cam.ac.uk

Louis-Philippe Morency

morency@cs.cmu.edu

Abstract

Over the past few years, there has been an increased

interest in automatic facial behavior analysis and under-

standing. We present OpenFace – an open source tool

intended for computer vision and machine learning re-

searchers, affective computing community and people in-

terested in building interactive applications based on facial

behavior analysis. OpenFace is the ﬁrst open source tool

capable of facial landmark detection, head pose estima-

tion, facial action unit recognition, and eye-gaze estimation.

The computer vision algorithms which represent the core of

OpenFace demonstrate state-of-the-art results in all of the

above mentioned tasks. Furthermore, our tool is capable of

real-time performance and is able to run from a simple we-

bcam without any specialist hardware. Finally, OpenFace

allows for easy integration with other applications and de-

vices through a lightweight messaging system.

1. Introduction

Over the past few years, there has been an increased in-

terest in machine understanding and recognition of affective

and cognitive mental states and interpretation of social sig-

nals especially based on facial expression and more broadly

facial behavior [18, 51, 39]. As the face is a very important

channel of nonverbal communication [20, 18], facial behav-

ior analysis has been used in different applications to facil-

itate human computer interaction [10, 43, 48, 66]. More

recently, there has been a number of developments demon-

strating the feasibility of automated facial behavior analysis

systems for better understanding of medical conditions such

as depression [25] and post traumatic stress disorders [53].

Other uses of automatic facial behavior analysis include au-

tomotive industries [14], education [42, 26], and entertain-

ment [47].

In our work we deﬁne facial behavior as consisting of:

facial landmark motion, head pose (orientation and mo-

tion), facial expressions, and eye gaze. Each of these modal-

ities play an important role in human behavior, both in-

dividually and together. For example automatic detection

and analysis of facial Action Units [19] (AUs) is an im-

Figure 1: OpenFace is an open source framework that im-

plements state-of-the-art facial behavior analysis algorithms

including: facial landmark detection, head pose tracking,

eye gaze and facial Action Unit estimation.

portant building block in nonverbal behavior and emotion

recognition systems [18, 51]. This includes detecting both

the presence and the intensity of AUs, allowing us to anal-

yse their occurrence, co-occurrence and dynamics. In ad-

dition to AUs, head pose and gesture also play an impor-

tant role in emotion and social signal perception and expres-

sion [56, 1, 29]. Finally, gaze direction is important when

evaluating things like attentiveness, social skills and mental

health, as well as intensity of emotions [35].

Over the past years there has been a huge amount of

progress in facial behavior understanding [18, 51, 39].

However, there is still no open source system available to

the research community that can do all of the above men-

tioned tasks (see Table 1). There is a big gap between state-

of-the-art algorithms and freely available toolkits. This is

especially true if real-time performance is wanted - a neces-

sity for interactive systems .

Furthermore, even though there exist a number of ap-

Tool Approach Landmark Head pose AU Gaze Train Fit Binary Real-time

COFW[13] RCPR[13] X X X X

FaceTracker CLM[50] X X X X X

dlib [34] [32] X X X X

DRMF[4] DRMF[4] X X X X

Chehra [5] X X X X

GNDPM GNDPM[58] X X

PO-CR[57] PO-CR [57] X X

Menpo [3] AAM, CLM, SDM

X X X

CFAN [67] [67] X X X

[65] Reg. For [65] X X X X X X

TCDCN CNN [70] X X X X

EyeTab [63] X N/A X X X

Intraface SDM [64] X X ?

OKAO ? X X X X X

FACET ? X X X X X

Affdex ? X X X X X

Tree DPM [71] [71] X X X

LEAR LEAR [40] X X X

TAUD TAUD [31] X X

OpenFace [7, 6] X X X X X X X X

Table 1: Comparison of facial behavior analysis tools. We do not consider ﬁtting code to be available if the only code

provided is a wrapper around a compiled executable. Note that most tools only provide binary versions (executables) rather

than the model training and ﬁtting source code.

The implementation differs from the originally proposed one based on

the used features,

the algorithms implemented are capable of real-time performance but the tool does not provide it,

the

executable is no longer available on the author’s website.

proaches for tackling each individual problem, very few of

them are available in source code form and would require

signiﬁcant amount of effort to re-implement. In some cases

exact re-implementation is virtually impossible due to lack

of details in papers. Examples of often omitted details in-

clude: values of hyper-parameters, data normalization and

cleaning procedures, exact training protocol, model initial-

ization and re-initialization procedures, and optimization

techniques to make systems real-time. These details are of-

ten as important as the algorithms themselves in order to

build systems that work on real world data. Source code is

a great way of providing such details. Finally, even the ap-

proaches that claim they provide code instead only provide

a thin wrapper around a compiled binary making it impos-

sible to know what is actually being computed internally.

OpenFace is not only the ﬁrst open source tool for facial

behavior analysis, it demonstrates state-of-the art perfor-

mance in facial landmark detection, head pose tracking, AU

recognition and eye gaze estimation. It is also able to per-

form all of these tasks together in real-time. Main contribu-

tions of OpenFace are: 1) implements and extends state-of-

the-art algorithms; 2) open source tool that includes model

training code; 3) comes with ready to use trained models;

4) is capable of real-time performance, without the need of

a GPU; 5) includes a messaging system allowing for easy

to implement real-time interactive applications; 6) available

as a Graphical User Interface (for Windows) and as a com-

mand line tool (for Ubuntu, Mac OS X and Windows).

Our work is intended to bridge that gap between existing

state-of-the-art research and easy to use out-of-the-box so-

lutions for facial behavior analysis. We believe our tool will

stimulate the community by lowering the bar of entry into

the ﬁeld and enabling new and interesting applications

First, we present a brief outline of the recent advances in

face analysis tools (section 2). Then we move on to describe

our facial behavior analysis pipeline (section 3). We follow,

by a description of a large number of experiments to asses

our framework (section 4). Finally, we provide a brief de-

scription of the interface provided by OpenFace (section 5).

2. Previous work

A full review of work in facial landmark detection, head

pose, eye gaze, and action unit estimation is outside the

scope of this paper, we refer the reader to recent reviews

of the ﬁeld [17, 18, 30, 46, 51, 61]. We instead provide an

https://www.cl.cam.ac.uk/research/rainbow/

projects/openface/

Figure 2: OpenFace facial behavior analysis pipeline, including: facial landmark detection, head pose and eye gaze estima-

tion, facial action unit recognition. The outputs from all of these systems (indicated by red) can be saved to disk or sent over

a network.

overview of available tools for accomplishing the individual

facial behavior analysis tasks. For a summary of available

tools see Table 1.

Facial landmark detection - there exists a broad selec-

tion of freely available tools to perform facial landmark de-

tection in images or videos. However, very few of the ap-

proaches provide the source code and instead only provide

executable binaries. This makes the reproduction of experi-

ments on different training sets or using different landmark

annotation schemes difﬁcult. Furthermore, binaries only al-

low for certain predeﬁned functionality and are often not

cross-platform, making real-time integration of the systems

that would rely on landmark detection almost impossible.

Although, there exist several exceptions that provide both

training and testing code [3, 71], those approaches do not

allow for real-time landmark tracking in videos - an impor-

tant requirement for interactive systems.

Head pose estimation has not received the same amount

of interest as facial landmark detection. An earlier exam-

ple of a dedicated head pose estimation is the Watson sys-

tem, which is an implementation of the Generalized Adap-

tive View-based Appearance Model [45]. There also exist

several frameworks that allow for head pose estimation us-

ing depth data [21], however they cannot work on webcams.

While some facial landmark detectors include head pose es-

timation capabilities [4, 5], most ignore this problem.

AU recognition - there are very few freely available

tools for action unit recognition. However, there are a num-

ber of commercial systems that amongst other functional-

ity perform Action Unit Recognition: FACET

, Affdex

and OKAO

. However, the drawback of such systems is the

sometimes prohibitive cost, unknown algorithms, and often

unknown training data. Furthermore, some tools are incon-

venient to use by being restricted to a single machine (due

http://www.emotient.com/products/

http://www.affectiva.com/solutions/affdex/

https://www.omron.com/ecb/products/mobile/

to MAC address locking or requiring of USB dongles). Fi-

nally, and most importantly, the commercial product may

be discontinued leading to impossible to reproduce results

due to lack of product transparency (this is illustrated by the

recent unavailability of FACET).

Gaze estimation - there are a number of tools and com-

mercial systems for eye-gaze estimation, however, majority

of them require specialist hardware such as infrared cam-

eras or head mounted cameras [30, 37, 54]. Although, there

exist a couple of systems available for webcam based gaze

estimation [72, 24, 63], they struggle in real-world scenar-

ios and some require cumbersome manual calibration steps.

In contrast to other available tools OpenFace provides

both training and testing code allowing for easy repro-

ducibility of experiments. Furthermore, our system shows

state-of-the-art results on in-the-wild data and does not re-

quire any specialist hardware or person speciﬁc calibration.

Finally, our system runs in real-time with all of the facial

behavior analysis modules working together.

3. OpenFace pipeline

In this section we outline the core technologies used by

OpenFace for facial behavior analysis (see Figure 2 for a

summary). First, we provide an explanation of how we de-

tect and track facial landmarks, together with a hierarchical

model extension to an existing algorithm. We then provide

an outline of how these features are used for head pose es-

timation and eye gaze tracking. Finally, we describe our

Facial Action Unit intensity and presence detection system,

which includes a novel person calibration extension to an

existing model.

3.1. Facial landmark detection and tracking

OpenFace uses the recently proposed Conditional Lo-

cal Neural Fields (CLNF) [8] for facial landmark detection

and tracking. CLNF is an instance of a Constrained Local

Model (CLM) [16], that uses more advanced patch experts

Figure 3: Sample registrations on 300-W and MPIIGaze

datasets.

and optimization function. The two main components of

CLNF are: Point Distribution Model (PDM) which captures

landmark shape variations; patch experts which capture lo-

cal appearance variations of each landmark. For more de-

tails about the algorithm refer to Baltru

saitis et al. [8].

3.1.1 Model novelties

The originally proposed CLNF model performs the detec-

tion of all 68 facial landmarks together. We extend this

model by training separate sets of point distribution and

patch expert models for eyes, lips and eyebrows. We later

ﬁt the landmarks detected with individual models to a joint

(PDM).

Tracking a face over a long period of time may lead to

drift or the person may leave the scene. In order to deal

with this, we employ a face validation step. We use a simple

three layer convolutional neural network (CNN) that given

a face aligned using a piecewise afﬁne warp is trained to

predict the expected landmark detection error. We train the

CNN on the LFPW [11] and Helen [36] training sets with

correct and randomly offset landmark locations. If the val-

idation step fails when tracking a face in a video, we know

that our model needs to be reset.

In case of landmark detection in difﬁcult in-the-wild im-

ages we use multiple initialization hypotheses at different

orientations and pick the model with the best converged

likelihood. This slows down the approach, but makes it

more accurate.

3.1.2 Implementation details

The PDM used in OpenFace was trained on two datasets -

LFPW [11] and Helen [36] training sets. This resulted in a

model with 34 non-rigid and 6 rigid shape parameters.

For training the CLNF patch experts we used: Multi-PIE

[27], LFPW [11] and Helen [36] training sets. We trained a

separate set of patch experts for seven views and four scales

(leading to 28 sets in total). Having multi-scale patch ex-

perts allows us to be accurate both on lower and higher res-

Figure 4: Sample gaze estimations on video sequences;

green lines represent the estimated eye gaze vectors.

olution face images. We found optimal results are achieved

when the face is at least 100px across. Training on different

views allows us to track faces with out of plane motion and

to model self-occlusion caused by head rotation.

To initialize our CLNF model we use the face detector

found in the dlib library [33, 34]. We learned a simple

linear mapping from the bounding box provided by dlib

detector to the one surrounding the 68 facial landmarks.

When tracking landmarks in videos we initialize the CLNF

model based on landmark detections in previous frame. If

our CNN validation module reports that tracking failed we

reinitialize the model using the dlib face detector.

OpenFace also allows for detection of multiple faces in

an image and tracking of multiple faces in videos. For

videos this is achieved by keeping a track of active face

tracks and a simple logic module that checks for people

leaving and entering the frame.

3.2. Head pose estimation

Our model is able to extract head pose (translation and

orientation) information in addition to facial landmark de-

tection. We are able to do this, as CLNF internally uses a 3D

representation of facial landmarks and projects them to the

image using orthographic camera projection. This allows us

to accurately estimate the head pose once the landmarks are

detected by solving the PnP problem.

For accurate head pose estimation OpenFace needs to

be provided with the camera calibration parameters (focal

length and principal point). In their absence OpenFace uses

a rough estimate based on image size.

3.3. Eye gaze estimation

CLNF framework is a general deformable shape regis-

tration approach so we use it to detect eye-region landmarks

as well. This includes eyelids, iris and the pupil. We used

the SynthesEyes training dataset [62] to train the PDM and

Figure 5: Prediction of AU12 on DISFA dataset [7]. Notice

how the prediction is always offset by a constant value.

CLNF patch experts. This model achieves state-of-the-art

results in eye-region registration task [62]. Some sample

registrations can be seen in Figure 3.

Once the location of the eye and the pupil are detected

using our CLNF model we use that information to compute

the eye gaze vector individually for each eye. We ﬁre a ray

from the camera origin through the center of the pupil in the

image plane and compute it’s intersection with the eye-ball

sphere. This gives us the pupil location in 3D camera coor-

dinates. The vector from the 3D eyeball center to the pupil

location is our estimated gaze vector. This is a fast and ac-

curate method for person independent eye-gaze estimation

in webcam images. See Figure 4 for sample gaze estimates.

3.4. Action Unit detection

OpenFace AU intensity and presence detection module

is based on a recent state-of-the-art AU recognition frame-

work [7, 59]. It is a direct implementation with a couple

of changes that adapt it to work better on natural video se-

quences from unseen datasets. A more detailed explanation

of the system can be found in Baltru

saitis et al. [7]. In

the following section we describe our extensions to the ap-

proach and the implementation details.

3.4.1 Model novelties

In natural interactions people are not expressive very often

[2]. This observation allows us to safely assume that most

of the time the lowest intensity (and in turn prediction) of

each action unit over a long video recording of a person

should be zero. However, the existing AU predictors tend

to sometimes under- or over-estimate AU values for a par-

ticular person, see Figure 5 for an illustration of this.

To correct for such prediction errors, we take the lowest

percentile (learned on validation data) of the predictions

on a speciﬁc person and subtract it from all of the predic-

tions. We call this approach – person calibration. Such a

correction can be easily implemented in an online system as

well by keeping a histogram of previous predictions. This

extension only applies to AU intensity prediction.

AU Full name Prediction

AU1 Inner brow raiser I

AU2 Outer brow raiser I

AU4 Brow lowerer I

AU5 Upper lid raiser I

AU6 Cheek raiser I

AU7 Lid tightener P

AU9 Nose wrinkler I

AU10 Upper lip raiser I

AU12 Lip corner puller I

AU14 Dimpler I

AU15 Lip corner depressor I

AU17 Chin raiser I

AU20 Lip stretched I

AU23 Lip tightener P

AU25 Lips part I

AU26 Jaw drop I

AU28 Lip suck P

AU45 Blink P

Table 2: List of AUs in OpenFace. I - intensity, P - presence.

Another extension we propose is to combine AU pres-

ence and intensity training datasets. Some datasets only

contain labels for action unit presence (SEMAINE [44] and

BP4D) and others contain labels for their intensities (DISFA

[41] and BP4D [69]). This makes the training on combined

datasets not straightforward. We use the distance to the hy-

perplane of the trained SVM model as a feature for an SVR

regressor. This allows us to train a single predictor using

both AU presence and intensity datasets.

3.4.2 Implementation details

In order to extract facial appearance features we used a sim-

ilarity transform from the currently detected landmarks to a

representation of frontal landmarks from a neutral expres-

sion. This results in a 112 × 112 pixel image of the face

with 45 pixel interpupilary distance (similar to Baltru

saitis

et al.[7]).

We extract Histograms of Oriented Gradients (HOGs)

features as proposed by Felzenswalb et al. [23] from the

aligned face. We use blocks of 2 × 2 cells, of 8 × 8 pix-

els, leading to 12 × 12 blocks of 31 dimensional histograms

(4464 dimensional vector describing the face). In order

to reduce the feature dimensionality we use a PCA model

trained on a number of facial expression datasets: CK+

[38], DISFA [41], AVEC 2011 [52], FERA 2011 [60], and

FERA 2015 [59]. Applying PCA to images (sub-sampling

from peak and neutral expressions) and keeping 95% of ex-

plained variability leads to a reduced basis of 1391 dimen-

sions. This allows for a generic basis, more suitable to un-

seen datasets.

OpenFace: An open source facial behavior analysis toolkit

Figures

Citations

OpenFace 2.0: Facial Behavior Analysis Toolkit

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

Tensor Fusion Network for Multimodal Sentiment Analysis

AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge

Why and how to use virtual reality to study human social interaction: The challenges of exploring a new research landscape.

References

Object Detection with Discriminatively Trained Part-Based Models

The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression

Dlib-ml: A Machine Learning Toolkit

One Millisecond Face Alignment with an Ensemble of Regression Trees

A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

Related Papers (5)

Deep Residual Learning for Image Recognition

Deep face recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Long short-term memory

Adam: A Method for Stochastic Optimization

Frequently Asked Questions (8)

Q1. What are the contributions mentioned in the paper "Openface: an open source facial behavior analysis toolkit" ?

Q2. What are the future works mentioned in the paper "Openface: an open source facial behavior analysis toolkit" ?

Q3. How many blocks of facial expressions are used?

Q4. What is the way to extract facial appearance features?

Q5. What are the use cases of saving facial behaviors using OpenFace?

Q6. How did the authors measure the performance of OpenFace on a head pose estimation task?

Q7. How many dimensions does the PCA model have?

Q8. Why is the recognition of certain AUs not as reliable as others?

Trending Questions (1)