scispace - formally typeset
Search or ask a question
Journal ArticleDOI

DEAP: A Database for Emotion Analysis ;Using Physiological Signals

TL;DR: A multimodal data set for the analysis of human affective states was presented and a novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool.
Abstract: We present a multimodal data set for the analysis of human affective states. The electroencephalogram (EEG) and peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance, and familiarity. For 22 of the 32 participants, frontal face video was also recorded. A novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool. An extensive analysis of the participants' ratings during the experiment is presented. Correlates between the EEG signal frequencies and the participants' ratings are investigated. Methods and results are presented for single-trial classification of arousal, valence, and like/dislike ratings using the modalities of EEG, peripheral physiological signals, and multimedia content analysis. Finally, decision fusion of the classification results from different modalities is performed. The data set is made publicly available and we encourage other researchers to use it for testing their own affective state estimation methods.

Summary (1 min read)

Jump to:  and [Summary]

Summary

  • Limitations, fiture research, and implications are discussed.
  • A child who has a dz#cult temperament may not evidence problem behavior ifthey haveparents who have outstandingparent management skills and are not impacted by family stressors;.
  • Causal risk factors can be changed and, when changed, they alter the risk of outcome.
  • Parent management training has been found to improve the social functioning of children at risk for ElBD (Patterson, 1982).
  • Previous research has focused on clinically identified populations (e.g., Walrath et a]., 2004).
  • The authors approved Institutional Review Board procedures did not require that the authors obtain child assent.
  • The screening process for kindergarten and first-grade participants included the first and second gates of the Early Screening Project (ESP; Walker, Severson, & Feil, 1995) and Systematic Screening for Behavior Disorders (SSBD; Walker & Severson, 1990), respectively.
  • The Parental Distress score also reflects the presence of parental depression.
  • Staffparticipated in a total of 50 hr of supervised training and practice to administer the measures as well as child outcome measures used in their study of a three-tiered behavior prevention model.

Did you find this useful? Give us your feedback

Figures (16)

Content maybe subject to copyright    Report

Article
Reference
DEAP : a Database for Emotion Analysis Using Physiological Signals
KOELSTRA, Sander, et al.
Abstract
We present a multimodal data set for the analysis of human affective states. The
electroencephalogram (EEG) and peripheral physiological signals of 32 participants were
recorded as each watched 40 one-minute long excerpts of music videos. Participants rated
each video in terms of the levels of arousal, valence, like/dislike, dominance, and familiarity.
For 22 of the 32 participants, frontal face video was also recorded. A novel method for stimuli
selection is proposed using retrieval by affective tags from the last.fm website, video highlight
detection, and an online assessment tool. An extensive analysis of the participants' ratings
during the experiment is presented. Correlates between the EEG signal frequencies and the
participants' ratings are investigated. Methods and results are presented for single-trial
classification of arousal, valence, and like/dislike ratings using the modalities of EEG,
peripheral physiological signals, and multimedia content analysis. Finally, decision fusion of
the classification results from different modalities is performed. The data set is made publicly
available and we encourage other [...]
KOELSTRA, Sander, et al. DEAP : a Database for Emotion Analysis Using Physiological
Signals. IEEE transactions on affective computing, 2012, vol. 3, no. 1, p. 18-31
DOI : 10.1109/T-AFFC.2011.15
Available at:
http://archive-ouverte.unige.ch/unige:47405
Disclaimer: layout of this document may differ from the published version.
1 / 1

IEEE TRANS. AFFECTIVE COMPUTING 1
DEAP: A Database for Emotion Analysis using
Physiological Signals
Sander Koelstra, Student Member, IEEE, Christian M¨uhl, Mohammad Soleymani, Student Member, IEEE,
Jong-Seok Lee, Member, IEEE, Ashkan Yazdani, Touradj Ebrahimi, Member, IEEE,
Thierry Pun, Member, IEEE, Anton Nijholt, Member, IEEE, Ioannis Patras, Member, IEEE
Abstract—We present a multimodal dataset for the analysis of human affective states. The electroencephalogram (EEG) and
peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos.
Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance and familiarity. For 22 of the 32
participants, frontal face video was also recorded. A novel method for stimuli selection is proposed using retrieval by affective tags
from the last.fm website, video highlight detection and an online assessment tool. An extensive analysis of the participants’ ratings
during the experiment is presented. Correlates between the EEG signal frequencies and the participants’ ratings are investigated.
Methods and results are presented for single-trial classification of arousal, valence and like/dislike ratings using the modalities of EEG,
peripheral physiological signals and multimedia content analysis. Finally, decision fusion of the classification results from the different
modalities is performed. The dataset is made publicly available and we encourage other researchers to use it for testing their own
affective state estimation methods.
Index Terms—Emotion classification, EEG, Physiological signals, Signal processing, Pattern classification, Affective computing.
1 INTRODUCTION
E
MOTION is a psycho-physiological process triggered
by conscious and/or unconscious perception of an
object or situation and is often associated with mood,
temperament, personality and disposition, and motiva-
tion. Emotions play an important role in human commu-
nication and can be expressed either verbally through
emotional vocabulary, or by expressing non-verbal cues
such as intonation of voice, facial expressions and ges-
tures. Most of the contemporary human-computer inter-
action (HCI) systems are deficient in interpreting this
information and suffer from the lac k of emotional intelli-
gence. In other words, they a re unable to identify human
emotional states and use this information in deciding
upon proper actions to execute. The goal of affective
computing is to fill this gap by detecting emotional
cues occurring during human-computer interaction and
synthesizing e motional responses.
Characterizing multimedia content with relevant, re-
liable and discriminating tags is vital for multimedia
The first three authors contributed equally to this work and are listed in
alphabetical order.
Sander Koelstra and Ioannis Patras are with the School of Computer
Science and Electronic Engineering, Q u een Mary University of London
(QMUL). E-mail: sander.koelstra@eecs.qmul.ac.uk
Christian M¨uhl and Anton Nijholt are with the Human Media Interaction
Group, University of Twente (UT).
Mohammad Soleymani and Thierry Pun are w ith the Computer Vision
and Multimedia Laboratory, University of Geneva (UniG´e).
Ashkan Yazdani, Jong-Seok Lee and Touradj Ebrahimi are with the Multi-
media Signal Processing Group, Ecole Polytechnique F´ed´erale de Lausanne
(EPFL).
information retrieval. Affective characteristics of multi-
media are important features for describing multime-
dia content and can be p resented by such emotional
tags. Implicit affective tagging refers to the effortless
generation of subjective and/or emotional tags. Implicit
tagging of videos using affective information c an help
recommendation and retrieval systems to improve their
performance [1]–[3]. The current dataset is recorded with
the goal of creating an adaptive music video recommen-
dation system. In our proposed music video recommen-
dation system, a user’s bodily responses will be trans-
lated to emotions. The emotions of a user while watc hing
music vide o clips will help the recommender system to
first understand user’s taste and then to recommend a
music clip which matches users current emotion.
The presented database explores the possibility to
classify emotion dimensions induced by showing music
videos to different users. To the best of our knowledge,
the responses to this stimuli (music video clips) have
never been explored before, and the research in this
field was mainly focused on images, music or non-music
video segments [4], [5]. In an adaptive music video
recommender, an emotion recognizer trained by phys-
iological responses to the content from similar nature,
music vide os, is be tte r able to fulfill its goal.
Various discrete categorizations of emotions have been
proposed, such as the six basic emotions proposed by
Ekman and Frie sen [6] and the tree structure of emotions
proposed by Pa rrot [7]. Dimensional scales of emotion
have also been proposed, such a s Plutchik’s emotion
wheel [8] and the valence-arousal scale by Russell [9].
In this work, we use Russell’s valence-arousal scale,

IEEE TRANS. AFFECTIVE COMPUTING 2
widely used in resea rch on affect, to quantitatively
describe emotions. In this scale , each emotional state
can be p laced on a two-dimensional plane with arousal
and valence as the horizontal and vertical ax es. While
arousal and valence explain most of the variation in
emotional states, a third dimension of dominance can
also be included in the model [9]. Arousal can range f rom
inactive (e.g. uninterested, bored) to active (e.g. alert,
excited), whereas valence ranges from unpleasant (e.g.
sad, stressed) to pleasant (e.g. happy, elate d). Dominance
ranges from a helpless and weak feeling (without con-
trol) to an empowered feeling (in control of everything).
For self-assessment along these scales, we use the well-
known self -assessment manikins (SAM) [10].
Emotion assessment is often c arried out through anal-
ysis of users’ emotiona l expressions and/or physiolog-
ical signals. Emotional expressions refer to any observ-
able verbal and non-verbal behavior that communicates
emotion. So far, most of the studies on emotion as-
sessment have focused on the analysis of facial expres-
sions and spee ch to determine a p erson’s emotional
state. Physiologica l signals are also known to include
emotional information that can be used for emotion
assessment but they have received less attention. They
comprise the signals originating from the central nervous
system (CNS) and the peripheral nervous system (PNS).
Recent advances in emotion recognition have mo-
tivated the creation of novel databases containing
emotional expressions in different modalities. These
databases mostly cover speech, visual, or audiovisual
data (e.g. [11]–[1 5]). The visual modality includes facial
expressions and/or body gestures. The audio modality
covers posed or genuine emotional speech in different
languages. M any of the existing visual databases include
only posed or de liberately expressed emotions.
Healey [ 16], [17] recorded one of the first affective
physiological datasets. She recorded 24 participants driv-
ing around the Boston area and annotated the dataset
by the drivers’ stress level. 17 Of the 24 participant
responses are publicly available
1
. Her recordings include
electrocardiogram (ECG), galvanic skin response (GSR)
recorded from hands and feet, electromyogram (EMG)
from the right trapezius muscle and respiration pa tte rns.
To the best of our knowledge, the only publicly avail-
able multi-modal emotional databases which includes
both physiological responses and facial expressions a re
the enterface 2005 emotional database and MAHNOB
HCI [4], [5]. The first one was recorde d by Savran
et al [5]. This database includes two sets. The first
set has ele ctroencephalogram (EEG), peripheral physi-
ological signals, functional near infra -red spectroscopy
(fNIRS) and facial v ideos from 5 male participants. The
second dataset only has fNIRS and facial videos from 16
participants of both genders. Both databases recorded
spontaneous responses to emotional images from the
international affective picture system (IAPS) [18]. An
1. http://www.physionet.org/pn3/drivedb/
extensive review of affective audiovisual da tabases can
be found in [13], [19]. The MAHNOB HCI database [4]
consists of two experiments. The responses including,
EEG, physiological signals, eye gaze, audio and facial
expressions of 30 people were recorded. The first exper-
iment was watching 20 emotional video extracted from
movies and online repositories. The second experiment
was tag agreement expe riment in which images and
short videos with human actions were shown the partic-
ipants first without a tag and then with a displayed tag.
The tags were either correct or incorrect a nd par ticipants’
agreement with the displayed tag was assessed.
There has been a large number of published works
in the domain of emotion recognition from physiologi-
cal signals [16], [20]– [24]. Of these studies, only a few
achieved notable results using video stimuli. Lisetti and
Nasoz used p hysiological responses to recognize emo-
tions in response to movie scenes [23]. The movie scenes
were selected to elicit six emotions, namely sadness,
amusement, fear, anger, frustration and surprise. They
achieved a high recognition rate of 84% for the recog-
nition of these six emotions. However, the classification
was based on the analysis of the signals in response to
pre-selected segments in the shown video known to be
related to highly emotional events.
Some efforts have been made towards implicit affec-
tive tagging of multimedia content. Kierkels et al. [25]
proposed a method for personalized affective tagging
of multimedia using peripheral physiological signals.
Valence and arousal lev els of participants’ emotions
when watching videos were computed from physiolog-
ical responses using linear regression [26]. Quantized
arousal and valence levels for a clip were then mapped
to emotion labels. This mapping enabled the retrieval of
video clips based on keyword queries. So far this novel
method achieved low precision.
Yazdani et al. [27] proposed using a brain computer
interface (BCI) based on P300 evoked potentials to emo-
tionally tag videos with one of the six Ekman basic
emotions [28]. Their system wa s trained with 8 partici-
pants and then tested on 4 others. They achieved a high
accuracy on selecting tags. However, in their proposed
system, a BCI only replaces the interface for explicit
expression of emotional tags, i.e. the method does not
implicitly tag a multimedia item using the participant’s
behavioral and psycho-physiological responses.
In a ddition to implicit tagging using behavioral
cues, multiple studies used multimedia content analy-
sis (MCA) for automated affective tagging of videos.
Hanjalic et al. [29] introduced personalized content
delivery” as a valuable tool in affective indexing and
retrieval systems. In order to represent affect in video,
they first selected video- a nd audio- content features
based on their relation to the valence-arousal space.
Then, arising emotions were estimated in this space by
combining these features. While valence-arousal could
be used separ ately for indexing, they combined these
values by following their te mporal pattern. This allowed

IEEE TRANS. AFFECTIVE COMPUTING 3
for determining an affect curve, shown to be useful for
extracting vide o highlights in a movie or sports v id eo.
Wang and Cheong [30] used audio and video features
to classify basic emotions elicited by movie scenes. A u-
dio was classified into music, speech and environment
signals and these were treated separately to shape an
aural affective feature vector. The aura l affective vector
of each scene was fused with v ideo-based features such
as key lighting and visual excitement to form a scene
feature vector. Finally, using the scene feature vectors,
movie scenes were classified and labeled with emotions.
Soleymani et. al proposed a scene affective c haracter-
ization using a Bayesian framework [31]. Arousal and
valence of each shot were first determined using linear
regression. Then, arousal and valence values in addition
to content features of each scene were used to classify
every scene into three classes, namely calm, excited pos-
itive and excited negative. The Bayesian framework was
able to incorporate the movie genre and the predicted
emotion from the last scene or temporal information to
improve the classification accuracy.
There are also various studies on music affective char-
acterization from acoustic features [32]–[34]. Rhythm,
tempo, Mel-frequency cepstral coefficients (MFCC),
pitch, zero crossing ra te are amongst common features
which have been used to character ize affect in music.
A p ilot study for the current work wa s presented in
[35]. In that study, 6 participants’ EEG and physiological
signals were recorded as each watched 20 music videos.
The participants rate d arousal and valence levels and
the EEG and physiological signals for e ach video were
classified into low/high arousal/valence classes.
In the current work, music video clips are used as the
visual stimuli to elicit d ifferent emotions. To this end,
a relatively large set of music video clips was gathered
using a novel stimuli selection method. A subjective test
was then performed to select the most appropriate test
material. For each video, a one-minute highlight was
selected automatically. 32 participants took part in the
experiment and their EEG and peripheral physiological
signals were recorded as they watched the 40 selected
music videos. Participants rated each video in terms of
arousal, valence, like/dislike, dominance and familiarity.
For 22 participants, frontal face video was also recorded.
This paper aims at introducing this publicly availa b le
2
database. The database contains all recorded signal data,
frontal face video for a subset of the participants a nd
subjective ratings f rom the participants. Also included
is the subjective ratings from the initial online subjective
annotation and the list of 120 videos used. Due to
licensing issues, we are not able to include the actual
videos, but YouTube links are included. Table 1 gives an
overview of the database contents.
To the best of our knowledge, this database has the
highest number of participants in publicly available
databases for analysis of spontaneous emotions from
2. http://www.eecs.qmul.ac.uk/mmv/datasets/deap/
TABLE 1
Database content summary
Online subjective annotation
Number of videos 120
Video duration 1 minute affective highlight (section 2.2)
Selection method
60 via last.fm affective tags,
60 manually selected
No. of ratings per video 14 - 16
Rating scales
Arousal
Valence
Dominance
Rating values Discrete scale of 1 - 9
Physiological Experiment
Number of participants 32
Number of videos 40
Selection method Subset of online annotated videos with
clearest responses (see section 2.3)
Rating scales
Arousal
Valence
Dominance
Liking (how much do you like the video?)
Familiarity (how well do you know the video?)
Rating values
Familiarity: discrete scale of 1 - 5
Others: continuous scale of 1 - 9
Recorded signals
32-channel 512Hz EEG
Peripheral physiological signals
Face video (for 22 participants)
physiological signals. In addition, it is the only database
that uses music vid eos as emotional stimuli.
We present an extensive statistical analysis of the
participant’s ratings and of the c orrelates between the
EEG signals and the ra tings. Preliminary single trial
classification results of EEG, peripheral physiological
signals and MC A are presented and compared. Finally,
a fusion algorithm is utilized to combine the results of
each modality and arrive at a more robust decision.
The layout of the paper is a s follows. In Section 2
the stimuli sele ction procedure is described in detail.
The experiment setup is covered in Section 3. Section
4 provides a statistical analysis of the ratings given by
participants during the experiment and a validation of
our stimuli selection method. In Section 5, correlates be-
tween the EEG frequencies and the participants’ ratings
are presented. The method and results of single-trial
classification a re given in Se c tion 6. The conclusion of
this work follows in Section 7.
2 STIMULI SELECTION
The stimuli used in the experiment were selected in
several steps. First, we selected 120 initial stimuli, half
of which were chosen semi-automatically and the rest
manually. Then, a one-minute highlight part was deter-
mined for each stimulus. Finally, through a web-based
subjective assessment experiment, 40 final stimuli were
selected. Eac h of these steps is explained below.

IEEE TRANS. AFFECTIVE COMPUTING 4
2.1 Initial stimuli selection
Eliciting emotional reactions from test par ticipants is a
difficult task and selecting the most effective stimulus
materials is crucial. We propose here a semi-automated
method for stimulus selection, with the goal of minimiz-
ing the bias arising from manual stimuli selection.
60 of the 120 initially selected stimuli were selected
using the Last.fm
3
music enthusiast website. Last.fm
allows users to track their music listening habits and
receive recommendations for new music and events.
Additionally, it allows the users to a ssign tags to individ-
ual songs, thus creating a folksonomy of tags. Many of
the tags carry emotional meanings, such as ’depressing’
or ’aggressive’. Last.fm offers an API, allowing one to
retrieve tags and tagged songs.
A list of emotional keywords was taken from [7] a nd
expanded to include inflections and synonyms, yielding
304 keywords. Next, for ea c h keyword, corresponding
tags were found in the Last.fm database. For each found
affective tag, the ten songs most often labeled with this
tag were selected. This resulted in a total of 1084 songs.
The vale nce -arousal space can be subdivided into 4
quadrants, namely low arousal/low va le nce (LALV), low
arousal/high valence (LAHV), high arousal/low v alence
(HALV) and high arousal/high valence (HAHV). In
order to ensure diversity of induced emotions, from the
1084 songs, 15 were selected manually for each quadrant
according to the following criteria:
Does th e tag accuratel y reflect the emotional content?
Examples of songs subjectively rejected according to this
criterium include songs that are tagged merely because
the song title or artist na me corresponds to the tag.
Also, in some cases the lyrics may correspond to the tag,
but the actual emotional c ontent of the song is entirely
different (e.g. happy songs about sad topics).
Is a music video available for the song?
Music videos for the songs were automatically retrieved
from YouTube, corrected manually where necessary.
However, many songs do not have a music video.
Is the song appropriate for use in t he experiment?
Since our test participants were mostly European stu-
dents, we selected those songs most likely to elicit
emotions for this target demographic. Therefore, mainly
European or North American artists were selected.
In addition to the songs selected using the method
described above, 60 stimulus videos were selected man-
ually, with 15 videos selected for each of the quadrants
in the arousal/valence space. The goal here was to select
those videos expected to induce the most clear e motional
reactions for each of the quadrants. The combination
of manual selection and selection using affective tags
produced a list of 120 candidate stimulus videos.
2.2 Detection of one-minute highlights
For each of the 120 initially selected music videos, a one
minute segment for use in the experiment was extra cted.
3. http://www.last.fm
In order to extract a segment with max imum emotional
content, an affective highlighting algorithm is proposed.
Soleymani et al. [31] used a linear regression method
to calculate arousal for each shot of in movies. In their
method, the arousal and valence of shots was computed
using a linear regression on the content-based features.
Informative fe atures for arousal estimation include loud-
ness and energy of the audio signals, motion component,
visual excitement and shot duration. The same approach
was used to compute valence. There are other content
features such as color variance and key lighting that
have been shown to be correlated with valence [30]. The
detailed description of the content features used in this
work is given in Section 6.2.
In order to find the best weights for arousal and
valence estimation using regression, the regressors were
trained on a ll shots in 21 annotated movies in the dataset
presented in [31]. T he linear weights were computed by
means of a relevance vector machine (RVM) from the
RVM toolbox provided by Tipping [36]. The RVM is able
to reject uninformative features dur ing its training hence
no f urther feature selection wa s used for arousal and
valence determination.
The music vide os were then se gmented into one
minute segments with 55 seconds overlap between seg-
ments. Content features were extra cted and provided the
input for the regressors. The emotional highlight score
of the i-th segment e
i
was computed using the following
equation:
e
i
=
q
a
2
i
+ v
2
i
(1)
The arousal, a
i
, and valence, v
i
, were centered. There-
fore, a smaller emotional highlight score (e
i
) is closer
to the neutral state. For each video, the one minute
long segment with the highest emotional highlight score
was chosen to be extracted for the experiment. For a
few clips, the automatic affective highlight detection was
manually overridde n. This was done only for songs with
segments that are particularly characteristic of the song,
well-known to the public, and most likely to elicit emo-
tional reactions. In these cases, the one-minute highlight
was selected so that these segments were included.
Given the 1 20 one-minute music video segments, the
final selection of 40 videos used in the experiment was
made on the basis of subjective ratings by volunteers, as
described in the next section.
2.3 Online subjective annotation
From the initial collection of 120 stimulus videos, the
final 40 test video clips were chosen by using a web-
based subjective emotion assessment interface. Partici-
pants watched music videos and rated them on a discrete
9-point scale f or valence, arousal and dominance. A
screenshot of the interface is shown in Fig. 1. Each
participant watched as many videos as he/she wanted
and was able to end the rating at any time. The order of

Citations
More filters
Journal ArticleDOI
TL;DR: Results show the potential uses of the recorded modalities and the significance of the emotion elicitation protocol and single modality and modality fusion results for both emotion recognition and implicit tagging experiments are reported.
Abstract: MAHNOB-HCI is a multimodal database recorded in response to affective stimuli with the goal of emotion recognition and implicit tagging research. A multimodal setup was arranged for synchronized recording of face videos, audio signals, eye gaze data, and peripheral/central nervous system physiological signals. Twenty-seven participants from both genders and different cultural backgrounds participated in two experiments. In the first experiment, they watched 20 emotional videos and self-reported their felt emotions using arousal, valence, dominance, and predictability as well as emotional keywords. In the second experiment, short videos and images were shown once without any tag and then with correct or incorrect tags. Agreement or disagreement with the displayed tags was assessed by the participants. The recorded videos and bodily responses were segmented and stored in a database. The database is made available to the academic community via a web-based system. The collected data were analyzed and single modality and modality fusion results for both emotion recognition and implicit tagging experiments are reported. These results show the potential uses of the recorded modalities and the significance of the emotion elicitation protocol.

1,162 citations


Cites background from "DEAP: A Database for Emotion Analys..."

  • ...An area of commerce that could obviously benefit from an automatic understanding of human emotional experience is the multimedia sector....

    [...]

Journal ArticleDOI
TL;DR: The experiment results show that neural signatures associated with different emotions do exist and they share commonality across sessions and individuals, and the performance of deep models with shallow models is compared.
Abstract: To investigate critical frequency bands and channels, this paper introduces deep belief networks (DBNs) to constructing EEG-based emotion recognition models for three emotions: positive, neutral and negative. We develop an EEG dataset acquired from 15 subjects. Each subject performs the experiments twice at the interval of a few days. DBNs are trained with differential entropy features extracted from multichannel EEG data. We examine the weights of the trained DBNs and investigate the critical frequency bands and channels. Four different profiles of 4, 6, 9, and 12 channels are selected. The recognition accuracies of these four profiles are relatively stable with the best accuracy of 86.65%, which is even better than that of the original 62 channels. The critical frequency bands and channels determined by using the weights of trained DBNs are consistent with the existing observations. In addition, our experiment results show that neural signatures associated with different emotions do exist and they share commonality across sessions and individuals. We compare the performance of deep models with shallow models. The average accuracies of DBN, SVM, LR, and KNN are 86.08%, 83.99%, 82.70%, and 72.60%, respectively.

1,131 citations


Additional excerpts

  • ...The DEAP dataset includes the EEG and peripheral physiological signals of 32 participants when watching 40 one-minute music videos....

    [...]

  • ...To the best of our knowledge, the popular publicly available emotional EEG datasets are MAHNOB HCI [3] and DEAP [45]....

    [...]

Journal ArticleDOI
TL;DR: This first of its kind, comprehensive literature review of the diverse field of affective computing focuses mainly on the use of audio, visual and text information for multimodal affect analysis, and outlines existing methods for fusing information from different modalities.

969 citations

Journal ArticleDOI
TL;DR: AffectNet is by far the largest database of facial expression, valence, and arousal in the wild enabling research in automated facial expression recognition in two different emotion models and various evaluation metrics show that the deep neural network baselines can perform better than conventional machine learning methods and off-the-shelf facial expressions recognition systems.
Abstract: Automated affective computing in the wild setting is a challenging problem in computer vision. Existing annotated databases of facial expressions in the wild are small and mostly cover discrete emotions (aka the categorical model). There are very limited annotated facial databases for affective computing in the continuous dimensional model (e.g., valence and arousal). To meet this need, we collected, annotated, and prepared for public distribution a new database of facial emotions in the wild (called AffectNet). AffectNet contains more than 1,000,000 facial images from the Internet by querying three major search engines using 1250 emotion related keywords in six different languages. About half of the retrieved images were manually annotated for the presence of seven discrete facial expressions and the intensity of valence and arousal. AffectNet is by far the largest database of facial expression, valence, and arousal in the wild enabling research in automated facial expression recognition in two different emotion models. Two baseline deep neural networks are used to classify images in the categorical model and predict the intensity of valence and arousal. Various evaluation metrics show that our deep neural network baselines can perform better than conventional machine learning methods and off-the-shelf facial expression recognition systems.

937 citations


Cites background or methods from "DEAP: A Database for Emotion Analys..."

  • ...The Database for Emotion Analysis using Physiological Signals (DEAP) [26] consists of spontaneous reactions of 32 participants in response to one-minute long music video clip....

    [...]

  • ...[26] DEAP - Gaussian naive Bayes classifier - EEG, physiological signals, and multimedia features - Binary classification of low/high arousal, valence, and liking - 0....

    [...]

  • ...DEAP [26] - 40 one-minute long videos shown to subjects - EEG signals recorded - 32 - Controlled - Spontaneous - Valence and arousal (continuous) - Self assessment...

    [...]

  • ...TheDatabase for Emotion Analysis using Physiological Signals (DEAP) [26] consists of spontaneous reactions of 32 participants in response to one-minute long music video clip....

    [...]

  • ...DEAP is a great database to study the relation of biological signals and dimensional affect, however, it has only a few subjects and the videos are captured in lab controlled settings....

    [...]

Journal ArticleDOI
TL;DR: Practical suggestions on the selection of many hyperparameters are provided in the hope that they will promote or guide the deployment of deep learning to EEG datasets in future research.
Abstract: Objective Electroencephalography (EEG) analysis has been an important tool in neuroscience with applications in neuroscience, neural engineering (e.g. Brain-computer interfaces, BCI's), and even commercial applications. Many of the analytical tools used in EEG studies have used machine learning to uncover relevant information for neural classification and neuroimaging. Recently, the availability of large EEG data sets and advances in machine learning have both led to the deployment of deep learning architectures, especially in the analysis of EEG signals and in understanding the information it may contain for brain functionality. The robust automatic classification of these signals is an important step towards making the use of EEG more practical in many applications and less reliant on trained professionals. Towards this goal, a systematic review of the literature on deep learning applications to EEG classification was performed to address the following critical questions: (1) Which EEG classification tasks have been explored with deep learning? (2) What input formulations have been used for training the deep networks? (3) Are there specific deep learning network structures suitable for specific types of tasks? Approach A systematic literature review of EEG classification using deep learning was performed on Web of Science and PubMed databases, resulting in 90 identified studies. Those studies were analyzed based on type of task, EEG preprocessing methods, input type, and deep learning architecture. Main results For EEG classification tasks, convolutional neural networks, recurrent neural networks, deep belief networks outperform stacked auto-encoders and multi-layer perceptron neural networks in classification accuracy. The tasks that used deep learning fell into five general groups: emotion recognition, motor imagery, mental workload, seizure detection, event related potential detection, and sleep scoring. For each type of task, we describe the specific input formulation, major characteristics, and end classifier recommendations found through this review. Significance This review summarizes the current practices and performance outcomes in the use of deep learning for EEG classification. Practical suggestions on the selection of many hyperparameters are provided in the hope that they will promote or guide the deployment of deep learning to EEG datasets in future research.

777 citations


Cites background or methods from "DEAP: A Database for Emotion Analys..."

  • ...The datasets that are used in more than one study within this review include an emotion recognition dataset ([56], various subsets used in ten studies), a mental workload dataset ([57], used in four studies), two motor imagery datasets from the same BCI competition ([58], one dataset used in six studies and the second dataset used in two studies), a seizure detection dataset ([33], used in three studies), and two sleep stage scoring datasets ([59, 60], which were used in groups of two studies and three studies, respectively)....

    [...]

  • ...The common emotion recognition dataset (DEAP) [56], the dataset analyzed by the highest number of studies, is a collection of EEG and peripheral signals from 32 subjects participating in a human affective state task....

    [...]

  • ...A comparison of architecture and input choices across studies using the publicly available DEAP [56] dataset....

    [...]

References
More filters
Journal ArticleDOI

12,519 citations


Additional excerpts

  • ...Ç...

    [...]

Journal ArticleDOI
TL;DR: Reports of affective experience obtained using SAM are compared to the Semantic Differential scale devised by Mehrabian and Russell (An approach to environmental psychology, 1974), which requires 18 different ratings.

7,472 citations


"DEAP: A Database for Emotion Analys..." refers methods in this paper

  • ...For self-assessment along these scales, we use the wellknown self-assessment manikins (SAM) [10]....

    [...]

  • ...For selfassessment along these scales, we use the well-known selfassessment manikins (SAM) [10]....

    [...]

  • ...From the top: Valence SAM, arousal SAM, dominance SAM, and liking....

    [...]

Book
01 Jan 1997
TL;DR: Key issues in affective computing, " computing that relates to, arises from, or influences emotions", are presented and new applications are presented for computer-assisted learning, perceptual information retrieval, arts and entertainment, and human health and interaction.
Abstract: Computers are beginning to acquire the ability to express and recognize affect, and may soon be given the ability to " have emotions. " The essential role of emotion in both human cognition and perception, as demonstrated by recent neurological studies, indicates that affective computers should not only provide better performance in assisting humans, but also might enhance computers' abilities to make decisions. This paper presents and discusses key issues in " affective computing, " computing that relates to, arises from, or influences emotions. Models are suggested for computer recognition of human emotion, and new applications are presented for computer-assisted learning, perceptual information retrieval, arts and entertainment, and human health and interaction. Affective computing, coupled with new wear-able computers, will also provide the ability to gather new data necessary for advances in emotion and cog-nition theory. Nothing in life is to be feared. It is only to be understood. – Marie Curie Emotions have a stigma in science; they are believed to be inherently non-scientific. Scientific principles are derived from rational thought, logical arguments, testable hypotheses, and repeatable experiments. There is room alongside science for " non-interfering " emotions such as those involved in curiosity, frustration, and the pleasure of discovery. In fact, much scientific research has been prompted by fear. Nonetheless, the role of emotions is marginalized at best. Why bring " emotion " or " affect " into any of the deliberate tools of science? Moreover, shouldn't it be completely avoided when considering properties to design into computers? After all, computers control significant parts of our lives – the phone system, the stock market, nuclear power plants, jet landings, and more. Who wants a computer to be able to " feel angry " at them? To feel contempt for any living thing? In this essay I will submit for discussion a set of ideas on what I call " affective computing, " computing that relates to, arises from, or influences emotions. This will need some further clarification which I shall attempt below. I should say up front that I am not proposing the pursuit of computerized cingulotomies 1 or even into the business of building " emotional computers ". 1 The making of small wounds in the ridge of the limbic system known as the cingulate gyrus, a surgical procedure to aid severely depressed patients. Nor will I propose answers to the difficult and intriguing questions , " …

5,700 citations


Additional excerpts

  • ...are related to valence [58]....

    [...]

Journal Article

5,359 citations


Additional excerpts

  • ...Ç...

    [...]

Frequently Asked Questions (11)
Q1. What are some common features used to characterize affect in music?

tempo, Mel-frequency cepstral coefficients (MFCC), pitch, zero crossing rate are amongst common features which have been used to characterize affect in music. 

The authors present a multimodal data set for the analysis of human affective states. An extensive analysis of the participants ' ratings during the experiment is presented. 

To test for significance, an independent one-sample t-test was performed, comparing the F1-distribution over participants to the 0.5 baseline. 

There are other content features such as color variance and key lighting that have been shown to be correlated with valence [30]. 

The valence-arousal space can be subdivided into 4 quadrants, namely low arousal/low valence (LALV), low arousal/high valence (LAHV), high arousal/low valence (HALV) and high arousal/high valence (HAHV). 

Of the 40 selected videos, 17 were selected via Last.fm affective tags, indicating that useful stimuli can be selected via this method. 

Physiological signals are also known to include emotional information that can be used for emotion assessment but they have received less attention. 

To the best of their knowledge, the only publicly available multi-modal emotional databases which includes both physiological responses and facial expressions are the enterface 2005 emotional database and MAHNOB HCI [4], [5]. 

The emotional highlight score of the i-th segment ei was computed using the following equation:ei = √ a2 i + v2 i (1)The arousal, ai, and valence, vi, were centered. 

after the experiment, participants were asked to rate their familiarity with each of the songs on a scale of 1 (”Never heard it before the experiment”) to 5 (”Knew the song very well”). 

The participants rated arousal and valence levels and the EEG and physiological signals for each video were classified into low/high arousal/valence classes. 

Trending Questions (1)
What are the different emotion datasets?

The paper mentions several emotion datasets, including one recorded by Healey with physiological signals and others with speech, visual, or audiovisual data.