scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Affective Computing in 2012"


Journal ArticleDOI
TL;DR: A multimodal data set for the analysis of human affective states was presented and a novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool.
Abstract: We present a multimodal data set for the analysis of human affective states. The electroencephalogram (EEG) and peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance, and familiarity. For 22 of the 32 participants, frontal face video was also recorded. A novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool. An extensive analysis of the participants' ratings during the experiment is presented. Correlates between the EEG signal frequencies and the participants' ratings are investigated. Methods and results are presented for single-trial classification of arousal, valence, and like/dislike ratings using the modalities of EEG, peripheral physiological signals, and multimedia content analysis. Finally, decision fusion of the classification results from different modalities is performed. The data set is made publicly available and we encourage other researchers to use it for testing their own affective state estimation methods.

3,013 citations


Journal ArticleDOI
TL;DR: Results show the potential uses of the recorded modalities and the significance of the emotion elicitation protocol and single modality and modality fusion results for both emotion recognition and implicit tagging experiments are reported.
Abstract: MAHNOB-HCI is a multimodal database recorded in response to affective stimuli with the goal of emotion recognition and implicit tagging research. A multimodal setup was arranged for synchronized recording of face videos, audio signals, eye gaze data, and peripheral/central nervous system physiological signals. Twenty-seven participants from both genders and different cultural backgrounds participated in two experiments. In the first experiment, they watched 20 emotional videos and self-reported their felt emotions using arousal, valence, dominance, and predictability as well as emotional keywords. In the second experiment, short videos and images were shown once without any tag and then with correct or incorrect tags. Agreement or disagreement with the displayed tags was assessed by the participants. The recorded videos and bodily responses were segmented and stored in a database. The database is made available to the academic community via a web-based system. The collected data were analyzed and single modality and modality fusion results for both emotion recognition and implicit tagging experiments are reported. These results show the potential uses of the recorded modalities and the significance of the emotion elicitation protocol.

1,162 citations


Journal ArticleDOI
TL;DR: A large audiovisual database is created as a part of an iterative approach to building Sensitive Artificial Listener agents that can engage a person in a sustained, emotionally colored conversation.
Abstract: SEMAINE has created a large audiovisual database as a part of an iterative approach to building Sensitive Artificial Listener (SAL) agents that can engage a person in a sustained, emotionally colored conversation. Data used to build the agents came from interactions between users and an "operator” simulating a SAL agent, in different configurations: Solid SAL (designed so that operators displayed an appropriate nonverbal behavior) and Semi-automatic SAL (designed so that users' experience approximated interacting with a machine). We then recorded user interactions with the developed system, Automatic SAL, comparing the most communicatively competent version to versions with reduced nonverbal skills. High quality recording was provided by five high-resolution, high-framerate cameras, and four microphones, recorded synchronously. Recordings total 150 participants, for a total of 959 conversations with individual SAL characters, lasting approximately 5 minutes each. Solid SAL recordings are transcribed and extensively annotated: 6-8 raters per clip traced five affective dimensions and 27 associated categories. Other scenarios are labeled on the same pattern, but less fully. Additional information includes FACS annotation on selected extracts, identification of laughs, nods, and shakes, and measures of user engagement with the automatic system. The material is available through a web-accessible database.

627 citations


Journal ArticleDOI
TL;DR: The results over a population of 24 participants demonstrate that user-independent emotion recognition can outperform individual self-reports for arousal assessments and do not underperform for valence assessments.
Abstract: This paper presents a user-independent emotion recognition method with the goal of recovering affective tags for videos using electroencephalogram (EEG), pupillary response and gaze distance. We first selected 20 video clips with extrinsic emotional content from movies and online resources. Then, EEG responses and eye gaze data were recorded from 24 participants while watching emotional video clips. Ground truth was defined based on the median arousal and valence scores given to clips in a preliminary study using an online questionnaire. Based on the participants' responses, three classes for each dimension were defined. The arousal classes were calm, medium aroused, and activated and the valence classes were unpleasant, neutral, and pleasant. One of the three affective labels of either valence or arousal was determined by classification of bodily responses. A one-participant-out cross validation was employed to investigate the classification performance in a user-independent approach. The best classification accuracies of 68.5 percent for three labels of valence and 76.4 percent for three labels of arousal were obtained using a modality fusion strategy and a support vector machine. The results over a population of 24 participants demonstrate that user-independent emotion recognition can outperform individual self-reports for arousal assessments and do not underperform for valence assessments.

582 citations


Journal ArticleDOI
TL;DR: This is the first survey of the domain that jointly considers its three major aspects, namely, modeling, analysis, and synthesis of social behavior, which investigates laws and principles underlying social interaction, and explores approaches for automatic understanding of social exchanges recorded with different sensors.
Abstract: Social Signal Processing is the research domain aimed at bridging the social intelligence gap between humans and machines. This paper is the first survey of the domain that jointly considers its three major aspects, namely, modeling, analysis, and synthesis of social behavior. Modeling investigates laws and principles underlying social interaction, analysis explores approaches for automatic understanding of social exchanges recorded with different sensors, and synthesis studies techniques for the generation of social behavior via various forms of embodiment. For each of the above aspects, the paper includes an extensive survey of the literature, points to the most important publicly available resources, and outlines the most fundamental challenges ahead.

398 citations


Journal ArticleDOI
TL;DR: This work brings to the table the ECG signal and presents a thorough analysis of its psychological properties, differentiates for the first time between active and passive arousal, and advocates that there are higher chances of ECG reactivity to emotion when the induction method is active for the subject.
Abstract: Emotion modeling and recognition has drawn extensive attention from disciplines such as psychology, cognitive science, and, lately, engineering. Although a significant amount of research has been done on behavioral modalities, less explored characteristics include the physiological signals. This work brings to the table the ECG signal and presents a thorough analysis of its psychological properties. The fact that this signal has been established as a biometric characteristic calls for subject-dependent emotion recognizers that capture the instantaneous variability of the signal from its homeostatic baseline. A solution based on the empirical mode decomposition is proposed for the detection of dynamically evolving emotion patterns on ECG. Classification features are based on the instantaneous frequency (Hilbert-Huang transform) and the local oscillation within every mode. Two experimental setups are presented for the elicitation of active arousal and passive arousal/valence. The results support the expectations for subject specificity, as well as demonstrating the feasibility of determining valence out of the ECG morphology (up to 89 percent for 44 subjects). In addition, this work differentiates for the first time between active and passive arousal, and advocates that there are higher chances of ECG reactivity to emotion when the induction method is active for the subject.

357 citations


Journal ArticleDOI
TL;DR: The current questions asked by synchrony evaluation and the state-of-the-art related methods are emphasized and the noncomputational and computational approaches of annotating, evaluating, and modeling interactional synchrony are reviewed.
Abstract: Synchrony refers to individuals' temporal coordination during social interactions. The analysis of this phenomenon is complex, requiring the perception and integration of multimodal communicative signals. The evaluation of synchrony has received multidisciplinary attention because of its role in early development, language learning, and social connection. Originally studied by developmental psychologists, synchrony has now captured the interest of researchers in such fields as social signal processing, robotics, and machine learning. This paper emphasizes the current questions asked by synchrony evaluation and the state-of-the-art related methods. First, we present definitions and functions of synchrony in youth and adulthood. Next, we review the noncomputational and computational approaches of annotating, evaluating, and modeling interactional synchrony. Finally, the current limitations and future research directions in the fields of developmental robotics, social robotics, and clinical studies are discussed.

350 citations


Journal ArticleDOI
TL;DR: An automatic multiclass arousal/valence classifier is implemented comparing performance when extracted features from nonlinear methods are used as an alternative to standard features and results show that, when nonlinearly extracted features are used, the percentages of successful recognition dramatically increase.
Abstract: This paper reports on a new methodology for the automatic assessment of emotional responses. More specifically, emotions are elicited in agreement with a bidimensional spatial localization of affective states, that is, arousal and valence dimensions. A dedicated experimental protocol was designed and realized where specific affective states are suitably induced while three peripheral physiological signals, i.e., ElectroCardioGram (ECG), ElectroDermal Response (EDR), and ReSPiration activity (RSP), are simultaneously acquired. A group of 35 volunteers was presented with sets of images gathered from the International Affective Picture System (IAPS) having five levels of arousal and five levels of valence, including a neutral reference level in both. Standard methods as well as nonlinear dynamic techniques were used to extract sets of features from the collected signals. The goal of this paper is to implement an automatic multiclass arousal/valence classifier comparing performance when extracted features from nonlinear methods are used as an alternative to standard features. Results show that, when nonlinearly extracted features are used, the percentages of successful recognition dramatically increase. A good recognition accuracy (>;90 percent) after 40-fold cross-validation steps for both arousal and valence classes was achieved by using the Quadratic Discriminant Classifier (QDC).

216 citations


Journal ArticleDOI
TL;DR: A fully autonomous integrated real-time system which combines incremental analysis of user behavior, dialogue management, and synthesis of speaker and listener behavior of a SAL character displayed as a virtual agent is described.
Abstract: This paper describes a substantial effort to build a real-time interactive multimodal dialogue system with a focus on emotional and nonverbal interaction capabilities. The work is motivated by the aim to provide technology with competences in perceiving and producing the emotional and nonverbal behaviors required to sustain a conversational dialogue. We present the Sensitive Artificial Listener (SAL) scenario as a setting which seems particularly suited for the study of emotional and nonverbal behavior since it requires only very limited verbal understanding on the part of the machine. This scenario allows us to concentrate on nonverbal capabilities without having to address at the same time the challenges of spoken language understanding, task modeling, etc. We first report on three prototype versions of the SAL scenario in which the behavior of the Sensitive Artificial Listener characters was determined by a human operator. These prototypes served the purpose of verifying the effectiveness of the SAL scenario and allowed us to collect data required for building system components for analyzing and synthesizing the respective behaviors. We then describe the fully autonomous integrated real-time system we created, which combines incremental analysis of user behavior, dialogue management, and synthesis of speaker and listener behavior of a SAL character displayed as a virtual agent. We discuss principles that should underlie the evaluation of SAL-type systems. Since the system is designed for modularity and reuse and since it is publicly available, the SAL system has potential as a joint research tool in the affective computing research community.

196 citations


Journal ArticleDOI
TL;DR: This paper proposes an approach for the automatic prediction of the traits the listeners attribute to a speaker they never heard before and shows that it is possible to predict with high accuracy whether a person is perceived to be in the upper or lower part of the scales corresponding to each of the Big -Five, the personality dimensions known to capture most of the individual differences.
Abstract: Whenever we listen to a voice for the first time, we attribute personality traits to the speaker. The process takes place in a few seconds and it is spontaneous and unaware. While the process is not necessarily accurate (attributed traits do not necessarily correspond to the actual traits of the speaker), still it significantly influences our behavior toward others, especially when it comes to social interaction. This paper proposes an approach for the automatic prediction of the traits the listeners attribute to a speaker they never heard before. The experiments are performed over a corpus of 640 speech clips (322 identities in total) annotated in terms of personality traits by 11 assessors. The results show that it is possible to predict with high accuracy (more than 70 percent depending on the particular trait) whether a person is perceived to be in the upper or lower part of the scales corresponding to each of the Big -Five, the personality dimensions known to capture most of the individual differences.

170 citations


Journal ArticleDOI
TL;DR: An automated system is developed to distinguish between naturally occurring spontaneous smiles under frustrating and delightful stimuli by exploring their temporal patterns given video of both and extracting local and global features related to human smile dynamics.
Abstract: We create two experimental situations to elicit two affective states: frustration, and delight. In the first experiment, participants were asked to recall situations while expressing either delight or frustration, while the second experiment tried to elicit these states naturally through a frustrating experience and through a delightful video. There were two significant differences in the nature of the acted versus natural occurrences of expressions. First, the acted instances were much easier for the computer to classify. Second, in 90 percent of the acted cases, participants did not smile when frustrated, whereas in 90 percent of the natural cases, participants smiled during the frustrating interaction, despite self-reporting significant frustration with the experience. As a follow up study, we develop an automated system to distinguish between naturally occurring spontaneous smiles under frustrating and delightful stimuli by exploring their temporal patterns given video of both. We extracted local and global features related to human smile dynamics. Next, we evaluated and compared two variants of Support Vector Machine (SVM), Hidden Markov Models (HMM), and Hidden-state Conditional Random Fields (HCRF) for binary classification. While human classification of the smile videos under frustrating stimuli was below chance, an accuracy of 92 percent distinguishing smiles under frustrating and delighted stimuli was obtained using a dynamic SVM classifier.

Journal ArticleDOI
TL;DR: A unique database containing recordings of mild to moderate emotionally colored responses to a series of laboratory-based emotion induction tasks is described, which gives researchers the opportunity to compare expressions from people from more than one culture.
Abstract: For many years psychological research on facial expression of emotion has relied heavily on a recognition paradigm based on posed static photographs There is growing evidence that there may be fundamental differences between the expressions depicted in such stimuli and the emotional expressions present in everyday life Affective computing, with its pragmatic emphasis on realism, needs examples of natural emotion This paper describes a unique database containing recordings of mild to moderate emotionally colored responses to a series of laboratory-based emotion induction tasks The recordings are accompanied by information on self-report of emotion and intensity, continuous trace-style ratings of valence and intensity, the sex of the participant, the sex of the experimenter, the active or passive nature of the induction task, and it gives researchers the opportunity to compare expressions from people from more than one culture

Journal ArticleDOI
TL;DR: The experimental results indicate that incorporating long-term temporal context is beneficial for emotion recognition systems that encounter a variety of emotional manifestations and context-sensitive approaches outperform those without context for classification tasks such as discrimination between valence levels or between clusters in the valence-activation space.
Abstract: Human emotional expression tends to evolve in a structured manner in the sense that certain emotional evolution patterns, i.e., anger to anger, are more probable than others, e.g., anger to happiness. Furthermore, the perception of an emotional display can be affected by recent emotional displays. Therefore, the emotional content of past and future observations could offer relevant temporal context when classifying the emotional content of an observation. In this work, we focus on audio-visual recognition of the emotional content of improvised emotional interactions at the utterance level. We examine context-sensitive schemes for emotion recognition within a multimodal, hierarchical approach: bidirectional Long Short-Term Memory (BLSTM) neural networks, hierarchical Hidden Markov Model classifiers (HMMs), and hybrid HMM/BLSTM classifiers are considered for modeling emotion evolution within an utterance and between utterances over the course of a dialog. Overall, our experimental results indicate that incorporating long-term temporal context is beneficial for emotion recognition systems that encounter a variety of emotional manifestations. Context-sensitive approaches outperform those without context for classification tasks such as discrimination between valence levels or between clusters in the valence-activation space. The analysis of emotional transitions in our database sheds light into the flow of affective expressions, revealing potentially useful patterns.

Journal ArticleDOI
TL;DR: The framework, data collected, and analysis demonstrate an ecologically valid method for unobtrusive evaluation of facial responses to media content that is robust to challenging real-world conditions and requires no explicit recruitment or compensation of participants.
Abstract: We present results validating a novel framework for collecting and analyzing facial responses to media content over the Internet. This system allowed 3,268 trackable face videos to be collected and analyzed in under two months. We characterize the data and present analysis of the smile responses of viewers to three commercials. We compare statistics from this corpus to those from the Cohn-Kanade+ (CK+) and MMI databases and show that distributions of position, scale, pose, movement, and luminance of the facial region are significantly different from those represented in these traditionally used datasets. Next, we analyze the intensity and dynamics of smile responses, and show that there are significantly different facial responses from subgroups who report liking the commercials compared to those that report not liking the commercials. Similarly, we unveil significant differences between groups who were previously familiar with a commercial and those that were not and propose a link to virality. Finally, we present relationships between head movement and facial behavior that were observed within the data. The framework, data collected, and analysis demonstrate an ecologically valid method for unobtrusive evaluation of facial responses to media content that is robust to challenging real-world conditions and requires no explicit recruitment or compensation of participants.

Journal ArticleDOI
TL;DR: It is argued here that principles from machine ethics have a role to play in the design of these machines and that the attainment of erotic wisdom is an ethically sound goal and that it provides more to loving relationships than only satisfying physical desire.
Abstract: This paper will explore the ethical impacts of the use of affective computing by engineers and roboticists who program their machines to mimic and manipulate human emotions in order to evoke loving or amorous reactions from their human users. We will see that it does seem plausible that some people might buy a love machine if it were created, but it is argued here that principles from machine ethics have a role to play in the design of these machines. This is best achieved by applying what is known about the philosophy of love, the ethics of loving relationships, and the philosophical value of the erotic in the early design stage of building robust artificial companions. The paper concludes by proposing certain ethical limits on the manipulation of human psychology when it comes to building sex robots and in the simulation of love in such machines. In addition, the paper argues that the attainment of erotic wisdom is an ethically sound goal and that it provides more to loving relationships than only satisfying physical desire. This fact may limit the possibility of creating a machine that can fulfill all that one should want out of erotic love unless a machine can be built that would help its user attain this kind of love.

Journal ArticleDOI
TL;DR: This study develops and evaluates user-independent and user-dependent physiology-based detectors of nonbasic affective states that were trained and validated on naturalistic data collected during interactions between 27 students and AutoTutor, an intelligent tutoring system with conversational dialogues.
Abstract: Signals from peripheral physiology (e.g., ECG, EMG, and GSR) in conjunction with machine learning techniques can be used for the automatic detection of affective states. The affect detector can be user-independent, where it is expected to generalize to novel users, or user-dependent, where it is tailored to a specific user. Previous studies have reported some success in detecting affect from physiological signals, but much of the work has focused on induced affect or acted expressions instead of contextually constrained spontaneous expressions of affect. This study addresses these issues by developing and evaluating user-independent and user-dependent physiology-based detectors of nonbasic affective states (e.g., boredom, confusion, curiosity) that were trained and validated on naturalistic data collected during interactions between 27 students and AutoTutor, an intelligent tutoring system with conversational dialogues. There is also no consensus on which techniques (i.e., feature selection or classification methods) work best for this type of data. Therefore, this study also evaluates the efficacy of affect detection using a host of feature selection and classification techniques on three physiological signals (ECG, EMG, and GSR) and their combinations. Two feature selection methods and nine classifiers were applied to the problem of recognizing eight affective states (boredom, confusion, curiosity, delight, flow/-engagement, surprise, and neutral). The results indicated that the user-independent modeling approach was not feasible; however, a mean kappa score of 0.25 was obtained for user-dependent models that discriminated among the most frequent emotions. The results also indicated that k-nearest neighbor and Linear Bayes Normal Classifier (LBNC) classifiers yielded the best affect detection rates. Single channel ECG, EMG, and GSR and three-channel multimodal models were generally more diagnostic than two--channel models.

Journal ArticleDOI
TL;DR: A novel framework for quantifying physiological stress at a distance via thermal imaging that associates high stress levels with novice surgeons, while low stress levels are associated with experienced surgeons, raising the possibility for an affective measure (stress) to assist in efficacy determination.
Abstract: In this paper, we present a novel framework for quantifying physiological stress at a distance via thermal imaging. The method captures stress-induced neurophysiological responses on the perinasal area that manifest as transient perspiration. We have developed two algorithms to extract the perspiratory signals from the thermophysiological imagery. One is based on morphology and is computationally efficient, while the other is based on spatial isotropic wavelets and is flexible; both require the support of a reliable facial tracker. We validated the two algorithms against the clinical standard in a controlled lab experiment where orienting responses were invoked on n=18 subjects via auditory stimuli. Then, we used the validated algorithms to quantify stress of surgeons (n=24) as they were performing suturing drills during inanimate laparoscopic training. This is a field application where the new methodology shines. It allows nonobtrusive monitoring of individuals who are naturally challenged with a task that is localized in space and requires directional attention. Both algorithms associate high stress levels with novice surgeons, while low stress levels are associated with experienced surgeons, raising the possibility for an affective measure (stress) to assist in efficacy determination. It is a clear indication of the methodology's promise and potential.

Journal ArticleDOI
TL;DR: EmotiNet, although limited by the domain and small quantity of knowledge it presently contains, represents a semantic resource appropriate for capturing and storing the structure and the semantics of real events and predicting the emotional responses triggered by chains of actions.
Abstract: The task of automatically detecting emotion in text is challenging. This is due to the fact that most of the times, textual expressions of affect are not direct-using emotion words-but result from the interpretation and assessment of the meaning of the concepts and interaction of concepts described in the text. This paper presents the core of EmotiNet, a new knowledge base (KB) for representing and storing affective reaction to real-life contexts, and the methodology employed in designing, populating, and evaluating it. The basis of the design process is given by a set of self-reported affective situations in the International Survey on Emotion Antecedents and Reactions (ISEAR) corpus. We cluster the examples and extract triples using Semantic Roles. We subsequently extend our model using other resources, such as VerbOcean, ConceptNet, and SentiWordNet, with the aim of generalizing the knowledge contained. Finally, we evaluate the approach using the representations of other examples in the ISEAR corpus. We conclude that EmotiNet, although limited by the domain and small quantity of knowledge it presently contains, represents a semantic resource appropriate for capturing and storing the structure and the semantics of real events and predicting the emotional responses triggered by chains of actions.

Journal ArticleDOI
TL;DR: This paper proposes an approach referred to as MoodCast to learn to infer individuals' emotional states and investigates how this person's emotional state influences (or is influenced by) her friends in the social network and verifies the effectiveness of the proposed approach.
Abstract: Marketing strategies without emotion will not work. Emotion stimulates the mind 3,000 times quicker than rational thought. Such emotion invokes either a positive or a negative response and physical expressions. Understanding the underlying dynamics of users' emotions can efficiently help companies formulate marketing strategies and support after-sale services. While prior work has focused mainly on qualitative aspects, in this paper we present our research on quantitative analysis of how an individual's emotional state can be inferred from her historic emotion log and how this person's emotional state influences (or is influenced by) her friends in the social network. We statistically study the dynamics of individual's emotions and discover several interesting as well as important patterns. Based on this discovery, we propose an approach referred to as MoodCast to learn to infer individuals' emotional states. In both mobile-based social network and online virtual network, we verify the effectiveness of our proposed approach.

Journal ArticleDOI
TL;DR: A series of exhaustive experiments are described which demonstrate the feasibility of recognizing human emotional states via integrating low level descriptors via integrating three different methodologies for integrating subsequent feature values.
Abstract: During recent years, the field of emotional content analysis of speech signals has been gaining a lot of attention and several frameworks have been constructed by different researchers for recognition of human emotions in spoken utterances. This paper describes a series of exhaustive experiments which demonstrate the feasibility of recognizing human emotional states via integrating low level descriptors. Our aim is to investigate three different methodologies for integrating subsequent feature values. More specifically, we used the following methods: 1) short-term statistics, 2) spectral moments, and 3) autoregressive models. Additionally, we employed a newly introduced group of parameters which is based on the wavelet decomposition. These are compared with a baseline set comprised of descriptors which are usually used for the specific task. Subsequently, we experimented on fusing these sets on the feature and log-likelihood levels. The classification step is based on hidden Markov models, while several algorithms which can handle redundant information were used during fusion. We report results on the well-known and freely available database BERLIN using data of six emotional states. Our experiments show the importance of including information which is captured by the set based on multiresolution analysis and the efficacy of merging subsequent feature values.

Journal ArticleDOI
TL;DR: Results indicate that an agent performing parallel empathy displaying emotional expressions relevant to the emotional state of the student may cause this emotion to persist and the agent performing concurrent parallel and then reactive empathy appeared to be effective in altering an emotionalState of fear to a neutral one.
Abstract: Empathetic behavior has been suggested to be one effective way for Embodied Conversational Agents (ECAs) to provide feedback to learners' emotions An issue that has been raised is the effective integration of parallel and reactive empathy The aim of this study is to examine the impact of ECAs' emotional facial and tone of voice expressions combined with empathetic verbal behavior when displayed as feedback to students' fear, sad, and happy emotions in the context of a self-assessment test Three identical female agents were used for this experiment: 1) an ECA performing parallel empathy combined with neutral emotional expressions, 2) an ECA performing parallel empathy displaying emotional expressions that were relevant to the emotional state of the student, and 3) an ECA performing parallel empathy by displaying relevant emotional expressions followed by emotional expressions of reactive empathy with the goal of altering the student's emotional state Results indicate that an agent performing parallel empathy displaying emotional expressions relevant to the emotional state of the student may cause this emotion to persist Moreover, the agent performing parallel and then reactive empathy appeared to be effective in altering an emotional state of fear to a neutral one

Journal ArticleDOI
TL;DR: Experimental results confirm that medium-grained meeting behaviors, namely, speaking time and social attention, are effective for the automatic detection of Extraversion.
Abstract: This work investigates the suitability of medium-grained meeting behaviors, namely, speaking time and social attention, for automatic classification of the Extraversion personality trait. Experimental results confirm that these behaviors are indeed effective for the automatic detection of Extraversion. The main findings of our study are that: 1) Speaking time and (some forms of) social gaze are effective indicators of Extraversion, 2) classification accuracy is affected by the amount of time for which meeting behavior is observed, 3) independently considering only the attention received by the target from peers is insufficient, and 4) distribution of social attention of peers plays a crucial role.

Journal ArticleDOI
TL;DR: A real-time affect detector dedicated to video viewing and entertainment applications that combines the acquisition of traditional physiological signals, namely, galvanic skin response, heart rate, and electromyogram, and the use of supervised classification techniques by means of Gaussian processes is proposed.
Abstract: In this paper, we propose a methodology to build a real-time affect detector dedicated to video viewing and entertainment applications. This detector combines the acquisition of traditional physiological signals, namely, galvanic skin response, heart rate, and electromyogram, and the use of supervised classification techniques by means of Gaussian processes. It aims at detecting the emotional impact of a video clip in a new way by first identifying emotional events in the affective stream (fast increase of the subject excitation) and then by giving the associated binary valence (positive or negative) of each detected event. The study was conducted to be as close as possible to realistic conditions by especially minimizing the use of active calibrations and considering on-the-fly detection. Furthermore, the influence of each physiological modality is evaluated through three different key-scenarios (mono-user, multi-user and extended multi-user) that may be relevant for consumer applications. A complete description of the experimental protocol and processing steps is given. The performances of the detector are evaluated on manually labeled sequences, and its robustness is discussed considering the different single and multi-user contexts.

Journal ArticleDOI
TL;DR: Findings imply that - although Scooter is well liked by students and improves student learning outcomes relative to the original tutor - Scooter does not have a large effect on students' affective states or their dynamics.
Abstract: We study the affective states exhibited by students using an intelligent tutoring system for Scatterplots with and without an interactive software agent, Scooter the Tutor. Scooter the Tutor had been previously shown to lead to improved learning outcomes as compared to the same tutoring system without Scooter. We found that affective states and transitions between affective states were very similar among students in both conditions. With the exception of the "neutral state,” no affective state occurred significantly more in one condition over the other. Boredom, confusion, and engaged concentration persisted in both conditions, representing both "virtuous cycles” and "vicious cycles” that did not appear to differ by condition. These findings imply that - although Scooter is well liked by students and improves student learning outcomes relative to the original tutor - Scooter does not have a large effect on students' affective states or their dynamics.

Journal ArticleDOI
TL;DR: The results presented in this paper correspond to the implementation of the decision-making system on an agent whose main goal is to learn from scratch how to behave in order to maximize its well-being by satisfying its drives or needs.
Abstract: In this paper, a new approach to the generation and the role of artificial emotions in the decision-making process of autonomous agents (physical and virtual) is presented. The proposed decision-making system is biologically inspired and it is based on drives, motivations, and emotions. The agent has certain needs or drives that must be within a certain range, and motivations are understood as what moves the agent to satisfy a drive. Considering that the well-being of the agent is a function of its drives, the goal of the agent is to optimize it. Currently, the implemented artificial emotions are happiness, sadness, and fear. The novelties of our approach are, on one hand, that the generation method and the role of each of the artificial emotions are not defined as a whole, as most authors do. Each artificial emotion is treated separately. On the other hand, in the proposed system it is not mandatory to predefine either the situations that must release any artificial emotion or the actions that must be executed in each case. Both the emotional releaser and the actions can be learned by the agent, as happens on some occasions in nature, based on its own experience. In order to test the decision-making process, it has been implemented on virtual agents (software entities) living in a simple virtual environment. The results presented in this paper correspond to the implementation of the decision-making system on an agent whose main goal is to learn from scratch how to behave in order to maximize its well-being by satisfying its drives or needs. The learning process, as shown by the experiments, produces very natural results. The usefulness of the artificial emotions in the decision-making system is proven by making the same experiments with and without artificial emotions, and then comparing the performance of the agent.

Journal ArticleDOI
TL;DR: This work introduces the automatic determination of leadership emergence by acoustic and linguistic features in online speeches and discusses cluster-preserving scaling of 10 original dimensions for discrete and continuous task modeling, ground truth establishment, and appropriate feature extraction for this novel speaker trait analysis paradigm.
Abstract: We introduce the automatic determination of leadership emergence by acoustic and linguistic features in online speeches. Full realism is provided by the varying and challenging acoustic conditions of the presented YouTube corpus of online available speeches labeled by 10 raters and by processing that includes Long Short-Term Memory-based robust voice activity detection (VAD) and automatic speech recognition (ASR) prior to feature extraction. We discuss cluster-preserving scaling of 10 original dimensions for discrete and continuous task modeling, ground truth establishment, and appropriate feature extraction for this novel speaker trait analysis paradigm. In extensive classification and regression runs, different temporal chunkings and optimal late fusion strategies (LFSs) of feature streams are presented. In the result, achievers, charismatic speakers, and teamplayers can be recognized significantly above chance level, reaching up to 72.5 percent accuracy on unseen test data.

Journal ArticleDOI
TL;DR: This paper uses Eysenck's theoretical basis to explain aspects of the characterization of virtual agents, and describes an architecture where personality affects the agent's global behavior quality as well as their back-channel productions.
Abstract: Convincing conversational agents require a coherent set of behavioral responses that can be interpreted by a human observer as indicative of a personality. This paper discusses the continued development and subsequent evaluation of virtual agents based on sound psychological principles. We use Eysenck's theoretical basis to explain aspects of the characterization of our agents, and we describe an architecture where personality affects the agent's global behavior quality as well as their back-channel productions. Drawing on psychological research, we evaluate perception of our agents' personalities and credibility by human viewers (N = 187). Our results suggest that we succeeded in validating theoretically grounded indicators of personality in our virtual agents, and that it is feasible to place our characters on Eysenck's scales. A key finding is that the presence of behavioral characteristics reinforces the prescribed personality profiles that are already emerging from the still images. Our long-term goal is to enhance agents' ability to sustain realistic interaction with human users, and we discuss how this preliminary work may be further developed to include more systematic variation of Eysenck's personality scales.

Journal ArticleDOI
TL;DR: In this article, the authors propose three criteria to judge if an entity is deceptive in emotional communication (good intention, emotional authenticity, and ontological authenticity) which can be regarded as ideal emotional communication conditions that saliently operate as presuppositions in our communications with other entities.
Abstract: A common objection to the use and development of “emotional” robots is that they are deceptive. This intuitive response assumes 1) that these robots intend to deceive, 2) that their emotions are not real, and 3) that they pretend to be a kind of entity they are not. We use these criteria to judge if an entity is deceptive in emotional communication (good intention, emotional authenticity, and ontological authenticity). They can also be regarded as “ideal emotional communication” conditions that saliently operate as presuppositions in our communications with other entities. While the good intention presupposition might be a bias or illusion we really need for sustaining the social life, in the future we may want to dispense with the other conditions in order to facilitate cross-entity communication. What we need instead are not “authentic” but appropriate emotional responses-appropriate to relevant social contexts. Criteria for this cannot be given a priori but must be learned-by humans and by robots. In the future, we may learn to live with “emotional” robots, especially if our values would change. However, contemporary robot designers who want their robots to receive trust from humans had better take into account current concerns about deception and create robots that do not evoke the three-fold deception response.

Journal ArticleDOI
TL;DR: A co-adaptive human-machine interface that is developed to control virtual forearm prosthesis over a long period of operation and achieves better physical performance measures in comparison with the traditional (nonadaptive) interface.
Abstract: The real-time adaptation between human and assistive devices can improve the quality of life for amputees, which, however, may be difficult to achieve since physical and mental states vary over time. This paper presents a co-adaptive human-machine interface (HMI) that is developed to control virtual forearm prosthesis over a long period of operation. Direct physical performance measures for the requested tasks are calculated. Bioelectric signals are recorded using one pair of electrodes placed on the frontal face region of a user to extract the mental (affective) measures (the entropy of the alpha band of the forehead electroencephalography signals) while performing the tasks. By developing an effective algorithm, the proposed HMI can adapt itself to the mental states of a user, thus improving its usability. The quantitative results from 16 users (including an amputee) show that the proposed HMI achieved better physical performance measures in comparison with the traditional (nonadaptive) interface (p-value<;0.001). Furthermore, there is a high correlation (correlation coefficient <; 0.9, p-value <; .01) between the physical performance measures and self-report feedbacks based on the NASA TLX questionnaire. As a result, the proposed adaptive HMI outperformed a traditional HMI.

Journal ArticleDOI
TL;DR: A semantic web usage mining approach for discovering periodic web access patterns from annotated web usage logs which incorporates information on consumer emotions and behaviors through self-reporting and behavioral tracking is proposed.
Abstract: The relationships between consumer emotions and their buying behaviors have been well documented. Technology-savvy consumers often use the web to find information on products and services before they commit to buying. We propose a semantic web usage mining approach for discovering periodic web access patterns from annotated web usage logs which incorporates information on consumer emotions and behaviors through self-reporting and behavioral tracking. We use fuzzy logic to represent real-life temporal concepts (e.g., morning) and requested resource attributes (ontological domain concepts for the requested URLs) of periodic pattern-based web access activities. These fuzzy temporal and resource representations, which contain both behavioral and emotional cues, are incorporated into a Personal Web Usage Lattice that models the user's web access activities. From this, we generate a Personal Web Usage Ontology written in OWL, which enables semantic web applications such as personalized web resources recommendation. Finally, we demonstrate the effectiveness of our approach by presenting experimental results in the context of personalized web resources recommendation with varying degrees of emotional influence. Emotional influence has been found to contribute positively to adaptation in personalized recommendation.