scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Emotion recognition in human-computer interaction

TL;DR: Basic issues in signal processing and analysis techniques for consolidating psychological and linguistic analyses of emotion are examined, motivated by the PKYSTA project, which aims to develop a hybrid system capable of using information from faces and voices to recognize people's emotions.
Abstract: Two channels have been distinguished in human interaction: one transmits explicit messages, which may be about anything or nothing; the other transmits implicit messages about the speakers themselves. Both linguistics and technology have invested enormous efforts in understanding the first, explicit channel, but the second is not as well understood. Understanding the other party's emotions is one of the key tasks associated with the second, implicit channel. To tackle that task, signal processing and analysis techniques have to be developed, while, at the same time, consolidating psychological and linguistic analyses of emotion. This article examines basic issues in those areas. It is motivated by the PKYSTA project, in which we aim to develop a hybrid system capable of using information from faces and voices to recognize people's emotions.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors discuss human emotion perception from a psychological perspective, examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data.
Abstract: Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions despite the fact that deliberate behaviour differs in visual appearance, audio profile, and timing from spontaneously occurring behaviour. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behaviour have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis including audiovisual fusion, linguistic and paralinguistic fusion, and multi-cue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next we examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.

2,503 citations

Journal ArticleDOI
05 Nov 2008
TL;DR: A new corpus named the “interactive emotional dyadic motion capture database” (IEMOCAP), collected by the Speech Analysis and Interpretation Laboratory at the University of Southern California (USC), which provides detailed information about their facial expressions and hand movements during scripted and spontaneous spoken communication scenarios.
Abstract: Since emotions are expressed through a combination of verbal and non-verbal channels, a joint analysis of speech and gestures is required to understand expressive human communication. To facilitate such investigations, this paper describes a new corpus named the “interactive emotional dyadic motion capture database” (IEMOCAP), collected by the Speech Analysis and Interpretation Laboratory (SAIL) at the University of Southern California (USC). This database was recorded from ten actors in dyadic sessions with markers on the face, head, and hands, which provide detailed information about their facial expressions and hand movements during scripted and spontaneous spoken communication scenarios. The actors performed selected emotional scripts and also improvised hypothetical scenarios designed to elicit specific types of emotions (happiness, anger, sadness, frustration and neutral state). The corpus contains approximately 12 h of data. The detailed motion capture information, the interactive setting to elicit authentic emotions, and the size of the database make this corpus a valuable addition to the existing databases in the community for the study and modeling of multimodal and expressive human communication.

2,359 citations


Cites methods from "Emotion recognition in human-comput..."

  • ...Different methodologies and annotation schemes have been proposed to capture the emotional content of databases (i.e., Feeltrace tool ( Cowie et al. 2001 ), Context Annotation Scheme (MECAS) (Devillers et al. 2005))....

    [...]

  • ...Different methodologies and annotation schemes have been proposed to capture the emotional content of databases (i.e., Feeltrace tool (Cowie et al. 2001), Context Annotation Scheme (MECAS) (Devillers et al. 2005))....

    [...]

Journal ArticleDOI
TL;DR: A survey of speech emotion classification addressing three important aspects of the design of a speech emotion recognition system, the choice of suitable features for speech representation, and the proper preparation of an emotional speech database for evaluating system performance are addressed.

1,735 citations

Journal ArticleDOI
TL;DR: This survey explicitly explores the multidisciplinary foundation that underlies all AC applications by describing how AC researchers have incorporated psychological theories of emotion and how these theories affect research questions, methods, results, and their interpretations.
Abstract: This survey describes recent progress in the field of Affective Computing (AC), with a focus on affect detection. Although many AC researchers have traditionally attempted to remain agnostic to the different emotion theories proposed by psychologists, the affective technologies being developed are rife with theoretical assumptions that impact their effectiveness. Hence, an informed and integrated examination of emotion theories from multiple areas will need to become part of computing practice if truly effective real-world systems are to be achieved. This survey discusses theoretical perspectives that view emotions as expressions, embodiments, outcomes of cognitive appraisal, social constructs, products of neural circuitry, and psychological interpretations of basic feelings. It provides meta-analyses on existing reviews of affect detection systems that focus on traditional affect detection modalities like physiology, face, and voice, and also reviews emerging research on more novel channels such as text, body language, and complex multimodal systems. This survey explicitly explores the multidisciplinary foundation that underlies all AC applications by describing how AC researchers have incorporated psychological theories of emotion and how these theories affect research questions, methods, results, and their interpretations. In this way, models and methods can be compared, and emerging insights from various disciplines can be more expertly integrated.

1,503 citations


Cites background from "Emotion recognition in human-comput..."

  • ...During most of the last century, research on emotions was conducted by philosophers and psychologists, whose work was based on a small set of “emotion theories” that continue to underpin research in this area....

    [...]

Journal ArticleDOI
TL;DR: A review of 104 studies of vocal expression and 41 studies of music performance reveals similarities between the two channels concerning (a) the accuracy with which discrete emotions were communicated to listeners and (b) the emotion-specific patterns of acoustic cues used to communicate each emotion as mentioned in this paper.
Abstract: Many authors have speculated about a close relationship between vocal expression of emotions and musical expression of emotions. but evidence bearing on this relationship has unfortunately been lacking. This review of 104 studies of vocal expression and 41 studies of music performance reveals similarities between the 2 channels concerning (a) the accuracy with which discrete emotions were communicated to listeners and (b) the emotion-specific patterns of acoustic cues used to communicate each emotion. The patterns are generally consistent with K. R. Scherer's (1986) theoretical predictions. The results can explain why music is perceived as expressive of emotion, and they are consistent with an evolutionary perspective on vocal expression of emotions. Discussion focuses on theoretical accounts and directions for future research.

1,474 citations

References
More filters
Proceedings Article
24 Aug 1981
TL;DR: In this paper, the spatial intensity gradient of the images is used to find a good match using a type of Newton-Raphson iteration, which can be generalized to handle rotation, scaling and shearing.
Abstract: Image registration finds a variety of applications in computer vision. Unfortunately, traditional image registration techniques tend to be costly. We present a new image registration technique that makes use of the spatial intensity gradient of the images to find a good match using a type of Newton-Raphson iteration. Our technique is taster because it examines far fewer potential matches between the images than existing techniques Furthermore, this registration technique can be generalized to handle rotation, scaling and shearing. We show how our technique can be adapted tor use in a stereo vision system.

12,944 citations

Journal ArticleDOI
TL;DR: In this paper, a method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image, and an iterative implementation is shown which successfully computes the Optical Flow for a number of synthetic image sequences.

10,727 citations

Proceedings ArticleDOI
12 Nov 1981
TL;DR: In this article, a method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image, and an iterative implementation is shown which successfully computes the Optical Flow for a number of synthetic image sequences.
Abstract: Optical flow cannot be computed locally, since only one independent measurement is available from the image sequence at a point, while the flow velocity has two components. A second constraint is needed. A method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image. An iterative implementation is shown which successfully computes the optical flow for a number of synthetic image sequences. The algorithm is robust in that it can handle image sequences that are quantized rather coarsely in space and time. It is also insensitive to quantization of brightness levels and additive noise. Examples are included where the assumption of smoothness is violated at singular points or along lines in the image.

8,078 citations

Journal ArticleDOI
TL;DR: Findings on decoding replicate earlier findings on the ability of judges to infer vocally expressed emotions with much-better-than-chance accuracy, including consistently found differences in the recognizability of different emotions.
Abstract: Professional actors' portrayals of 14 emotions varying in intensity and valence were presented to judges. The results on decoding replicate earlier findings on the ability of judges to infer vocally expressed emotions with much-better-than-chance accuracy, including consistently found differences in the recognizability of different emotions. A total of 224 portrayals were subjected to digital acoustic analysis to obtain profiles of vocal parameters for different emotions. The data suggest that vocal parameters not only index the degree of intensity typical for different emotions but also differentiate valence or quality aspects. The data are also used to test theoretical predictions on vocal patterning based on the component process model of emotion (K.R. Scherer, 1986). Although most hypotheses are supported, some need to be revised on the basis of the empirical evidence. Discriminant analysis and jackknifing show remarkably high hit rates and patterns of confusion that closely mirror those found for listener-judges.

1,862 citations

Journal ArticleDOI
TL;DR: In this paper, a theory is proposed that emotions are cognitively based states which co-ordinate quasi-autonomous processes in the nervous system, and that complex emotions are derived from a small number of basic emotions and arise at junctures of social plans.
Abstract: A theory is proposed that emotions are cognitively based states which co-ordinate quasi-autonomous processes in the nervous system. Emotions provide a biological solution to certain problems of transition between plans, in systems with multiple goals. Their function is to accomplish and maintain these transitions, and to communicate them to ourselves and others. Transitions occur at significant junctures of plans when the evaluation of success in a plan changes. Complex emotions are derived from a small number of basic emotions and arise at junctures of social plans.

1,728 citations