scispace - formally typeset
Search or ask a question
Author

Zhihong Zeng

Bio: Zhihong Zeng is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Affective computing & Facial expression. The author has an hindex of 17, co-authored 24 publications receiving 3635 citations. Previous affiliations of Zhihong Zeng include Chinese Academy of Sciences & University of Houston.

Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors discuss human emotion perception from a psychological perspective, examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data.
Abstract: Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions despite the fact that deliberate behaviour differs in visual appearance, audio profile, and timing from spontaneously occurring behaviour. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behaviour have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis including audiovisual fusion, linguistic and paralinguistic fusion, and multi-cue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next we examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.

2,503 citations

Proceedings ArticleDOI
12 Nov 2007
TL;DR: A survey of the available approaches to solving the problem of machine understanding of human affective behavior occurring in real-world settings can be found in this paper, where the authors discuss human emotion perception from a psychological perspective.
Abstract: Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. Promising approaches have been reported, including automatic methods for facial and vocal affect recognition. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions-despite the fact that deliberate behavior differs in visual and audio expressions from spontaneously occurring behavior. Recently efforts to develop algorithms that can process naturally occurring human affective behavior have emerged. This paper surveys these efforts. We first discuss human emotion perception from a psychological perspective. Next, we examine the available approaches to solving the problem of machine understanding of human affective behavior occurring in real-world settings. We finally outline some scientific and engineering challenges for advancing human affect sensing technology.

215 citations

Journal ArticleDOI
TL;DR: The development of a computing algorithm that uses both audio and visual sensors to detect and track a user's affective state to aid computer decision making is focused on.
Abstract: Advances in computer processing power and emerging algorithms are allowing new ways of envisioning human-computer interaction. Although the benefit of audio-visual fusion is expected for affect recognition from the psychological and engineering perspectives, most of existing approaches to automatic human affect analysis are unimodal: information processed by computer system is limited to either face images or the speech signals. This paper focuses on the development of a computing algorithm that uses both audio and visual sensors to detect and track a user's affective state to aid computer decision making. Using our multistream fused hidden Markov model (MFHMM), we analyzed coupled audio and visual streams to detect four cognitive states (interest, boredom, frustration and puzzlement) and seven prototypical emotions (neural, happiness, sadness, anger, disgust, fear and surprise). The MFHMM allows the building of an optimal connection among multiple streams according to the maximum entropy principle and the maximum mutual information criterion. Person-independent experimental results from 20 subjects in 660 sequences show that the MFHMM approach outperforms face-only HMM, pitch-only HMM, energy-only HMM, and independent HMM fusion, under clean and varying audio channel noise condition.

142 citations

Journal ArticleDOI
TL;DR: A smoothing method is proposed to reduce the detrimental influence of speech on facial expression recognition and the feature selection analysis shows that subjects are prone to use brow movement in face, pitch and energy in prosody to express their affects while speaking.
Abstract: The ability of a computer to detect and appropriately respond to changes in a user's affective state has significant implications to human-computer interaction (HCI). In this paper, we present our efforts toward audio-visual affect recognition on 11 affective states customized for HCI application (four cognitive/motivational and seven basic affective states) of 20 nonactor subjects. A smoothing method is proposed to reduce the detrimental influence of speech on facial expression recognition. The feature selection analysis shows that subjects are prone to use brow movement in face, pitch and energy in prosody to express their affects while speaking. For person-dependent recognition, we apply the voting method to combine the frame-based classification results from both audio and visual channels. The result shows 7.5% improvement over the best unimodal performance. For person-independent test, we apply multistream HMM to combine the information from multiple component streams. This test shows 6.1% improvement over the best component performance

131 citations

Proceedings ArticleDOI
01 Sep 2008
TL;DR: The authors' extensive person-independent experiments suggest that the SIFT descriptor outperforms HoG and LBP, and LPP outperforms PCA and LDA in this application, but the classifier fusion does not show a significant advantage over SIFT-only classifier.
Abstract: The ability to handle multi-view facial expressions is important for computers to understand affective behavior under less constrained environment However, most of existing methods for facial expression recognition are based on the near-frontal view face data, which are likely to fail in the non-frontal facial expression analysis In this paper, we conduct an investigation on analyzing multi-view facial expressions Three local patch descriptors (HoG, LBP, and SIFT) are used to extract facial features, which are the inputs to a nearest-neighbor indexing method that identifies facial expressions We also investigate the influence of feature dimension reductions (PCA, LDA, and LPP) and classifier fusion on the recognition performance We test our approaches on multi-view data generated from BU-3DFE 3D facial expression database that includes 100 subjects with 6 emotions and 4 intensity levels Our extensive person-independent experiments suggest that the SIFT descriptor outperforms HoG and LBP, and LPP outperforms PCA and LDA in this application But the classifier fusion does not show a significant advantage over SIFT-only classifier

111 citations


Cited by
More filters
Proceedings ArticleDOI
13 Jun 2010
TL;DR: The Cohn-Kanade (CK+) database is presented, with baseline results using Active Appearance Models (AAMs) and a linear support vector machine (SVM) classifier using a leave-one-out subject cross-validation for both AU and emotion detection for the posed data.
Abstract: In 2000, the Cohn-Kanade (CK) database was released for the purpose of promoting research into automatically detecting individual facial expressions. Since then, the CK database has become one of the most widely used test-beds for algorithm development and evaluation. During this period, three limitations have become apparent: 1) While AU codes are well validated, emotion labels are not, as they refer to what was requested rather than what was actually performed, 2) The lack of a common performance metric against which to evaluate new algorithms, and 3) Standard protocols for common databases have not emerged. As a consequence, the CK database has been used for both AU and emotion detection (even though labels for the latter have not been validated), comparison with benchmark algorithms is missing, and use of random subsets of the original database makes meta-analyses difficult. To address these and other concerns, we present the Extended Cohn-Kanade (CK+) database. The number of sequences is increased by 22% and the number of subjects by 27%. The target expression for each sequence is fully FACS coded and emotion labels have been revised and validated. In addition to this, non-posed sequences for several types of smiles and their associated metadata have been added. We present baseline results using Active Appearance Models (AAMs) and a linear support vector machine (SVM) classifier using a leave-one-out subject cross-validation for both AU and emotion detection for the posed data. The emotion and AU labels, along with the extended image data and tracked landmarks will be made available July 2010.

3,439 citations

Journal ArticleDOI
TL;DR: A multimodal data set for the analysis of human affective states was presented and a novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool.
Abstract: We present a multimodal data set for the analysis of human affective states. The electroencephalogram (EEG) and peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance, and familiarity. For 22 of the 32 participants, frontal face video was also recorded. A novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool. An extensive analysis of the participants' ratings during the experiment is presented. Correlates between the EEG signal frequencies and the participants' ratings are investigated. Methods and results are presented for single-trial classification of arousal, valence, and like/dislike ratings using the modalities of EEG, peripheral physiological signals, and multimedia content analysis. Finally, decision fusion of the classification results from different modalities is performed. The data set is made publicly available and we encourage other researchers to use it for testing their own affective state estimation methods.

3,013 citations

Journal ArticleDOI
TL;DR: In this paper, the authors discuss human emotion perception from a psychological perspective, examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data.
Abstract: Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions despite the fact that deliberate behaviour differs in visual appearance, audio profile, and timing from spontaneously occurring behaviour. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behaviour have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis including audiovisual fusion, linguistic and paralinguistic fusion, and multi-cue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next we examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.

2,503 citations

Journal ArticleDOI
TL;DR: This paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy to enable researchers to better understand the state of the field and identify directions for future research.
Abstract: Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors Modality refers to the way in which something happens or is experienced and a research problem is characterized as multimodal when it includes multiple such modalities In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able to interpret such multimodal signals together Multimodal machine learning aims to build models that can process and relate information from multiple modalities It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential Instead of focusing on specific multimodal applications, this paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and co-learning This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research

1,945 citations

Journal ArticleDOI
TL;DR: This survey explicitly explores the multidisciplinary foundation that underlies all AC applications by describing how AC researchers have incorporated psychological theories of emotion and how these theories affect research questions, methods, results, and their interpretations.
Abstract: This survey describes recent progress in the field of Affective Computing (AC), with a focus on affect detection. Although many AC researchers have traditionally attempted to remain agnostic to the different emotion theories proposed by psychologists, the affective technologies being developed are rife with theoretical assumptions that impact their effectiveness. Hence, an informed and integrated examination of emotion theories from multiple areas will need to become part of computing practice if truly effective real-world systems are to be achieved. This survey discusses theoretical perspectives that view emotions as expressions, embodiments, outcomes of cognitive appraisal, social constructs, products of neural circuitry, and psychological interpretations of basic feelings. It provides meta-analyses on existing reviews of affect detection systems that focus on traditional affect detection modalities like physiology, face, and voice, and also reviews emerging research on more novel channels such as text, body language, and complex multimodal systems. This survey explicitly explores the multidisciplinary foundation that underlies all AC applications by describing how AC researchers have incorporated psychological theories of emotion and how these theories affect research questions, methods, results, and their interpretations. In this way, models and methods can be compared, and emerging insights from various disciplines can be more expertly integrated.

1,503 citations