Home
/
Authors
/
Zhihong Zeng

Author

Zhihong Zeng

University of Illinois at Urbana–Champaign

Other affiliations: Chinese Academy of Sciences, University of Houston

Bio: Zhihong Zeng is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Affective computing & Facial expression. The author has an hindex of 17, co-authored 24 publications receiving 3635 citations. Previous affiliations of Zhihong Zeng include Chinese Academy of Sciences & University of Houston.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

[...]

Zhihong Zeng¹, Maja Pantic², Glenn I. Roisman¹, Thomas S. Huang¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, Imperial College London²

01 Jan 2009-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this paper, the authors discuss human emotion perception from a psychological perspective, examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data.

...read moreread less

Abstract: Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions despite the fact that deliberate behaviour differs in visual appearance, audio profile, and timing from spontaneously occurring behaviour. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behaviour have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis including audiovisual fusion, linguistic and paralinguistic fusion, and multi-cue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next we examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.

...read moreread less

2,503 citations

Proceedings Article•DOI•

A survey of affect recognition methods: audio, visual and spontaneous expressions

[...]

Zhihong Zeng¹, Maja Pantic², Glenn I. Roisman¹, Thomas S. Huang¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, Imperial College London²

12 Nov 2007

TL;DR: A survey of the available approaches to solving the problem of machine understanding of human affective behavior occurring in real-world settings can be found in this paper, where the authors discuss human emotion perception from a psychological perspective.

...read moreread less

Abstract: Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. Promising approaches have been reported, including automatic methods for facial and vocal affect recognition. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions-despite the fact that deliberate behavior differs in visual and audio expressions from spontaneously occurring behavior. Recently efforts to develop algorithms that can process naturally occurring human affective behavior have emerged. This paper surveys these efforts. We first discuss human emotion perception from a psychological perspective. Next, we examine the available approaches to solving the problem of machine understanding of human affective behavior occurring in real-world settings. We finally outline some scientific and engineering challenges for advancing human affect sensing technology.

...read moreread less

215 citations

Journal Article•DOI•

Audio–Visual Affective Expression Recognition Through Multistream Fused HMM

[...]

Zhihong Zeng¹, Jilin Tu¹, B. Pianfetti¹, Thomas S. Huang¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jun 2008-IEEE Transactions on Multimedia

TL;DR: The development of a computing algorithm that uses both audio and visual sensors to detect and track a user's affective state to aid computer decision making is focused on.

...read moreread less

Abstract: Advances in computer processing power and emerging algorithms are allowing new ways of envisioning human-computer interaction. Although the benefit of audio-visual fusion is expected for affect recognition from the psychological and engineering perspectives, most of existing approaches to automatic human affect analysis are unimodal: information processed by computer system is limited to either face images or the speech signals. This paper focuses on the development of a computing algorithm that uses both audio and visual sensors to detect and track a user's affective state to aid computer decision making. Using our multistream fused hidden Markov model (MFHMM), we analyzed coupled audio and visual streams to detect four cognitive states (interest, boredom, frustration and puzzlement) and seven prototypical emotions (neural, happiness, sadness, anger, disgust, fear and surprise). The MFHMM allows the building of an optimal connection among multiple streams according to the maximum entropy principle and the maximum mutual information criterion. Person-independent experimental results from 20 subjects in 660 sequences show that the MFHMM approach outperforms face-only HMM, pitch-only HMM, energy-only HMM, and independent HMM fusion, under clean and varying audio channel noise condition.

...read moreread less

142 citations

Journal Article•DOI•

Audio-Visual Affect Recognition

[...]

Zhihong Zeng¹, Jilin Tu¹, Ming Liu¹, Thomas S. Huang¹, B. Pianfetti¹, Dan Roth¹, Stephen E. Levinson¹ - Show less +3 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Feb 2007-IEEE Transactions on Multimedia

TL;DR: A smoothing method is proposed to reduce the detrimental influence of speech on facial expression recognition and the feature selection analysis shows that subjects are prone to use brow movement in face, pitch and energy in prosody to express their affects while speaking.

...read moreread less

Abstract: The ability of a computer to detect and appropriately respond to changes in a user's affective state has significant implications to human-computer interaction (HCI). In this paper, we present our efforts toward audio-visual affect recognition on 11 affective states customized for HCI application (four cognitive/motivational and seven basic affective states) of 20 nonactor subjects. A smoothing method is proposed to reduce the detrimental influence of speech on facial expression recognition. The feature selection analysis shows that subjects are prone to use brow movement in face, pitch and energy in prosody to express their affects while speaking. For person-dependent recognition, we apply the voting method to combine the frame-based classification results from both audio and visual channels. The result shows 7.5% improvement over the best unimodal performance. For person-independent test, we apply multistream HMM to combine the information from multiple component streams. This test shows 6.1% improvement over the best component performance

...read moreread less

131 citations

Proceedings Article•DOI•

Multi-view facial expression recognition

[...]

Yuxiao Hu¹, Zhihong Zeng¹, Lijun Yin², Xiaozhou Wei², Xi Zhou¹, Thomas S. Huang¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Binghamton University²

01 Sep 2008

TL;DR: The authors' extensive person-independent experiments suggest that the SIFT descriptor outperforms HoG and LBP, and LPP outperforms PCA and LDA in this application, but the classifier fusion does not show a significant advantage over SIFT-only classifier.

...read moreread less

Abstract: The ability to handle multi-view facial expressions is important for computers to understand affective behavior under less constrained environment However, most of existing methods for facial expression recognition are based on the near-frontal view face data, which are likely to fail in the non-frontal facial expression analysis In this paper, we conduct an investigation on analyzing multi-view facial expressions Three local patch descriptors (HoG, LBP, and SIFT) are used to extract facial features, which are the inputs to a nearest-neighbor indexing method that identifies facial expressions We also investigate the influence of feature dimension reductions (PCA, LDA, and LPP) and classifier fusion on the recognition performance We test our approaches on multi-view data generated from BU-3DFE 3D facial expression database that includes 100 subjects with 6 emotions and 4 intensity levels Our extensive person-independent experiments suggest that the SIFT descriptor outperforms HoG and LBP, and LPP outperforms PCA and LDA in this application But the classifier fusion does not show a significant advantage over SIFT-only classifier

...read moreread less

111 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression

[...]

Patrick Lucey¹, Jeffrey F. Cohn¹, Takeo Kanade¹, Jason Saragih¹, Zara Ambadar², Iain Matthews³ - Show less +2 more•Institutions (3)

Carnegie Mellon University¹, University of Pittsburgh², Disney Research³

13 Jun 2010

TL;DR: The Cohn-Kanade (CK+) database is presented, with baseline results using Active Appearance Models (AAMs) and a linear support vector machine (SVM) classifier using a leave-one-out subject cross-validation for both AU and emotion detection for the posed data.

...read moreread less

Abstract: In 2000, the Cohn-Kanade (CK) database was released for the purpose of promoting research into automatically detecting individual facial expressions. Since then, the CK database has become one of the most widely used test-beds for algorithm development and evaluation. During this period, three limitations have become apparent: 1) While AU codes are well validated, emotion labels are not, as they refer to what was requested rather than what was actually performed, 2) The lack of a common performance metric against which to evaluate new algorithms, and 3) Standard protocols for common databases have not emerged. As a consequence, the CK database has been used for both AU and emotion detection (even though labels for the latter have not been validated), comparison with benchmark algorithms is missing, and use of random subsets of the original database makes meta-analyses difficult. To address these and other concerns, we present the Extended Cohn-Kanade (CK+) database. The number of sequences is increased by 22% and the number of subjects by 27%. The target expression for each sequence is fully FACS coded and emotion labels have been revised and validated. In addition to this, non-posed sequences for several types of smiles and their associated metadata have been added. We present baseline results using Active Appearance Models (AAMs) and a linear support vector machine (SVM) classifier using a leave-one-out subject cross-validation for both AU and emotion detection for the posed data. The emotion and AU labels, along with the extended image data and tracked landmarks will be made available July 2010.

...read moreread less

3,439 citations

Journal Article•DOI•

DEAP: A Database for Emotion Analysis ;Using Physiological Signals

[...]

Sander Koelstra¹, Christian Mühl², Mohammad Soleymani³, Jong-Seok Lee⁴, Ashkan Yazdani⁵, Touradj Ebrahimi⁵, Thierry Pun³, Anton Nijholt², Ioannis Patras¹ - Show less +5 more•Institutions (5)

Queen Mary University of London¹, University of Twente², University of Geneva³, Yonsei University⁴, École Normale Supérieure⁵

01 Jan 2012-IEEE Transactions on Affective Computing

TL;DR: A multimodal data set for the analysis of human affective states was presented and a novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool.

...read moreread less

Abstract: We present a multimodal data set for the analysis of human affective states. The electroencephalogram (EEG) and peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance, and familiarity. For 22 of the 32 participants, frontal face video was also recorded. A novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool. An extensive analysis of the participants' ratings during the experiment is presented. Correlates between the EEG signal frequencies and the participants' ratings are investigated. Methods and results are presented for single-trial classification of arousal, valence, and like/dislike ratings using the modalities of EEG, peripheral physiological signals, and multimedia content analysis. Finally, decision fusion of the classification results from different modalities is performed. The data set is made publicly available and we encourage other researchers to use it for testing their own affective state estimation methods.

...read moreread less

3,013 citations

Journal Article•DOI•

A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

[...]

Zhihong Zeng¹, Maja Pantic², Glenn I. Roisman¹, Thomas S. Huang¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, Imperial College London²

01 Jan 2009-IEEE Transactions on Pattern Analysis and Machine Intelligence

...read moreread less

2,503 citations

Journal Article•DOI•

Multimodal Machine Learning: A Survey and Taxonomy

[...]

Tadas Baltrusaitis¹, Chaitanya Ahuja², Louis-Philippe Morency²•Institutions (2)

Microsoft¹, Carnegie Mellon University²

01 Feb 2019-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy to enable researchers to better understand the state of the field and identify directions for future research.

...read moreread less

Abstract: Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors Modality refers to the way in which something happens or is experienced and a research problem is characterized as multimodal when it includes multiple such modalities In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able to interpret such multimodal signals together Multimodal machine learning aims to build models that can process and relate information from multiple modalities It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential Instead of focusing on specific multimodal applications, this paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and co-learning This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research

...read moreread less

1,945 citations

Journal Article•DOI•

Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications

[...]

Rafael A. Calvo¹, Sidney K. D'Mello²•Institutions (2)

University of Sydney¹, University of Memphis²

01 Jan 2010-IEEE Transactions on Affective Computing

TL;DR: This survey explicitly explores the multidisciplinary foundation that underlies all AC applications by describing how AC researchers have incorporated psychological theories of emotion and how these theories affect research questions, methods, results, and their interpretations.

...read moreread less

Abstract: This survey describes recent progress in the field of Affective Computing (AC), with a focus on affect detection. Although many AC researchers have traditionally attempted to remain agnostic to the different emotion theories proposed by psychologists, the affective technologies being developed are rife with theoretical assumptions that impact their effectiveness. Hence, an informed and integrated examination of emotion theories from multiple areas will need to become part of computing practice if truly effective real-world systems are to be achieved. This survey discusses theoretical perspectives that view emotions as expressions, embodiments, outcomes of cognitive appraisal, social constructs, products of neural circuitry, and psychological interpretations of basic feelings. It provides meta-analyses on existing reviews of affect detection systems that focus on traditional affect detection modalities like physiology, face, and voice, and also reviews emerging research on more novel channels such as text, body language, and complex multimodal systems. This survey explicitly explores the multidisciplinary foundation that underlies all AC applications by describing how AC researchers have incorporated psychological theories of emotion and how these theories affect research questions, methods, results, and their interpretations. In this way, models and methods can be compared, and emerging insights from various disciplines can be more expertly integrated.

...read moreread less

1,503 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse