A review of affective computing

Question

Q1. What contributions have the authors mentioned in the paper "A review of affective computing: from unimodal analysis to multimodal fusion" ?

Q2. What future works have the authors mentioned in the paper "A review of affective computing: from unimodal analysis to multimodal fusion" ?

Q3. What is the primary advantage of analyzing videos over textual analysis?

Q4. What was the acoustic feature used to generate the feature representation of the entire dataset?

Q5. What are the common unsupervised methods for sentiment analysis?

Q6. What is the main channel for forming an impression of the subject’s present state of mind?

Q7. What was the effect of the feature adaptation scheme on the emotion recognition system?

Q8. What is the percentage of studies that report visual modality as superior to audio?

Q9. How accurate was the synchronization of the audio and video signals?

Accepted Answer

This is the primary motivation behind their first of its kind, comprehensive literature review of the diverse field of affective computing. Furthermore, existing literature surveys lack a detailed discussion of state of the art in multimodal affect analysis frameworks, which this review aims to address. In this paper, the authors focus mainly on the use of audio, visual and text information for multimodal affect analysis, since around 90 % of the relevant literature appears to cover these three modalities. As part of this review, the authors carry out an extensive study of different categories of state-of-the-art fusion techniques, followed by a critical analysis of potential performance improvements with multimodal analysis compared to unimodal analysis. A comprehensive overview of these two complementary fields aims to form the building blocks for readers, to better understand this challenging and exciting research field.

Accepted Answer

One important area of future research is to investigate novel approaches for advancing their understanding of the temporal dependency between utterances, i. e., the effect of utterance at time t on the utterance at time t+1. The progress in text classification research can play a major role in future of the multimodal affect analysis research. Future research should focus on answering this question. The use of deep learning for multimodal fusion can also be an important future work.

Accepted Answer

The primary advantage of analyzing videos over textual analysis, for detecting emotions and sentiments from opinions, is the surplus of behavioral cues.

Accepted Answer

For acoustic features, low-level acoustic features were extracted at frame level on each utterance and used to generate feature representation of the entire dataset, using the OpenSMILE toolkit.

Accepted Answer

Whilst machine learning methods, for supervised training of the sentiment analysis system, are predominant in literature, a number of unsupervised methods such as linguistic patterns can also be found.

Accepted Answer

Across the ages of people involved, and the nature of conversations, facial expressions are the primary channel for forming an impression of the subject’s present state of mind.

Accepted Answer

The results on uncontrolled recordings (i.e., speech downloaded from a video-sharing website) revealed that the feature adaptation scheme significantly improved the unweighted and weighted accuracies of the emotion recognition system.

Accepted Answer

In their literature survey, the authors have found more than 90% of studies reported visual modality as superior to audio and other modalities.

Accepted Answer

To accommodate research in audio-visual fusion, the audio and video signals were synchronized with an accuracy of 25micro-seconds.

A review of affective computing

Figures

Citations

Sensing, Measuring, and Modeling Social Signals in Nonverbal Communication

Multimodal Embeddings from Language Models

Emotion Recognition and Understanding Using EEG Data in A Brain-Inspired Spiking Neural Network Architecture

Modeling Feedback in Interaction With Conversational Agents—A Review

Multimodal Speaker Adaptation of Acoustic Model and Language Model for Asr Using Speaker Face Embedding

References

ImageNet Classification with Deep Convolutional Neural Networks

Efficient Estimation of Word Representations in Vector Space

A fast learning algorithm for deep belief nets

Convolutional Neural Networks for Sentence Classification

The Expression of the Emotions in Man and Animals

Related Papers (5)

Glove: Global Vectors for Word Representation

Long short-term memory

Affective Computing and Sentiment Analysis

A circumplex model of affect

IEMOCAP: interactive emotional dyadic motion capture database

Frequently Asked Questions (9)