Q2. What future works have the authors mentioned in the paper "A review of affective computing: from unimodal analysis to multimodal fusion" ?
One important area of future research is to investigate novel approaches for advancing their understanding of the temporal dependency between utterances, i. e., the effect of utterance at time t on the utterance at time t+1. The progress in text classification research can play a major role in future of the multimodal affect analysis research. Future research should focus on answering this question. The use of deep learning for multimodal fusion can also be an important future work.
Q3. What is the primary advantage of analyzing videos over textual analysis?
The primary advantage of analyzing videos over textual analysis, for detecting emotions and sentiments from opinions, is the surplus of behavioral cues.
Q4. What was the acoustic feature used to generate the feature representation of the entire dataset?
For acoustic features, low-level acoustic features were extracted at frame level on each utterance and used to generate feature representation of the entire dataset, using the OpenSMILE toolkit.
Q5. What are the common unsupervised methods for sentiment analysis?
Whilst machine learning methods, for supervised training of the sentiment analysis system, are predominant in literature, a number of unsupervised methods such as linguistic patterns can also be found.
Q6. What is the main channel for forming an impression of the subject’s present state of mind?
Across the ages of people involved, and the nature of conversations, facial expressions are the primary channel for forming an impression of the subject’s present state of mind.
Q7. What was the effect of the feature adaptation scheme on the emotion recognition system?
The results on uncontrolled recordings (i.e., speech downloaded from a video-sharing website) revealed that the feature adaptation scheme significantly improved the unweighted and weighted accuracies of the emotion recognition system.
Q8. What is the percentage of studies that report visual modality as superior to audio?
In their literature survey, the authors have found more than 90% of studies reported visual modality as superior to audio and other modalities.
Q9. How accurate was the synchronization of the audio and video signals?
To accommodate research in audio-visual fusion, the audio and video signals were synchronized with an accuracy of 25micro-seconds.