Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models
read more
Citations
AVEC 2017: Real-life Depression, and Affect Recognition Workshop and Challenge
Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition
Prominence features: Effective emotional features for speech emotion recognition
End-to-end learning for dimensional emotion recognition from physiological signals
Speech emotion recognition research: an analysis of research focus
References
LIBSVM: A library for support vector machines
Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy
Feature selection based on mutual information: criteria ofmax-dependency, max-relevance, and min-redundancy
A concordance correlation coefficient to evaluate reproducibility.
A review of feature selection techniques in bioinformatics
Related Papers (5)
Frequently Asked Questions (14)
Q2. What future works have the authors mentioned in the paper "Continuous estimation of emotions in speech by dynamic cooperative speaker models" ?
Web-based applications could offer the possibility to everyone to upload to the cloud his/her speech sequence along with the corresponding annotation. Finally, the introduction of the QBTD paradigm suggests future developments based on modular architecture in which each SSRM is trained and optimised on each quadrant and then merged using a cooperative rule based on different machine learning scenarios and other databases of emotional speech.
Q3. What is the main disadvantage of the automatic recognition of spontaneous emotions?
Whereas the use of all annotation data can help at preserving diversity in emotion perception, e. g., by using multi-task learning of each annotator [25], [26], it has the main disadvantage to increase the overall complexity of the model according to the number of available raters.
Q4. What is the common approach in the literature to use all the emotion variability found in the data as?
The common approach in the literature is to use all the emotion variability found in the data as training material and tune the machine learning system in order to disregard the less relevant instances (e. g., by optimising the number of support vectors and the soft margin in Support Vector Regression (SVR)) for emotion prediction [19], [30], [31], [32].
Q5. What is the new strategy for combining speech sequences?
the addition to the cooperation of speech sequences of new speakers is now expected using the single speaker model construction, as well as the inclusion of additional affective contents of the same speaker by single speaker model relearning.
Q6. What is the frequency of inclusion of a speaker in the cooperation strategy?
Since their system dynamically adapts the ensemble of SSRM used in the cooperation strategy to perform emotion prediction, the authors have analysed the frequency of inclusion (i. e., the number of times the SSRM of a speaker is included in the cooperation over the number of observation windows) of each speaker in the model.
Q7. What is the significance of the synchronization procedure?
The statistical significance of the improvementsobtained with the inclusion of the synchronization procedure is verified by a paired t-test for both arousal and valence; the authors obtained p < 0.001 for both the experiments, demonstrating the importance of the synchronization procedure for constructing the SSRMs that cooperate in the CRM.
Q8. What is the advantage of the proposed architecture for mobile applications?
For this reason, the proposed architecture is perfectly suitable for mobile applications, thanks to the easiness and flexibility to develop single models separately trained on distinct speech sequences with different emotional contents.
Q9. What is the significance of the improvements obtained with the QBTD procedure?
9.The statistical significance of the improvements obtained with the QBTD procedure over the global optimisation (ALL), is verified with a paired t-test for both arousal and valence; the authors obtained p < 0.0011949-3045 (c) 2015 IEEE.
Q10. How can arousalbe be added to the cooperative system?
Arousalbe added to the cooperative system simply by training a new SSRM using the speech sequence along with the relative annotations for the new speaker.
Q11. What is the significance of the QBTD procedure for the construction of the SSRM?
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.and p = 0.027 for arousal and valence respectively, demonstrating the importance of the QBTD procedure for constructing the SSRM.
Q12. What other strategies are used to quantify the performance of the proposed method?
To further quantify the performance of the proposed method (i.e., SSRM combined with CRM) with respect to standard regression approaches, the authors also implemented two other emotion recognition strategies.
Q13. What are the challenges of the automatic recognition of naturalistic emotion from time-continuous labels?
Automatic recognition of naturalistic emotion from time-continuous labels presents however several challenges that are not yet solved [9], such as the definition of a reliable goldstandard from a pool of raters and the issue of data scarcity in training models.
Q14. What is the main question to be solved?
Regarding the issue of data scarcity, the main question to be solved is how to deal with the huge diversity found in a collection of spontaneous displays of emotion.