The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism
read more
Citations
Recent developments in openSMILE, the munich open-source multimedia feature extractor
The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing
Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network
End-to-End Multimodal Emotion Recognition Using Deep Neural Networks
AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge
References
Data Mining: Practical Machine Learning Tools and Techniques
The WEKA data mining software: an update
Opensmile: the munich versatile and fast open-source audio feature extractor
The HTK book
Social Signal Processing
Related Papers (5)
Frequently Asked Questions (16)
Q2. What are the frame-wise MFCCs used in the speech recognition task?
In particular, frame-wise MFCCs 1–12 and logarithmic energy are computed along with their first and second order delta (∆) regression coefficients as typically used in speech recognition.
Q3. What is the simplest way to calculate a speech quality feature set?
Taking into account space and memory requirements, only a small set of descriptors are calculated per frame, following a sliding window scheme to combine frame-wise LLDs and functionals.
Q4. What is the way to reduce speaker dependence?
Participants are encouraged to use the manual speaker segmentation for development of features extraction, but for the test set an automatic speaker diarisation system has to be used.
Q5. What is the primary evaluation measure of the Challenge?
As primary evaluation measure, the authors retain the choice of unweighted average recall as used since the first Challenge held in 2009 [1].
Q6. How many instances of the multi-class classification task are there?
Since six of the 18 emotional categories are extremely sparse (≤ 30 instances in the entire GEMEP database), the authors restrict the evaluation to the 12 most frequent ones in the multi-class classification task.
Q7. How many instances of the multi-class classification task are there?
Since six of the 18 emotional categories are extremely sparse (≤ 30 instances in the entire GEMEP database), the authors restrict the evaluation to the 12 most frequent ones in the multi-class classification task.
Q8. What is the official competition measure of the Social Signals Sub-Challenge?
Given the nature of the Social Signals Sub-Challenge’s detection task, the authors also consider the Area Under the Curve measure [28] for the laughter and filler classes on frame level (100 frames per second); the unweighted average (UAAUC) is the official competition measure of this Sub-Challenge.
Q9. How many clips are in the SSPNet Conflict Corpus?
It contains 1 430 clips of 30 seconds extracted from the Canal9 Corpus – a collection of 45 Swiss political debates (in French) – including 138 subjects in total: 23 females (1 moderator and 22 participants) and 133 males (3 moderators and 120 participants).
Q10. What is the motivation to consider unweighted rather than weighted average recall?
The motivation to consider unweighted rather than weighted average recall (‘conventional’ accuracy) is that it is also meaningful for highly unbalanced distributions of instances among classes, as is given in the Autism Sub-Challenge.
Q11. How many times did the participants participate in the challenge?
Since all participants except the moderators do not occur more than a few times (most of them only once), the following strategy was followed to reduce speaker dependence to a minimum.
Q12. What is the purpose of the SVC?
The SVC will serve to evaluate features and algorithms for the determination and localisation of speakers’ social signals in speech.
Q13. What is the official competition measure for the Diagnosis task?
For all four challenges, the official competition measure is UAAUC and UAR, respectively; these are given in boldface in Table 5.
Q14. How many instances of emotional speech from ten professional actors?
It contains 1.2 k instances of emotional speech from ten professional actors (five1http://lium3.univ-lemans.fr/diarization/doku.php/welcome
Q15. What was the purpose of the experiment?
The two participants had to identify objects (out of a predefined list) that increase the chances of survival in a polar environment.
Q16. What was the purpose of the experiment?
The two participants had to identify objects (out of a predefined list) that increase the chances of survival in a polar environment.