The INTERSPEECH 2012 Speaker Trait Challenge
read more
Citations
The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing
The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism
A Survey of Personality Computing
Deep learning for robust feature generation in audiovisual emotion recognition
Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits
References
The WEKA data mining software: an update
The random subspace method for constructing decision forests
Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German
Opensmile: the munich versatile and fast open-source audio feature extractor
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge
Related Papers (5)
Frequently Asked Questions (12)
Q2. What is the parameter selection for the UA recall?
To allow for robust parameter selection, the parameters N and P yielding the best average UA recall across random seeds 1–30 on the development set are selected.
Q3. What is the way to perform the UA recall on the test set?
Of the tasks investigated, the recognition of conscientiousness (80.1 % UA recall on test using RF), extraversion (75.3 %) and intelligibility (68.6 %) can be performed most robustly.
Q4. How did the participants rate the stimuli?
To mitigate effects of fatigue or boredom, each of the 32 participants rated only three out of the six blocks in randomised order with a short break between each block.
Q5. What is the EWE for the training and development sets?
While the Challenge task is classification, the EWE is provided for the training and development sets, and participants are encouraged to present regression results in their contributions.
Q6. What was the purpose of the study?
The participants were instructed to rate the stimuli according to their likability, without taking into account sentence content or transmission quality.
Q7. What is the motivation to consider unweighted average recall rather than weighted average (?
The motivation to consider unweighted average recall rather than weighted average (WA) recall (‘conventional’ accuracy, additionally given for reference) is that it is also meaningful for highly unbalanced distributions of instances among classes, as was given in former Challenges, and for more than two classes.
Q8. What is the difference between the DET and the Receiver Operating Characteristic?
In related disciplines of spoken language technology, evaluation often makes use of the Detection Error Trade-off (DET, False Negative Rate vs. False Positive Rate) curve, which is an alternative to the Receiver Operating Characteristic (ROC, True Positive Rate vs. False Positive Rate).
Q9. How many participants were rated on the likability of the data?
Likability ratings of the data were established by presentingthe stimuli to 32 participants (17 male, 15 female, aged 20–42, mean=28.6, standard deviation=5.4).
Q10. What was the likability of the speakers?
Recordings and evaluations in the corpus were made before and after CCRT: before CCRT (T0; 54 speakers), 10-weeks after CCRT (T1; 48 speakers) and 12-months after CCRT (T3; 39 speakers).
Q11. How was the EWE calculated and discretised?
In accordance with the Likability Sub-Challenge, the EWE was calculated and discretised into binary class labels (intelligible, non-intelligible), dividing at the median of the distribution.
Q12. What is the way to evaluate the UA recall of the sub-Challenges?
The authors have provided a baseline using a rather ‘brute force’ feature extraction and classification approach for the sake of consistency across the Sub-Challenges; particularly, for the Pathology Sub-Challenge, no information on the phonetic content is used or assessed in the baseline.