AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge
read more
Citations
AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild
Deep Facial Expression Recognition: A Survey
End-to-End Multimodal Emotion Recognition Using Deep Neural Networks
AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild
Facial Expression Analysis under Partial Occlusion: A Survey
References
Coefficient alpha and the internal structure of tests.
Scikit-learn: Machine Learning in Python
Intraclass correlations: uses in assessing rater reliability.
The WEKA data mining software: an update
Development of a Rating Scale for Primary Depressive Illness
Related Papers (5)
Frequently Asked Questions (12)
Q2. How did the model achieve the performance?
The best value of complexity, window size, time delay, and standardisation method were obtained by maximising the performance - measured as CCC - on the development partition with the model, learned on the training partition.
Q3. What is the way to fit the model?
In particular, the authors fit a linear support vector machine with stochastic gradient descent, i. e. the loss is computed one sample at a time and the model is sequentially updated.
Q4. What is the method for calculating the fusion model?
In order to keep the complexity low, and estimate the contribution of each modality in the fusion process, the authors build the fusion model by a simple linear regression of the predictions obtained on the development partition, using Weka 3.7 with default parameters [16].
Q5. How many features were extracted from the facial landmarks?
In order to extract geometric features, the authors tracked 49 facial landmarks with the Supervised Descent Method (SDM) [42] and aligned them with a mean shape from stable points (located on the eye corners and on the nose region).
Q6. What are the functionals applied to pitch and loudness?
To pitch and loudness the following functionals are additionally applied: percentiles 20, 50 and 80, the range of percentiles 20 – 80 and the mean and standard deviation of the slope of rising/falling signal parts.
Q7. What is the parameter for W and D?
Table 5 lists the best parameters for W and D, for each modality and emotional dimension, and shows that, the valencegenerally requires longer window size (to extract features) and time delay (to compensate for reaction time) than for arousal; W̄A = 5.3, W̄V = 9.3, D̄A = 1.2, D̄V = 1.8.
Q8. What is the method for comparing arousal and valence?
This technique has significantly (p < 0.001 for CC) improved the inter-rater reliability for both arousal and valence; the Fisher Z-transform is used to perform statistical comparisons between CC in this study.
Q9. What is the definition of a minimalistic acoustic standard parameter set?
Some recommendations for the definition of a minimalistic acoustic standard parameter set have been recently investigated, and have led to the Geneva Minimalistic Acoustic Parameter Set (GeMAPS), and to an extended version (eGeMAPS) [10], which is used here as baseline.
Q10. What is the way to determine the hyper-parameters for the two modalities?
For both modalities the authors conducted a grid search for the following parameters: loss function ∈ {logarithmic, hinge loss}, regularization ∈ {L1,L2}, and α ∈ {1e1, 1e0, . . . , 1e − 5}.
Q11. What is the requirement to participate in the challenge?
To be eligible to participate in the challenge, every entry has to be accompanied by a paper presenting the results and the methods that created them, which will undergo peerreview.
Q12. What is the purpose of the 2016 AVEC?
The 2016 Audio-Visual Emotion Challenge and Workshop (AVEC 2016) will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, video, and physiological analysis of emotion and depression, with all participants competing under strictly the same conditions.