Being bored? Recognising natural interest by extensive audiovisual integration for real-life application

Question

Q1. What contributions have the authors mentioned in the paper "Being bored? recognising natural interest by extensive audiovisual integration for real-life application" ?

Q2. What are the future works mentioned in the paper "Being bored? recognising natural interest by extensive audiovisual integration for real-life application" ?

Q3. What is the first step of building an active appearance model?

Q4. How is the spotting of non-linguistic vocalisations achieved?

Q5. What is the main reason for the face analysis system?

Q6. What is the importance of considering non-linguistic vocalisations for correct recognition of spontaneous speech?

Q7. How is the level of interest integrated into the feature space?

Q8. What is the downside of the regression approach?

Q9. What is the effect of diffusion by word errors?

Q10. What was the subject asked to do?

Q11. What are the vocalisations that are referred to as garbage in the ongoing?

Q12. What is the significance of the non-linguistic vocalisation coughing?

Q13. How is the performance of the fusion of all information sources achieved?

Q14. What topics are used in the virtual product and company tour?

Accepted Answer

Herein the authors introduce a fully automatic processing combination of Active-Appearance-Model-based facial expression, vision-based eyeactivity estimation, acoustic features, linguistic analysis, non-linguistic vocalisations, and temporal context information in an early feature fusion process. The authors provide detailed subject-independent results for classification and regression of the Level of Interest using Support-Vector Machines on an audiovisual interest corpus ( AV IC ) consisting of spontaneous, conversational speech demonstrating “ theoretical ” effectiveness of the approach. Further, to evaluate the approach with regards to real-life usability a user-study is conducted for proof of “ practical ” effectiveness.

Accepted Answer

Future works will have to deal with improved discrimination of the subtle difference of the border class between strong interest and boredom. Automatically noticing such events and performance with automatic modalitiy selection will be one future research issue. Also, in this respect more instances of strongly expressed boredom should be recorded in future efforts to broaden the scope of use-cases: in the face-to-face communication captured herein, these did not occur sufficiently often - potentially due to subject ’ s minimum politeness. For many applications detection of boredom or high interest moments may be sufficient.

Accepted Answer

The first step of building an Active Appearance Model is the independent application of a Principal Component Analysis to the aligned and normalised shapes in S and the shape-free textures in T, thus generating a shape and a texture model.

Accepted Answer

For spotting of non-linguistic vocalisations in the first decoding pass as described in the previous section with best parameters a recall rate of 55% and a precision rate of 46% is achieved.

Accepted Answer

Their face analysis system is capable of such pattern recognition tasks due to multiple evaluations of the influence of algorithmic parameters and their optimisation.

Accepted Answer

considering non-linguistic vocalisations is important for correct recognition of spontaneous speech since they are an essential part of natural speech and also carry meaningful information [56, 57, 58].

Accepted Answer

contextual interest information is integrated in the feature space by using the last estimate of the Level of Interest as feature.

Accepted Answer

The higher resolution of the regression approach (providing “inbetween” LOI values such as 1.5) has the downside of yielding a slightly lower accuracy: if the authors discretise the regression output into the discrete classes {LOI0, LOI1, LOI2} and compare it with the discrete master LOI, an F1 measure of 69.1% is obtained for the optimal case of fusion of all information instead of 76.0% for the directly discrete classification.

Accepted Answer

this diffusion by word errors also leads to fewer observations of the same terms: already at a minimum term frequency of two within the database the annotation based level overtakes.

Accepted Answer

The subject was explicitly asked not to worry about being polite to the experimenter, e.g. by always showing a certain level of “polite” attention.

Accepted Answer

These vocalisations are breathing, consent, coughing, hesitation, laughter, long pause, short pause, and other human noise (referred to as garbage in the ongoing).

Accepted Answer

Note that the non-linguistic vocalisation coughing could not be detected automatically (cf. sec. 2.2.5) despite its high relevance for two-fold reasons: its occurrences are mostly shorter than 100 ms, which violates their HMM topology, and too few instances are contained for reliable training - IGR does not take overall occurrence into account but measures predictive ability of e.g. coughing when it appears.

Accepted Answer

The results presented, show that by early fusion of all information sources the maximum accuracy is obtained: a remarkable subject-independent F1-measure of 72.2% is achieved for unbalanced training.

Accepted Answer

Nine topics are used in a virtual product and company tour (Toyota Museum, Safety, Intelligent Transport System, Toyota Production System, Environment, Motor sports, Toyota History, Toyota Partner Robot, and Toyota Prius).

Being bored? Recognising natural interest by extensive audiovisual integration for real-life application

Figures

Citations

The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing

Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

The INTERSPEECH 2010 Paralinguistic Challenge

OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit

Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers

References

A tutorial on hidden Markov models and selected applications in speech recognition

Data Mining: Practical Machine Learning Tools and Techniques

Rapid object detection using a boosted cascade of simple features

Usability Engineering

Data Mining

Related Papers (5)

A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

A database of German emotional speech.

The INTERSPEECH 2009 Emotion Challenge

Opensmile: the munich versatile and fast open-source audio feature extractor

Emotion recognition in human-computer interaction

Frequently Asked Questions (14)