scispace - formally typeset
Open AccessJournal ArticleDOI

Being bored? Recognising natural interest by extensive audiovisual integration for real-life application

Reads0
Chats0
TLDR
A fully automatic processing combination of Active-Appearance-Model-based facial expression, vision-based eye-activity estimation, acoustic features, linguistic analysis, non-linguistic vocalisations, and temporal context information in an early feature fusion process is introduced.
About
This article is published in Image and Vision Computing.The article was published on 2009-11-01 and is currently open access. It has received 193 citations till now. The article focuses on the topics: Affective computing.

read more

Citations
More filters
Journal ArticleDOI

The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing

TL;DR: A basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis, is proposed and intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters.
Journal ArticleDOI

Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

TL;DR: The basic phenomenon reflecting the last fifteen years is addressed, commenting on databases, modelling and annotation, the unit of analysis and prototypicality and automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration.
Proceedings ArticleDOI

The INTERSPEECH 2010 Paralinguistic Challenge

TL;DR: The INTERSPEECH 2010 Paralinguistic Challenge shall help overcome the usually low compatibility of results, by addressing three selected sub-challenges, by address-ing three selected tasks.
Proceedings ArticleDOI

OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit

TL;DR: A novel open-source affect and emotion recognition engine, which integrates all necessary components in one highly efficient software package, and which can be used for batch processing of databases.
Journal ArticleDOI

Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers

TL;DR: This work defines speech emotion recognition systems as a collection of methodologies that process and classify speech signals to detect the embedded emotions and identified and discussed distinct areas of SER.
References
More filters
Journal ArticleDOI

A tutorial on hidden Markov models and selected applications in speech recognition

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Proceedings ArticleDOI

Rapid object detection using a boosted cascade of simple features

TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Book

Usability Engineering

Jakob Nielsen
TL;DR: This guide to the methods of usability engineering provides cost-effective methods that will help developers improve their user interfaces immediately and shows you how to avoid the four most frequently listed reasons for delay in software projects.
Book

Data Mining

Ian Witten
TL;DR: In this paper, generalized estimating equations (GEE) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC are discussed.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What contributions have the authors mentioned in the paper "Being bored? recognising natural interest by extensive audiovisual integration for real-life application" ?

Herein the authors introduce a fully automatic processing combination of Active-Appearance-Model-based facial expression, vision-based eyeactivity estimation, acoustic features, linguistic analysis, non-linguistic vocalisations, and temporal context information in an early feature fusion process. The authors provide detailed subject-independent results for classification and regression of the Level of Interest using Support-Vector Machines on an audiovisual interest corpus ( AV IC ) consisting of spontaneous, conversational speech demonstrating “ theoretical ” effectiveness of the approach. Further, to evaluate the approach with regards to real-life usability a user-study is conducted for proof of “ practical ” effectiveness. 

Future works will have to deal with improved discrimination of the subtle difference of the border class between strong interest and boredom. Automatically noticing such events and performance with automatic modalitiy selection will be one future research issue. Also, in this respect more instances of strongly expressed boredom should be recorded in future efforts to broaden the scope of use-cases: in the face-to-face communication captured herein, these did not occur sufficiently often - potentially due to subject ’ s minimum politeness. For many applications detection of boredom or high interest moments may be sufficient. 

The first step of building an Active Appearance Model is the independent application of a Principal Component Analysis to the aligned and normalised shapes in S and the shape-free textures in T, thus generating a shape and a texture model. 

For spotting of non-linguistic vocalisations in the first decoding pass as described in the previous section with best parameters a recall rate of 55% and a precision rate of 46% is achieved. 

Their face analysis system is capable of such pattern recognition tasks due to multiple evaluations of the influence of algorithmic parameters and their optimisation. 

considering non-linguistic vocalisations is important for correct recognition of spontaneous speech since they are an essential part of natural speech and also carry meaningful information [56, 57, 58]. 

contextual interest information is integrated in the feature space by using the last estimate of the Level of Interest as feature. 

The higher resolution of the regression approach (providing “inbetween” LOI values such as 1.5) has the downside of yielding a slightly lower accuracy: if the authors discretise the regression output into the discrete classes {LOI0, LOI1, LOI2} and compare it with the discrete master LOI, an F1 measure of 69.1% is obtained for the optimal case of fusion of all information instead of 76.0% for the directly discrete classification. 

this diffusion by word errors also leads to fewer observations of the same terms: already at a minimum term frequency of two within the database the annotation based level overtakes. 

The subject was explicitly asked not to worry about being polite to the experimenter, e.g. by always showing a certain level of “polite” attention. 

These vocalisations are breathing, consent, coughing, hesitation, laughter, long pause, short pause, and other human noise (referred to as garbage in the ongoing). 

Note that the non-linguistic vocalisation coughing could not be detected automatically (cf. sec. 2.2.5) despite its high relevance for two-fold reasons: its occurrences are mostly shorter than 100 ms, which violates their HMM topology, and too few instances are contained for reliable training - IGR does not take overall occurrence into account but measures predictive ability of e.g. coughing when it appears. 

The results presented, show that by early fusion of all information sources the maximum accuracy is obtained: a remarkable subject-independent F1-measure of 72.2% is achieved for unbalanced training. 

Nine topics are used in a virtual product and company tour (Toyota Museum, Safety, Intelligent Transport System, Toyota Production System, Environment, Motor sports, Toyota History, Toyota Partner Robot, and Toyota Prius).