A computational framework for affective video content representation and modeling is proposed based on the dimensional approach to affect that is known from the field of psychophysiology that is characterized by the dimensions of arousal (intensity of affect) and valence (type of affect).
Abstract:
This paper looks into a new direction in video content analysis - the representation and modeling of affective video content . The affective content of a given video clip can be defined as the intensity and type of feeling or emotion (both are referred to as affect) that are expected to arise in the user while watching that clip. The availability of methodologies for automatically extracting this type of video content will extend the current scope of possibilities for video indexing and retrieval. For instance, we will be able to search for the funniest or the most thrilling parts of a movie, or the most exciting events of a sport program. Furthermore, as the user may want to select a movie not only based on its genre, cast, director and story content, but also on its prevailing mood, the affective content analysis is also likely to contribute to enhancing the quality of personalizing the video delivery to the user. We propose in this paper a computational framework for affective video content representation and modeling. This framework is based on the dimensional approach to affect that is known from the field of psychophysiology. According to this approach, the affective video content can be represented as a set of points in the two-dimensional (2-D) emotion space that is characterized by the dimensions of arousal (intensity of affect) and valence (type of affect). We map the affective video content onto the 2-D emotion space by using the models that link the arousal and valence dimensions to low-level features extracted from video data. This results in the arousal and valence time curves that, either considered separately or combined into the so-called affect curve, are introduced as reliable representations of expected transitions from one feeling to another along a video, as perceived by a viewer.
TL;DR: A multimodal data set for the analysis of human affective states was presented and a novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool.
TL;DR: This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques.
TL;DR: This paper reviews the major approaches to multimodal human-computer interaction, giving an overview of the field from a computer vision perspective, and focuses on body, gesture, gaze, and affective interaction.
TL;DR: This work investigates and develops methods to extract and combine low-level features that represent the emotional content of an image, and uses these for image emotion classification.
TL;DR: In this article, the authors deal with the nature and theory of meaning and present a new, objective method for its measurement which they call the semantic differential, which can be adapted to a wide variety of problems in such areas as clinical psychology, social psychology, linguistics, mass communications, esthetics, and political science.
TL;DR: Reports of affective experience obtained using SAM are compared to the Semantic Differential scale devised by Mehrabian and Russell (An approach to environmental psychology, 1974), which requires 18 different ratings.
TL;DR: Key issues in affective computing, " computing that relates to, arises from, or influences emotions", are presented and new applications are presented for computer-assisted learning, perceptual information retrieval, arts and entertainment, and human health and interaction.
TL;DR: It is found that the technique of seeding a Fisher Projection with the results of sequential floating forward search improves the performance of the Fisher Projections and provides the highest recognition rates reported to date for classification of affect from physiology: 81 percent recognition accuracy on eight classes of emotion, including neutral.
Q1. What have the authors contributed in "Affective video content representation and modeling" ?
This paper looks into a new direction in video content analysis – the representation and modeling of affective video content. The authors propose in this paper a computational framework for affective video content representation and modeling. This framework is based on the dimensional approach to affect that is known from the field of psychophysiology. The authors map the affective video content onto the 2-D emotion space by using the models that link the arousal and valence dimensions to low-level features extracted from video data. This results in the arousal and valence time curves that, either considered separately or combined into the so-called affect curve, are introduced as reliable representations of expected transitions from one feeling to another along a video, as perceived by a viewer. Furthermore, as the user may want to select a movie not only based on its genre, cast, director and story content, but also on its prevailing mood, the affective content analysis is also likely to contribute to enhancing the quality of personalizing the video delivery to the user.
Q2. Why does the user’s excitement rise after the second shot change?
At last, when the course of the game becomes more dynamical around frame 850, the excitement of the user will start to rise, though with a certain delay – again due to the inertia of human affective states.
Q3. How can the authors measure affective responses to stimuli?
Subjects’ affective responses to these stimuli can be quantified either by evaluating their own reports, e.g., by using the Self-Assessment Manikin ([18]) or by measuring physiological functions that are considered related to particular affect dimensions.
Q4. What is the function that is used to represent the arousal curve?
Scaling the convolution result back to the original value range results in the function that the authors adopt as the rhythm component of their arousal model (1), which is illustrated in Fig. 8(b)where(5)
Q5. What is the minimum expected capabilities of video storage systems?
The minimum expected capabilities of such systems will definitely evolve beyond the pure automation of retrieval processes: an average user will require more and more from his electronic infrastructure at home.
Q6. What is the common way to filter video content?
The systems currently available for personalized video delivery usually filter the programs on the basis of information like, in the case of a movie, the genre, cast, director and story (script) content.
Q7. What is the effect of a change in the rate of shot changes?
whenever there is a goal, or an important break (e.g., due to foul play, free kick, etc.), the director immediately increases the rate of shot changes trying to show everything that is happening on the field and among the spectators at that moment.
Q8. What is the effect of the director switching between cameras?
The director switches from one to another camera (e.g., by zooming onto a particular event, the bench or the spectators) only occasionally, which results in rather long shots.
Q9. What is the effect of the affective content on the user’s perception of the movie?
the perception of the affective content interferes with the perception of the cognitive content and influences a user’s reactions to the cognitive content, such as liking or not-liking, enjoyment and memory.
Q10. Why is the pitch average not suitable for valence?
In view of the smoothness criterion, the function (11) is not directly suitable to serve as a valence component time curve due to its step-wise nature.