scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Video Affective Content Analysis: A Survey of State-of-the-Art Methods

01 Oct 2015-IEEE Transactions on Affective Computing (IEEE)-Vol. 6, Iss: 4, pp 410-430
TL;DR: A general framework for video affective content analysis is proposed, which includes video content, emotional descriptors, and users' spontaneous nonverbal responses, as well as the relationships between the three.
Abstract: Video affective content analysis has been an active research area in recent decades, since emotion is an important component in the classification and retrieval of videos. Video affective content analysis can be divided into two approaches: direct and implicit. Direct approaches infer the affective content of videos directly from related audiovisual features. Implicit approaches, on the other hand, detect affective content from videos based on an automatic analysis of a user’s spontaneous response while consuming the videos. This paper first proposes a general framework for video affective content analysis, which includes video content, emotional descriptors, and users’ spontaneous nonverbal responses, as well as the relationships between the three. Then, we survey current research in both direct and implicit video affective content analysis, with a focus on direct video affective content analysis . Lastly, we identify several challenges in this field and put forward recommendations for future research.
Citations
More filters
Journal ArticleDOI
TL;DR: Experimental results cumulatively confirm that personality differences are better revealed while comparing user responses to emotionally homogeneous videos, and above-chance recognition is achieved for both affective and personality dimensions.
Abstract: We present ASCERTAIN—a multimodal databa AS e for impli C it p ER sonali T y and A ffect recognit I o N using commercial physiological sensors. To our knowledge, ASCERTAIN is the first database to connect personality traits and emotional states via physiological responses . ASCERTAIN contains big-five personality scales and emotional self-ratings of 58 users along with their Electroencephalogram (EEG), Electrocardiogram (ECG), Galvanic Skin Response (GSR) and facial activity data, recorded using off-the-shelf sensors while viewing affective movie clips. We first examine relationships between users’ affective ratings and personality scales in the context of prior observations, and then study linear and non-linear physiological correlates of emotion and personality. Our analysis suggests that the emotion-personality relationship is better captured by non-linear rather than linear statistics. We finally attempt binary emotion and personality trait recognition using physiological features. Experimental results cumulatively confirm that personality differences are better revealed while comparing user responses to emotionally homogeneous videos, and above-chance recognition is achieved for both affective and personality dimensions.

329 citations


Cites background from "Video Affective Content Analysis: A..."

  • ...Finally, the importance of using less-intrusive sensors in affective studies has been widely acknowledged [23], [25]....

    [...]

  • ...user behavior via the use of physiological sensors (see [23] for a review)....

    [...]

  • ...While most available affective datasets have been compiled using lab equipment [23], ASCERTAIN represents one of the first initiatives to exclusively employwearable sensors for data collection, which not only enhances its ecological validity, but also repeatability and suitability for large-scale user profiling....

    [...]

  • ...Also, Wang and Ji [23] advocate the need for less-intrusive sensors to elicit natural emotional behavior...

    [...]

01 Jan 2004
TL;DR: LTS3 Reference LTS-ARTICLE-2004-019 Record created on 2006-06-14, modified on 2016-08-08.
Abstract: Keywords: LTS3 Reference LTS-ARTICLE-2004-019 Record created on 2006-06-14, modified on 2016-08-08

202 citations

Journal ArticleDOI
TL;DR: This paper proposes to predict the continuous probability distribution of image emotions which are represented in dimensional valence-arousal space and carries out large-scale statistical analysis on the constructed Image-Emotion-Social-Net dataset, observing that the emotion distribution can be well-modeled by a Gaussian mixture model.
Abstract: Previous works on image emotion analysis mainly focused on predicting the dominant emotion category or the average dimension values of an image for affective image classification and regression. However, this is often insufficient in various real-world applications, as the emotions that are evoked in viewers by an image are highly subjective and different. In this paper, we propose to predict the continuous probability distribution of image emotions which are represented in dimensional valence-arousal space. We carried out large-scale statistical analysis on the constructed Image-Emotion-Social-Net dataset, on which we observed that the emotion distribution can be well-modeled by a Gaussian mixture model. This model is estimated by an expectation-maximization algorithm with specified initializations. Then, we extract commonly used emotion features at different levels for each image. Finally, we formalize the emotion distribution prediction task as a shared sparse regression (SSR) problem and extend it to multitask settings, named multitask shared sparse regression (MTSSR), to explore the latent information between different prediction tasks. SSR and MTSSR are optimized by iteratively reweighted least squares. Experiments are conducted on the Image-Emotion-Social-Net dataset with comparisons to three alternative baselines. The quantitative results demonstrate the superiority of the proposed method.

186 citations


Cites background from "Video Affective Content Analysis: A..."

  • ...Note that affective content analysis has also been widely studied based on other types of input data, such as text [38], speech [39], [40], music [41]–[44] and videos [45]–[49]....

    [...]

Journal ArticleDOI
TL;DR: Rolling multi-task hypergraph learning (RMTHG) is presented to consistently combine these factors and a learning algorithm is designed for automatic optimization to predict the personalized emotion perceptions of images for each individual viewer.
Abstract: Images can convey rich semantics and induce various emotions to viewers. Most existing works on affective image analysis focused on predicting the dominant emotions for the majority of viewers. However, such dominant emotion is often insufficient in real-world applications, as the emotions that are induced by an image are highly subjective and different with respect to different viewers. In this paper, we propose to predict the personalized emotion perceptions of images for each individual viewer. Different types of factors that may affect personalized image emotion perceptions, including visual content, social context, temporal evolution, and location influence, are jointly investigated. Rolling multi-task hypergraph learning (RMTHG) is presented to consistently combine these factors and a learning algorithm is designed for automatic optimization. For evaluation, we set up a large scale image emotion dataset from Flickr, named Image-Emotion-Social-Net, on both dimensional and categorical emotion representations with over 1 million images and about 8,000 users. Experiments conducted on this dataset demonstrate that the proposed method can achieve significant performance gains on personalized emotion classification, as compared to several state-of-the-art approaches.

142 citations


Cites background from "Video Affective Content Analysis: A..."

  • ...Note that affective content analysis has also been widely studied based on other types of input data, such as text [5], speech [36], [37], music [38], [39], [40], [41] and videos [42], [43], [44], [45], [46]....

    [...]

Journal ArticleDOI
TL;DR: This work proposes the definition of four different dimensions, namely Pattern & Knowledge discovery, Information Fusion & Integration, Scalability, and Visualization, which are used to define a set of new metrics (termed degrees) in order to evaluate the different software tools and frameworks of SNA.

134 citations


Cites background from "Video Affective Content Analysis: A..."

  • ...One example is the work published in [372] where authors apply sentiment-analysis to the video once it had been transcribed....

    [...]

References
More filters
Journal ArticleDOI

12,519 citations


"Video Affective Content Analysis: A..." refers background in this paper

  • ...Based on the expected mood, i.e., the emotions a film - maker intends to communicate to a particular audience with a common cultural background, Hanjalic and Xu [2] successfully related audiovisual features with the emotional dimension of the audience....

    [...]

  • ...Figure 1 summarizes the major components of the two approaches for video affective content analysis....

    [...]

Book
01 Jan 2009
TL;DR: The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.
Abstract: Can machine learning deliver AI? Theoretical results, inspiration from the brain and cognition, as well as machine learning experiments suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one would need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers, graphical models with many levels of latent variables, or in complicated propositional formulae re-using many sub-formulae. Each level of the architecture represents features at a different level of abstraction, defined as a composition of lower-level features. Searching the parameter space of deep architectures is a difficult task, but new algorithms have been discovered and a new sub-area has emerged in the machine learning community since 2006, following these discoveries. Learning algorithms such as those for Deep Belief Networks and other related unsupervised learning algorithms have recently been proposed to train deep architectures, yielding exciting results and beating the state-of-the-art in certain areas. Learning Deep Architectures for AI discusses the motivations for and principles of learning algorithms for deep architectures. By analyzing and comparing recent results with different learning algorithms for deep architectures, explanations for their success are proposed and discussed, highlighting challenges and suggesting avenues for future explorations in this area.

7,767 citations

Book
01 Jan 1997
TL;DR: Key issues in affective computing, " computing that relates to, arises from, or influences emotions", are presented and new applications are presented for computer-assisted learning, perceptual information retrieval, arts and entertainment, and human health and interaction.
Abstract: Computers are beginning to acquire the ability to express and recognize affect, and may soon be given the ability to " have emotions. " The essential role of emotion in both human cognition and perception, as demonstrated by recent neurological studies, indicates that affective computers should not only provide better performance in assisting humans, but also might enhance computers' abilities to make decisions. This paper presents and discusses key issues in " affective computing, " computing that relates to, arises from, or influences emotions. Models are suggested for computer recognition of human emotion, and new applications are presented for computer-assisted learning, perceptual information retrieval, arts and entertainment, and human health and interaction. Affective computing, coupled with new wear-able computers, will also provide the ability to gather new data necessary for advances in emotion and cog-nition theory. Nothing in life is to be feared. It is only to be understood. – Marie Curie Emotions have a stigma in science; they are believed to be inherently non-scientific. Scientific principles are derived from rational thought, logical arguments, testable hypotheses, and repeatable experiments. There is room alongside science for " non-interfering " emotions such as those involved in curiosity, frustration, and the pleasure of discovery. In fact, much scientific research has been prompted by fear. Nonetheless, the role of emotions is marginalized at best. Why bring " emotion " or " affect " into any of the deliberate tools of science? Moreover, shouldn't it be completely avoided when considering properties to design into computers? After all, computers control significant parts of our lives – the phone system, the stock market, nuclear power plants, jet landings, and more. Who wants a computer to be able to " feel angry " at them? To feel contempt for any living thing? In this essay I will submit for discussion a set of ideas on what I call " affective computing, " computing that relates to, arises from, or influences emotions. This will need some further clarification which I shall attempt below. I should say up front that I am not proposing the pursuit of computerized cingulotomies 1 or even into the business of building " emotional computers ". 1 The making of small wounds in the ridge of the limbic system known as the cingulate gyrus, a surgical procedure to aid severely depressed patients. Nor will I propose answers to the difficult and intriguing questions , " …

5,700 citations

Journal ArticleDOI
TL;DR: In this article, the authors define emotion as a phenomenon to be studied, without consensual conceptualization and operationalization of exactly what phenomenon is to be investigated. But progress in theory and research is difficult to a...
Abstract: Defining “emotion” is a notorious problem. Without consensual conceptualization and operationalization of exactly what phenomenon is to be studied, progress in theory and research is difficult to a...

3,247 citations

Journal ArticleDOI
TL;DR: A multimodal data set for the analysis of human affective states was presented and a novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool.
Abstract: We present a multimodal data set for the analysis of human affective states. The electroencephalogram (EEG) and peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance, and familiarity. For 22 of the 32 participants, frontal face video was also recorded. A novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool. An extensive analysis of the participants' ratings during the experiment is presented. Correlates between the EEG signal frequencies and the participants' ratings are investigated. Methods and results are presented for single-trial classification of arousal, valence, and like/dislike ratings using the modalities of EEG, peripheral physiological signals, and multimedia content analysis. Finally, decision fusion of the classification results from different modalities is performed. The data set is made publicly available and we encourage other researchers to use it for testing their own affective state estimation methods.

3,013 citations


"Video Affective Content Analysis: A..." refers background in this paper

  • ...In 2001, Nack et al. [1] defined the concept of Computational Media Aesthetics (CMA) as the algorithmic study to analyze and interpret how the visual and aural elements in media evoke audiences’ emotional responses based on the film grammar....

    [...]