Why does the user’s excitement rise after the second shot change?

At last, when the course of the game becomes more dynamical around frame 850, the excitement of the user will start to rise, though with a certain delay – again due to the inertia of human affective states.

What is the function that is used to represent the arousal curve?

Scaling the convolution result back to the original value range results in the function that the authors adopt as the rhythm component of their arousal model (1), which is illustrated in Fig. 8(b)where(5)

What is the common way to filter video content?

The systems currently available for personalized video delivery usually filter the programs on the basis of information like, in the case of a movie, the genre, cast, director and story (script) content.

What is the effect of a change in the rate of shot changes?

whenever there is a goal, or an important break (e.g., due to foul play, free kick, etc.), the director immediately increases the rate of shot changes trying to show everything that is happening on the field and among the spectators at that moment.

What is the effect of the director switching between cameras?

The director switches from one to another camera (e.g., by zooming onto a particular event, the bench or the spectators) only occasionally, which results in rather long shots.

What is the effect of the affective content on the user’s perception of the movie?

the perception of the affective content interferes with the perception of the cognitive content and influences a user’s reactions to the cognitive content, such as liking or not-liking, enjoyment and memory.

Why is the pitch average not suitable for valence?

In view of the smoothness criterion, the function (11) is not directly suitable to serve as a valence component time curve due to its step-wise nature.

(Open Access) Affective video content representation and modeling (2005) | Alan Hanjalic

Q: What is the minimum expected capabilities of video storage systems?

The minimum expected capabilities of such systems will definitely evolve beyond the pure automation of retrieval processes: an average user will require more and more from his electronic infrastructure at home.

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005 143

Affective Video Content

Representation and Modeling

Alan Hanjalic, Member, IEEE, and Li-Qun Xu, Member, IEEE

Abstract—This paper looks into a new direction in video content

analysis – the representation and modeling of affective video con-

tent. The affective content of a given video clip can be deﬁned as

the intensity and type of feeling or emotion (both are referred to

as affect) that are expected to arise in the user while watching that

clip. The availability of methodologies for automatically extracting

this type of video content will extend the current scope of possibil-

ities for video indexing and retrieval. For instance, we will be able

to search for the funniest or the most thrilling parts of a movie,

or the most exciting events of a sport program. Furthermore, as

the user may want to select a movie not only based on its genre,

cast, director and story content, but also on its prevailing mood, the

affective content analysis is also likely to contribute to enhancing

the quality of personalizing the video delivery to the user. We pro-

pose in this paper a computational framework for affective video

content representation and modeling. This framework is based on

the dimensional approach to affect that is known from the ﬁeld of

psychophysiology. According to this approach, the affective video

content can be represented as a set of points in the two-dimen-

sional (2-D) emotion space that is characterized by the dimensions

of arousal (intensity of affect) and valence (type of affect). We map

the affective video content onto the 2-D emotion space by using the

models that link the arousal and valence dimensions to low-level

features extracted from video data. This results in the arousal and

valence time curves that, either considered separately or combined

into the so-called affect curve, are introduced as reliable represen-

tations of expected transitions from one feeling to another along a

video, as perceived by a viewer.

Index Terms—Affective video content analysis, video abstrac-

tion, video content modeling, video content representation, video

highlights extraction.

I. INTRODUCTION

IGITAL VIDEO collections are growing rapidly in both

the professional and consumer environment, and are char-

acterized by a steadily increasing capacity and content variety.

Since searching manually through these collections is tedious

and time-consuming, transferring the search and retrieval tasks

to automated systems becomes crucial for being able to efﬁ-

ciently handle stored video volumes. The development of such

systems is based on the

algorithms for video content analysis.

These algorithms are built around the models bridging the gap

between the syntax of the digital video data stream (captured

Manuscript received August 31, 2001; revised June 18, 2003. The associate

editor coordinating the review of this manuscript and approving it for publica-

tion was Dr. Sankar Basu.

A. Hanjalic is with the Department of Mediamatics, Delft University of Tech-

nology, 2628 CD Delft, The Netherlands (e-mail: A.Hanjalic@ewi.tudelft.nl).

L.-Q. Xu is with the Broadband Applications Research Centre, BT

Research Venturing, Martlesham Heath, Ipswich IP5 3RE, U.K. (e-mail:

li-qun.xu@bt.com).

Digital Object Identiﬁer 10.1109/TMM.2004.840618

Fig. 1. Overview of two different levels of video content perception, analysis,

and retrieval.

in the so-called low-level features) and the semantic meaning

of that stream. Using the information that is extracted from a

video by these algorithms, digital video data can be indexed,

classiﬁed, ﬁltered or organized automatically based on semantic

criteria.

The semantic meaning of a given video clip is not unique,

as the content of this clip can be perceived in many different

ways. Clearly, each way of perceiving video content requires a

particular type of information in order to index, classify, ﬁlter

or organize the video collection correspondingly. As depicted

in Fig. 1, we differentiate between two basic levels of video

content perception, hence two different levels of analyzing and

retrieving video content.

• Cognitive level.

• Affective level.

An algorithm analyzing a video at cognitive level aims at ex-

tracting information that describes the “facts,” e.g., the struc-

ture of the story, the composition of a scene, and the objects and

people captured by the camera. For example, these facts can in-

clude the labels such as “a panorama of San Francisco,” an “out-

door” or “indoor” scene, a broadcast news report on “Topic X,”

a “dialog between person A and person B,” or the “fast breaks,”

“steals,” and “scores” of a basketball match. Most of the world-

wide research efforts in the ﬁeld of video content analysis have

been invested so far in raising the efﬁciency and reliability of

analyzing the video content at cognitive level. Good overviews

of, and references to the results of these efforts can be found in

[7], [13], and [21].

Little research effort has been invested so far in extracting the

information that describes the affective content of a video. This

content can be deﬁned as the amount and type of affect (feeling

or emotion) that are contained in video and expected to arise in

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 05:48 from IEEE Xplore. Restrictions apply.

144 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005

users while watching that video. This expected feeling or emo-

tion can be seen as the one that is either intended to be commu-

nicated toward the audience (from video program directors), or

that is likely to be elicited from majority of the audience who

are watching the particular video clip. To illustrate the former,

we use the quote of I. Maitland [25], the Emmy-Award-winning

director and editor: “It is the ﬁlmmaker’s job to create moods in

such a realistic manner that the audience will experience those

same emotions enacted on the screen, and thus feel part of the

experience.” The expected affective response of a broad audi-

ence can best be illustrated by the example of a sport broadcast:

A score (goal) in a soccer match can generally be considered a

highly exciting event, just like the ﬁnish of a swimming compe-

tition or the sprint over the last 50 m in a running contest.

At this stage it is worthwhile emphasizing that the affective

content of a video does not necessarily correspond to the affec-

tive response of a particular user to this content. In other words,

the expected feeling or emotion as described above should not

be mixed up with the actual feeling or emotion that is evoked in

a user while watching video. The expected affective response

can be considered objective, as it results from the actions of

the movie director, or reﬂects the more-or-less unanimous re-

sponse of a general audience to a given stimulus. Opposed to

this, the perceived feeling or emotion is highly subjective and

context-dependent. Therefore, it may be very different from the

expected one and may also vary from one individual to another.

For instance, the same soccer television broadcast may make

the winning team’s fans happy, the losing fans sad, and elicit no

emotions at all from an audience that is not interested in soccer.

The relation between the expected and the subjective affective

responses (e.g., marking a horror movie with the label “funny”

for those people who always laugh while watching such movies)

and the information about the context (e.g., winning or losing

soccer fan) can be taken into account, for instance, by gener-

ating the proﬁle of a particular user. This proﬁle can then be

used to map the expected affective response to a given stimulus

onto the user-speciﬁc affective response to that stimulus.

We propose in this paper a computational framework for af-

fective video content representation and modeling. The repre-

sentation part of the framework consists of a set of curves that

reliably depict the expected transitions from one feeling to an-

other along a video, as elicited from a general user. The mod-

eling part addresses the problem of computing the values of the

content representation curves on the basis of low-level features

extracted from video.

This paper is organized as follows. In Section II, we discuss

the importance of extending the research in the ﬁeld of video

content analysis from the cognitive to the affective level, which

allows for a number of new or enhanced video indexing and

retrieval applications. In Section III we elaborate on the dimen-

sional approach to affect that is known from psychophysiology

and that provides the fundamentals of the proposed framework.

The detailed framework is then presented in Section IV (repre-

sentation part) and Section V (modeling part), together with the

validation using real video program data. Conclusions and rec-

ommendations for future research are given in Section VI.

II. W

HY AFFECTIVE

VIDEO CONTENT

ANALYSIS?

A. Personalized Video Delivery

In view of the rapidly growing technological awareness of

the average user, the availability of automated systems that can

optimally prepare data for an easy access by the user becomes

crucial for the commercial success of consumer-oriented mul-

timedia databases. The minimum expected capabilities of such

systems will deﬁnitely evolve beyond the pure automation of

retrieval processes: an average user will require more and more

from his electronic infrastructure at home. In the particular case

of video storage systems, this “more” can directly be interpreted

as personalized video delivery. Since video storage system at

home will soon become a necessary buffer for the hundreds of

television channels reaching one’s home, the system module

handling the stored video data will increasingly be expected to

take into account the preferences of the user and to ﬁlter and

organize the stored content accordingly. The systems currently

available for personalized video delivery usually ﬁlter the pro-

grams on the basis of information like, in the case of a movie,

the genre, cast, director and story (script) content. As the user

preferences in this case are also largely determined by the pre-

vailing mood of a movie, then any information regarding this

mood (obtainable by analyzing the types and intensities of feel-

ings or emotions along a video) is likely to improve the quality

of personalized video delivery.

B. Video Indexing Using Affective Labels

The availability of methods for automatically extracting the

affective video content will extend the current scope of possi-

bilities for video indexing and retrieval. The evidence reported

by Picard [25] is that ﬁnding photographs having a particular

mood was the most frequent request of advertising customers in

a study of image retrieval made with Kodak Picture Exchange

[27]. One can easily extend this result to video collections as

well: an average user will often search for the “funniest,”“most

sentimental,” or “most thrilling” fragments of a movie, as well

as for the “most exciting” segments of a sport event.

C. Video Highlighting

Although the highlights generally stand for the most inter-

esting parts of a video, the deﬁnition of what is “interesting”

may vary widely across video genres and for different applica-

tions. For instance, while a highlight of a news program may

be determined by the novelty and impact of the news (e.g.,

“breaking news,”“headline news”), the criteria for highlight ex-

traction from a home video are rather content-dependent, like

“where my baby walked for the ﬁrst time.” The ability to an-

alyze video at affective level will broaden the possibilities for

highlights extraction in a number of application contexts, such

as automated movie trailer generation and sport broadcast sum-

marization.

A movie trailer is a concatenation of movie excerpts that last

only for several tens of seconds but are capable of commanding

the attention of a large number of potential cinema goers and

video on-demand users. Analyzing a movie at affective level

can provide valuable clues about which parts of the movie are

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 05:48 from IEEE Xplore. Restrictions apply.

HANJALIC AND XU: AFFECTIVE VIDEO CONTENT REPRESENTATION AND MODELING 145

most suitable for being an element of the trailer. This is because

emotion plays a primary role when processing mediated stimuli

[9]. The emotion (affective content) inﬂuences the attention of a

user and his or her evaluation and memory for the facts (cogni-

tive content). Consequently, the perception of the affective con-

tent interferes with the perception of the cognitive content and

inﬂuences a user’s reactions to the cognitive content, such as

liking or not-liking, enjoyment and memory. Since memory is

the most important factor when creating a trailer, it is worthy

to notice that memory for highly emotional and, in particular,

highly arousing video fragments has been proven to last longer

than the memory for less-emotional video clips [16], [17]. If the

information on the affective video content is available, creation

of movie trailers can be performed fully automatically. Also,

the trailers can be generated remotely at a user’s home, for each

movie downloaded by the home digital video storage system.

Previous approaches to automated highlights extraction

in sport video were usually based on the development of

domain-speciﬁc models for predeﬁned events (e.g., goals in

soccer, home runs in baseball, fast breaks and steals in basket-

ball, etc.) that are supposed to be interpreted by the users as

highlights [3], [15], [22]. The need for event modeling not only

makes the highlight extraction technically and semantically a

complex task in many broadcasts, but it also requires the devel-

opment of a separate highlights-detecting algorithm for each

particular sport program genre. Since it is realistic to assume

that each highlight event (e.g., goal, touchdown, home run,

the ﬁnals of a swimming competition, and the last 50 meters

in a running contest) induces a steady increase in a user’sex-

citement, an alternative to the domain-speciﬁc approach could

be to search for highlights in those video segments that excite

the users most. In this way, generic methods for highlights

extraction could be developed that are independent of the type

of events appearing in a particular sports program genre and

the differences in event realization and coverage.

III. D

IMENSIONAL APPROACH TO

AFFECT

As studied by Bradley [5], Lang et al. [19], Osgood et al. [24],

Russel and Mehrabian [28], affect has three basic underlying

dimensions.

• Valence (

• Arousal (

• Control (Dominance) (

Valence is typically characterized as a continuous range of affec-

tive responses or states extending from pleasant or “positive” to

unpleasant or “negative” [8], while arousal is characterized by

affective states ranging on a continuous scale from energized,

excited and alert to calm, drowsy or peaceful. We can also say

that arousal stands for the “intensity” of emotion, while valence

can be related to the “type” of emotion. The third dimension

– control (dominance) – is particularly useful in distinguishing

among emotional states having similar arousal and valence (e.g.,

differentiating between “grief” and “rage”) and typically ranges

from “no control” to “full control”. Consequently, the entire

scope of human emotions can be represented as a set of points

in the three-dimensional (3-D) VAC coordinate space.

Fig. 2. Illustration of the 3-D emotion space (from Dietz and Lang [9]).

While we could tend to assume that the points corresponding

to different affective states are equally likely to be found any-

where in the three-dimensional VAC coordinate space, psycho-

physiological experiments show that only certain areas of this

space are actually relevant. These experiments typically include

measurements of affective responses of a large group of subjects

to calibrated audio-visual stimuli collected in the International

Affective Picture System (IAPS, Lang et al. [20]) and the Inter-

national Affective Digitized Sounds system (IADS, Bradley and

Lang [6]). Subjects’ affective responses to these stimuli can be

quantiﬁed either by evaluating their own reports, e.g., by using

the Self-Assessment Manikin ([18]) or by measuring physio-

logical functions that are considered related to particular affect

dimensions. For example, heart rate reliably indexes valence,

while skin conductance is associated with arousal. It was found

that the heart rate accelerates as a reaction to pleasant stimuli,

while unpleasant stimuli cause the heart rate to slow down [8],

[10], [12]. Also, an increase in arousal causes the sweat glands

to become active and the skin conductance responses larger and

more frequent [8], [14]. While IAPS and IADS are specially

created to evoke a wide range of emotions with their audio-vi-

sual content, the three-dimensional surface circumventing the

affective responses after their mapping onto the corresponding

points in the 3-D VAC coordinate system is roughly parabolic.

An idea about the shape of the surface can be obtained from

the illustration in Fig. 2. The parabolic shape becomes logical

if we realize that there are relatively few or even no stimuli that

would cause an emotional state characterized by, for instance,

high arousal and neutral valence, or high valence accompanied

by low arousal [9].

The dimensional approach to representing emotion as de-

scribed above can play an important role in the development

of “affective” agents that serve as mediators between the com-

puter and user, and involve the user in an interaction with the

computer in the same way as he/she interacts with other hu-

mans. Since human-to-human interaction is strongly determined

by emotions, an affective agent is able to sense, synthesize, and

express emotions. For example, Dietz and Lang [9] use the par-

abolic surface from Fig. 2 as the basis for assigning a tempera-

ment, mood and emotion to an affective agent, thus deﬁning the

“personality” of that agent. The temperament is a ﬁxed point in

the space that deﬁnes the “at rest” state of the agent (its rudimen-

tary personality). While the temperament is static, the points

corresponding to the mood and emotion of the agent can move

freely within the space. The position of the emotion point gives

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 05:48 from IEEE Xplore. Restrictions apply.

146 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005

Fig. 3. Illustration of the 2-D emotion space (from Dietz and Lang [9]).

rise to the expressions of the agent and determines its current

affective state. Further, the emotion point gravitates toward the

position of the mood that, again, moves through the space rel-

atively slowly, is mainly pulled by emotional events and gravi-

tates toward the position of the temperament. The dynamics of

the system is therefore inﬂuenced by both the agent’s current

affective state and its temperament.

IV. A

FFECTIVE VIDEO CONTENT REPRESENTATION

A. Two-Dimensional (2-D) Emotion Space

As can be seen from Fig. 2, the effect of the control dimen-

sion becomes visible only at points with distinctly high abso-

lute valence values. This effect is also quite small, mainly due

to a rather narrow range of values belonging to this dimension.

Consequently, it can be said that the control dimension plays

only a limited role in characterizing various emotional states.

As a matter of fact, Greenwald et al. [12] have shown that va-

lence and arousal account for most of the independent variance

in emotional responses. This is especially true for the problem

to be addressed in this paper – the extraction of the affective

content from a video. Numerous studies of human emotional

responses to media have shown that “emotion elicited by pic-

tures, television, radio, computers, and sounds can be mapped

onto an emotion space created by the arousal and valence axes”

[9]. For this reason, we neglect the control dimension and con-

sider the arousal and valence dimensions only. Instead of the

three-dimensional surface introduced in the previous section,

the relevant emotion space for the purpose of affective video

content analysis is reduced to the projection of this surface onto

the arousal-valence plane. Fig. 3 shows an illustration of the re-

sulting 2-D emotion space. The parabolic contour is generated to

circumvent the scatter plot of affective responses with respect to

arousal and valence only, which were collected using the IAPS

and IADS stimuli. It is expected that the affective states ex-

tracted from a video can be represented as the points within this

contour.

B. Arousal, Valence, and Affect Curve

By computing the arousal and valence values along a video

the arousal and valence time curves can be obtained. We intro-

duce these curves, either considered separately or combined into

the so-called affect curve, as suitable representations of the af-

fective content of a video in view of the applications described

in Section II.

Fig. 4. Illustration of the arousal, valence, and affect curve.

The arousal time curve indicates how the intensity of the

emotional load changes along a video, and depicts the expected

changes in user’s excitement while watching that video. In this

sense, the arousal curve is particularly suitable for locating the

“exciting” video segments. On the basis of the arousal time

curve we can generate a video abstract containing the highlights

in a desired length. Namely, given the maximum abstract length

in frames, a horizontal line can be drawn cutting off the peaks

of the curve in such a way that the number of frames covered by

the peaks is not larger than

. This is illustrated in Fig. 4(a).

The valence time curve depicts the state changes in the type

of feelings or emotions contained in a video over the time. As

such, this curve mimics the expected changes of “moods” of the

user while watching a video. Using the valence time curve we

can also determine the “positive” and “negative” video segments

with respect to the expected type of feeling that is evoked in the

user during these segments. This information can serve to match

the video to personal preferences of the user, but also to auto-

matically perform “censorship” tasks, that is, to remove all seg-

ments from a video that are “too negative” for certain groups of

the audience. As illustrated in Fig. 4(b), such segments may be

searched among those for which the valence curve reaches local

minima. The arousal and valence time curves can be combined

into the affect curve. This curve is composed of the value pairs

of the arousal and valence time curves that are taken per time

stamp of the video and mapped onto the corresponding points

of the 2-D emotion space [Fig. 4(c)].

The affect curve can be seen as the most complete repre-

sentation of the affective content of a video, which can be ob-

tained automatically. This curve can be interpreted in various

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 05:48 from IEEE Xplore. Restrictions apply.

HANJALIC AND XU: AFFECTIVE VIDEO CONTENT REPRESENTATION AND MODELING 147

Fig. 5. Illustration of the possibility for video content indexing and retrieval

at affective level.

ways and used for numerous applications related to video con-

tent representation and retrieval at the affective level. For in-

stance, assuming that the affect curve has already been com-

puted for a given video, an arbitrary temporal segment of that

video can automatically be indexed with respect to the affective

states through which the corresponding part of the affect curve

passes. Indexes can be provided in the form of labels that are

assigned a priori to different regions of the 2-D emotion space,

as illustrated in Fig. 5. Also, the area of the 2-D emotion space

in which the curve traverses most of the time corresponds to the

dominant affective state (“prevailing mood”) of a video. This

can be highly useful for automatically classifying a video into

different affective genres. Further, the affect curve may directly

serve as a criterion for ﬁltering the incoming videos according

to a user’s preferences. Namely, an affect curve representing

a user’s preferences can be obtained by simply combining the

affect curves of all programs that the user has selected in the

past (in the learning phase of the system). Filtering an incoming

video according to this user’s preferences is then nothing more

than matching the affect curve of the incoming video with the

affect curve describing the user’s preferences.

V. A

FFECTIVE VIDEO CONTENT MODELING

In order to obtain the affective content representation as de-

scribed in the previous section, models need to be developed

for the arousal and valence time curve. These models fulﬁll the

tasks of deriving arousal and valence values from the values

of low-level features computed in a video. In Section V-A, we

introduce the basic criteria that need to be taken into account

during the model development. Then, in Section V-B, we elab-

orate on the possibilities for establishing relations between the

affect dimensions and low-level features. Finally, we propose

models for the arousal and valence time curve and experiment

with these models on a number of video excerpts from movies

and soccer television broadcasts.

A. Criteria for Developing Affect Models

As arousal and valence are psychological categories, their

models need to be psychologically justiﬁable. To achieve this,

we introduce the following three criteria that a model for the

arousal, valence or affect curve should satisfy.

• Comparability.

• Compatibility.

• Smoothness.

The ﬁrst criterion (Comparability) ensures that the values of

the arousal, valence and the resulting affect curve obtained in

different videos for similar types of events are comparable.

This criterion obviously imposes normalization and scaling

requirements when computing the time curves. The second

criterion (Compatibility) ensures that the affect curve covers

an area in the valence-arousal coordinate system, the shape of

which roughly corresponds to the parabolic-like contour of the

2-D emotion space. The third criterion (Smoothness) accounts

for the degree of memory retention of preceding frames and

shots [1]. It ensures that the perception of the content, and

consequently the mediated affective state, does not change

abruptly from one video frame to another but is a function of a

number of consecutive frames (shots).

B. Feature Selection

Little is known regarding the relations between the low-level

features and affect. While the problem of bridging the semantic

gap remains very hard in the case of cognitive video content

analysis, the magnitude of this problem in the affective case

is even bigger. The reason for this is that in the cognitive case

the low-level features describe aspects of a real entity, e.g., the

choice of the color red as one of the features to characterize

a red car. In the affective case, however, we need to relate the

low-level features to something rather abstract, such as feeling

or emotion. In the context of this paper, we are particularly inter-

ested in the relations between low-level features and the affect

dimensions of arousal and valence.

One of the most extensively investigated visual features in the

context of affective video content analysis is motion. Research

results show that motion in a television picture has a signiﬁ-

cant impact on individual affective responses. This has been re-

alized also by ﬁlm theorists who contend that motion is highly

expressive and is able “to evoke strong emotional responses in

viewers” ([2], [11]). In particular, Detenber et al. [8] and Sim-

mons et al. [29] investigated the inﬂuence of camera and object

motion on emotional responses of humans and concluded that

an increase of motion intensity on the screen causes an increase

in arousal. The type of emotion (represented by the sign of va-

lence) was found independent of motion: if the mood of a test

person was “positive” or “negative” while watching a still pic-

ture, the “sign” of the mood will not change if a motion is intro-

duced within that picture.

Based on the results obtained by Murray and Arnott [23], as

well as those reported by Picard in [25] and [26], various vocal

effects present in the sound track of a video may bear broad

relations to the affective content of that video. In terms of affect

dimensions, the loudness (signal energy) and speech rate (e.g.,

faster for fear or joy and slower for disgust or romance) are

often being related to the arousal, while the inﬂection, rhythm,

duration of the last syllable of a sentence, voice quality (e.g.,

breathy or resonant), as well as the pitch-related features (pitch

average, pitch range and pitch changes), are commonly related

to valence [23], [25].

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 05:48 from IEEE Xplore. Restrictions apply.

Affective video content representation and modeling

Figures

Citations

The measurement of meaning

DEAP: A Database for Emotion Analysis ;Using Physiological Signals

Content-based multimedia information retrieval: State of the art and challenges

Multimodal human-computer interaction: A survey

Affective image classification using features inspired by psychology and art theory

References

The Measurement of Meaning

Measuring emotion: The self-assessment manikin and the semantic differential

Affective Computing

The measurement of meaning

Toward machine emotional intelligence: analysis of affective physiological state

Related Papers (5)

A circumplex model of affect

DEAP: A Database for Emotion Analysis ;Using Physiological Signals

A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

Affective Computing

A Multimodal Database for Affect Recognition and Implicit Tagging

Frequently Asked Questions (10)

Q1. What have the authors contributed in "Affective video content representation and modeling" ?

Q2. Why does the user’s excitement rise after the second shot change?

Q3. How can the authors measure affective responses to stimuli?

Q4. What is the function that is used to represent the arousal curve?

Q5. What is the minimum expected capabilities of video storage systems?

Q6. What is the common way to filter video content?

Q7. What is the effect of a change in the rate of shot changes?

Q8. What is the effect of the director switching between cameras?

Q9. What is the effect of the affective content on the user’s perception of the movie?

Q10. Why is the pitch average not suitable for valence?