Real-Time Video Emotion Recognition based on Reinforcement Learning and Domain Knowledge

doi:10.1109/TCSVT.2021.3072412

Journal ArticleDOI

Real-Time Video Emotion Recognition based on Reinforcement Learning and Domain Knowledge

Ke Zhang, +4 more

- 12 Apr 2021 -

IEEE Transactions on Circuits and System...

- pp 1-1

Chats0

TLDR

A novel multimodal emotion recognition model for conversational videos based on reinforcement learning and domain knowledge (ERLDK) is proposed in this paper and achieves the state-of-the-art results on weighted average and most of the specific emotion categories.

Abstract:

Multimodal emotion recognition in conversational videos (ERC) develops rapidly in recent years. To fully extract the relative context from video clips, most studies build their models on the entire dialogues which make them lack of real-time ERC ability. Different from related researches, a novel multimodal emotion recognition model for conversational videos based on reinforcement learning and domain knowledge (ERLDK) is proposed in this paper. In ERLDK, the reinforcement learning algorithm is introduced to conduct real-time ERC with the occurrence of conversations. The collection of history utterances is composed as an emotion-pair which represents the multimodal context of the following utterance to be recognized. Dueling deep-Q-network (DDQN) based on gated recurrent unit (GRU) layers is designed to learn the correct action from the alternative emotion categories. Domain knowledge is extracted from public dataset based on the former information of emotion-pairs. The extracted domain knowledge is used to revise the results from the RL module and is transformed into other dataset to examine the rationality. The experimental results on datasets show that ERLDK achieves the state-of-the-art results on weighted average and most of the specific emotion categories.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep learning based multimodal emotion recognition using model-level fusion of audio-visual modalities

Asif Iqbal Middya, +2 more

- 01 Mar 2022 -

Knowledge Based Systems

TL;DR: In this paper , separate feature extractor networks for audio and video data are proposed, and an optimal multimodal emotion recognition model is created by fusing audio and visual features at the model level.

...read moreread less

Journal ArticleDOI

Gated attention fusion network for multimodal sentiment classification

Yongping Pu, +3 more

- 01 Jan 2022 -

Knowledge Based Systems

TL;DR: Wang et al. as discussed by the authors proposed a novel multimodal sentiment classification model based on gated attention mechanism, where the image feature is used to emphasize the text segment by the attention mechanism and it allows the model to focus on the text that affects the sentiment polarity.

...read moreread less

Journal ArticleDOI

Gated Recurrent Unit with Multilingual Universal Sentence Encoder for Arabic Aspect-Based Sentiment Analysis

Mohammad AL-Smadi, +4 more

- 13 Oct 2021 -

Knowledge Based Systems

TL;DR: A deep learning model based on Gated Recurrent Units (GRU) and features extracted using the Multilingual Universal Sentence Encoder (MUSE) is designed and developed and outperforms the baseline model and the related research methods evaluated on the same dataset.

...read moreread less

Journal ArticleDOI

Video sentiment analysis with bimodal information-augmented multi-head attention

Ting Wu, +7 more

- 10 Jan 2022 -

Knowledge Based Systems

TL;DR: In this article, a multi-head attention based fusion network is proposed to fuse different modalities of features for sentiment analysis, which is inspired by the observations that the interactions between any two pair-wise modalities are different and they do not equally contribute to the final sentiment prediction.

...read moreread less

Journal ArticleDOI

Continuous Emotion Recognition for Long-Term Behavior Modeling through Recurrent Neural Networks

Ioannis Kansizoglou, +3 more

- 12 May 2022 -

Technologies (Basel)

TL;DR: This work introduces a novel approach that gradually maps and learns the personality of a human, by conceiving and tracking the individual's emotional variations throughout their interaction, and proposes a handy tool for HRI scenarios, where robot’s activity adaptation is needed for enhanced interaction performance and safety.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

IEMOCAP: interactive emotional dyadic motion capture database

Carlos Busso, +8 more

TL;DR: A new corpus named the “interactive emotional dyadic motion capture database” (IEMOCAP), collected by the Speech Analysis and Interpretation Laboratory at the University of Southern California (USC), which provides detailed information about their facial expressions and hand movements during scripted and spontaneous spoken communication scenarios.

...read moreread less

Proceedings ArticleDOI

Context-Dependent Sentiment Analysis in User-Generated Videos.

Soujanya Poria, +5 more

TL;DR: A LSTM-based model is proposed that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process and showing 5-10% performance improvement over the state of the art and high robustness to generalizability.

...read moreread less

Proceedings Article

Memory Fusion Network for Multi-view Sequential Learning

Amir Zadeh, +5 more

TL;DR: Memory Fusion Network (MFN) as discussed by the authors explicitly accounts for both interactions in a neural architecture and continuously models them through time by using a memory fusion network to learn view-specific interactions and cross-view interactions.

...read moreread less

Proceedings ArticleDOI

Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos.

Devamanyu Hazarika, +5 more

TL;DR: A deep neural framework is proposed, termed conversational memory network, which leverages contextual information from the conversation history to recognize utterance-level emotions in dyadic conversational videos.

...read moreread less

Journal ArticleDOI

Video Summarization With Attention-Based Encoder–Decoder Networks

Zhong Ji, +3 more

- 01 Jun 2020 -

IEEE Transactions on Circuits and System...

TL;DR: This paper proposes a novel video summarization framework named attentive encoder–decoder networks forVideo summarization (AVS), in which the encoder uses a bidirectional long short-term memory (BiLSTM) to encode the contextual information among the input video frames.

...read moreread less