YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context

doi:10.1109/MIS.2013.34

Home
/
Papers
/
YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context

Journal Article•DOI•

YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context

Martin Wöllmer, Felix Weninger, T. Knaup, Bjoern Schuller, Congkai Sun¹, Kenji Sagae², L-P Morency² - Show less +3 more•Institutions (2)

Shanghai Jiao Tong University¹, University of Southern California²

01 May 2013-IEEE Intelligent Systems (IEEE)-Vol. 28, Iss: 3, pp 46-53

TL;DR: Experimental results indicate that training on written movie reviews is a promising alternative to exclusively using (spoken) in-domain data for building a system that analyzes spoken movie review videos, and that language-independent audio-visual analysis can compete with linguistic analysis.

read less

Abstract: This work focuses on automatically analyzing a speaker's sentiment in online videos containing movie reviews. In addition to textual information, this approach considers adding audio features as typically used in speech-based emotion recognition as well as video features encoding valuable valence information conveyed by the speaker. Experimental results indicate that training on written movie reviews is a promising alternative to exclusively using (spoken) in-domain data for building a system that analyzes spoken movie review videos, and that language-independent audio-visual analysis can compete with linguistic analysis.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A review of affective computing

[...]

Soujanya Poria¹, Erik Cambria², Rajiv Bajpai², Amir Hussain¹•Institutions (2)

University of Stirling¹, Nanyang Technological University²

01 Sep 2017-Information Fusion

TL;DR: This first of its kind, comprehensive literature review of the diverse field of affective computing focuses mainly on the use of audio, visual and text information for multimodal affect analysis, and outlines existing methods for fusing information from different modalities.

...read moreread less

969 citations

Journal Article•DOI•

An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels

[...]

Guang-Bin Huang¹•Institutions (1)

Nanyang Technological University¹

03 Apr 2014-Cognitive Computation

TL;DR: An insight into ELMs in three aspects, viz: random neurons, random features and kernels is provided and it is shown that in theory ELMs (with the same kernels) tend to outperform support vector machine and its variants in both regression and classification applications with much easier implementation.

...read moreread less

Abstract: Extreme learning machines (ELMs) basically give answers to two fundamental learning problems: (1) Can fundamentals of learning (i.e., feature learning, clus- tering, regression and classification) be made without tuning hidden neurons (including biological neurons) even when the output shapes and function modeling of these neurons are unknown? (2) Does there exist unified frame- work for feedforward neural networks and feature space methods? ELMs that have built some tangible links between machine learning techniques and biological learning mechanisms have recently attracted increasing attention of researchers in widespread research areas. This paper provides an insight into ELMs in three aspects, viz: random neurons, random features and kernels. This paper also shows that in theory ELMs (with the same kernels) tend to outperform support vector machine and its variants in both regression and classification applications with much easier implementation.

...read moreread less

871 citations

Proceedings Article•DOI•

Context-Dependent Sentiment Analysis in User-Generated Videos.

[...]

Soujanya Poria¹, Erik Cambria¹, Devamanyu Hazarika², Navonil Majumder³, Amir Zadeh⁴, Louis-Philippe Morency⁴ - Show less +2 more•Institutions (4)

Nanyang Technological University¹, National University of Singapore², Instituto Politécnico Nacional³, Carnegie Mellon University⁴

01 Jul 2017

TL;DR: A LSTM-based model is proposed that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process and showing 5-10% performance improvement over the state of the art and high robustness to generalizability.

...read moreread less

Abstract: Multimodal sentiment analysis is a developing area of research, which involves the identification of sentiments in videos. Current research considers utterances as independent entities, i.e., ignores the interdependencies and relations among the utterances of a video. In this paper, we propose a LSTM-based model that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process. Our method shows 5-10% performance improvement over the state of the art and high robustness to generalizability.

...read moreread less

570 citations

Cites background from "YouTube Movie Reviews: Sentiment An..."

...Recently, a number of approaches to multimodal sentiment analysis, producing interesting results, have been proposed (Pérez-Rosas et al., 2013; Wollmer et al., 2013; Poria et al., 2015)....
[...]
...Recently, a number of approaches to multimodal sentiment analysis, producing interesting results, have been proposed (Pérez-Rosas et al., 2013; Wollmer et al., 2013; Poria et al., 2015)....
[...]
...(Wollmer et al., 2013) and (Rozgic et al., 2012) fused information from audio, visual, and textual modalities to extract emotion and sentiment....
[...]

Proceedings Article•DOI•

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

[...]

AmirAli Bagher Zadeh¹, Paul Pu Liang¹, Soujanya Poria², Erik Cambria², Louis-Philippe Morency¹ - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, Nanyang Technological University²

01 Jul 2018

TL;DR: This paper introduces CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI), the largest dataset of sentiment analysis and emotion recognition to date and uses a novel multimodal fusion technique called the Dynamic Fusion Graph (DFG), which is highly interpretable and achieves competative performance when compared to the previous state of the art.

...read moreread less

Abstract: Analyzing human multimodal language is an emerging area of research in NLP Intrinsically this language is multimodal (heterogeneous), sequential and asynchronous; it consists of the language (words), visual (expressions) and acoustic (paralinguistic) modalities all in the form of asynchronous coordinated sequences From a resource perspective, there is a genuine need for large scale datasets that allow for in-depth studies of this form of language In this paper we introduce CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI), the largest dataset of sentiment analysis and emotion recognition to date Using data from CMU-MOSEI and a novel multimodal fusion technique called the Dynamic Fusion Graph (DFG), we conduct experimentation to exploit how modalities interact with each other in human multimodal language Unlike previously proposed fusion techniques, DFG is highly interpretable and achieves competative performance when compared to the previous state of the art

...read moreread less

545 citations

Additional excerpts

...The ICT-MMMO (Wöllmer et al., 2013) consists of online social review videos annotated at the video level for sentiment....
[...]
...The ICT-MMMO (Wöllmer et al., 2013) consists of online social review videos annotated at the video level for sentiment....
[...]

Proceedings Article•DOI•

Tensor Fusion Network for Multimodal Sentiment Analysis

[...]

Amir Zadeh¹, Minghai Chen¹, Soujanya Poria², Erik Cambria³, Louis-Philippe Morency¹ - Show less +1 more•Institutions (3)

Carnegie Mellon University¹, University of Stirling², Nanyang Technological University³

01 Sep 2017

TL;DR: In this article, a tensor fusion network (Tensor fusion network) is proposed to model intra-modality and inter-modal dynamics for multimodal sentiment analysis.

...read moreread less

Abstract: Multimodal sentiment analysis is an increasingly popular research area, which extends the conventional language-based definition of sentiment analysis to a multimodal setup where other relevant modalities accompany language. In this paper, we pose the problem of multimodal sentiment analysis as modeling intra-modality and inter-modality dynamics. We introduce a novel model, termed Tensor Fusion Networks, which learns both such dynamics end-to-end. The proposed approach is tailored for the volatile nature of spoken language in online videos as well as accompanying gestures and voice. In the experiments, our model outperforms state-of-the-art approaches for both multimodal and unimodal sentiment analysis.

...read moreread less

532 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68

Collapse

References

PDF

Open Access

More filters

Posted Content•

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

[...]

Peter D. Turney¹•Institutions (1)

National Research Council¹

11 Dec 2002-arXiv: Learning

TL;DR: A simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (Thumbs down) if the average semantic orientation of its phrases is positive.

...read moreread less

Abstract: This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., "subtle nuances") and a negative semantic orientation when it has bad associations (e.g., "very cavalier"). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word "excellent" minus the mutual information between the given phrase and the word "poor". A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews.

...read moreread less

4,526 citations

Proceedings Article•

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

[...]

Peter, Turney

01 Jan 2002

TL;DR: This article proposed an unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended(thumbs down) based on the average semantic orientation of phrases in the review that contain adjectives or adverbs.

...read moreread less

Abstract: This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down) The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs A phrase has a positive semantic orientation when it has good associations (eg, “subtle nuances”) and a negative semantic orientation when it has bad associations (eg, “very cavalier”) In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word “excellent” minus the mutual information between the given phrase and the word “poor” A review is classified as recommended if the average semantic orientation of its phrases is positive The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations) The accuracy ranges from 84% for automobile reviews to 66% for movie reviews

...read moreread less

3,814 citations

Correlation-based Feature Selection for Machine Learning

[...]

Mark Hall

01 Jan 1998

TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.

...read moreread less

Abstract: A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set. Feature selection degraded machine learning performance in cases where some features were eliminated which were highly predictive of very small areas of the instance space. Further experiments compared CFS with a wrapper—a well known approach to feature selection that employs the target learning algorithm to evaluate feature sets. In many cases CFS gave comparable results to the wrapper, and in general, outperformed the wrapper on small datasets. CFS executes many times faster than the wrapper, which allows it to scale to larger datasets. Two methods of extending CFS to handle feature interaction are presented and experimentally evaluated. The first considers pairs of features and the second incorporates iii feature weights calculated by the RELIEF algorithm. Experiments on artificial domains showed that both methods were able to identify interacting features. On natural domains, the pairwise method gave more reliable results than using weights provided by RELIEF.

...read moreread less

3,533 citations

Proceedings Article•DOI•

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

[...]

Bo Pang¹, Lillian Lee¹•Institutions (1)

Cornell University¹

21 Jul 2004

TL;DR: This paper proposed a machine learning method that applies text-categorization techniques to just the subjective portions of the document, extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs; this greatly facilitates incorporation of cross-sentence contextual constraints.

...read moreread less

Abstract: Sentiment analysis seeks to identify the viewpoint(s) underlying a text span; an example application is classifying a movie review as "thumbs up" or "thumbs down". To determine this sentiment polarity, we propose a novel machine-learning method that applies text-categorization techniques to just the subjective portions of the document. Extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs; this greatly facilitates incorporation of cross-sentence contextual constraints.

...read moreread less

3,459 citations

Journal Article•DOI•

A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

[...]

Zhihong Zeng¹, Maja Pantic², Glenn I. Roisman¹, Thomas S. Huang¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, Imperial College London²

01 Jan 2009-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this paper, the authors discuss human emotion perception from a psychological perspective, examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data.

...read moreread less

Abstract: Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions despite the fact that deliberate behaviour differs in visual appearance, audio profile, and timing from spontaneously occurring behaviour. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behaviour have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis including audiovisual fusion, linguistic and paralinguistic fusion, and multi-cue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next we examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.

...read moreread less

2,503 citations