A survey of multimodal sentiment analysis

doi:10.1016/J.IMAVIS.2017.08.003

Home
/
Papers
/
A survey of multimodal sentiment analysis

Journal Article•DOI•

A survey of multimodal sentiment analysis

Mohammad Soleymani¹, David Garcia², Brendan Jou³, Björn Schuller⁴, Björn Schuller¹, Björn Schuller⁵, Shih-Fu Chang³, Maja Pantic⁶, Maja Pantic⁴ - Show less +5 more•Institutions (6)

University of Geneva¹, ETH Zurich², Columbia University³, Imperial College London⁴, University of Passau⁵, University of Twente⁶

01 Sep 2017-Image and Vision Computing (Elsevier)-Vol. 65, pp 3-14

TL;DR: The thesis is that multimodal sentiment analysis holds a significant untapped potential with the arrival of complementary data streams for improving and going beyond text-based sentiment analysis.

read less

About: This article is published in Image and Vision Computing.The article was published on 2017-09-01. It has received 357 citations till now. The article focuses on the topics: Sentiment analysis & Population.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Sentiment analysis using deep learning architectures: a review

[...]

Ashima Yadav¹, Dinesh Kumar Vishwakarma¹•Institutions (1)

Delhi Technological University¹

01 Aug 2020-Artificial Intelligence Review

TL;DR: This paper provides a detailed survey of popular deep learning models that are increasingly applied in sentiment analysis and presents a taxonomy of sentiment analysis, which highlights the power of deep learning architectures for solving sentiment analysis problems.

...read moreread less

Abstract: Social media is a powerful source of communication among people to share their sentiments in the form of opinions and views about any topic or article, which results in an enormous amount of unstructured information. Business organizations need to process and study these sentiments to investigate data and to gain business insights. Hence, to analyze these sentiments, various machine learning, and natural language processing-based approaches have been used in the past. However, deep learning-based methods are becoming very popular due to their high performance in recent times. This paper provides a detailed survey of popular deep learning models that are increasingly applied in sentiment analysis. We present a taxonomy of sentiment analysis and discuss the implications of popular deep learning architectures. The key contributions of various researchers are highlighted with the prime focus on deep learning approaches. The crucial sentiment analysis tasks are presented, and multiple languages are identified on which sentiment analysis is done. The survey also summarizes the popular datasets, key features of the datasets, deep learning model applied on them, accuracy obtained from them, and the comparison of various deep learning models. The primary purpose of this survey is to highlight the power of deep learning architectures for solving sentiment analysis problems.

...read moreread less

385 citations

Cites background or methods from "A survey of multimodal sentiment an..."

...Yanagimoto et al. (2013) developed a neural network architecture, which is composed of four layers of RBM sharing hidden layer units and visible units for estimating the similarity between the articles....
[...]
...Formore information onmultimodal sentiment analysis, the following popularworks can be referred (Soleymani et al. 2017; Chen et al. 2018; Poria et al. 2016b; Shah et al. 2016; Zhang et al. 2018a; Agarwal et al. 2019)....
[...]
...The core sentiment analysis tasks include document-level sentiment classification, sentence-level sentiment classification, and aspect-level sentiment classification, and sub-tasks include multi-domain sentiment classification and multimodal sentiment classification (Soleymani et al. 2017)....
[...]

Journal Article•DOI•

Deep Facial Expression Recognition: A Survey

[...]

01 Jul 2022-IEEE Transactions on Affective Computing

TL;DR: A comprehensive review of deep facial expression recognition (FER) including datasets and algorithms that provide insights into these intrinsic problems can be found in this article , where the authors introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets.

...read moreread less

Abstract: With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose, and identity bias. In this survey, we provide a comprehensive review of deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. We then describe the standard pipeline of a deep FER system with the related background knowledge and suggestions for applicable implementations for each stage. For the state-of-the-art in deep FER, we introduce existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences and discuss their advantages and limitations. Competitive performances and experimental comparisons on widely used benchmarks are also summarized. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.

...read moreread less

209 citations

Journal Article•DOI•

A survey on sentiment analysis methods, applications, and challenges

[...]

Mayur Wankhade, Annavarapu Chandra Sekhara Rao, Chaitanya Kulkarni

01 Oct 2022-Artificial Intelligence Review

TL;DR: Sentiment analysis is the process of gathering and analyzing people's opinions, thoughts, and impressions regarding various topics, products, subjects, and services as mentioned in this paper , which can be beneficial to corporations, governments and individuals for collecting information and making decisions based on opinion.

...read moreread less

Abstract: The rapid growth of Internet-based applications, such as social media platforms and blogs, has resulted in comments and reviews concerning day-to-day activities. Sentiment analysis is the process of gathering and analyzing people’s opinions, thoughts, and impressions regarding various topics, products, subjects, and services. People’s opinions can be beneficial to corporations, governments, and individuals for collecting information and making decisions based on opinion. However, the sentiment analysis and evaluation procedure face numerous challenges. These challenges create impediments to accurately interpreting sentiments and determining the appropriate sentiment polarity. Sentiment analysis identifies and extracts subjective information from the text using natural language processing and text mining. This article discusses a complete overview of the method for completing this task as well as the applications of sentiment analysis. Then, it evaluates, compares, and investigates the approaches used to gain a comprehensive understanding of their advantages and disadvantages. Finally, the challenges of sentiment analysis are examined in order to define future directions.

...read moreread less

138 citations

Journal Article•DOI•

Digital Emotion Contagion.

[...]

Amit Goldenberg¹, James J. Gross²•Institutions (2)

Harvard University¹, Stanford University²

01 Apr 2020-Trends in Cognitive Sciences

TL;DR: It is suggested that one unique feature of digital emotion contagion is that it is mediated by digital media platforms that are motivated to upregulate user emotions.

...read moreread less

105 citations

Cites background from "A survey of multimodal sentiment an..."

...Digital media activity allows researchers to detect user emotions from different types of signals [108]....
[...]

Journal Article•DOI•

Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data

[...]

Akshi Kumar¹, Kathiravan Srinivasan², Wen-Huang Cheng³, Albert Y. Zomaya⁴•Institutions (4)

Delhi Technological University¹, VIT University², National Chiao Tung University³, University of Sydney⁴

01 Jan 2020-Information Processing and Management

TL;DR: A hybrid deep learning model for fine-grained sentiment prediction in real-time multimodal data that reinforces the strengths of deep learning nets in combination to machine learning to deal with two specific semiotic systems, namely the textual and visual systems.

...read moreread less

Abstract: Detecting sentiments in natural language is tricky even for humans, making its automated detection more complicated. This research proffers a hybrid deep learning model for fine-grained sentiment prediction in real-time multimodal data. It reinforces the strengths of deep learning nets in combination to machine learning to deal with two specific semiotic systems, namely the textual (written text) and visual (still images) and their combination within the online content using decision level multimodal fusion. The proposed contextual ConvNet-SVMBoVW model, has four modules, namely, the discretization, text analytics, image analytics, and decision module. The input to the model is multimodal text, m e {text, image, info-graphic}. The discretization module uses Google Lens to separate the text from the image, which is then processed as discrete entities and sent to the respective text analytics and image analytics modules. Text analytics module determines the sentiment using a hybrid of a convolution neural network (ConvNet) enriched with the contextual semantics of SentiCircle. An aggregation scheme is introduced to compute the hybrid polarity. A support vector machine (SVM) classifier trained using bag-of-visual-words (BoVW) for predicting the visual content sentiment. A Boolean decision module with a logical OR operation is augmented to the architecture which validates and categorizes the output on the basis of five fine-grained sentiment categories (truth values), namely ‘highly positive,’ ‘positive,’ ‘neutral,’ ‘negative’ and ‘highly negative.’ The accuracy achieved by the proposed model is nearly 91% which is an improvement over the accuracy obtained by the text and image modules individually.

...read moreread less

96 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

Journal Article•DOI•

Gradient-based learning applied to document recognition

[...]

Yann LeCun¹, Léon Bottou², Léon Bottou³, Yoshua Bengio³, Yoshua Bengio⁴, Yoshua Bengio⁵, Patrick Haffner³ - Show less +3 more•Institutions (5)

Bell Labs¹, École Normale Supérieure², AT&T³, Alcatel-Lucent⁴, École Polytechnique de Montréal⁵

01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

...read moreread less

42,067 citations

Journal Article•DOI•

ImageNet Large Scale Visual Recognition Challenge

[...]

Olga Russakovsky¹, Jia Deng², Hao Su¹, Jonathan Krause¹, Sanjeev Satheesh¹, Sean Ma¹, Zhiheng Huang¹, Andrej Karpathy¹, Aditya Khosla³, Michael S. Bernstein¹, Alexander C. Berg⁴, Li Fei-Fei¹ - Show less +8 more•Institutions (4)

Stanford University¹, University of Michigan², Massachusetts Institute of Technology³, University of North Carolina at Chapel Hill⁴

01 Dec 2015-International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

...read moreread less

30,811 citations

Proceedings Article•

Distributed Representations of Words and Phrases and their Compositionality

[...]

Tomas Mikolov¹, Ilya Sutskever¹, Kai Chen¹, Greg S. Corrado¹, Jeffrey Dean¹ - Show less +1 more•Institutions (1)

Google¹

05 Dec 2013

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

...read moreread less

24,012 citations

"A survey of multimodal sentiment an..." refers background or methods in this paper

...To extract text features, they used text2vec [19], which in addition to part-of-speech features were used to train a deep CNN....
[...]
...Product review Lexicon-based dictionaries; bag-of-words; word embeddings [19] in combination with classifiers such as...
[...]