Deep Multimodal Representation Learning: A Survey

doi:10.1109/ACCESS.2019.2916887

Open AccessJournal ArticleDOI

Deep Multimodal Representation Learning: A Survey

Wenzhong Guo, +2 more

- 15 May 2019 -

IEEE Access

- Vol. 7, pp 63373-63394

Chats0

TLDR

The key issues of newly developed technologies, such as encoder-decoder model, generative adversarial networks, and attention mechanism in a multimodal representation learning perspective, which, to the best of the knowledge, have never been reviewed previously are highlighted.

Abstract:

Multimodal representation learning, which aims to narrow the heterogeneity gap among different modalities, plays an indispensable role in the utilization of ubiquitous multimodal data. Due to the powerful representation ability with multiple levels of abstraction, deep learning-based multimodal representation learning has attracted much attention in recent years. In this paper, we provided a comprehensive survey on deep multimodal representation learning which has never been concentrated entirely. To facilitate the discussion on how the heterogeneity gap is narrowed, according to the underlying structures in which different modalities are integrated, we category deep multimodal representation learning methods into three frameworks: joint representation, coordinated representation, and encoder-decoder. Additionally, we review some typical models in this area ranging from conventional models to newly developed technologies. This paper highlights on the key issues of newly developed technologies, such as encoder-decoder model, generative adversarial networks, and attention mechanism in a multimodal representation learning perspective, which, to the best of our knowledge, have never been reviewed previously, even though they have become the major focuses of much contemporary research. For each framework or model, we discuss its basic structure, learning objective, application scenes, key issues, advantages, and disadvantages, such that both novel and experienced researchers can benefit from this survey. Finally, we suggest some important directions for future work.

Deep Multimodal Representation Learning: A Survey

Citations

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis

MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis

A bird's-eye view of deep learning in bioimage analysis.

CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Gradient-based learning applied to document recognition

Related Papers (5)

Long short-term memory

Deep Residual Learning for Image Recognition

Deep learning

Glove: Global Vectors for Word Representation

Visualizing Data using t-SNE