Open AccessProceedings Article
Multimodal Deep Learning
Jiquan Ngiam,Aditya Khosla,Mingyu Kim,Juhan Nam,Honglak Lee,Andrew Y. Ng +5 more
- pp 689-696
Reads0
Chats0
TLDR
This work presents a series of tasks for multimodal learning and shows how to train deep networks that learn features to address these tasks, and demonstrates cross modality feature learning, where better features for one modality can be learned if multiple modalities are present at feature learning time.Abstract:
Deep networks have been successfully applied to unsupervised feature learning for single modalities (e.g., text, images or audio). In this work, we propose a novel application of deep networks to learn features over multiple modalities. We present a series of tasks for multimodal learning and show how to train deep networks that learn features to address these tasks. In particular, we demonstrate cross modality feature learning, where better features for one modality (e.g., video) can be learned if multiple modalities (e.g., audio and video) are present at feature learning time. Furthermore, we show how to learn a shared representation between modalities and evaluate it on a unique task, where the classifier is trained with audio-only data but tested with video-only data and vice-versa. Our models are validated on the CUAVE and AVLetters datasets on audio-visual speech classification, demonstrating best published visual speech classification on AVLetters and effective shared representation learning.read more
Citations
More filters
Proceedings ArticleDOI
Enabling Edge Devices that Learn from Each Other: Cross Modal Training for Activity Recognition
TL;DR: RecycleML uses cross modal transfer to accelerate the learning of edge devices across different sensing modalities and reduces the amount of required labeled data by at least 90% and speeds up the training process by up to 50 times in comparison to training the edge device from scratch.
Journal ArticleDOI
Multifactorial deep learning reveals pan-cancer genomic tumor clusters with distinct immunogenomic landscape and response to immunotherapy
Feng Xie,Jianjun Zhang,Jiayin Wang,Alexandre Reuben,Xu Wei,Xin Yi,Frederick S. Varn,Yongsheng Ye,Junwen Cheng,Miao Yu,Yue Wang,Yufeng Liu,Mingchao Xie,Peng Du,Ke Ma,Xin Ma,Penghui Zhou,Shengli Yang,Yaobing Chen,Guoping Wang,Xuefeng Xia,Zhongxing Liao,John V. Heymach,Ignacio I. Wistuba,P. Andrew Futreal,Kai Ye,Chao Cheng,Chao Cheng,Tian Xia +28 more
TL;DR: This study provides a proof for principle that deep learning modeling may have the potential to discover intrinsic statistical cross-modality correlations of multifactorial input data to dissect the molecular mechanisms underlying primary resistance to immunotherapy, which likely involves multiple factors from both the tumor and host at different molecular levels.
Journal ArticleDOI
A Joint Deep Boltzmann Machine (jDBM) Model for Person Identification Using Mobile Phone Data
TL;DR: The experimental results show that the joint representations obtained from the proposed jDBM model are robust to noise and missing information and a higher accuracy can be achieved compared to the greedy layer-wise initialization.
Proceedings ArticleDOI
Multi-modal Sensor Registration for Vehicle Perception via Deep Neural Networks
TL;DR: A deep learning method is developed that takes multiple channels of heterogeneous data, to detect the misalignment of the LiDAR-video inputs and is tested on the Ford LiDar-video driving test data set and will be discussed.
Proceedings ArticleDOI
A computationally efficient multi-modal classification approach of disaster-related Twitter images
TL;DR: A novel multi-modal two-stage framework relies on computationally inexpensive visual and semantic features to analyze Twitter data and seems to motivate an updated folk statement "an ANNOTATED image is worth a thousand words".
References
More filters
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI
Reducing the Dimensionality of Data with Neural Networks
TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
Journal ArticleDOI
A fast learning algorithm for deep belief nets
TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Proceedings ArticleDOI
Extracting and composing robust features with denoising autoencoders
TL;DR: This work introduces and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.
Journal ArticleDOI
Hearing lips and seeing voices
Harry McGurk,John Macdonald +1 more
TL;DR: The study reported here demonstrates a previously unrecognised influence of vision upon speech perception, on being shown a film of a young woman's talking head in which repeated utterances of the syllable [ba] had been dubbed on to lip movements for [ga].