Multimodal Deep Learning
Citations
3,351 citations
Cites background from "Multimodal Deep Learning"
...Hence, deep neural networks have been explored for domain adaptation (Glorot et al., 2011; Chen et al., 2012), multimodal and multi-source learning problems (Ngiam et al., 2011; Ge et al., 2013), where significant performance gains have been witnessed....
[...]
..., 2012), multimodal and multi-source learning problems (Ngiam et al., 2011; Ge et al., 2013), where significant performance gains have been obtained....
[...]
3,097 citations
2,900 citations
Cites methods from "Multimodal Deep Learning"
...Other methods tested include a heterogeneous spectral mapping approach proposed by Shi [105], a method proposed by Vinokourov [120], and a multimodal deep learning approach proposed by Ngiam [79]....
[...]
...Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY....
[...]
...The results of the experiment from best to worst performance are Zhou [144], Ngiam [79], Vinokourov [120], and Shi [105]....
[...]
2,817 citations
Cites background from "Multimodal Deep Learning"
...[268, 269] propose and evaluate an application of deep networks to learn features over audio/speech and image/video modalities....
[...]
...The results presented in [269, 268] show that learning video features with both video and audio outperforms that with only video data....
[...]
...While the deep generative architecture for multimodal learning described in [268, 269] is based on non-probabilistic autoencoder neural...
[...]
2,404 citations
Cites background from "Multimodal Deep Learning"
...Compared with bimodal AEs and bimodal DBNs [123,145], the Corr-AEs focus more on the correlation across data than the complementarity learned from different modalities....
[...]
References
31,952 citations
"Multimodal Deep Learning" refers methods in this paper
...We used an off-the-shelf object detector [12] with median filtering over time to extract the mouth regions....
[...]
16,717 citations
15,055 citations
"Multimodal Deep Learning" refers background or methods in this paper
...We first describe the restricted Boltzmann machine (RBM) [5, 6] followed by the sparsity regularization method [8]....
[...]
...While self-taught learning was first motivated with sparse coding, recent work on deep learning [5, 6, 7] have examined how deep sigmoidal networks can be trained to produce useful representations for handwritten digits and text....
[...]
6,816 citations
"Multimodal Deep Learning" refers methods in this paper
...Instead, we propose a training method inspired by denoising autoencoders [11]....
[...]
5,506 citations
"Multimodal Deep Learning" refers background in this paper
...The McGurk effect [1] refers to an audio-visual perception phenomenon where a visual /ga/ with a audio /ba/ is perceived as /da/ by most subjects....
[...]
...This was first exemplified in the McGurk effect [1] where a visual /ga/ with a voiced /ba/ is perceived as /da/ by most subjects....
[...]