Representation Learning: A Review and New Perspectives
Citations
[...]
46,982 citations
[...]
38,208 citations
21,729 citations
20,769 citations
Cites background from "Representation Learning: A Review a..."
...y the variational bound, a noisy data reconstruction term, exposing a novel connection between auto-encoders and stochastic variational inference. In contrast to a typical objective for auto-encoders [BCV13], all parameters updates, including those of the noise distribution, correspond to optimization of the variational lower bound on the marginal likelihood. From the learned generative model it is strai...
[...]
...der the autoencoding model [VLL+10], i.e. the negative reconstrution error. However, it is well known that this reconstruction criterion is in itself not sufficient for learning useful representations [BCV13]. Regularization techniques have been proposed to make autoencoders learn useful representations, such as denoising, contractive and sparse autoencoder variants [BCV13]. Related are also encoder-decod...
[...]
14,635 citations
Cites background from "Representation Learning: A Review a..."
...visedAutomated Mathematician / EURISKO (Lenat, 1983; Lenat and Brown, 1984) continually learns concepts by combining previously learnt concepts. Such hierarchical representation learning (Ring, 1994; Bengio et al., 2013; Deng and Yu, 2014) is also a recurring theme of DL NNs for SL (Sec. 5), UL-aided SL (Sec. 5.8, 5.10, 5.15), and hierarchical RL (Sec. 6.5). Often, abstract hierarchical representations are natural b...
[...]
...ce encoder for RL (Gisslen et al., 2011). 6.5 Deep Hierarchical RL (HRL) and Subgoal Learning with FNNs and RNNs Multiple learnable levels of abstraction (Fu, 1977; Lenat and Brown, 1984; Ring, 1994; Bengio et al., 2013; Deng and Yu, 2014) seem as important for RL as for SL. Work on NN-based Hierarchical RL (HRL) has been published since the early 1990s. In particular, gradient-based subgoal discovery with FNNs or R...
[...]
...Deep hierarchical RL (HRL) and subgoal learning with FNNs and RNNs Multiple learnable levels of abstraction (Bengio et al., 2013; Deng & Yu, 2014; Fu, 1977; Lenat & Brown, 1984; Ring, 1994) seem as important for RL as for SL. Work on NN-based Hierarchical RL (HRL) has been published since the early…...
[...]
...Ideally, given an ensemble of input patterns, redundancy reduction through a deep NN will create a factorial code (a code with statistically independent components) of the ensemble (Barlow, 1989; Barlow et al., 1989), to disentangle the unknown factors of variation (compare Bengio et al., 2013)....
[...]
...rougha deepNN will create a factorial code (a codewith statistically independentcomponents) of the ensemble (Barlow et al., 1989; Barlow, 1989), to disentangle the unknown factors of variation (e.g., Bengio et al., 2013). Such codes may be sparse and can be advantageous for (1) data compression, (2) speeding up subsequent BP (Becker, 1991), (3) trivialising the task of subsequent naive yet optimal Bayes classifiers (S...
[...]
References
73,978 citations
42,067 citations
30,124 citations
"Representation Learning: A Review a..." refers background or methods in this paper
...…a non-parametric approach, based on a training set nearest neighbor graph (Schölkopf et al., 1998; Roweis and Saul, 2000; Tenenbaum et al., 2000; Brand, 2003; Belkin and Niyogi, 2003; Donoho and Grimes, 2003; Weinberger and Saul, 2004; Hinton and Roweis, 2003; van der Maaten and Hinton, 2008)....
[...]
...Corruptions considered in Vincent et al. (2010) include additive isotropic Gaussian noise, salt and pepper noise for gray-scale images, and masking noise (salt or pepper only)....
[...]
...17 low-dimensional embedding coordinate “parameters” for each training point, these coordinates are obtained through an explicitly parametrized function, as with the parametric variant (van der Maaten, 2009) of t-SNE (van der Maaten and Hinton, 2008)....
[...]
...…controlled number of free parameters in such parametric methods, compared to their pure non-parametric counterparts, forces models to generalize the manifold shape non-locally (Bengio et al., 2006b), which can translate into better features and final performance (van der Maaten and Hinton, 2008)....
[...]
16,989 citations
"Representation Learning: A Review a..." refers methods in this paper
...Ngiam et al. (2011) have used Hybrid Monte Carlo (Neal, 1993), but other options include contrastive divergence (Hinton, 1999; Hinton et al....
[...]