Extracting and composing robust features with denoising autoencoders
read more
Citations
Deep learning
Generative Adversarial Nets
Deep Learning
Dropout: a simple way to prevent neural networks from overfitting
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
References
Learning representations by back-propagating errors
Reducing the Dimensionality of Data with Neural Networks
Neural networks and physical systems with emergent collective computational abilities
Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations
A fast learning algorithm for deep belief nets
Related Papers (5)
Frequently Asked Questions (11)
Q2. What are the future works mentioned in the paper "Extracting and composing robust features with denoising autoencoders" ?
Future work inspired by this observation should investigate other types of corruption process, not only of the input but of the representation itself as well.
Q3. What is the training procedure for denoising?
Their training procedure for the denoising autoencoder involves learning to recover a clean input from a corrupted version, a task known as denoising.
Q4. What is the key ingredient to this success?
One key ingredient to this success appears to be the use of an unsupervised training criterion to perform a layer-by-layer initialization: each layer is at first trained to produce a higher level (hidden) representation of the observed patterns, based on the representation it receives as input from the layer below, by optimizing a local unsupervised criterion.
Q5. What is the main reason why deep architectures are needed?
Recent theoretical studies indicate that deep architectures (Bengio & Le Cun, 2007; Bengio, 2007) may be needed to efficiently model complex distributions and achieve better generalization performance on challenging recognition tasks.
Q6. What is the way to train a model of a joint?
Let us augment the set of modeled random variables to include the corrupted example X̃ in addition to the corresponding uncorrupted example X, and let us perform maximum likelihood training on a model of their joint.
Q7. What is the deterministic mapping of the input vector x?
An autoencoder takes an input vector x ∈ [0, 1]d, and first maps it to a hidden representation y ∈ [0, 1]d′ through a deterministic mapping y = fθ(x) = s(Wx + b), parameterized by θ = {W,b}.
Q8. What is the key to learning in deep architectures?
While unsupervised learning of a mapping that produces “good” intermediate representations of the input pattern seems to be key, little is understood regarding what constitutes “good” representations for initializing deep architectures, or what explicit criteria may guide learning such representations.
Q9. What is the classification algorithm for a svm?
As can be seen in the table, the corruption+denoising training works remarkably well as an initialization step, and in most cases yields significantly better classification performance than basic autoencoder stacking with no noise.
Q10. What is the alternative loss suggested by the interpretation of x and z as bit vectors?
An alternative loss, suggested by the interpretation of x and z as either bit vectors or vectors of bit probabilities (Bernoullis) is the reconstruction crossentropy:LIH(x, z)= IH(Bx‖Bz)= − d∑k=1[xk log zk+(1− xk) log(1− zk)]
Q11. What is the way to optimize for the lower bound?
Optimizing for the lower bound leads to:max θ,θ′ E q(X,Y )[logBgθ′ (Y )(X)]=max θ,θ′ E q(X, eX)[logBgθ′ (fθ( eX))(X)] =minθ,θ′ E q(X, eX)[LIH(X, gθ′(fθ(X̃)))]where in the second line the authors use the fact that Y = fθ(X̃) deterministically.