Modeling Natural Images Using Gated MRFs
read more
Citations
Deep learning
Deep Learning
Context Encoders: Feature Learning by Inpainting
Context Encoders: Feature Learning by Inpainting
Deep generative image models using a Laplacian pyramid of adversarial networks
References
ImageNet: A large-scale hierarchical image database
Distinctive Image Features from Scale-Invariant Keypoints
Gradient-based learning applied to document recognition
Histograms of oriented gradients for human detection
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images
Related Papers (5)
Frequently Asked Questions (17)
Q2. What are the future works in "Modeling natural images using gated mrfs" ?
Finally, a very promising research avenue is to extend the model to video sequences in which the temporal regularities created by smoothly changing viewing transformations should make it far easier to learn to model depth, three-dimensional transformations and occlusion [ 71 ]. Feedforward inference in their hierarchical generative model can be viewed as a type of variational approximation that is only exactly correct for the top layer, but the inference for the lower layers is a very good approximation because of the way they are learned [ 20 ].
Q3. What is the common task to quantitatively validate a generative model of natural images?
The most commonly used task to quantitatively validate a generative model of natural images is image denoising, assuming homogeneous additive Gaussian noise of known variance [10], [11], [37], [58], [12].
Q4. How many weight updates are made at the topmost layer?
All layers are trained by using FPCD but, as training proceeds, the number of Markov chain steps between weight updates is increased from 1 to 100 at the topmost layer in order to obtain a better approximation to the maximum likelihood gradient.
Q5. What is the correct sampling procedure for the deep model?
The correct sampling procedure [20] consists of generating a sample from the topmost RBM, followed by back-projection to image space through the chain of conditional distributions for each layer given the layer above.
Q6. What was the method used to train a linear multi-class logistic regression classifier?
The discriminative training consisted of training a linear multi-class logistic regression classifier on the top level representation without using back-propagation to jointly optimize the parameters across all layers.
Q7. How many times overcomplete will the representation be?
If k different local filters are replicated over all possible integer positionsin the image, the representation will be about k times overcomplete8.
Q8. Why did the authors train a deep model without weight-sharing?
Since the input images have fairly low resolution and the statistics across the images are strongly non-stationary (because the faces have been aligned), the authors trained a deep model without weight-sharing.
Q9. What is the probability of the system returning to the initial state?
If the sum of the kinetic and potential energy rises by ∆ due to inaccurate simulation of the dynamics, the system is returned to the initial state with probability 1 − exp(−∆).
Q10. How do the authors draw an unbiased sample from the deep model?
In order to draw an unbiased sample from the deep model, the authors then map the second layer sample produced in this way through the conditional distributions p(hm|h2) and p(hp|h2) to sample the mean and precision latent variables.
Q11. What is the update rule for gradient ascent in the likelihood?
The update rule for gradient ascent in the likelihood is: θ ← θ + η ( < ∂F ∂θ >model − < ∂F ∂θ >data ) (13)where <> denotes expectation over samples from the model or the training data.
Q12. What is the method for filling in missing pixels?
The latent representation in the higher layers is able to capture longer range structure and it does a better job at filling-in the missing pixels15.
Q13. What is the generative model used to fill in the missing pixels?
In order to fill-in the authors initialize the missing pixels at zero and propagate the occluded image through the four layers using the sequence of posterior expectations.
Q14. How can a cRBM model a smooth image?
A cRBM could model this data by using two Gaussians (second row and first column): one that is spherical and tight at the origin for smooth images and another one that has a covariance elongated along the anti-diagonal for structured images.
Q15. What is the variational bound on the likelihood that is improved as each layer is added?
however, the variational bound on the likelihood that is improved as each layer is added assumes this form of incorrect inference sothe learning ensures that it works well.
Q16. What is the way to model overlapping images?
This will be particularly relevant when the authors introduce a convolutional extension of the model to represent spatially stationary high-resolution images (as opposed to small image patches), since it will not be possible to independently normalize overlapping image patches.
Q17. What is the difference between the number of latent variables and the number of parameters subject to learning?
Since the number of latent variables scales as the number of input variables, the number of parameters subject to learning scales quadratically with the size of the input making learning infeasibly slow.