Hybrid Deep Learning for Face Verification
read more
Citations
A Discriminative Feature Learning Approach for Deep Face Recognition
Deep Learning Face Representation from Predicting 10,000 Classes
Deep Learning Face Representation by Joint Identification-Verification
Convolutional Neural Network Architectures for Matching Natural Language Sentences
Deeply learned face representations are sparse, selective, and robust
References
ImageNet Classification with Deep Convolutional Neural Networks
Distinctive Image Features from Scale-Invariant Keypoints
Gradient-based learning applied to document recognition
A fast learning algorithm for deep belief nets
Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
Related Papers (5)
Frequently Asked Questions (14)
Q2. What is the effect of sharing weights in higher layers?
Since faces are structured objects, locally sharing weights in higher layers allows the network to learn different high-level features at different locations.
Q3. What is the effect of local sharing weights in the same map?
weights of neurons (including convolution kernels and biases) in the same map in higher convolutional layers are locally shared.
Q4. What is the way to extract face similarities?
(2) Considering the regular structures of faces, the deep ConvNets in their model locally share weights in higher convolutional layers, such that different mid- or high-level features are extracted from different face regions, which is contrary to conventional ConvNet structures [18], and can greatly improve their fitting and generalization capabilities.
Q5. How do the authors train the Classification RBM?
The authors discriminatively train the Classification RBM by minimizing the negative log probability of the target class t given input x; that is, minimizing − log p(yt | x).
Q6. What is the main reason why face recognition models are shallow?
Many face recognition models are shallow structures, and need high-dimensional over-completed feature representations to learn the complex mappings from pairs of noisy features to face similarities [12, 7, 25]; otherwise, the models may suffer from inferior performance.
Q7. What is the probability distribution of the two outputs of the ConvNet?
Since the two outputs of the ConvNet represent a probability distribution (summed to 1), when one output is known, the other output contains no additional information.
Q8. What are the convolutional layers and their pooling regions?
The 3D convolution kernel sizes of the convolutional layers and the pooling region sizes of the max-pooling layers are shown as the small cuboids and squares inside the large cuboids of maps respectively.
Q9. What is the gradient of the loss w.r.t. nm?
The gradient of the loss w.r.t. αnm is∂L∂αnm =∂L∂xn ∂xn ∂αnm = 1 MK ∂L ∂xn K∑ k=1 ∂Cnm(I n k ) ∂αnm . (4)∂L ∂xn can be calculated by the closed form expression of p(yt | x) (Eq. (2)), and ∂C n m(I n k )∂αnm can be calculated usingthe back-propagation algorithm in the ConvNet.
Q10. What is the probability distribution of the nth ConvNet?
Then the n-th ConvNet group prediction can be expressed asxn = 1M M∑ m=1 1 K K∑ k=1 Cnm(I n k ) , (3)where the inner and outer sums are over different input modes (level 1 pooling) and different ConvNets (level 2 pooling), respectively.
Q11. What is the way to improve performance of a convolutional neural network?
Averaging the results of multiple ConvNets has been shown to be an effective way of improving performance [9, 15], while the authors will show that their hybrid structure is significantly better than the simple averaging scheme.
Q12. What are the main factors to consider when evaluating the accuracy of the three methods?
Although Tom-vs-Pete [3], high-dim LBP [7], and Fisher vector faces [25] have better accuracy than their method, there are two important factors to be considered.
Q13. How many people are used to train the deep ConvNets?
For both settings, the authors randomly choose 80% people from the training data to train the deep ConvNets, and use the remaining 20% people to train the top-layer RBM and fine-tune the entire model.
Q14. What is the probability of the class m-th?
Let N and M be the number of groups and the number of ConvNets in each group, respectively, and Cnm(·) be the input-output mapping for the m-th ConvNet in the n-th group.