Weakly Supervised Cascaded Convolutional Networks
read more
Citations
Deep Learning for Generic Object Detection: A Survey
Object Detection in 20 Years: A Survey
A Survey of Deep Learning-Based Object Detection
Few-Shot Object Detection via Feature Reweighting
Attention-Based Dropout Layer for Weakly Supervised Object Localization
References
ImageNet Classification with Deep Convolutional Neural Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
ImageNet: A large-scale hierarchical image database
SSD: Single Shot MultiBox Detector
Fast R-CNN
Related Papers (5)
Frequently Asked Questions (13)
Q2. What is the loss function for the class activation map?
Since multiple categories can exist in a single image [22], the authors use an independent loss function for each class in this branch of the CNN architecture, so the loss function is the sum of C binary logistic regression loss functions.
Q3. How are the methods used to train object detection systems?
To improve the detection performance, object proposal generation, feature extraction, and MIL are trained in a cascaded manner, in an end-to-end way.
Q4. What is the common way of weakly supervised learning methods?
The most common way of weakly supervised learning methods often work by selecting the candidate positive object instances in the positive bags, and then learning a model of the object appearance using appearance model.
Q5. What is the way to train the multiple instance learning loss?
Using the the selected candidate bounding boxes from previous stage, it trains the multiple instance learning loss to select the best sample for each object presented in an image.
Q6. What are the main components of the proposed CNN?
CNN architectures:1. Loc Net: Inspired by [36], the authors removed fullyconnected layers from each of Alexnet or VGG-16 and replaced them by two convolutional layers and one global pooling layer.
Q7. What are the main datasets used for the proposed methods?
The experiments for their proposed methods are extensively done on the PASCAL VOC 2007, 2010, 2012 datasets and also ILSVRC 2013, 2014 which are large scale datasets for objects.
Q8. What is the main reason why Wang et al. use probabilistic latent semantic analysis?
Wang et al. [35] employ probabilistic latent semantic analysis on the windows of positive samples to select the most discriminative clusters that represents the object category.
Q9. What is the importance of getting better object proposals?
Given the importance of getting better object proposals the authors added a middle stage to the previous architecture in their three stage network.
Q10. What is the total loss function of the cascaded network?
The total loss function of the cas-caded network is:LTotal = LGAP (y, I) + λLMIL(y,x, I) (2)where λ is the hyper-parameter balancing two loss functions.
Q11. What is the first stage of the proposed architecture?
The first stage extracts class specific object proposals using a fully convolutional network followed by aglobal average (max) pooling layer.
Q12. How does the performance of the proposed cascaded network differ from other approaches?
For an instance of using the segmentation stage by Alexnet architecture, cascaded network improves almost 2.5% on detection and 2% on classification in PASCAL VOC 2007.
Q13. How did the authors improve the performance of the CNN?
In [22], the same authors improved the performance further by incorporating both localization and classification on a new CNN architecture.