Joint Deep Learning for Pedestrian Detection
read more
Citations
ImageNet Large Scale Visual Recognition Challenge
Image Super-Resolution Using Deep Convolutional Networks
DeepReID: Deep Filter Pairing Neural Network for Person Re-identification
Deep Learning for Generic Object Detection: A Survey
Image Super-Resolution Using Deep Convolutional Networks
References
ImageNet Classification with Deep Convolutional Neural Networks
Distinctive Image Features from Scale-Invariant Keypoints
Gradient-based learning applied to document recognition
Histograms of oriented gradients for human detection
Reducing the Dimensionality of Data with Neural Networks
Related Papers (5)
Frequently Asked Questions (13)
Q2. What are the future works mentioned in the paper "Joint deep learning for pedestrian detection" ?
The authors expect even larger improvement by training their UDN on much larger-scale training sets in the future work. This framework also has the potential for general object detection.
Q3. What are the commonly used classification approaches?
The widely used classification approaches include various boosting classifiers [9, 11, 53], linear SVM [5], histogram intersection kernel SVM [31], latent SVM [17], multiple kernel SVM [48], structural SVM [58], and probabilistic models [2, 32].
Q4. How is the algorithm used to prune candidate detection windows?
In order to save computation, a detector using HOG+CSS and Linear SVM is utilized for pruning candidate detection windows at both training and testing stages.
Q5. How do the authors enrich the operation in deep models?
2. The authors enrich the operation in deep models by incorporating the deformation layer into the convolutional neural networks (CNN) [26].
Q6. How are the features learned in deep models?
(3) By fixing HOG features and deformable models, occlusion handling models are learned in [34, 36], using the part-detection scores as input.
Q7. How many negative samples are used in the Caltech-Train dataset?
At the training stage, there are approximately 60,000 negative samples and 4,000 positive samples from the Caltech-Train dataset.
Q8. What are the performing approaches on the Caltech-Test?
The current best performing approaches on the Caltech-Test are the MultiResC [37] and the contextual boost [8], both of which have an average miss rate of 48%.
Q9. How does the proposed deep model perform on public datasets?
Through interaction among these interdependent components,joint learning achieves the best performance on publicly available datasets, outperforming the existing best performing approaches by 9% on the largest Caltech dataset.
Q10. How is the log-average miss rate calculated?
As in [12], the log-average miss rate is used to summarize the detector performance, and is computed by averaging the miss rate at nine FPPI rates that are evenly spaced in the log-space in the range from 10−2 to 100.
Q11. What is the funding source for this work?
Acknowledgment: This work is supported by the General Research Fund sponsored by the Research Grants Council of Hong Kong (Project No. CUHK 417110, CUHK 417011, CUHK 429412) and National Natural Science Foundation of China (Project No. 61005057).
Q12. What is the main purpose of this paper?
This paper proposes a unified deep model that jointly learns four components – feature extraction, deformation handling, occlusion handling and classification – for pedestrian detection.
Q13. What is the performing CNN on Caltech-Test?
A two-layer CNN (CNN-2layer in Fig. 7(a)) is constructed by convolving the extracted feature maps with another convolutional layer and another pooling layer.