Traffic sign recognition with multi-scale Convolutional Networks
read more
Citations
Deep learning in neural networks
Multi-column deep neural networks for image classification
Deep Learning Face Representation from Predicting 10,000 Classes
Building high-level features using large scale unsupervised learning
Robust Physical-World Attacks on Deep Learning Visual Classification
References
Gradient-based learning applied to document recognition
What is the best multi-stage architecture for object recognition?
The German Traffic Sign Recognition Benchmark: A multi-class classification competition
Why is Real-World Visual Object Recognition Hard?
Road traffic sign detection and classification
Related Papers (5)
Frequently Asked Questions (9)
Q2. What are the future works in "Traffic sign recognition with multi-scale convolutional networks" ?
Future work should investigate the impact of unsupervised pre-training of feature extracting stages, particularly with a higher number of features at each stage, which can be more easily learned than with a purely supervised fashion. Finally, ensemble processing with multiple networks might further enhance accuracy. Taking votes from colored and non-colored networks can probably alleviate both situations where color may be used or not. By visualizing remaining errors, the authors also suspect that normalized color channels may be more informative than raw color.
Q3. What are some other features that can be added to the architecture?
Other realistic perturbations would probably also increase robustness such as other affine transformations, brightness, contrast and blur.[15] showed architecture choice is crucial in a number of state-of-the-art methods including ConvNets.
Q4. Why is the classifier used in this article?
The motivation for combining representation from multiple stages in the classifier is to provide different scales of receptive fields to the classifier.
Q5. How many samples are generated in the GTSRB dataset?
1) Validation: Traffic sign examples in the GTSRB dataset were extracted from 1-second video sequences, i.e. each real-world instance yields 30 samples with usually increasing resolution as the camera is approaching the sign.
Q6. How many samples are added to the dataset?
the authors build a jittered dataset by adding 5 transformed versions of the original training set, yielding 126,750 samples in total.
Q7. What is the way to learn to deformations in a dataset?
When a dataset does not naturally contain those deformations, adding them synthetically will yield more robust learning to potential deformations in the test set.
Q8. How does the architecture differ from traditional ConvNets?
The architecture used in the present work departs from traditional ConvNets by the type of non-linearities used, by the use of connections that skip layers, and by the use of pooling layers with different subsampling ratios for the connections that skip layers and for those that do not.
Q9. What is the ConvNet with random features?
The authors also evaluate the best ConvNet with random features in section III-B (108-200 random features by training the 2-layer classifier with 100 hidden units only) and obtain 97.33% accuracy on the test set (see convolutional filters in Fig 4).