Proceedings ArticleDOI
BoxCars: 3D Boxes as CNN Input for Improved Fine-Grained Vehicle Recognition
Jakub Sochor,Adam Herout,Jirí Havel +2 more
- pp 3006-3015
Reads0
Chats0
TLDR
This work is showing that extracting additional data from the video stream and feeding it into the deep convolutional neural network boosts the recognition performance considerably, and can considerably improve the performance of traffic surveillance systems.Abstract:
We are dealing with the problem of fine-grained vehicle make&model recognition and verification. Our contribution is showing that extracting additional data from the video stream – besides the vehicle image itself – and feeding it into the deep convolutional neural network boosts the recognition performance considerably. This additional information includes: 3D vehicle bounding box used for "unpacking" the vehicle image, its rasterized low-resolution shape, and information about the 3D vehicle orientation. Experiments show that adding such information decreases classification error by 26% (the accuracy is improved from 0.772 to 0.832) and boosts verification average precision by 208% (0.378 to 0.785) compared to baseline pure CNN without any input modifications. Also, the pure baseline CNN outperforms the recent state of the art solution by 0.081. We provide an annotated set "BoxCars" of surveillance vehicle images augmented by various automatically extracted auxiliary information. Our approach and the dataset can considerably improve the performance of traffic surveillance systems.read more
Citations
More filters
Proceedings ArticleDOI
Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition
TL;DR: The authors proposed a bank of convolutional filters to capture class-specific discriminative patches without extra part or bounding box annotations, which achieves state-of-the-art performance on three publicly available fine-grained recognition datasets (CUB-200-2011, Stanford Cars and FGVC-Aircraft).
Proceedings ArticleDOI
Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification
Zhongdao Wang,Luming Tang,Xihui Liu,Zhuliang Yao,Shuai Yi,Jing Shao,Junjie Yan,Shengjin Wang,Hongsheng Li,Xiaogang Wang +9 more
TL;DR: Both the orientation invariant feature embedding and the spatio-temporal regularization achieve considerable improvements in the vehicle Re-identification problem.
Proceedings ArticleDOI
Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-Temporal Path Proposals
TL;DR: In this article, a Siamese-CNN+Path-LSTM model was proposed to incorporate complex spatio-temporal information for regularizing the re-ID results.
Book ChapterDOI
Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition
TL;DR: A cross-layer bilinear pooling approach is proposed to capture the inter-layer part feature relations, which results in superior performance compared with other bilinears pooling based approaches.
Journal ArticleDOI
Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition
TL;DR: A deep attention-based spatially recursive model that can learn to attend to critical object parts and encode them into spatially expressive representations and is end-to-end trainable to serve as the part detector and feature extractor.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings ArticleDOI
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Journal ArticleDOI
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Journal ArticleDOI
The Pascal Visual Object Classes (VOC) Challenge
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Proceedings ArticleDOI
Are we ready for autonomous driving? The KITTI vision benchmark suite
TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.