scispace - formally typeset
Proceedings ArticleDOI

BoxCars: 3D Boxes as CNN Input for Improved Fine-Grained Vehicle Recognition

Reads0
Chats0
TLDR
This work is showing that extracting additional data from the video stream and feeding it into the deep convolutional neural network boosts the recognition performance considerably, and can considerably improve the performance of traffic surveillance systems.
Abstract
We are dealing with the problem of fine-grained vehicle make&model recognition and verification. Our contribution is showing that extracting additional data from the video stream – besides the vehicle image itself – and feeding it into the deep convolutional neural network boosts the recognition performance considerably. This additional information includes: 3D vehicle bounding box used for "unpacking" the vehicle image, its rasterized low-resolution shape, and information about the 3D vehicle orientation. Experiments show that adding such information decreases classification error by 26% (the accuracy is improved from 0.772 to 0.832) and boosts verification average precision by 208% (0.378 to 0.785) compared to baseline pure CNN without any input modifications. Also, the pure baseline CNN outperforms the recent state of the art solution by 0.081. We provide an annotated set "BoxCars" of surveillance vehicle images augmented by various automatically extracted auxiliary information. Our approach and the dataset can considerably improve the performance of traffic surveillance systems.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition

TL;DR: The authors proposed a bank of convolutional filters to capture class-specific discriminative patches without extra part or bounding box annotations, which achieves state-of-the-art performance on three publicly available fine-grained recognition datasets (CUB-200-2011, Stanford Cars and FGVC-Aircraft).
Proceedings ArticleDOI

Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification

TL;DR: Both the orientation invariant feature embedding and the spatio-temporal regularization achieve considerable improvements in the vehicle Re-identification problem.
Proceedings ArticleDOI

Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-Temporal Path Proposals

TL;DR: In this article, a Siamese-CNN+Path-LSTM model was proposed to incorporate complex spatio-temporal information for regularizing the re-ID results.
Book ChapterDOI

Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition

TL;DR: A cross-layer bilinear pooling approach is proposed to capture the inter-layer part feature relations, which results in superior performance compared with other bilinears pooling based approaches.
Journal ArticleDOI

Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition

TL;DR: A deep attention-based spatially recursive model that can learn to attend to critical object parts and encode them into spatially expressive representations and is end-to-end trainable to serve as the part detector and feature extractor.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Journal ArticleDOI

The Pascal Visual Object Classes (VOC) Challenge

TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Proceedings ArticleDOI

Are we ready for autonomous driving? The KITTI vision benchmark suite

TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
Related Papers (5)