Accurate Single Stage Detector Using Recurrent Rolling Convolution

doi:10.1109/CVPR.2017.87

Open AccessProceedings ArticleDOI

Accurate Single Stage Detector Using Recurrent Rolling Convolution

- pp 752-760

TLDR

In this article, the authors proposed Recurrent Rolling Convolution (RRC) architecture over multi-scale feature maps to construct object classifiers and bounding box regressors which are deep in context.

Abstract:

Most of the recent successful methods in accurate object detection and localization used some variants of R-CNN style two stage Convolutional Neural Networks (CNN) where plausible regions were proposed in the first stage then followed by a second stage for decision refinement. Despite the simplicity of training and the efficiency in deployment, the single stage detection methods have not been as competitive when evaluated in benchmarks consider mAP for high IoU thresholds. In this paper, we proposed a novel single stage end-to-end trainable object detection network to overcome this limitation. We achieved this by introducing Recurrent Rolling Convolution (RRC) architecture over multi-scale feature maps to construct object classifiers and bounding box regressors which are deep in context. We evaluated our method in the challenging KITTI dataset which measures methods under IoU threshold of 0.7. We showed that with RRC, a single reduced VGG-16 based model already significantly outperformed all the previously published results. At the time this paper was written our models ranked the first in KITTI car detection (the hard level), the first in cyclist detection and the second in pedestrian detection. These results were not reached by the previous single stage methods. The code is publicly available.

Citations

PDF

Open Access

More filters

Book ChapterDOI

Tracking Objects as Points

Xingyi Zhou, +2 more

TL;DR: CenterTrack as mentioned in this paper applies a detection model to a pair of images and detections from the prior frame, given this minimal input, localizes objects and predicts their associations with the previous frame.

...read moreread less

Proceedings ArticleDOI

Multi-Task Multi-Sensor Fusion for 3D Object Detection

Ming Liang, +4 more

TL;DR: An end-to-end learnable architecture that reasons about 2D and 3D object detection as well as ground estimation and depth completion is presented that leads the KITTI benchmark on 2D, 3D and bird's eye view object detection, while being real-time.

...read moreread less

Journal ArticleDOI

Deep learning in video multi-object tracking: A survey

Gioele Ciaparrone, +6 more

- 14 Mar 2020 -

Neurocomputing

TL;DR: A comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.

...read moreread less

Posted Content

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Yan Wang, +5 more

- 18 Dec 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes to convert image-based depth maps to pseudo-LiDAR representations --- essentially mimicking the LiDAR signal, and achieves impressive improvements over the existing state-of-the-art in image- based performance.

...read moreread less

Journal ArticleDOI

Recent Advances in Deep Learning for Object Detection

Xiongwei Wu, +3 more

- 05 Jul 2020 -

Neurocomputing

TL;DR: A comprehensive survey of recent advances in visual object detection with deep learning can be found in this article, where the authors systematically analyze the existing object detection frameworks and organize the survey into three major parts: detection components, learning strategies, and applications and benchmarks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings ArticleDOI

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, +3 more

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

...read moreread less

Journal ArticleDOI

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 01 Jun 2017 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

...read moreread less

Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 04 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less

Proceedings ArticleDOI

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick, +3 more

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.

...read moreread less

Collapse

Accurate Single Stage Detector Using Recurrent Rolling Convolution

Citations

Tracking Objects as Points

Multi-Task Multi-Sensor Fusion for 3D Object Detection

Deep learning in video multi-object tracking: A survey

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Recent Advances in Deep Learning for Object Detection

References

Long short-term memory

You Only Look Once: Unified, Real-Time Object Detection

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Related Papers (5)

Faster R-CNN: towards real-time object detection with region proposal networks

SSD: Single Shot MultiBox Detector

Deep Residual Learning for Image Recognition

Fast R-CNN

You Only Look Once: Unified, Real-Time Object Detection