Detect to Track and Track to Detect

doi:10.1109/ICCV.2017.330

Open AccessProceedings ArticleDOI

Detect to Track and Track to Detect

Christoph Feichtenhofer, +2 more

- pp 3057-3065

Chats0

TLDR

This paper sets up a ConvNet architecture for simultaneous detection and tracking, using a multi-task objective for frame-based object detection and across-frame track regression, and introduces correlation features that represent object co-occurrences across time to aid the ConvNet during tracking.

Abstract:

Recent approaches for high accuracy detection and tracking of object categories in video consist of complex multistage solutions that become more cumbersome each year. In this paper we propose a ConvNet architecture that jointly performs detection and tracking, solving the task in a simple and effective way. Our contributions are threefold: (i) we set up a ConvNet architecture for simultaneous detection and tracking, using a multi-task objective for frame-based object detection and across-frame track regression; (ii) we introduce correlation features that represent object co-occurrences across time to aid the ConvNet during tracking; and (iii) we link the frame level detections based on our across-frame tracklets to produce high accuracy detections at the video level. Our ConvNet architecture for spatiotemporal object detection is evaluated on the large-scale ImageNet VID dataset where it achieves state-of-the-art results. Our approach provides better single model performance than the winning method of the last ImageNet challenge while being conceptually much simpler. Finally, we show that by increasing the temporal stride we can dramatically increase the tracker speed.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep Learning for Generic Object Detection: A Survey

Li Liu, +7 more

- 01 Feb 2020 -

International Journal of Computer Vision

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.

...read moreread less

Proceedings ArticleDOI

Fast Online Object Tracking and Segmentation: A Unifying Approach

Qiang Wang, +4 more

TL;DR: This method improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task, and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second.

...read moreread less

Journal ArticleDOI

A Survey of Deep Learning-Based Object Detection

Licheng Jiao, +6 more

- 05 Sep 2019 -

IEEE Access

TL;DR: This survey provides a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors, and lists the traditional and new applications.

...read moreread less

Book ChapterDOI

Tracking Objects as Points

Xingyi Zhou, +2 more

TL;DR: CenterTrack as mentioned in this paper applies a detection model to a pair of images and detections from the prior frame, given this minimal input, localizes objects and predicts their associations with the previous frame.

...read moreread less

Proceedings ArticleDOI

Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net

Wenjie Luo, +2 more

TL;DR: A novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor is proposed, which is very efficient in terms of both memory and computation.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Collapse

Detect to Track and Track to Detect

Citations

Deep Learning for Generic Object Detection: A Survey

Fast Online Object Tracking and Segmentation: A Unifying Approach

A Survey of Deep Learning-Based Object Detection

Tracking Objects as Points

Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Going deeper with convolutions

ImageNet Large Scale Visual Recognition Challenge

Related Papers (5)

SSD: Single Shot MultiBox Detector

Deep Residual Learning for Image Recognition

Faster R-CNN: towards real-time object detection with region proposal networks

You Only Look Once: Unified, Real-Time Object Detection

Fast R-CNN