Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies

doi:10.1109/ICCV.2017.41

Home
/
Papers
/
Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies

Proceedings Article•DOI•

Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies

Amir Sadeghian¹, Alexandre Alahi¹, Silvio Savarese¹•Institutions (1)

Stanford University¹

01 Oct 2017-pp 300-311

TL;DR: In this article, a structure of Recurrent Neural Networks (RNNs) is proposed to jointly reason on multiple cues over a temporal window, which allows to correct data association errors and recover observations from occluded states.

read less

Abstract: The majority of existing solutions to the Multi-Target Tracking (MTT) problem do not combine cues over a long period of time in a coherent fashion. In this paper, we present an online method that encodes long-term temporal dependencies across multiple cues. One key challenge of tracking methods is to accurately track occluded targets or those which share similar appearance properties with surrounding objects. To address this challenge, we present a structure of Recurrent Neural Networks (RNN) that jointly reasons on multiple cues over a temporal window. Our method allows to correct data association errors and recover observations from occluded states. We demonstrate the robustness of our data-driven approach by tracking multiple targets using their appearance, motion, and even interactions. Our method outperforms previous works on multiple publicly available datasets including the challenging MOT benchmark.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints

[...]

Amir Sadeghian¹, Vineet Kosaraju¹, Ali Sadeghian², Noriaki Hirose¹, Hamid Rezatofighi¹, Silvio Savarese¹ - Show less +2 more•Institutions (2)

Stanford University¹, University of Florida²

01 Jun 2019

TL;DR: In this paper, an interpretable framework based on Generative Adversarial Network (GAN) is proposed for path prediction for multiple interacting agents in a scene, which leverages two sources of information, the path history of all the agents in the scene, and the scene context information, using images of the scene.

...read moreread less

Abstract: This paper addresses the problem of path prediction for multiple interacting agents in a scene, which is a crucial step for many autonomous platforms such as self-driving cars and social robots. We present SoPhie; an interpretable framework based on Generative Adversarial Network (GAN), which leverages two sources of information, the path history of all the agents in a scene, and the scene context information, using images of the scene. To predict a future path for an agent, both physical and social information must be leveraged. Previous work has not been successful to jointly model physical and social interactions. Our approach blends a social attention mechanism with physical attention that helps the model to learn where to look in a large scene and extract the most salient parts of the image relevant to the path. Whereas, the social attention component aggregates information across the different agent interactions and extracts the most important trajectory information from the surrounding neighbors. SoPhie also takes advantage of GAN to generates more realistic samples and to capture the uncertain nature of the future paths by modeling its distribution. All these mechanisms enable our approach to predict socially and physically plausible paths for the agents and to achieve state-of-the-art performance on several different trajectory forecasting benchmarks.

...read moreread less

752 citations

Journal Article•DOI•

Deep learning for cellular image analysis

[...]

Erick Moen¹, Dylan Bannon¹, Takamasa Kudo², William Graf¹, Markus W. Covert², David Van Valen¹ - Show less +2 more•Institutions (2)

California Institute of Technology¹, Stanford University²

27 May 2019-Nature Methods

TL;DR: The intersection between deep learning and cellular image analysis is reviewed and an overview of both the mathematical mechanics and the programming frameworks of deep learning that are pertinent to life scientists are provided.

...read moreread less

Abstract: Recent advances in computer vision and machine learning underpin a collection of algorithms with an impressive ability to decipher the content of images. These deep learning algorithms are being applied to biological images and are transforming the analysis and interpretation of imaging data. These advances are positioned to render difficult analyses routine and to enable researchers to carry out new, previously impossible experiments. Here we review the intersection between deep learning and cellular image analysis and provide an overview of both the mathematical mechanics and the programming frameworks of deep learning that are pertinent to life scientists. We survey the field's progress in four key applications: image classification, image segmentation, object tracking, and augmented microscopy. Last, we relay our labs' experience with three key aspects of implementing deep learning in the laboratory: annotating training data, selecting and training a range of neural network architectures, and deploying solutions. We also highlight existing datasets and implementations for each surveyed application.

...read moreread less

714 citations

Book Chapter•DOI•

Tracking Objects as Points

[...]

Xingyi Zhou¹, Vladlen Koltun², Philipp Krähenbühl¹•Institutions (2)

University of Texas at Austin¹, Intel²

23 Aug 2020

TL;DR: CenterTrack as mentioned in this paper applies a detection model to a pair of images and detections from the prior frame, given this minimal input, localizes objects and predicts their associations with the previous frame.

...read moreread less

Abstract: Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. We present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That’s it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves \(67.8\%\) MOTA on the MOT17 challenge at 22 FPS and \(89.4\%\) MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves \(28.3\%\) AMOTA@0.2 on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.

...read moreread less

657 citations

Journal Article•DOI•

FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking

[...]

Yifu Zhang¹, Chunyu Wang², Xinggang Wang¹, Wenjun Zeng², Wenyu Liu¹ - Show less +1 more•Institutions (2)

Huazhong University of Science and Technology¹, Microsoft²

04 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A simple approach which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features allows \emph{FairMOT} to obtain high levels of detection and tracking accuracy and outperform previous state-of-the-arts by a large margin on several public datasets.

...read moreread less

Abstract: There has been remarkable progress on object detection and re-identification (re-ID) in recent years which are the key components of multi-object tracking. However, little attention has been focused on jointly accomplishing the two tasks in a single network. Our study shows that the previous attempts ended up with degraded accuracy mainly because the re-ID task is not fairly learned which causes many identity switches. The unfairness lies in two-fold: (1) they treat re-ID as a secondary task whose accuracy heavily depends on the primary detection task. So training is largely biased to the detection task but ignores the re-ID task; (2) they use ROI-Align to extract re-ID features which is directly borrowed from object detection. However, this introduces a lot of ambiguity in characterizing objects because many sampling points may belong to disturbing instances or background. To solve the problems, we present a simple approach \emph{FairMOT} which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the tasks allows \emph{FairMOT} to obtain high levels of detection and tracking accuracy and outperform previous state-of-the-arts by a large margin on several public datasets. The source code and pre-trained models are released at this https URL.

...read moreread less

507 citations

Proceedings Article•DOI•

Tracking Without Bells and Whistles

[...]

Philipp Bergmann¹, Tim Meinhardt, Laura Leal-Taixé•Institutions (1)

Technische Universität München¹

01 Oct 2019

TL;DR: Tracktor as discussed by the authors exploits the bounding box regression of an object detector to predict the position of the object in the next frame, thereby converting a detector into a Tracktor and provides a new state-of-the-art on three multi-object tracking benchmarks by extending it with a straightforward re-identification and camera motion compensation.

...read moreread less

Abstract: The problem of tracking multiple objects in a video sequence poses several challenging tasks. For tracking-by-detection, these include object re-identification, motion prediction and dealing with occlusions. We present a tracker (without bells and whistles) that accomplishes tracking without specifically targeting any of these tasks, in particular, we perform no training or optimization on tracking data. To this end, we exploit the bounding box regression of an object detector to predict the position of an object in the next frame, thereby converting a detector into a Tracktor. We demonstrate the potential of Tracktor and provide a new state-of-the-art on three multi-object tracking benchmarks by extending it with a straightforward re-identification and camera motion compensation. We then perform an analysis on the performance and failure cases of several state-of-the-art tracking methods in comparison to our Tracktor. Surprisingly, none of the dedicated tracking methods are considerably better in dealing with complex tracking scenarios, namely, small and occluded objects or missing detections. However, our approach tackles most of the easy tracking scenarios. Therefore, we motivate our approach as a new tracking paradigm and point out promising future research directions. Overall, Tracktor yields superior tracking performance than any current tracking method and our analysis exposes remaining and unsolved tracking challenges to inspire future research directions.

...read moreread less

503 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Posted Content•

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

[...]

Kaiming He¹, Xiangyu Zhang², Shaoqing Ren¹, Jian Sun¹•Institutions (2)

Microsoft¹, Xi'an Jiaotong University²

06 Feb 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.

...read moreread less

Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.

...read moreread less

11,866 citations

Proceedings Article•DOI•

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

[...]

Kaiming He¹, Xiangyu Zhang², Shaoqing Ren¹, Jian Sun¹•Institutions (2)

Microsoft¹, Xi'an Jiaotong University²

07 Dec 2015

TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.

...read moreread less

Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66% [33]). To our knowledge, our result is the first to surpass the reported human-level performance (5.1%, [26]) on this dataset.

...read moreread less

11,732 citations

Journal Article•DOI•

Social Force Model for Pedestrian Dynamics

[...]

Dirk Helbing¹, Péter Molnár¹•Institutions (1)

University of Stuttgart¹

01 May 1995-Physical Review E

TL;DR: Computer simulations of crowds of interacting pedestrians show that the social force model is capable of describing the self-organization of several observed collective effects of pedestrian behavior very realistically.

...read moreread less

Abstract: It is suggested that the motion of pedestrians can be described as if they would be subject to ``social forces.'' These ``forces'' are not directly exerted by the pedestrians' personal environment, but they are a measure for the internal motivations of the individuals to perform certain actions (movements). The corresponding force concept is discussed in more detail and can also be applied to the description of other behaviors. In the presented model of pedestrian behavior several force terms are essential: first, a term describing the acceleration towards the desired velocity of motion; second, terms reflecting that a pedestrian keeps a certain distance from other pedestrians and borders; and third, a term modeling attractive effects. The resulting equations of motion of nonlinearly coupled Langevin equations. Computer simulations of crowds of interacting pedestrians show that the social force model is capable of describing the self-organization of several observed collective effects of pedestrian behavior very realistically.

...read moreread less

5,716 citations

Journal Article•DOI•

Algorithms for the Assignment and Transportation Problems

[...]

James Munkres

01 Mar 1957-Journal of The Society for Industrial and Applied Mathematics

TL;DR: In this paper, algorithms for the solution of the general assignment and transportation problems are presen, and the algorithm is generalized to one for the transportation problem.

...read moreread less

Abstract: In this paper we presen algorithms for the solution of the general assignment and transportation problems. In Section 1, a statement of the algorithm for the assignment problem appears, along with a proof for the correctness of the algorithm. The remarks which constitute the proof are incorporated parenthetically into the statement of the algorithm. Following this appears a discussion of certain theoretical aspects of the problem. In Section 2, the algorithm is generalized to one for the transportation problem. The algorithm of that section is stated as concisely as possible, with theoretical remarks omitted.

...read moreread less

3,918 citations