scispace - formally typeset
Search or ask a question

Showing papers by "Sebastian Thrun published in 2016"


Book ChapterDOI
08 Oct 2016
TL;DR: This work proposes a method for offline training of neural networks that can track novel objects at test-time at 100 fps, which is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for real-time applications.
Abstract: Machine learning techniques are often used in computer vision due to their ability to leverage large amounts of training data to improve performance. Unfortunately, most generic object trackers are still trained from scratch online and do not benefit from the large number of videos that are readily available for offline training. We propose a method for offline training of neural networks that can track novel objects at test-time at 100 fps. Our tracker is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for real-time applications. Our tracker uses a simple feed-forward network with no online training required. The tracker learns a generic relationship between object motion and appearance and can be used to track novel objects that do not appear in the training set. We test our network on a standard tracking benchmark to demonstrate our tracker’s state-of-the-art performance. Further, our performance improves as we add more videos to our offline training set. To the best of our knowledge, our tracker (Our tracker is available at http://davheld.github.io/GOTURN/GOTURN.html) is the first neural-network tracker that learns to track generic objects at 100 fps.

941 citations


Posted Content
TL;DR: In this paper, the authors proposed a method for offline training of neural networks that can track novel objects at test-time at 100 fps, which is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for realtime applications.
Abstract: Machine learning techniques are often used in computer vision due to their ability to leverage large amounts of training data to improve performance. Unfortunately, most generic object trackers are still trained from scratch online and do not benefit from the large number of videos that are readily available for offline training. We propose a method for offline training of neural networks that can track novel objects at test-time at 100 fps. Our tracker is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for real-time applications. Our tracker uses a simple feed-forward network with no online training required. The tracker learns a generic relationship between object motion and appearance and can be used to track novel objects that do not appear in the training set. We test our network on a standard tracking benchmark to demonstrate our tracker's state-of-the-art performance. Further, our performance improves as we add more videos to our offline training set. To the best of our knowledge, our tracker is the first neural-network tracker that learns to track generic objects at 100 fps.

782 citations


Proceedings ArticleDOI
18 Jun 2016
TL;DR: A probabilistic 3D segmentation method that combines spatial, temporal, and semantic information to make better-informed decisions about how to segment a scene and is able to significantly reduce both undersegmentations and oversegmentations on the KITTI dataset while still running in real-time.
Abstract: In order to track dynamic objects in a robot’s environment, one must first segment the scene into a collection of separate objects. Most real-time robotic vision systems today rely on simple spatial relations to segment the scene into separate objects. However, such methods fail under a variety of realworld situations such as occlusions or crowds of closely-packed objects. We propose a probabilistic 3D segmentation method that combines spatial, temporal, and semantic information to make better-informed decisions about how to segment a scene. We begin with a coarse initial segmentation. We then compute the probability that a given segment should be split into multiple segments or that multiple segments should be merged into a single segment, using spatial, semantic, and temporal cues. Our probabilistic segmentation framework enables us to significantly reduce both undersegmentations and oversegmentations on the KITTI dataset [3, 4, 5] while still running in real-time. By combining spatial, temporal, and semantic information, we are able to create a more robust 3D segmentation system that leads to better overall perception in crowded dynamic environments.

54 citations


Proceedings ArticleDOI
16 May 2016
TL;DR: This work has developed a novel procedure for training a neural network to recognize a set of objects from just a single training image per object, and demonstrates that it significantly outperforms previous state-of-the-art approaches.
Abstract: Some robots must repeatedly interact with a fixed set of objects in their environment. To operate correctly, it is helpful for the robot to be able to recognize the object instances that it repeatedly encounters. However, current methods for recognizing object instances require that, during training, many pictures are taken of each object from a large number of viewing angles. This procedure is slow and requires much manual effort before the robot can begin to operate in a new environment. We have developed a novel procedure for training a neural network to recognize a set of objects from just a single training image per object. To obtain robustness to changes in viewpoint, we take advantage of a supplementary dataset in which we observe a separate (non-overlapping) set of objects from multiple viewpoints. After pre-training the network in a novel multi-stage fashion, the network can robustly recognize new object instances given just a single training image of each object. If more images of each object are available, the performance improves. We perform a thorough analysis comparing our novel training procedure to traditional neural network pre-training techniques as well as previous state-of-the-art approaches including keypoint-matching, template-matching, and sparse coding, and we demonstrate that our method significantly outperforms these previous approaches. Our method can thus be used to easily teach a robot to recognize a novel set of object instances from unknown viewpoints.

41 citations


Journal ArticleDOI
TL;DR: A tracker that combines 3D shape, color, and motion cues in a probabilistic framework that is able to robustly handle changes in viewpoint, occlusions, and lighting variations for moving objects of a variety of shapes, sizes, and distances is presented.
Abstract: Real-time tracking algorithms often suffer from low accuracy and poor robustness when confronted with difficult, real-world data. We present a tracker that combines 3D shape, color when available, and motion cues to accurately track moving objects in real-time. Our tracker allocates computational effort based on the shape of the posterior distribution. Starting with a coarse approximation to the posterior, the tracker successively refines this distribution, increasing in tracking accuracy over time. The tracker can thus be run for any amount of time, after which the current approximation to the posterior is returned. Even at a minimum runtime of 0.37 ms per object, our method outperforms all of the baseline methods of similar speed by at least 25% in root-mean-square RMS tracking error. If our tracker is allowed to run for longer, the accuracy continues to improve, and it continues to outperform all baseline methods. Our tracker is thus anytime, allowing the speed or accuracy to be optimized based on the needs of the application. By combining 3D shape, color when available, and motion cues in a probabilistic framework, our tracker is able to robustly handle changes in viewpoint, occlusions, and lighting variations for moving objects of a variety of shapes, sizes, and distances.

40 citations


Posted Content
TL;DR: A novel data synthesis technique is introduced that merges images of individual skin lesions with full-body images and heavily augments them to generate significant amounts of data and is intended for potential clinical use to augment the capabilities of healthcare providers.
Abstract: Dense object detection and temporal tracking are needed across applications domains ranging from people-tracking to analysis of satellite imagery over time. The detection and tracking of malignant skin cancers and benign moles poses a particularly challenging problem due to the general uniformity of large skin patches, the fact that skin lesions vary little in their appearance, and the relatively small amount of data available. Here we introduce a novel data synthesis technique that merges images of individual skin lesions with full-body images and heavily augments them to generate significant amounts of data. We build a convolutional neural network (CNN) based system, trained on this synthetic data, and demonstrate superior performance to traditional detection and tracking techniques. Additionally, we compare our system to humans trained with simple criteria. Our system is intended for potential clinical use to augment the capabilities of healthcare providers. While domain-specific, we believe the methods invoked in this work will be useful in applying CNNs across domains that suffer from limited data availability.

10 citations


Proceedings Article
01 Dec 2016
TL;DR: In this article, a novel data synthesis technique was introduced that merges images of individual skin lesions with full-body images and heavily augments them to generate significant amounts of data.
Abstract: Dense object detection and temporal tracking are needed across applications domains ranging from people-tracking to analysis of satellite imagery over time. The detection and tracking of malignant skin cancers and benign moles poses a particularly challenging problem due to the general uniformity of large skin patches, the fact that skin lesions vary little in their appearance, and the relatively small amount of data available. Here we introduce a novel data synthesis technique that merges images of individual skin lesions with full-body images and heavily augments them to generate significant amounts of data. We build a convolutional neural network (CNN) based system, trained on this synthetic data, and demonstrate superior performance to traditional detection and tracking techniques. Additionally, we compare our system to humans trained with simple criteria. Our system is intended for potential clinical use to augment the capabilities of healthcare providers. While domain-specific, we believe the methods invoked in this work will be useful in applying CNNs across domains that suffer from limited data availability.

3 citations