Top 7 papers published by Sebastian Thrun from Stanford University in 2016

Book Chapter•DOI•

Learning to Track at 100 FPS with Deep Regression Networks

[...]

David Held¹, Sebastian Thrun¹, Silvio Savarese¹•Institutions (1)

08 Oct 2016

TL;DR: This work proposes a method for offline training of neural networks that can track novel objects at test-time at 100 fps, which is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for real-time applications.

...read moreread less

Abstract: Machine learning techniques are often used in computer vision due to their ability to leverage large amounts of training data to improve performance. Unfortunately, most generic object trackers are still trained from scratch online and do not benefit from the large number of videos that are readily available for offline training. We propose a method for offline training of neural networks that can track novel objects at test-time at 100 fps. Our tracker is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for real-time applications. Our tracker uses a simple feed-forward network with no online training required. The tracker learns a generic relationship between object motion and appearance and can be used to track novel objects that do not appear in the training set. We test our network on a standard tracking benchmark to demonstrate our tracker’s state-of-the-art performance. Further, our performance improves as we add more videos to our offline training set. To the best of our knowledge, our tracker (Our tracker is available at http://davheld.github.io/GOTURN/GOTURN.html) is the first neural-network tracker that learns to track generic objects at 100 fps.

...read moreread less

941 citations

Posted Content•

Learning to Track at 100 FPS with Deep Regression Networks

[...]

David Held¹, Sebastian Thrun¹, Silvio Savarese¹•Institutions (1)

Stanford University¹

06 Apr 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors proposed a method for offline training of neural networks that can track novel objects at test-time at 100 fps, which is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for realtime applications.

...read moreread less

Abstract: Machine learning techniques are often used in computer vision due to their ability to leverage large amounts of training data to improve performance. Unfortunately, most generic object trackers are still trained from scratch online and do not benefit from the large number of videos that are readily available for offline training. We propose a method for offline training of neural networks that can track novel objects at test-time at 100 fps. Our tracker is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for real-time applications. Our tracker uses a simple feed-forward network with no online training required. The tracker learns a generic relationship between object motion and appearance and can be used to track novel objects that do not appear in the training set. We test our network on a standard tracking benchmark to demonstrate our tracker's state-of-the-art performance. Further, our performance improves as we add more videos to our offline training set. To the best of our knowledge, our tracker is the first neural-network tracker that learns to track generic objects at 100 fps.

...read moreread less

782 citations

Proceedings Article•DOI•

A Probabilistic Framework for Real-time 3D Segmentation using Spatial, Temporal, and Semantic Cues.

[...]

David Held¹, Devin Guillory², Brice Rebsamen³, Sebastian Thrun¹, Silvio Savarese¹ - Show less +1 more•Institutions (3)

Stanford University¹, University of California, Berkeley², National University of Singapore³

18 Jun 2016

TL;DR: A probabilistic 3D segmentation method that combines spatial, temporal, and semantic information to make better-informed decisions about how to segment a scene and is able to significantly reduce both undersegmentations and oversegmentations on the KITTI dataset while still running in real-time.

...read moreread less

Abstract: In order to track dynamic objects in a robot’s environment, one must first segment the scene into a collection of separate objects. Most real-time robotic vision systems today rely on simple spatial relations to segment the scene into separate objects. However, such methods fail under a variety of realworld situations such as occlusions or crowds of closely-packed objects. We propose a probabilistic 3D segmentation method that combines spatial, temporal, and semantic information to make better-informed decisions about how to segment a scene. We begin with a coarse initial segmentation. We then compute the probability that a given segment should be split into multiple segments or that multiple segments should be merged into a single segment, using spatial, semantic, and temporal cues. Our probabilistic segmentation framework enables us to significantly reduce both undersegmentations and oversegmentations on the KITTI dataset [3, 4, 5] while still running in real-time. By combining spatial, temporal, and semantic information, we are able to create a more robust 3D segmentation system that leads to better overall perception in crowded dynamic environments.

...read moreread less

54 citations

Proceedings Article•DOI•

Robust single-view instance recognition

[...]

David Held¹, Sebastian Thrun¹, Silvio Savarese¹•Institutions (1)

Stanford University¹

16 May 2016

TL;DR: This work has developed a novel procedure for training a neural network to recognize a set of objects from just a single training image per object, and demonstrates that it significantly outperforms previous state-of-the-art approaches.

...read moreread less

Abstract: Some robots must repeatedly interact with a fixed set of objects in their environment. To operate correctly, it is helpful for the robot to be able to recognize the object instances that it repeatedly encounters. However, current methods for recognizing object instances require that, during training, many pictures are taken of each object from a large number of viewing angles. This procedure is slow and requires much manual effort before the robot can begin to operate in a new environment. We have developed a novel procedure for training a neural network to recognize a set of objects from just a single training image per object. To obtain robustness to changes in viewpoint, we take advantage of a supplementary dataset in which we observe a separate (non-overlapping) set of objects from multiple viewpoints. After pre-training the network in a novel multi-stage fashion, the network can robustly recognize new object instances given just a single training image of each object. If more images of each object are available, the performance improves. We perform a thorough analysis comparing our novel training procedure to traditional neural network pre-training techniques as well as previous state-of-the-art approaches including keypoint-matching, template-matching, and sparse coding, and we demonstrate that our method significantly outperforms these previous approaches. Our method can thus be used to easily teach a robot to recognize a novel set of object instances from unknown viewpoints.

...read moreread less

41 citations

Journal Article•DOI•

Robust real-time tracking combining 3D shape, color, and motion

[...]

David Held¹, Jesse Levinson¹, Sebastian Thrun¹, Silvio Savarese¹•Institutions (1)

Stanford University¹

01 Jan 2016-The International Journal of Robotics Research

TL;DR: A tracker that combines 3D shape, color, and motion cues in a probabilistic framework that is able to robustly handle changes in viewpoint, occlusions, and lighting variations for moving objects of a variety of shapes, sizes, and distances is presented.

...read moreread less

Abstract: Real-time tracking algorithms often suffer from low accuracy and poor robustness when confronted with difficult, real-world data. We present a tracker that combines 3D shape, color when available, and motion cues to accurately track moving objects in real-time. Our tracker allocates computational effort based on the shape of the posterior distribution. Starting with a coarse approximation to the posterior, the tracker successively refines this distribution, increasing in tracking accuracy over time. The tracker can thus be run for any amount of time, after which the current approximation to the posterior is returned. Even at a minimum runtime of 0.37 ms per object, our method outperforms all of the baseline methods of similar speed by at least 25% in root-mean-square RMS tracking error. If our tracker is allowed to run for longer, the accuracy continues to improve, and it continues to outperform all baseline methods. Our tracker is thus anytime, allowing the speed or accuracy to be optimized based on the needs of the application. By combining 3D shape, color when available, and motion cues in a probabilistic framework, our tracker is able to robustly handle changes in viewpoint, occlusions, and lighting variations for moving objects of a variety of shapes, sizes, and distances.

...read moreread less

40 citations

Posted Content•

Skin Cancer Detection and Tracking using Data Synthesis and Deep Learning

[...]

Yunzhu Li¹, Andre Esteva², Brett Kuprel², Roberto A. Novoa³, Justin M. Ko², Sebastian Thrun² - Show less +2 more•Institutions (3)

Peking University¹, Stanford University², University of Pennsylvania³

04 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel data synthesis technique is introduced that merges images of individual skin lesions with full-body images and heavily augments them to generate significant amounts of data and is intended for potential clinical use to augment the capabilities of healthcare providers.

...read moreread less

Abstract: Dense object detection and temporal tracking are needed across applications domains ranging from people-tracking to analysis of satellite imagery over time. The detection and tracking of malignant skin cancers and benign moles poses a particularly challenging problem due to the general uniformity of large skin patches, the fact that skin lesions vary little in their appearance, and the relatively small amount of data available. Here we introduce a novel data synthesis technique that merges images of individual skin lesions with full-body images and heavily augments them to generate significant amounts of data. We build a convolutional neural network (CNN) based system, trained on this synthetic data, and demonstrate superior performance to traditional detection and tracking techniques. Additionally, we compare our system to humans trained with simple criteria. Our system is intended for potential clinical use to augment the capabilities of healthcare providers. While domain-specific, we believe the methods invoked in this work will be useful in applying CNNs across domains that suffer from limited data availability.

...read moreread less

10 citations

Proceedings Article•

Skin Cancer Detection and Tracking using Data Synthesis and Deep Learning

[...]

Yunzhu Li¹, Andre Esteva², Brett Kuprel², Roberto A. Novoa³, Justin M. Ko², Sebastian Thrun² - Show less +2 more•Institutions (3)

Peking University¹, Stanford University², University of Pennsylvania³

01 Dec 2016

TL;DR: In this article, a novel data synthesis technique was introduced that merges images of individual skin lesions with full-body images and heavily augments them to generate significant amounts of data.

...read moreread less

Abstract: Dense object detection and temporal tracking are needed across applications domains ranging from people-tracking to analysis of satellite imagery over time. The detection and tracking of malignant skin cancers and benign moles poses a particularly challenging problem due to the general uniformity of large skin patches, the fact that skin lesions vary little in their appearance, and the relatively small amount of data available. Here we introduce a novel data synthesis technique that merges images of individual skin lesions with full-body images and heavily augments them to generate significant amounts of data. We build a convolutional neural network (CNN) based system, trained on this synthetic data, and demonstrate superior performance to traditional detection and tracking techniques. Additionally, we compare our system to humans trained with simple criteria. Our system is intended for potential clinical use to augment the capabilities of healthcare providers. While domain-specific, we believe the methods invoked in this work will be useful in applying CNNs across domains that suffer from limited data availability.

...read moreread less

3 citations

Showing papers by "Sebastian Thrun published in 2016"