Home
/
Authors
/
Stefan Hinterstoisser

Author

Stefan Hinterstoisser

Other affiliations: Technische Universität München, Ludwig Maximilian University of Munich

Bio: Stefan Hinterstoisser is an academic researcher from Google. The author has contributed to research in topics: Object detection & Pose. The author has an hindex of 23, co-authored 33 publications receiving 3249 citations. Previous affiliations of Stefan Hinterstoisser include Technische Universität München & Ludwig Maximilian University of Munich.

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes

[...]

Stefan Hinterstoisser¹, Vincent Lepetit², Slobodan Ilic¹, Stefan Johannes Josef Holzer¹, Gary Bradski, Kurt Konolige, Nassir Navab¹ - Show less +3 more•Institutions (2)

Technische Universität München¹, École Polytechnique Fédérale de Lausanne²

05 Nov 2012

TL;DR: A framework for automatic modeling, detection, and tracking of 3D objects with a Kinect and shows how to build the templates automatically from 3D models, and how to estimate the 6 degrees-of-freedom pose accurately and in real-time.

...read moreread less

Abstract: We propose a framework for automatic modeling, detection, and tracking of 3D objects with a Kinect. The detection part is mainly based on the recent template-based LINEMOD approach [1] for object detection. We show how to build the templates automatically from 3D models, and how to estimate the 6 degrees-of-freedom pose accurately and in real-time. The pose estimation and the color information allow us to check the detection hypotheses and improves the correct detection rate by 13% with respect to the original LINEMOD. These many improvements make our framework suitable for object manipulation in Robotics applications. Moreover we propose a new dataset made of 15 registered, 1100+ frame video sequences of 15 various objects for the evaluation of future competing methods.

...read moreread less

1,114 citations

Proceedings Article•DOI•

Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes

[...]

Stefan Hinterstoisser¹, Stefan Johannes Josef Holzer¹, Cedric Cagniart¹, Slobodan Ilic¹, Kurt Konolige, Nassir Navab¹, Vincent Lepetit² - Show less +3 more•Institutions (2)

Technische Universität München¹, École Polytechnique Fédérale de Lausanne²

06 Nov 2011

TL;DR: This work presents a method for detecting 3D objects using multi-modalities based on an efficient representation of templates that capture the different modalities, and shows in many experiments on commodity hardware that it significantly outperforms state-of-the-art methods on single modalities.

...read moreread less

Abstract: We present a method for detecting 3D objects using multi-modalities. While it is generic, we demonstrate it on the combination of an image and a dense depth map which give complementary object information. It works in real-time, under heavy clutter, does not require a time consuming training stage, and can handle untextured objects. It is based on an efficient representation of templates that capture the different modalities, and we show in many experiments on commodity hardware that our approach significantly outperforms state-of-the-art methods on single modalities.

...read moreread less

611 citations

Journal Article•DOI•

Gradient Response Maps for Real-Time Detection of Textureless Objects

[...]

Stefan Hinterstoisser, Cedric Cagniart, Slobodan Ilic, Peter Sturm¹, Nassir Navab, Pascal Fua², Vincent Lepetit² - Show less +3 more•Institutions (2)

French Institute for Research in Computer Science and Automation¹, École Normale Supérieure²

01 May 2012-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A method for real-time 3D object instance detection that does not require a time-consuming training stage, and can handle untextured objects, and is much faster and more robust with respect to background clutter than current state-of-the-art methods is presented.

...read moreread less

Abstract: We present a method for real-time 3D object instance detection that does not require a time-consuming training stage, and can handle untextured objects. At its core, our approach is a novel image representation for template matching designed to be robust to small image transformations. This robustness is based on spread image gradient orientations and allows us to test only a small subset of all possible pixel locations when parsing the image, and to represent a 3D object with a limited set of templates. In addition, we demonstrate that if a dense depth sensor is available we can extend our approach for an even better performance also taking 3D surface normal orientations into account. We show how to take advantage of the architecture of modern computers to build an efficient but very discriminant representation of the input images that can be used to consider thousands of templates in real time. We demonstrate in many experiments on real data that our method is much faster and more robust with respect to background clutter than current state-of-the-art methods.

...read moreread less

590 citations

Proceedings Article•DOI•

Dominant orientation templates for real-time detection of texture-less objects

[...]

Stefan Hinterstoisser, Vincent Lepetit¹, Slobodan Ilic, Pascal Fua¹, Nassir Navab - Show less +1 more•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

13 Jun 2010

TL;DR: This work presents a method for real-time 3D object detection that does not require a time consuming training stage, and can handle untextured objects, and is a novel template representation that is designed to be robust to small image transformations.

...read moreread less

Abstract: We present a method for real-time 3D object detection that does not require a time consuming training stage, and can handle untextured objects. At its core, is a novel template representation that is designed to be robust to small image transformations. This robustness based on dominant gradient orientations lets us test only a small subset of all possible pixel locations when parsing the image, and to represent a 3D object with a limited set of templates. We show that together with a binary representation that makes evaluation very fast and a branch-and-bound approach to efficiently scan the image, it can detect untextured objects in complex situations and provide their 3D pose in real-time.

...read moreread less

251 citations

Book Chapter•DOI•

On Pre-trained Image Features and Synthetic Images for Deep Learning

[...]

Stefan Hinterstoisser, Vincent Lepetit¹, Paul Wohlhart, Kurt Konolige•Institutions (1)

University of Bordeaux¹

08 Sep 2018

TL;DR: A simple trick is shown that is sufficient to train very effectively modern object detectors with synthetic images only: freeze the layers responsible for feature extraction to generic layers pre-trained on real images, and train only the remaining layers with plain OpenGL rendering.

...read moreread less

Abstract: Deep Learning methods usually require huge amounts of training data to perform at their full potential, and often require expensive manual labeling. Using synthetic images is therefore very attractive to train object detectors, as the labeling comes for free, and several approaches have been proposed to combine synthetic and real images for training. In this paper, we evaluate if ‘freezing’ the layers responsible for feature extraction to generic layers pre-trained on real images, and training only the remaining layers with plain OpenGL rendering may allow for training with synthetic images only. Our experiments with very recent deep architectures for object recognition (Faster-RCNN, R-FCN, Mask-RCNN) and image feature extractors (InceptionResnet and Resnet) show this simple approach performs surprisingly well.

...read moreread less

189 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations

Book•

Computer Vision: Algorithms and Applications

[...]

Richard Szeliski

30 Sep 2010

TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.

...read moreread less

Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

...read moreread less

4,146 citations

Proceedings Article•DOI•

Online Object Tracking: A Benchmark

[...]

Yi Wu¹, Jongwoo Lim², Ming-Hsuan Yang¹•Institutions (2)

University of California, Merced¹, Hanyang University²

23 Jun 2013

TL;DR: Large scale experiments are carried out with various evaluation criteria to identify effective approaches for robust tracking and provide potential future research directions in this field.

...read moreread less

Abstract: Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

...read moreread less

3,828 citations

Journal Article•DOI•

Tracking-Learning-Detection

[...]

Zdenek Kalal¹, Krystian Mikolajczyk¹, Jiri Matas•Institutions (1)

University of Surrey¹

01 Jul 2012-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning, and detection, and develops a novel learning method (P-N learning) which estimates the errors by a pair of “experts”: P-expert estimates missed detections, and N-ex Expert estimates false alarms.

...read moreread less

Abstract: This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object's location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning, and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates the detector's errors and updates it to avoid these errors in the future. We study how to identify the detector's errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of “experts”: (1) P-expert estimates missed detections, and (2) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches.

...read moreread less

3,137 citations

Journal Article•DOI•

Object Tracking Benchmark

[...]

Yi Wu¹, Jongwoo Lim², Ming-Hsuan Yang³•Institutions (3)

Nanjing University of Information Science and Technology¹, Hanyang University², University of California, Merced³

01 Sep 2015-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria is carried out to identify effective approaches for robust tracking and provide potential future research directions in this field.

...read moreread less

Abstract: Object tracking has been one of the most important and active research areas in the field of computer vision. A large number of tracking algorithms have been proposed in recent years with demonstrated success. However, the set of sequences used for evaluation is often not sufficient or is sometimes biased for certain types of algorithms. Many datasets do not have common ground-truth object positions or extents, and this makes comparisons among the reported quantitative results difficult. In addition, the initial conditions or parameters of the evaluated tracking algorithms are not the same, and thus, the quantitative results reported in literature are incomparable or sometimes contradictory. To address these issues, we carry out an extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria to understand how these methods perform within the same framework. In this work, we first construct a large dataset with ground-truth object positions and extents for tracking and introduce the sequence attributes for the performance analysis. Second, we integrate most of the publicly available trackers into one code library with uniform input and output formats to facilitate large-scale performance evaluation. Third, we extensively evaluate the performance of 31 algorithms on 100 sequences with different initialization settings. By analyzing the quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

...read moreread less

2,974 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse