Home
/
Authors
/
Stefan Popov

Author

Stefan Popov

Other affiliations: Max Planck Society, Saarland University, French Institute for Research in Computer Science and Automation

Bio: Stefan Popov is an academic researcher from Google. The author has contributed to research in topics: Rendering (computer graphics) & Computer science. The author has an hindex of 15, co-authored 24 publications receiving 1880 citations. Previous affiliations of Stefan Popov include Max Planck Society & Saarland University.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

[...]

Alina Kuznetsova¹, Hassan Rom¹, Neil Alldrin¹, Jasper Uijlings¹, Ivan Krasin¹, Jordi Pont-Tuset¹, Shahab Kamali¹, Stefan Popov¹, Matteo Malloci¹, Alexander Kolesnikov¹, Tom Duerig¹, Vittorio Ferrari¹ - Show less +8 more•Institutions (1)

Google¹

02 Nov 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: Open Images V4 as mentioned in this paper is a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection from Flickr without a predefined list of class names or tags.

...read moreread less

Abstract: We present Open Images V4, a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection. The images have a Creative Commons Attribution license that allows to share and adapt the material, and they have been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias. Open Images V4 offers large scale across several dimensions: 30.1M image-level labels for 19.8k concepts, 15.4M bounding boxes for 600 object classes, and 375k visual relationship annotations involving 57 classes. For object detection in particular, we provide 15x more bounding boxes than the next largest datasets (15.4M boxes on 1.9M images). The images often show complex scenes with several objects (8 annotated objects per image on average). We annotated visual relationships between them, which support visual relationship detection, an emerging task that requires structured reasoning. We provide in-depth comprehensive statistics about the dataset, we validate the quality of the annotations, we study how the performance of several modern models evolves with increasing amounts of training data, and we demonstrate two applications made possible by having unified annotations of multiple types coexisting in the same images. We hope that the scale, quality, and variety of Open Images V4 will foster further research and innovation even beyond the areas of image classification, object detection, and visual relationship detection.

...read moreread less

482 citations

Journal Article•DOI•

The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale

[...]

Google¹

01 Jul 2020-International Journal of Computer Vision

TL;DR: Open Images V4 as discussed by the authors is a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection from Flickr without a predefined list of class names or tags.

...read moreread less

Abstract: We present Open Images V4, a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection. The images have a Creative Commons Attribution license that allows to share and adapt the material, and they have been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an initial design bias. Open Images V4 offers large scale across several dimensions: 30.1M image-level labels for 19.8k concepts, 15.4M bounding boxes for 600 object classes, and 375k visual relationship annotations involving 57 classes. For object detection in particular, we provide $$15\times $$ more bounding boxes than the next largest datasets (15.4M boxes on 1.9M images). The images often show complex scenes with several objects (8 annotated objects per image on average). We annotated visual relationships between them, which support visual relationship detection, an emerging task that requires structured reasoning. We provide in-depth comprehensive statistics about the dataset, we validate the quality of the annotations, we study how the performance of several modern models evolves with increasing amounts of training data, and we demonstrate two applications made possible by having unified annotations of multiple types coexisting in the same images. We hope that the scale, quality, and variety of Open Images V4 will foster further research and innovation even beyond the areas of image classification, object detection, and visual relationship detection.

...read moreread less

333 citations

Journal Article•DOI•

Stackless KD-Tree Traversal for High Performance GPU Ray Tracing

[...]

Stefan Popov¹, Johannes Günther¹, Hans-Peter Seidel¹, Philipp Slusallek²•Institutions (2)

Max Planck Society¹, Saarland University²

01 Sep 2007-Computer Graphics Forum

TL;DR: Significant advances have been achieved for realtime ray tracing recently, but realtime performance for complex scenes still requires large computational resources not yet available from the CPUs in standard PCs.

...read moreread less

Abstract: Significant advances have been achieved for realtime ray tracing recently, but realtime performance for complex scenes still requires large computational resources not yet available from the CPUs in standard PCs. Incidentally, most of these PCs also contain modern GPUs that do offer much larger raw compute power. However, limitations in the programming and memory model have so far kept the performance of GPU ray tracers well below that of their CPU counterparts. In this paper we present a novel packet ray traversal implementation that completely eliminates the need for maintaining a stack during kd-tree traversal and that reduces the number of traversal steps per ray. While CPUs benefit moderately from the stackless approach, it improves GPU performance significantly. We achieve a peak performance of over 16 million rays per second for reasonably complex scenes, including complex shading and secondary rays. Several examples show that with this new technique GPUs can actually outperform equivalent CPU based ray tracers.

...read moreread less

269 citations

Proceedings Article•DOI•

Realtime Ray Tracing on GPU with BVH-based Packet Traversal

[...]

Johannes Günther, Stefan Popov¹, Hans-Peter Seidel, Philipp Slusallek¹•Institutions (1)

Saarland University¹

10 Sep 2007

TL;DR: This paper presents a BVH-based GPU ray tracer with a parallel packet traversal algorithm using a shared stack, and presents a fast, CPU-based BvH construction algorithm which very accurately approximates the surface area heuristic using streamed binning while still being one order of magnitude faster than previously published results.

...read moreread less

Abstract: Recent GPU ray tracers can already achieve performance competitive to that of their CPU counterparts. Nevertheless, these systems can not yet fully exploit the capabilities of modern GPUs and can only handle medium-sized, static scenes. In this paper we present a BVH-based GPU ray tracer with a parallel packet traversal algorithm using a shared stack. We also present a fast, CPU-based BVH construction algorithm which very accurately approximates the surface area heuristic using streamed binning while still being one order of magnitude faster than previously published results. Furthermore, using a BVH allows us to push the size limit of supported scenes on the GPU: We can now ray trace the 12.7 million triangle Power Plant at 1024 times 1024 image resolution with 3 fps, including shading and shadows.

...read moreread less

188 citations

Proceedings Article•DOI•

Large-Scale Interactive Object Segmentation With Human Annotators

[...]

Rodrigo Benenson¹, Stefan Popov¹, Vittorio Ferrari¹•Institutions (1)

Google¹

01 Jun 2019

TL;DR: This paper systematically explores in simulation the design space of deep interactive segmentation models and reports new insights and caveats, and presents a technique for automatically estimating the quality of the produced masks which exploits indirect signals from the annotation process.

...read moreread less

Abstract: Manually annotating object segmentation masks is very time consuming. Interactive object segmentation methods offer a more efficient alternative where a human annotator and a machine segmentation model collaborate. In this paper we make several contributions to interactive segmentation: (1) we systematically explore in simulation the design space of deep interactive segmentation models and report new insights and caveats; (2) we execute a large-scale annotation campaign with real human annotators, producing masks for 2.5M instances on the OpenImages dataset. We released this data publicly, forming the largest existing dataset for instance segmentation. Moreover, by re-annotating part of the COCO dataset, we show that we can produce instance masks 3x faster than traditional polygon drawing tools while also providing better quality. (3) We present a technique for automatically estimating the quality of the produced masks which exploits indirect signals from the annotation process.

...read moreread less

130 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Journal Article•DOI•

Deep Learning for Generic Object Detection: A Survey

[...]

Li Liu¹, Li Liu², Wanli Ouyang³, Xiaogang Wang⁴, Paul Fieguth⁵, Jie Chen², Xinwang Liu¹, Matti Pietikäinen² - Show less +4 more•Institutions (5)

National University of Defense Technology¹, University of Oulu², University of Sydney³, The Chinese University of Hong Kong⁴, University of Waterloo⁵

01 Feb 2020-International Journal of Computer Vision

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.

...read moreread less

Abstract: Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the field of generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. More than 300 research contributions are included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics. We finish the survey by identifying promising directions for future research.

...read moreread less

1,897 citations

Posted Content•

Optuna: A Next-generation Hyperparameter Optimization Framework

[...]

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama - Show less +1 more

25 Jul 2019-arXiv: Learning

TL;DR: New design-criteria for next-generation hyperparameter optimization software are introduced, including define-by-run API that allows users to construct the parameter search space dynamically, and easy-to-setup, versatile architecture that can be deployed for various purposes.

...read moreread less

Abstract: The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purposes, ranging from scalable distributed computing to light-weight experiment conducted via interactive interface. In order to prove our point, we will introduce Optuna, an optimization software which is a culmination of our effort in the development of a next generation optimization software. As an optimization software designed with define-by-run principle, Optuna is particularly the first of its kind. We will present the design-techniques that became necessary in the development of the software that meets the above criteria, and demonstrate the power of our new design through experimental results and real world applications. Our software is available under the MIT license (this https URL).

...read moreread less

1,448 citations

Proceedings Article•DOI•

Optuna: A Next-generation Hyperparameter Optimization Framework

[...]

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama - Show less +1 more

25 Jul 2019

TL;DR: Optuna as mentioned in this paper is a next-generation hyperparameter optimization software with define-by-run (DBR) API that allows users to construct the parameter search space dynamically.

...read moreread less

1,248 citations

Posted Content•

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

[...]

Xiujun Li¹, Xi Yin¹, Chunyuan Li¹, Pengchuan Zhang¹, Xiaowei Hu¹, Lei Zhang¹, Lijuan Wang¹, Houdong Hu¹, Li Dong¹, Furu Wei¹, Yejin Choi², Jianfeng Gao¹ - Show less +8 more•Institutions (2)

Microsoft¹, University of Washington²

13 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes a new learning method Oscar (Object-Semantics Aligned Pre-training), which uses object tags detected in images as anchor points to significantly ease the learning of alignments.

...read moreread less

Abstract: Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks. While existing methods simply concatenate image region features and text features as input to the model to be pre-trained and use self-attention to learn image-text semantic alignments in a brute force manner, in this paper, we propose a new learning method Oscar (Object-Semantics Aligned Pre-training), which uses object tags detected in images as anchor points to significantly ease the learning of alignments. Our method is motivated by the observation that the salient objects in an image can be accurately detected, and are often mentioned in the paired text. We pre-train an Oscar model on the public corpus of 6.5 million text-image pairs, and fine-tune it on downstream tasks, creating new state-of-the-arts on six well-established vision-language understanding and generation tasks.

...read moreread less

887 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse