Home
/
Authors
/
Mohamed Omran

Author

Mohamed Omran

Bio: Mohamed Omran is an academic researcher from Max Planck Society. The author has contributed to research in topics: Pose & Pedestrian detection. The author has an hindex of 16, co-authored 17 publications receiving 9627 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

The Cityscapes Dataset for Semantic Urban Scene Understanding

[...]

Marius Cordts¹, Mohamed Omran², Sebastian Ramos³, Timo Rehfeld¹, Markus Enzweiler³, Rodrigo Benenson², Uwe Franke³, Stefan Roth¹, Bernt Schiele² - Show less +5 more•Institutions (3)

Technische Universität Darmstadt¹, Max Planck Society², Daimler AG³

01 Jun 2016

TL;DR: This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.

...read moreread less

Abstract: Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations, 20 000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.

...read moreread less

7,547 citations

Posted Content•

The Cityscapes Dataset for Semantic Urban Scene Understanding

[...]

Marius Cordts¹, Mohamed Omran², Sebastian Ramos³, Timo Rehfeld¹, Markus Enzweiler³, Rodrigo Benenson², Uwe Franke³, Stefan Roth¹, Bernt Schiele² - Show less +5 more•Institutions (3)

Technische Universität Darmstadt¹, Max Planck Society², Daimler AG³

06 Apr 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Cityscapes as discussed by the authors is a large-scale dataset for semantic urban scene understanding, consisting of 5000 images with high quality pixel-level annotations and 200,000 additional images with coarse annotations.

...read moreread less

Abstract: Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.

...read moreread less

3,503 citations

Book Chapter•DOI•

Ten Years of Pedestrian Detection, What Have We Learned?

[...]

Rodrigo Benenson¹, Mohamed Omran¹, Jan Hosang¹, Bernt Schiele¹•Institutions (1)

Max Planck Society¹

06 Sep 2014

TL;DR: This work analyzes the remarkable progress of the last decade by dis- cussing the main ideas explored in the 40+ detectors currently present in the Caltech pedestrian detection benchmark to find a new decision forest detector.

...read moreread less

Abstract: Paper-by-paper results make it easy to miss the forest for the trees.We analyse the remarkable progress of the last decade by dis- cussing the main ideas explored in the 40+ detectors currently present in the Caltech pedestrian detection benchmark. We observe that there exist three families of approaches, all currently reaching similar detec- tion quality. Based on our analysis, we study the complementarity of the most promising ideas by combining multiple published strategies. This new decision forest detector achieves the current best known performance on the challenging Caltech-USA dataset.

...read moreread less

662 citations

Proceedings Article•DOI•

Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation

[...]

Mohamed Omran¹, Christoph Lassner¹, Gerard Pons-Moll¹, Peter V. Gehler¹, Bernt Schiele¹ - Show less +1 more•Institutions (1)

Max Planck Society¹

17 Aug 2018

TL;DR: Neural Body Fitting (NBF) as discussed by the authors integrates a statistical body model as a layer within a CNN leveraging both reliable bottom-up body part segmentation and robust top-down body model constraints.

...read moreread less

Abstract: Direct prediction of 3D body pose and shape parameters remains a challenge even for highly parameterized, deep learning models. The representation of the prediction space is difficult to map to from the plain 2D image space, perspective ambiguities make the loss function noisy and training data is scarce. In this paper, we propose a novel approach (Neural Body Fitting (NBF)) that integrates a statistical body model as a layer within a CNN leveraging both reliable bottom-up body part segmentation and robust top-down body model constraints. NBF is fully differentiable and can be trained end-to-end from both 2D and 3D annotations. In detailed experiments we analyze how the components of our model improve model performance and present a robust, easy to use, end-to-end trainable framework for 3D human pose estimation from single 2D images.

...read moreread less

517 citations

Proceedings Article•DOI•

How Far are We from Solving Pedestrian Detection

[...]

Shanshan Zhang¹, Rodrigo Benenson¹, Mohamed Omran¹, Jan Hosang¹, Bernt Schiele¹ - Show less +1 more•Institutions (1)

Max Planck Society¹

03 Feb 2016

TL;DR: The gap between current state-of-the-art methods and the "perfect single frame detector" is investigated, the impact of training annotation noise on the detector performance is studied, and it is shown that one can improve even with a small portion of sanitised training data.

...read moreread less

Abstract: Encouraged by the recent progress in pedestrian detection, we investigate the gap between current state-of-the-art methods and the "perfect single frame detector". We enable our analysis by creating a human baseline for pedestrian detection (over the Caltech dataset), and by manually clustering the recurrent errors of a top detector. Our results characterise both localisation and background-versusforeground errors. To address localisation errors we study the impact of training annotation noise on the detector performance, and show that we can improve even with a small portion of sanitised training data. To address background/foreground discrimination, we study convnets for pedestrian detection, and discuss which factors affect their performance. Other than our in-depth analysis, we report top performance on the Caltech dataset, and provide a new sanitised set of training and test annotations.

...read moreread less

478 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Mask R-CNN

[...]

Kaiming He¹, Georgia Gkioxari¹, Piotr Dollár², Ross Girshick²•Institutions (2)

Facebook¹, École Centrale Paris²

20 Mar 2017

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

...read moreread less

Abstract: We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available.

...read moreread less

14,299 citations

Journal Article•DOI•

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

[...]

Vijay Badrinarayanan¹, Alex Kendall¹, Roberto Cipolla¹•Institutions (1)

University of Cambridge¹

01 Dec 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures, including FCN and DeconvNet.

...read moreread less

Abstract: We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1] . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/ .

...read moreread less

13,468 citations

Proceedings Article•DOI•

Image-to-Image Translation with Conditional Adversarial Networks

[...]

Phillip Isola¹, Jun-Yan Zhu¹, Tinghui Zhou¹, Alexei A. Efros¹•Institutions (1)

University of California, Berkeley¹

21 Jul 2017

TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

...read moreread less

Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

...read moreread less

11,958 citations

Journal Article•DOI•

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

[...]

Liang-Chieh Chen¹, George Papandreou¹, Iasonas Kokkinos², Kevin Murphy¹, Alan L. Yuille³ - Show less +1 more•Institutions (3)

Google¹, University College London², Johns Hopkins University³

01 Apr 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.

...read moreread less

Abstract: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First , we highlight convolution with upsampled filters, or ‘atrous convolution’, as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second , we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third , we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed “DeepLab” system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7 percent mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

...read moreread less

11,856 citations

Proceedings Article•DOI•

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

[...]

Jun-Yan Zhu¹, Taesung Park¹, Phillip Isola¹, Alexei A. Efros¹•Institutions (1)

University of California, Berkeley¹

01 Oct 2017

TL;DR: CycleGAN as discussed by the authors learns a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.

...read moreread less

Abstract: Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F : Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.

...read moreread less

11,682 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse