Home
/
Authors
/
Mohammad Saeed Rad

Author

Mohammad Saeed Rad

École Polytechnique Fédérale de Lausanne

Bio: Mohammad Saeed Rad is an academic researcher from École Polytechnique Fédérale de Lausanne. The author has contributed to research in topics: Pixel & Real image. The author has an hindex of 7, co-authored 16 publications receiving 192 citations.

Topics: Pixel, Real image, Facial expression, Face (geometry), Multi-task learning ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

SROBB: Targeted Perceptual Loss for Single Image Super-Resolution

[...]

Mohammad Saeed Rad¹, Behzad Bozorgtabar¹, Urs-Viktor Marti², Max Basler², Hazim Kemal Ekenel¹, Jean-Philippe Thiran¹ - Show less +2 more•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Swisscom²

01 Oct 2019

TL;DR: In this paper, the authors optimize a deep network-based decoder with a targeted objective function that penalizes images at different semantic levels using the corresponding terms, which results in more realistic textures and sharper edges.

...read moreread less

Abstract: By benefiting from perceptual losses, recent studies have improved significantly the performance of the super-resolution task, where a high-resolution image is resolved from its low-resolution counterpart. Although such objective functions generate near-photorealistic results, their capability is limited, since they estimate the reconstruction error for an entire image in the same way, without considering any semantic information. In this paper, we propose a novel method to benefit from perceptual loss in a more objective way. We optimize a deep network-based decoder with a targeted objective function that penalizes images at different semantic levels using the corresponding terms. In particular, the proposed method leverages our proposed OBB (Object, Background and Boundary) labels, generated from segmentation labels, to estimate a suitable perceptual loss for boundaries, while considering texture similarity for backgrounds. We show that our proposed approach results in more realistic textures and sharper edges, and outperforms other state-of-the-art algorithms in terms of both qualitative results on standard benchmarks and results of extensive user studies.

...read moreread less

113 citations

Proceedings Article•DOI•

SynDeMo: Synergistic Deep Feature Alignment for Joint Learning of Depth and Ego-Motion

[...]

Behzad Bozorgtabar¹, Mohammad Saeed Rad¹, Dwarikanath Mahapatra², Jean-Philippe Thiran¹•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, IBM²

27 Oct 2019

TL;DR: Extensive experiments demonstrate that the depth and ego-motion models surpass the state-of-the-art, unsupervised methods and compare favorably to early supervised deep models for geometric understanding.

...read moreread less

Abstract: Despite well-established baselines, learning of scene depth and ego-motion from monocular video remains an ongoing challenge, specifically when handling scaling ambiguity issues and depth inconsistencies in image sequences. Much prior work uses either a supervised mode of learning or stereo images. The former is limited by the amount of labeled data, as it requires expensive sensors, while the latter is not always readily available as monocular sequences. In this work, we demonstrate the benefit of using geometric information from synthetic images, coupled with scene depth information, to recover the scale in depth and ego-motion estimation from monocular videos. We developed our framework using synthetic image-depth pairs and unlabeled real monocular images. We had three training objectives: first, to use deep feature alignment to reduce the domain gap between synthetic and monocular images to yield more accurate depth estimation when presented with only real monocular images at test time. Second, we learn scene specific representation by exploiting self-supervision coming from multi-view synthetic images without the need for depth labels. Third, our method uses single-view depth and pose networks, which are capable of jointly training and supervising one another mutually, yielding consistent depth and ego-motion estimates. Extensive experiments demonstrate that our depth and ego-motion models surpass the state-of-the-art, unsupervised methods and compare favorably to early supervised deep models for geometric understanding. We validate the effectiveness of our training objectives against standard benchmarks thorough an ablation study.

...read moreread less

51 citations

Book Chapter•DOI•

A Computer Vision System to Localize and Classify Wastes on the Streets

[...]

Mohammad Saeed Rad¹, Andreas von Kaenel, Andre Droux, Francois Tieche², Nabil Ouerhani², Hazim Kemal Ekenel³, Jean-Philippe Thiran¹ - Show less +3 more•Institutions (3)

École Polytechnique Fédérale de Lausanne¹, École Normale Supérieure², Istanbul Technical University³

10 Jul 2017

TL;DR: A fully automated computer vision application for littering quantification based on images taken from the streets and sidewalks using a deep learning based framework to localize and classify different types of wastes.

...read moreread less

Abstract: Littering quantification is an important step for improving cleanliness of cities. When human interpretation is too cumbersome or in some cases impossible, an objective index of cleanliness could reduce the littering by awareness actions. In this paper, we present a fully automated computer vision application for littering quantification based on images taken from the streets and sidewalks. We have employed a deep learning based framework to localize and classify different types of wastes. Since there was no waste dataset available, we built our acquisition system mounted on a vehicle. Collected images containing different types of wastes. These images are then annotated for training and benchmarking the developed system. Our results on real case scenarios show accurate detection of littering on variant backgrounds.

...read moreread less

50 citations

Journal Article•DOI•

Learn to synthesize and synthesize to learn

[...]

Behzad Bozorgtabar¹, Mohammad Saeed Rad¹, Hazim Kemal Ekenel², Jean-Philippe Thiran¹, Jean-Philippe Thiran³ - Show less +1 more•Institutions (3)

École Polytechnique Fédérale de Lausanne¹, Istanbul Technical University², University of Lausanne³

01 Aug 2019-Computer Vision and Image Understanding

TL;DR: Compared to existing models, synthetic face images generated by the proposed attribute guided face image generation method present a good photorealistic quality on several face datasets and can be used for synthetic data augmentation, and improve the performance of the classifier used for facial expression recognition.

...read moreread less

25 citations

Proceedings Article•DOI•

Using Photorealistic Face Synthesis and Domain Adaptation to Improve Facial Expression Analysis

[...]

Behzad Bozorgtabar¹, Mohammad Saeed Rad¹, Hazim Kemal Ekenel², Jean-Philippe Thiran¹•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Istanbul Technical University²

14 May 2019

TL;DR: In this paper, a new attribute guided face image synthesis model is proposed to perform a translation between multiple image domains using a single model, and the model can learn from synthetic faces by matching the feature distributions between different domains while preserving each domain's characteristics.

...read moreread less

Abstract: Cross-domain synthesizing realistic faces to learn deep models has attracted increasing attention for facial expression analysis as it helps to improve the performance of expression recognition accuracy despite having small number of real training images. However, learning from synthetic face images can be problematic due to the distribution discrepancy between low-quality synthetic images and real face images and may not achieve the desired performance when the learned model applies to real world scenarios. To this end, we propose a new attribute guided face image synthesis to perform a translation between multiple image domains using a single model. In addition, we adopt the proposed model to learn from synthetic faces by matching the feature distributions between different domains while preserving each domain’s characteristics. We evaluate the effectiveness of the proposed approach on several face datasets on generating realistic face images. We demonstrate that the expression recognition performance can be enhanced by benefiting from our face synthesis model. Moreover, we also conduct experiments on a near-infrared dataset containing facial expression videos of drivers to assess the performance using in-the-wild data for driver emotion recognition.

...read moreread less

23 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Book Chapter•DOI•

Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance

[...]

Marvin Klingner¹, Jan-Aike Termöhlen¹, Jonas Mikolajczyk¹, Tim Fingscheidt¹•Institutions (1)

Braunschweig University of Technology¹

23 Aug 2020

TL;DR: A new self-supervised semantically-guided depth estimation (SGDepth) method to deal with moving dynamic-class (DC) objects, such as moving cars and pedestrians, which violate the static-world assumptions typically made during training of such models.

...read moreread less

Abstract: Self-supervised monocular depth estimation presents a powerful method to obtain 3D scene information from single camera images, which is trainable on arbitrary image sequences without requiring depth labels, e.g., from a LiDAR sensor. In this work we present a new self-supervised semantically-guided depth estimation (SGDepth) method to deal with moving dynamic-class (DC) objects, such as moving cars and pedestrians, which violate the static-world assumptions typically made during training of such models. Specifically, we propose (i) mutually beneficial cross-domain training of (supervised) semantic segmentation and self-supervised depth estimation with task-specific network heads, (ii) a semantic masking scheme providing guidance to prevent moving DC objects from contaminating the photometric loss, and (iii) a detection method for frames with non-moving DC objects, from which the depth of DC objects can be learned. We demonstrate the performance of our method on several benchmarks, in particular on the Eigen split, where we exceed all baselines without test-time refinement.

...read moreread less

217 citations

Journal Article•DOI•

Multilayer Hybrid Deep-Learning Method for Waste Classification and Recycling.

[...]

Yinghao Chu, Chen Huang, Xiaodan Xie¹, Bohai Tan, Shyam Kamal², Xiaogang Xiong³ - Show less +2 more•Institutions (3)

Ohio University¹, Indian Institute of Technology (BHU) Varanasi², Harbin Institute of Technology³

01 Nov 2018-Computational Intelligence and Neuroscience

TL;DR: The MHS is trained and validated against the manually labelled items, achieving overall classification accuracy higher than 90% under two different testing scenarios, which significantly outperforms a reference CNN-based method relying on image-only inputs.

...read moreread less

Abstract: This study proposes a multilayer hybrid deep-learning system (MHS) to automatically sort waste disposed of by individuals in the urban public area. This system deploys a high-resolution camera to capture waste image and sensors to detect other useful feature information. The MHS uses a CNN-based algorithm to extract image features and a multilayer perceptrons (MLP) method to consolidate image features and other feature information to classify wastes as recyclable or the others. The MHS is trained and validated against the manually labelled items, achieving overall classification accuracy higher than 90% under two different testing scenarios, which significantly outperforms a reference CNN-based method relying on image-only inputs.

...read moreread less

154 citations

Proceedings Article•DOI•

Structure-Preserving Super Resolution With Gradient Guidance

[...]

Cheng Ma¹, Yongming Rao¹, Yean Cheng¹, Ce Chen¹, Jiwen Lu¹, Jie Zhou¹ - Show less +2 more•Institutions (1)

Tsinghua University¹

14 Jun 2020

TL;DR: Zhang et al. as discussed by the authors proposed a structure-preserving super resolution method to alleviate the undesired structural distortions in the recovered images by exploiting gradient maps of images to guide the recovery in two aspects.

...read moreread less

Abstract: Structures matter in single image super resolution (SISR). Recent studies benefiting from generative adversarial network (GAN) have promoted the development of SISR by recovering photo-realistic images. However, there are always undesired structural distortions in the recovered images. In this paper, we propose a structure-preserving super resolution method to alleviate the above issue while maintaining the merits of GAN-based methods to generate perceptual-pleasant details. Specifically, we exploit gradient maps of images to guide the recovery in two aspects. On the one hand, we restore high-resolution gradient maps by a gradient branch to provide additional structure priors for the SR process. On the other hand, we propose a gradient loss which imposes a second-order restriction on the super-resolved images. Along with the previous image-space loss functions, the gradient-space objectives help generative networks concentrate more on geometric structures. Moreover, our method is model-agnostic, which can be potentially used for off-the-shelf SR networks. Experimental results show that we achieve the best PI and LPIPS performance and meanwhile comparable PSNR and SSIM compared with state-of-the-art perceptual-driven SR methods. Visual results demonstrate our superiority in restoring structures while generating natural SR images.

...read moreread less

143 citations

Posted Content•

Structure-Preserving Super Resolution with Gradient Guidance

[...]

Cheng Ma¹, Yongming Rao¹, Yean Cheng¹, Ce Chen¹, Jiwen Lu¹, Jie Zhou¹ - Show less +2 more•Institutions (1)

Tsinghua University¹

29 Mar 2020-arXiv: Image and Video Processing

TL;DR: A structure-preserving super resolution method which exploits gradient maps of images to guide the recovery in two aspects and proposes a gradient loss which imposes a second-order restriction on the super-resolved images.

...read moreread less

127 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

Collapse