Home
/
Authors
/
Mohamed Elgharib

Author

Mohamed Elgharib

Other affiliations: Qatar Foundation, Boston University, Qatar Computing Research Institute ...read more

Bio: Mohamed Elgharib is an academic researcher from Max Planck Society. The author has contributed to research in topics: Computer science & Face (geometry). The author has an hindex of 16, co-authored 59 publications receiving 1163 citations. Previous affiliations of Mohamed Elgharib include Qatar Foundation & Boston University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2011

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Painting style transfer for head portraits using convolutional neural networks

[...]

Ahmed Selim¹, Mohamed Elgharib², Linda Doyle¹•Institutions (2)

Trinity College, Dublin¹, Qatar Computing Research Institute²

11 Jul 2016

TL;DR: This work presents a new technique for transferring the painting from a head portrait onto another and imposes novel spatial constraints by locally transferring the color distributions of the example painting to better captures the painting texture and maintains the integrity of facial structures.

...read moreread less

Abstract: Head portraits are popular in traditional painting. Automating portrait painting is challenging as the human visual system is sensitive to the slightest irregularities in human faces. Applying generic painting techniques often deforms facial structures. On the other hand portrait painting techniques are mainly designed for the graphite style and/or are based on image analogies; an example painting as well as its original unpainted version are required. This limits their domain of applicability. We present a new technique for transferring the painting from a head portrait onto another. Unlike previous work our technique only requires the example painting and is not restricted to a specific style. We impose novel spatial constraints by locally transferring the color distributions of the example painting. This better captures the painting texture and maintains the integrity of facial structures. We generate a solution through Convolutional Neural Networks and we present an extension to video. Here motion is exploited in a way to reduce temporal inconsistencies and the shower-door effect. Our approach transfers the painting style while maintaining the input photograph identity. In addition it significantly reduces facial deformations over state of the art.

...read moreread less

188 citations

Proceedings Article•DOI•

StyleRig: Rigging StyleGAN for 3D Control Over Portrait Images

[...]

Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick Pérez¹, Michael Zollhöfer², Christian Theobalt - Show less +4 more•Institutions (2)

Valeo¹, Stanford University²

14 Jun 2020

TL;DR: In this article, a rigging network is trained between the 3DMM's semantic parameters and StyleGAN's input to provide a face rig-like control over a pretrained and fixed StyleGAN via a 3DMMs.

...read moreread less

Abstract: StyleGAN generates photorealistic portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background), but lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination. Three-dimensional morphable face models (3DMMs) on the other hand offer control over the semantic parameters, but lack photorealism when rendered and only model the face interior, not other parts of a portrait image (hair, mouth interior, background). We present the first method to provide a face rig-like control over a pretrained and fixed StyleGAN via a 3DMM. A new rigging network, \textit{RigNet} is trained between the 3DMM's semantic parameters and StyleGAN's input. The network is trained in a self-supervised manner, without the need for manual annotations. At test time, our method generates portrait images with the photorealism of StyleGAN and provides explicit control over the 3D semantic parameters of the face.

...read moreread less

178 citations

Posted Content•

StyleRig: Rigging StyleGAN for 3D Control over Portrait Images

[...]

Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick Pérez¹, Michael Zollhöfer², Christian Theobalt - Show less +4 more•Institutions (2)

Valeo¹, Stanford University²

31 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents the first method to provide a face rig-like control over a pretrained and fixed StyleGAN via a 3DMM via a new rigging network, \textit{RigNet} is trained between the 3D MM's semantic parameters and StyleGAN's input.

...read moreread less

Abstract: StyleGAN generates photorealistic portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background), but lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination. Three-dimensional morphable face models (3DMMs) on the other hand offer control over the semantic parameters, but lack photorealism when rendered and only model the face interior, not other parts of a portrait image (hair, mouth interior, background). We present the first method to provide a face rig-like control over a pretrained and fixed StyleGAN via a 3DMM. A new rigging network, RigNet is trained between the 3DMM's semantic parameters and StyleGAN's input. The network is trained in a self-supervised manner, without the need for manual annotations. At test time, our method generates portrait images with the photorealism of StyleGAN and provides explicit control over the 3D semantic parameters of the face.

...read moreread less

174 citations

Journal Article•DOI•

XNect: real-time multi-person 3D motion capture with a single RGB camera

[...]

Dushyant Mehta¹, Oleksandr Sotnychenko¹, Franziska Mueller¹, Weipeng Xu¹, Mohamed Elgharib¹, Pascal Fua¹, Hans-Peter Seidel¹, Helge Rhodin¹, Gerard Pons-Moll¹, Christian Theobalt¹ - Show less +6 more•Institutions (1)

University of British Columbia¹

08 Jul 2020

TL;DR: In this paper, the SelecSLS Net is proposed to estimate 2D and 3D pose features along with identity assignments for all visible joints of all individuals in multi-person 3D motion capture.

...read moreread less

Abstract: We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates successfully in generic scenes which may contain occlusions by objects and by other people. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. In the second stage, a fullyconnected neural network turns the possibly partial (on account of occlusion) 2D pose and 3D pose features for each subject into a complete 3D pose estimate per individual. The third stage applies space-time skeletal model fitting to the predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose, and enforce temporal coherence. Our method returns the full skeletal pose in joint angles for each subject. This is a further key distinction from previous work that do not produce joint angle results of a coherent skeleton in real time for multi-person scenes. The proposed system runs on consumer hardware at a previously unseen speed of more than 30 fps given 512x320 images as input while achieving state-of-the-art accuracy, which we will demonstrate on a range of challenging real-world scenes.

...read moreread less

168 citations

Proceedings Article•DOI•

FML: Face Model Learning From Videos

[...]

Ayush Tewari¹, Florian Bernard¹, Pablo Garrido, Gaurav Bharaj, Mohamed Elgharib¹, Hans-Peter Seidel¹, Patrick Pérez², Michael Zollhöfer³, Christian Theobalt - Show less +5 more•Institutions (3)

Max Planck Society¹, Valeo², Stanford University³

01 Jun 2019

TL;DR: This work proposes multi-frame video-based self-supervised training of a deep network that learns a face identity model both in shape and appearance while jointly learning to reconstruct 3D faces.

...read moreread less

Abstract: Monocular image-based 3D reconstruction of faces is a long-standing problem in computer vision. Since image data is a 2D projection of a 3D face, the resulting depth ambiguity makes the problem ill-posed. Most existing methods rely on data-driven priors that are built from limited 3D face scans. In contrast, we propose multi-frame video-based self-supervised training of a deep network that (i) learns a face identity model both in shape and appearance while (ii) jointly learning to reconstruct 3D faces. Our face model is learned using only corpora of in-the-wild video clips collected from the Internet. This virtually endless source of training data enables learning of a highly general 3D face model. In order to achieve this, we propose a novel multi-frame consistency loss that ensures consistent shape and appearance across multiple frames of a subject's face, thus minimizing depth ambiguity. At test time we can use an arbitrary number of frames, so that we can perform both monocular as well as multi-frame reconstruction.

...read moreread less

155 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Proceedings Article•

A morphable model for the synthesis of 3D faces

[...]

Matthew Turk

01 Jan 1999

2,010 citations

Journal Article•DOI•

AutoML: A survey of the state-of-the-art

[...]

Xin He¹, Kaiyong Zhao¹, Xiaowen Chu¹•Institutions (1)

Hong Kong Baptist University¹

05 Jan 2021-Knowledge Based Systems

TL;DR: A comprehensive and up-to-date review of the state-of-the-art (SOTA) in AutoML methods according to the pipeline, covering data preparation, feature engineering, hyperparameter optimization, and neural architecture search (NAS).

...read moreread less

Abstract: Deep learning (DL) techniques have obtained remarkable achievements on various tasks, such as image recognition, object detection, and language modeling. However, building a high-quality DL system for a specific task highly relies on human expertise, hindering its wide application. Meanwhile, automated machine learning (AutoML) is a promising solution for building a DL system without human assistance and is being extensively studied. This paper presents a comprehensive and up-to-date review of the state-of-the-art (SOTA) in AutoML. According to the DL pipeline, we introduce AutoML methods – covering data preparation, feature engineering, hyperparameter optimization, and neural architecture search (NAS) – with a particular focus on NAS, as it is currently a hot sub-topic of AutoML. We summarize the representative NAS algorithms’ performance on the CIFAR-10 and ImageNet datasets and further discuss the following subjects of NAS methods: one/two-stage NAS, one-shot NAS, joint hyperparameter and architecture optimization, and resource-aware NAS. Finally, we discuss some open problems related to the existing AutoML methods for future research.

...read moreread less

809 citations

Proceedings Article•DOI•

VIBE: Video Inference for Human Body Pose and Shape Estimation

[...]

Muhammed Kocabas¹, Nikos Athanasiou¹, Michael J. Black¹•Institutions (1)

Max Planck Society¹

14 Jun 2020

TL;DR: This work defines a novel temporal network architecture with a self-attention mechanism and shows that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels.

...read moreread less

Abstract: Human motion is fundamental to understanding behavior. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methods fail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. To address this problem, we propose "Video Inference for Body Pose and Shape Estimation'' (VIBE), which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty is an adversarial learning framework that leverages AMASS to discriminate between real human motions and those produced by our temporal pose and shape regression networks. We define a novel temporal network architecture with a self-attention mechanism and show that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. We perform extensive experimentation to analyze the importance of motion and demonstrate the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving state-of-the-art performance. Code and pretrained models are available at https://github.com/mkocabas/VIBE

...read moreread less

687 citations

Posted Content•

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

[...]

Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan¹, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or¹ - Show less +3 more•Institutions (1)

Tel Aviv University¹

03 Aug 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a generic image-to-image translation framework, pixel2style2pixel (pSp), based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended latent space.

...read moreread less

Abstract: We present a generic image-to-image translation framework, Pixel2Style2Pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended W+ latent space. We first show that our encoder can directly embed real images into W+, with no additional optimization. We further introduce a dedicated identity loss which is shown to achieve improved performance in the reconstruction of an input image. We demonstrate pSp to be a simple architecture that, by leveraging a well-trained, fixed generator network, can be easily applied on a wide-range of image-to-image translation tasks. Solving these tasks through the style representation results in a global approach that does not rely on a local pixel-to-pixel correspondence and further supports multi-modal synthesis via the resampling of styles. Notably, we demonstrate that pSp can be trained to align a face image to a frontal pose without any labeled data, generate multi-modal results for ambiguous tasks such as conditional face generation from segmentation maps, and construct high-resolution images from corresponding low-resolution images.

...read moreread less

504 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse