Home
/
Authors
/
Sunil Hadap

Author

Sunil Hadap

Other affiliations: Amazon.com, Carnegie Mellon University, Laval University ...read more

Bio: Sunil Hadap is an academic researcher from Adobe Systems. The author has contributed to research in topics: Compositing & Digital painting. The author has an hindex of 30, co-authored 94 publications receiving 3495 citations. Previous affiliations of Sunil Hadap include Amazon.com & Carnegie Mellon University.

Papers published on a yearly basis

2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2001
2000
1999

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Depth from Combining Defocus and Correspondence Using Light-Field Cameras

[...]

Michael W. Tao¹, Sunil Hadap², Jitendra Malik¹, Ravi Ramamoorthi¹•Institutions (2)

University of California, Berkeley¹, Adobe Systems²

01 Dec 2013

TL;DR: A novel simple and principled algorithm is presented that computes dense depth estimation by combining both defocus and correspondence depth cues, and shows how to combine the two cues into a high quality depth map, suitable for computer vision applications such as matting, full control of depth-of-field, and surface reconstruction.

...read moreread less

Abstract: Light-field cameras have recently become available to the consumer market. An array of micro-lenses captures enough information that one can refocus images after acquisition, as well as shift one's viewpoint within the sub-apertures of the main lens, effectively obtaining multiple views. Thus, depth cues from both defocus and correspondence are available simultaneously in a single capture. Previously, defocus could be achieved only through multiple image exposures focused at different depths, while correspondence cues needed multiple exposures at different viewpoints or multiple cameras, moreover, both cues could not easily be obtained together. In this paper, we present a novel simple and principled algorithm that computes dense depth estimation by combining both defocus and correspondence depth cues. We analyze the x-u 2D epipolar image (EPI), where by convention we assume the spatial x coordinate is horizontal and the angular u coordinate is vertical (our final algorithm uses the full 4D EPI). We show that defocus depth cues are obtained by computing the horizontal (spatial) variance after vertical (angular) integration, and correspondence depth cues by computing the vertical (angular) variance. We then show how to combine the two cues into a high quality depth map, suitable for computer vision applications such as matting, full control of depth-of-field, and surface reconstruction.

...read moreread less

582 citations

Proceedings Article•DOI•

Neural Face Editing with Intrinsic Image Disentangling

[...]

Zhixin Shu¹, Ersin Yumer², Sunil Hadap², Kalyan Sunkavalli², Eli Shechtman², Dimitris Samaras³ - Show less +2 more•Institutions (3)

Stony Brook University¹, Adobe Systems², Université Paris-Saclay³

13 Apr 2017

TL;DR: The authors proposed an end-to-end generative adversarial network that infers a face-specific disentangled representation of intrinsic face properties, including shape (i.e. normals), albedo, and lighting, and an alpha matte.

...read moreread less

Abstract: Traditional face editing methods often require a number of sophisticated and task specific algorithms to be applied one after the other — a process that is tedious, fragile, and computationally intensive. In this paper, we propose an end-to-end generative adversarial network that infers a face-specific disentangled representation of intrinsic face properties, including shape (i.e. normals), albedo, and lighting, and an alpha matte. We show that this network can be trained on in-the-wild images by incorporating an in-network physically-based image formation module and appropriate loss functions. Our disentangling latent representation allows for semantically relevant edits, where one aspect of facial appearance can be manipulated while keeping orthogonal properties fixed, and we demonstrate its use for a number of facial editing applications.

...read moreread less

246 citations

Proceedings Article•DOI•

Deep Outdoor Illumination Estimation

[...]

Yannick Hold-Geoffroy¹, Kalyan Sunkavalli², Sunil Hadap², Emiliano Gambaretto², Jean-François Lalonde¹ - Show less +1 more•Institutions (2)

Laval University¹, Adobe Systems²

01 Jul 2017

TL;DR: It is demonstrated that the approach allows the recovery of plausible illumination conditions and enables photorealistic virtual object insertion from a single image and significantly outperforms previous solutions to this problem.

...read moreread less

Abstract: We present a CNN-based technique to estimate high-dynamic range outdoor illumination from a single low dynamic range image. To train the CNN, we leverage a large dataset of outdoor panoramas. We fit a low-dimensional physically-based outdoor illumination model to the skies in these panoramas giving us a compact set of parameters (including sun position, atmospheric conditions, and camera parameters). We extract limited field-of-view images from the panoramas, and train a CNN with this large set of input image–output lighting parameter pairs. Given a test image, this network can be used to infer illumination parameters that can, in turn, be used to reconstruct an outdoor illumination environment map. We demonstrate that our approach allows the recovery of plausible illumination conditions and enables photorealistic virtual object insertion from a single image. An extensive evaluation on both the panorama dataset and captured HDR environment maps shows that our technique significantly outperforms previous solutions to this problem.

...read moreread less

238 citations

Posted Content•

Neural Face Editing with Intrinsic Image Disentangling

[...]

Zhixin Shu¹, Ersin Yumer², Sunil Hadap², Kalyan Sunkavalli², Eli Shechtman², Dimitris Samaras³ - Show less +2 more•Institutions (3)

Stony Brook University¹, Adobe Systems², Université Paris-Saclay³

13 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: An end-to-end generative adversarial network is proposed that infers a face-specific disentangled representation of intrinsic face properties, including shape, albedo, and lighting, and an alpha matte, and it is shown that this network can be trained on in thewild images by incorporating an in-network physically-based image formation module and appropriate loss functions.

...read moreread less

Abstract: Traditional face editing methods often require a number of sophisticated and task specific algorithms to be applied one after the other --- a process that is tedious, fragile, and computationally intensive. In this paper, we propose an end-to-end generative adversarial network that infers a face-specific disentangled representation of intrinsic face properties, including shape (i.e. normals), albedo, and lighting, and an alpha matte. We show that this network can be trained on "in-the-wild" images by incorporating an in-network physically-based image formation module and appropriate loss functions. Our disentangling latent representation allows for semantically relevant edits, where one aspect of facial appearance can be manipulated while keeping orthogonal properties fixed, and we demonstrate its use for a number of facial editing applications.

...read moreread less

216 citations

Journal Article•DOI•

Automatic Scene Inference for 3D Object Compositing

[...]

Kevin Karsch¹, Kalyan Sunkavalli², Sunil Hadap², Nathan A. Carr², Hailin Jin², Rafael Fonte¹, Michael Sittig¹, David Forsyth¹ - Show less +4 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Adobe Systems²

02 Jun 2014-ACM Transactions on Graphics

TL;DR: a user-friendly image editing system that supports a drag-and-drop object insertion, postprocess illumination editing, and depth-of-field manipulation, and achieves the same level of realism as techniques that require significant user interaction.

...read moreread less

Abstract: We present a user-friendly image editing system that supports a drag-and-drop object insertion (where the user merely drags objects into the image, and the system automatically places them in 3D and relights them appropriately), postprocess illumination editing, and depth-of-field manipulation. Underlying our system is a fully automatic technique for recovering a comprehensive 3D scene model (geometry, illumination, diffuse albedo, and camera parameters) from a single, low dynamic range photograph. This is made possible by two novel contributions: an illumination inference algorithm that recovers a full lighting model of the scene (including light sources that are not directly visible in the photograph), and a depth estimation algorithm that combines data-driven depth transfer with geometric reasoning about the scene layout. A user study shows that our system produces perceptually convincing results, and achieves the same level of realism as techniques that require significant user interaction.

...read moreread less

203 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation

[...]

Yunjey Choi¹, Minje Choi¹, Munyoung Kim², Jung-Woo Ha³, Sunghun Kim⁴, Jaegul Choo¹ - Show less +2 more•Institutions (4)

Korea University¹, The College of New Jersey², Naver Corporation³, Hong Kong University of Science and Technology⁴

18 Jun 2018

TL;DR: StarGAN as discussed by the authors proposes a unified model architecture to perform image-to-image translation for multiple domains using only a single model, which leads to superior quality of translated images compared to existing models as well as the capability of flexibly translating an input image to any desired target domain.

...read moreread less

Abstract: Recent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains. To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network. This leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. We empirically demonstrate the effectiveness of our approach on a facial attribute transfer and a facial expression synthesis tasks.

...read moreread less

2,479 citations

Proceedings Article•DOI•

Unsupervised Monocular Depth Estimation with Left-Right Consistency

[...]

Clément Godard¹, Oisin Mac Aodha¹, Gabriel J. Brostow¹•Institutions (1)

University College London¹

21 Jul 2017

TL;DR: In this article, the authors propose a novel training objective that enables CNNs to learn to perform single image depth estimation, despite the absence of ground truth depth data, by generating disparity images by training their network with an image reconstruction loss.

...read moreread less

Abstract: Learning based methods have shown very promising results for the task of depth estimation in single images. However, most existing approaches treat depth prediction as a supervised regression problem and as a result, require vast quantities of corresponding ground truth depth data for training. Just recording quality depth data in a range of environments is a challenging problem. In this paper, we innovate beyond existing approaches, replacing the use of explicit depth data during training with easier-to-obtain binocular stereo footage. We propose a novel training objective that enables our convolutional neural network to learn to perform single image depth estimation, despite the absence of ground truth depth data. Ex-ploiting epipolar geometry constraints, we generate disparity images by training our network with an image reconstruction loss. We show that solving for image reconstruction alone results in poor quality depth images. To overcome this problem, we propose a novel training loss that enforces consistency between the disparities produced relative to both the left and right images, leading to improved performance and robustness compared to existing approaches. Our method produces state of the art results for monocular depth estimation on the KITTI driving dataset, even outperforming supervised methods that have been trained with ground truth depth.

...read moreread less

2,239 citations

Posted Content•

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

[...]

Yunjey Choi¹, Minje Choi¹, Munyoung Kim², Jung-Woo Ha³, Sunghun Kim⁴, Jaegul Choo¹ - Show less +2 more•Institutions (4)

Korea University¹, The College of New Jersey², Naver Corporation³, Hong Kong University of Science and Technology⁴

24 Nov 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: A unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network, which leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain.

...read moreread less

2,033 citations

Proceedings Article•

A morphable model for the synthesis of 3D faces

[...]

Matthew Turk

01 Jan 1999

2,010 citations

Proceedings Article•

Robot vision

[...]

Y.J. Tejwani¹•Institutions (1)

Marquette University¹

01 Jan 1989

TL;DR: A scheme is developed for classifying the types of motion perceived by a humanlike robot and equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented.

...read moreread less

Abstract: A scheme is developed for classifying the types of motion perceived by a humanlike robot. It is assumed that the robot receives visual images of the scene using a perspective system model. Equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented. >

...read moreread less

2,000 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse