scispace - formally typeset
Search or ask a question

Showing papers by "Paul Debevec published in 2019"


Proceedings ArticleDOI
15 Jun 2019
TL;DR: This work presents a novel approach to view synthesis using multiplane images (MPIs) that incorporates occlusion reasoning, improving performance on challenging scene features such as object boundaries, lighting reflections, thin structures, and scenes with high depth complexity.
Abstract: We present a novel approach to view synthesis using multiplane images (MPIs). Building on recent advances in learned gradient descent, our algorithm generates an MPI from a set of sparse camera viewpoints. The resulting method incorporates occlusion reasoning, improving performance on challenging scene features such as object boundaries, lighting reflections, thin structures, and scenes with high depth complexity. We show that our method achieves high-quality, state-of-the-art results on two datasets: the Kalantari light field dataset, and a new camera array dataset, Spaces, which we make publicly available.

335 citations


Journal ArticleDOI
TL;DR: In this paper, a neural network is trained on a small database of 18 individuals captured under different directional light sources in a controlled light stage setup consisting of a densely sampled sphere of lights.
Abstract: Lighting plays a central role in conveying the essence and depth of the subject in a portrait photograph. Professional photographers will carefully control the lighting in their studio to manipulate the appearance of their subject, while consumer photographers are usually constrained to the illumination of their environment. Though prior works have explored techniques for relighting an image, their utility is usually limited due to requirements of specialized hardware, multiple images of the subject under controlled or known illuminations, or accurate models of geometry and reflectance. To this end, we present a system for portrait relighting: a neural network that takes as input a single RGB image of a portrait taken with a standard cellphone camera in an unconstrained environment, and from that image produces a relit image of that subject as though it were illuminated according to any provided environment map. Our method is trained on a small database of 18 individuals captured under different directional light sources in a controlled light stage setup consisting of a densely sampled sphere of lights. Our proposed technique produces quantitatively superior results on our dataset's validation set compared to prior works, and produces convincing qualitative relighting results on a dataset of hundreds of real-world cellphone portraits. Because our technique can produce a 640 × 640 image in only 160 milliseconds, it may enable interactive user-facing photographic applications in the future.

179 citations


Journal ArticleDOI
TL;DR: Multiple experiments, comparisons, and applications show that The Relightables significantly improves upon the level of realism in placing volumetrically captured human performances into arbitrary CG scenes.
Abstract: We present "The Relightables", a volumetric capture system for photorealistic and high quality relightable full-body performance capture. While significant progress has been made on volumetric capture systems, focusing on 3D geometric reconstruction with high resolution textures, much less work has been done to recover photometric properties needed for relighting. Results from such systems lack high-frequency details and the subject's shading is prebaked into the texture. In contrast, a large body of work has addressed relightable acquisition for image-based approaches, which photograph the subject under a set of basis lighting conditions and recombine the images to show the subject as they would appear in a target lighting environment. However, to date, these approaches have not been adapted for use in the context of a high-resolution volumetric capture system. Our method combines this ability to realistically relight humans for arbitrary environments, with the benefits of free-viewpoint volumetric capture and new levels of geometric accuracy for dynamic performances. Our subjects are recorded inside a custom geodesic sphere outfitted with 331 custom color LED lights, an array of high-resolution cameras, and a set of custom high-resolution depth sensors. Our system innovates in multiple areas: First, we designed a novel active depth sensor to capture 12.4 MP depth maps, which we describe in detail. Second, we show how to design a hybrid geometric and machine learning reconstruction pipeline to process the high resolution input and output a volumetric video. Third, we generate temporally consistent reflectance maps for dynamic performers by leveraging the information contained in two alternating color gradient illumination images acquired at 60Hz. Multiple experiments, comparisons, and applications show that The Relightables significantly improves upon the level of realism in placing volumetrically captured human performances into arbitrary CG scenes.

121 citations


Posted Content
TL;DR: In this paper, an approach to view synthesis using multiplane images (MPIs) is presented, which incorporates occlusion reasoning, improving performance on challenging scene features such as object boundaries, lighting reflections, thin structures, and scenes with high depth complexity.
Abstract: We present a novel approach to view synthesis using multiplane images (MPIs). Building on recent advances in learned gradient descent, our algorithm generates an MPI from a set of sparse camera viewpoints. The resulting method incorporates occlusion reasoning, improving performance on challenging scene features such as object boundaries, lighting reflections, thin structures, and scenes with high depth complexity. We show that our method achieves high-quality, state-of-the-art results on two datasets: the Kalantari light field dataset, and a new camera array dataset, Spaces, which we make publicly available.

100 citations


Journal ArticleDOI
TL;DR: In this article, a neural network is trained on a small database of 18 individuals captured under different directional light sources in a controlled light stage setup consisting of a densely sampled sphere of lights.
Abstract: Lighting plays a central role in conveying the essence and depth of the subject in a portrait photograph. Professional photographers will carefully control the lighting in their studio to manipulate the appearance of their subject, while consumer photographers are usually constrained to the illumination of their environment. Though prior works have explored techniques for relighting an image, their utility is usually limited due to requirements of specialized hardware, multiple images of the subject under controlled or known illuminations, or accurate models of geometry and reflectance. To this end, we present a system for portrait relighting: a neural network that takes as input a single RGB image of a portrait taken with a standard cellphone camera in an unconstrained environment, and from that image produces a relit image of that subject as though it were illuminated according to any provided environment map. Our method is trained on a small database of 18 individuals captured under different directional light sources in a controlled light stage setup consisting of a densely sampled sphere of lights. Our proposed technique produces quantitatively superior results on our dataset's validation set compared to prior works, and produces convincing qualitative relighting results on a dataset of hundreds of real-world cellphone portraits. Because our technique can produce a 640 $\times$ 640 image in only 160 milliseconds, it may enable interactive user-facing photographic applications in the future.

93 citations


Journal ArticleDOI
TL;DR: A novel technique to relight images of human faces by learning a model of facial reflectance from a database of 4D reflectance field data of several subjects in a variety of expressions and viewpoints is presented.
Abstract: We present a novel technique to relight images of human faces by learning a model of facial reflectance from a database of 4D reflectance field data of several subjects in a variety of expressions and viewpoints. Using our learned model, a face can be relit in arbitrary illumination environments using only two original images recorded under spherical color gradient illumination. The output of our deep network indicates that the color gradient images contain the information needed to estimate the full 4D reflectance field, including specular reflections and high frequency details. While capturing spherical color gradient illumination still requires a special lighting setup, reduction to just two illumination conditions allows the technique to be applied to dynamic facial performance capture. We show side-by-side comparisons which demonstrate that the proposed system outperforms the state-of-the-art techniques in both realism and speed.

63 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: The authors' inference runs at interactive frame rates on a mobile device, enabling realistic rendering of virtual objects into real scenes for mobile mixed reality and improves the realism of rendered objects compared to the state-of-the art methods for both indoor and outdoor scenes.
Abstract: We present a learning-based method to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field of view (FOV). For training data, we collect videos of various reflective spheres placed within the camera's FOV, leaving most of the background unoccluded, leveraging that materials with diverse reflectance functions reveal different lighting cues in a single exposure. We train a deep neural network to regress from the LDR background image to HDR lighting by matching the LDR ground truth sphere images to those rendered with the predicted illumination using image-based relighting, which is differentiable. Our inference runs at interactive frame rates on a mobile device, enabling realistic rendering of virtual objects into real scenes for mobile mixed reality. Training on automatically exposed and white-balanced videos, we improve the realism of rendered objects compared to the state-of-the art methods for both indoor and outdoor scenes.

57 citations


Proceedings ArticleDOI
28 Jul 2019
TL;DR: In this paper, a learning-based method was proposed to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range image from a mobile phone camera with a limited field of view (FOV).
Abstract: We present a learning-based method to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field of view (FOV). For training data, we collect videos of various reflective spheres placed within the camera's FOV, leaving most of the background unoccluded, leveraging that materials with diverse reflectance functions reveal different lighting cues in a single exposure. We train a deep neural network to regress from the LDR background image to HDR lighting by matching the LDR ground truth sphere images to those rendered with the predicted illumination using image-based relighting, which is differentiable. Our inference runs at interactive frame rates on a mobile device, enabling realistic rendering of virtual objects into real scenes for mobile mixed reality. Training on auto-exposed and white-balanced videos, we improve the realism of rendered objects compared to the state-of-the art methods for both indoor and outdoor scenes.

46 citations


Posted Content
TL;DR: The authors' inference runs at interactive frame rates on a mobile device, enabling realistic rendering of virtual objects into real scenes for mobile mixed reality and improves the realism of rendered objects compared to the state-of-the art methods for both indoor and outdoor scenes.
Abstract: We present a learning-based method to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field of view (FOV). For training data, we collect videos of various reflective spheres placed within the camera's FOV, leaving most of the background unoccluded, leveraging that materials with diverse reflectance functions reveal different lighting cues in a single exposure. We train a deep neural network to regress from the LDR background image to HDR lighting by matching the LDR ground truth sphere images to those rendered with the predicted illumination using image-based relighting, which is differentiable. Our inference runs at interactive frame rates on a mobile device, enabling realistic rendering of virtual objects into real scenes for mobile mixed reality. Training on automatically exposed and white-balanced videos, we improve the realism of rendered objects compared to the state-of-the art methods for both indoor and outdoor scenes.

18 citations



Proceedings ArticleDOI
28 Jul 2019
TL;DR: A variety of new compositing techniques using Multi-plane Images (MPI's) derived from footage shot with an inexpensive and portable light field video camera array are presented, and a simple workflow which offers new creative capabilities is demonstrated.
Abstract: We present a variety of new compositing techniques using Multi-plane Images (MPI's) [Zhou et al. 2018] derived from footage shot with an inexpensive and portable light field video camera array. The effects include camera stabilization, foreground object removal, synthetic depth of field, and deep compositing. Traditional compositing is based around layering RGBA images to visually integrate elements into the same scene, and often requires manual 2D and/or 3D artist intervention to achieve realism in the presence of volumetric effects such as smoke or splashing water. We leverage the newly introduced DeepView solver [Flynn et al. 2019] and a light field camera array to generate MPIs stored in the DeepEXR format for compositing with realistic spatial integration and a simple workflow which offers new creative capabilities. We demonstrate using this technique by combining footage that would otherwise be very challenging and time intensive to achieve when using traditional techniques, with minimal artist intervention.

Proceedings ArticleDOI
17 Nov 2019
TL;DR: This work presents a portable multi-camera system for recording panoramic light field video content that contains 47 time-synchronized cameras distributed on the surface of a hemispherical, 0.92 meter diameter plastic dome.
Abstract: We present a portable multi-camera system for recording panoramic light field video content. The proposed system captures wide baseline (0.8 meters), high resolution (>15 pixels per degree), large field of view (>220°) light fields at 30 frames per second. The array contains 47 time-synchronized cameras distributed on the surface of a hemispherical, 0.92 meter diameter plastic dome. We use commercially available action sports cameras (Yi 4k) mounted inside the dome using 3D printed brackets. The dome, mounts, triggering hardware and cameras are inexpensive and the array itself is easy to fabricate. Using modern view interpolation algorithms we can render objects as close as 33-cm to the surface of the array.

Proceedings ArticleDOI
17 Nov 2019
TL;DR: An end-to-end system to generate 3D assets that enable real-time rendering of an opera on high end mobile phones is shown, showing how to deliver an immersive mixed reality experience in every user’s living room.
Abstract: Motivated by recent availability of augmented and virtual reality platforms, we tackle the challenging problem of immersive storytelling experiences on mobile devices. In particular, we show an end-to-end system to generate 3D assets that enable real-time rendering of an opera on high end mobile phones. We call our system AR-ia and in this paper we walk through the main components and technical challenges of such a system, showing how to deliver an immersive mixed reality experience in every user’s living room.