Multi-view relighting using a geometry-aware network
read more
Citations
State of the Art on Neural Rendering
State of the Art on Neural Rendering
Neural Reflectance Fields for Appearance Acquisition
NeuTex: Neural Texture Mapping for Volumetric Neural Rendering
Crowdsampling the Plenoptic Function
References
Deep Residual Learning for Image Recognition
Adam: A Method for Stochastic Optimization
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
Image-to-Image Translation with Conditional Adversarial Networks
Related Papers (5)
Frequently Asked Questions (12)
Q2. What is used to create the ground truth shadow masks?
The ground truth geometry and materials are used to render the sun and sky layers, and to create the ground truth greyscale shadow masks.
Q3. What is the basic premise of their approach?
The basic premise of their approach is to use multi-view information and approximate 3D geometry to reason about non-local lighting interactions and guide the relighting task.
Q4. How long does it take to render a converged image?
4.2 Photo-realistic rendering, layer decomposition and compositing-based data augmentationPath-tracing complex outdoors sceneswith a physically-based sun/sky model is expensive: rendering a converged image at 1024×768 takes about 10 minutes on their 400-core cluster.
Q5. How do the authors train the network to generate shadow images?
For shadow refinement to be successful at test time, the network needs to learn the mapping between approximate proxy shadows and ground truth shadows at training time.
Q6. What is the weight for the contribution of a given image to the color of a ?
The weight for the contribution of a given image i to the color of a pixel in the RGB shadow image is computed as:1 | |xo − pi (xo)| |22 · |1 + c ᵀ i dsun |2 + ϵ , (1)where ci is a unit vector giving the direction from camera i to xo, pi (xo) ∈ R3 is the first intersection of the camera ray defined by ci with the proxy (Fig. 5) and ϵ = 1e−5.
Q7. Why are the target masks not aligned with the input image?
Since the authors want to change the lighting, the target masks are generally not aligned with the shadows in the input image, making the problem inherently more ambiguous.
Q8. What is the purpose of the reprocessing of the source and target shadow images?
The source shadow refinement process uses the actual boundary in the input image, giving better overall results compared to the target shadow refinement (Fig. 4, (e)).3.2.2 RGB shadow images.
Q9. How are the three sub-modules of their network trained?
The three sub-modules of their network are trained jointly in a supervised manner to minimize the sum of three losses:L = Lrelight + Lsrc + Ltgt. (2) These loss functions compare the accuracy of their network’s predictions (the final relit image as well as both intermediate refined shadow masks) to synthetic ground truth, which the authors detail in Section 4.
Q10. What are the sub-networks used for shadow refinement?
These shadow images re-project colors from the shadow-casting geometry from all viewpoints into pixels in shadow, helping the network identify erroneously reconstructed shadow casters from the reprojected color (Fig. 4,5).
Q11. What is the method for detecting shadows?
Other approaches include Lalonde et al. [2010] which uses Conditional Random Fields to detect the shadow, or Mohan et al. [2007] which is a gradient-based solution for shadow removal.
Q12. How do the authors bypass the problems of synthetic training?
To bypass these issues, the authors use synthetic training data and render photo-realistic images using the Mitsuba [Jakob 2010] pathtracer.