SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects.
read more
Citations
You Only Need Adversarial Supervision for Semantic Image Synthesis
Image Inpainting Guided by Coherence Priors of Semantics and Textures
Image Inpainting Guided by Coherence Priors of Semantics and Textures
AIM 2020 Challenge on Image Extreme Inpainting
AIM 2020 Challenge on Image Extreme Inpainting
References
Adam: A Method for Stochastic Optimization
Image quality assessment: from error visibility to structural similarity
Generative Adversarial Nets
Image-to-Image Translation with Conditional Adversarial Networks
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Related Papers (5)
Frequently Asked Questions (13)
Q2. What are the future works mentioned in the paper "Sesame: semantic editing of scenes by adding, manipulating or erasing objects" ?
As a future research direction, the authors plan to extend this work on image generation conditioned on other types of information, e. g., scene graphs, could also benefit from their two-stream discriminator. The authors will open-source the code and the models under the repository name OpenSESAME.
Q3. How do the authors train the SESAME Generator?
The authors train the Generator in an adversarial manner using the following losses: Perceptual Loss [17], Feature Matching Loss [41] and Hinge Loss [24,46,30] as the Adversarial Loss.
Q4. How do the authors train the SESAME discriminator?
For training the authors are using the Two Time-Scale Update Rule [11] to determine the scale between the learning rate of the generator and the discriminators, with lrgen = 0.0001 and lrdisc = 0.0004.
Q5. How many images are used for the validation set?
The dataset contains 3,000 street-level view images of 50 different cities in Europe for the training set and 500 images for the validation set.
Q6. What is the advantage of using semantic labels?
In particular, their method is able to edit images with pixel-level guidance of semanticlabels, permitting full control over the output.
Q7. What is the semantics of the discriminator?
The generator is a Encoder-Decoder architecture, with dilated convolutions[53] and SPADE[35] layers, explained in section 3 and the discriminator is a two-stream patch discriminator, described in section 3.SESAME Generator.
Q8. What are the different approaches to generating images?
There are many approaches targeting multiple levels of abstraction and locality of features that the authors seek to encapsulate in the output.
Q9. How do the authors measure the performance of their architecture?
In order to showcase the benefits of their approach the authors ablate the performance of their architecture by varying (a) the generator architecture, (b) the discriminator architecture and (c) the available semantics, by utilizing either the Full semantic layout or the semantics of the rectangular region the authors want to edit, which the authors refer to as BBox Semantics.
Q10. What is the definition of semantic image editing?
In this paper, the authors follow the formulation of Bau et al . [3], and define the task of semantic image editing as the process of adding, altering and removing instances of certain classes or semantic concepts in a scene.
Q11. What is the main argument for the proposed method?
The authors argue that their proposed method works better together as the large receptive field provided by the dilated convolutions in their generator synergizes well with the highly focused gradient flow coming from their discriminator.
Q12. What is the way to tackle the problem of object synthesis?
Another line of work is tackling object synthesis by utilizing semantic layout information, which provides a fine-grained guidance over the manipulation of an image.
Q13. What is the purpose of the generator?
To achieve these goals the authors adapt their generator from the network proposed by Johnson et al . [17] to fill the gaps: two down-sampling layers, a semantic core made of multiple residual layers and two up-sampling ones.