Deep image synthesis from intuitive user input: A review and perspectives
TLDR
In many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph or layout, and have a computer system automatically generate photo-realistic images according to that input as mentioned in this paper.Abstract:
In many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph, or layout, and have a computer system automatically generate photo-realistic images according to that input. While classically, works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation approaches. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross fertilization between major image generation paradigms, and evaluation and comparison of generation methods.read more
Citations
More filters
Proceedings ArticleDOI
Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion
TL;DR: Li et al. as mentioned in this paper proposed a multimodal guided artwork diffusion (MGAD) model, which is a diffusion-based digital artwork generation approach that utilizes multimmodal prompts as guidance to control the classifier-free diffusion model.
Journal ArticleDOI
User‐Controllable Latent Transformer for StyleGAN Image Layout Editing
TL;DR: This paper proposes an interactive framework for manipulating latent codes in accordance with the user inputs and trains a latent transformer based on a transformer encoder-decoder architecture that estimates the output latent codes, which are fed to the StyleGAN generator to obtain a result image.
Journal ArticleDOI
A Review of Synthetic Image Data and Its Use in Computer Vision
Keith Man,Javaan Chahl +1 more
TL;DR: A review of synthetic image data can be found in this paper , where the authors provide a general overview of types of synthetic data, as categorised by synthesised output, common methods of synthesising different types of image data, existing applications and logical extensions, performance of synthetic images in different applications and the associated difficulties in assessing data performance.
Journal ArticleDOI
Controlling StyleGANs using rough scribbles via one‐shot learning
Yuki Endo,Yoshihiro Kanamori +1 more
TL;DR: This paper generates realistic and diverse images with layout control over, for example, facial part layouts and body poses from only a single training pair annotated with semantic scribbles using the StyleGAN prior.
Journal ArticleDOI
CoGS: Controllable Generation and Search from Sketch and Style
TL;DR: In this paper , a style-conditioned, sketch-driven synthesis of images is proposed, which enables exploration of diverse appearance possibilities for a given sketched object, enabling decoupled control over the structure and the appearance of the output.
References
More filters
Journal ArticleDOI
Image quality assessment: from error visibility to structural similarity
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Journal ArticleDOI
Generative Adversarial Nets
Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde-Farley,Sherjil Ozair,Aaron Courville,Yoshua Bengio +7 more
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Book ChapterDOI
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings ArticleDOI
Rethinking the Inception Architecture for Computer Vision
TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Proceedings ArticleDOI
Fast R-CNN
TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.