scispace - formally typeset
Open AccessJournal ArticleDOI

Deep image synthesis from intuitive user input: A review and perspectives

TLDR
In many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph or layout, and have a computer system automatically generate photo-realistic images according to that input as mentioned in this paper.
Abstract
In many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph, or layout, and have a computer system automatically generate photo-realistic images according to that input. While classically, works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation approaches. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross fertilization between major image generation paradigms, and evaluation and comparison of generation methods.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion

TL;DR: Li et al. as mentioned in this paper proposed a multimodal guided artwork diffusion (MGAD) model, which is a diffusion-based digital artwork generation approach that utilizes multimmodal prompts as guidance to control the classifier-free diffusion model.
Journal ArticleDOI

User‐Controllable Latent Transformer for StyleGAN Image Layout Editing

Yuki Endo
TL;DR: This paper proposes an interactive framework for manipulating latent codes in accordance with the user inputs and trains a latent transformer based on a transformer encoder-decoder architecture that estimates the output latent codes, which are fed to the StyleGAN generator to obtain a result image.
Journal ArticleDOI

A Review of Synthetic Image Data and Its Use in Computer Vision

Keith Man, +1 more
- 01 Nov 2022 - 
TL;DR: A review of synthetic image data can be found in this paper , where the authors provide a general overview of types of synthetic data, as categorised by synthesised output, common methods of synthesising different types of image data, existing applications and logical extensions, performance of synthetic images in different applications and the associated difficulties in assessing data performance.
Journal ArticleDOI

Controlling StyleGANs using rough scribbles via one‐shot learning

TL;DR: This paper generates realistic and diverse images with layout control over, for example, facial part layouts and body poses from only a single training pair annotated with semantic scribbles using the StyleGAN prior.
Journal ArticleDOI

CoGS: Controllable Generation and Search from Sketch and Style

TL;DR: In this paper , a style-conditioned, sketch-driven synthesis of images is proposed, which enables exploration of diverse appearance possibilities for a given sketched object, enabling decoupled control over the structure and the appearance of the output.
References
More filters
Journal ArticleDOI

Image quality assessment: from error visibility to structural similarity

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Proceedings ArticleDOI

Fast R-CNN

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.
Related Papers (5)