Deep image synthesis from intuitive user input: A review and perspectives

doi:10.1007/S41095-021-0234-8

Open AccessJournal ArticleDOI

Deep image synthesis from intuitive user input: A review and perspectives

Yuan Xue, +5 more

- 01 Mar 2022 -

Computational Visual Media

- Vol. 8, Iss: 1, pp 3-31

TLDR

In many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph or layout, and have a computer system automatically generate photo-realistic images according to that input as mentioned in this paper.

Abstract:

In many applications of computer graphics, art, and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph, or layout, and have a computer system automatically generate photo-realistic images according to that input. While classically, works that allow such automatic image content generation have followed a framework of image retrieval and composition, recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation approaches. This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics. This motivates new perspectives on input representation and interactivity, cross fertilization between major image generation paradigms, and evaluation and comparison of generation methods.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion

Nisha Huang, +3 more

TL;DR: Li et al. as mentioned in this paper proposed a multimodal guided artwork diffusion (MGAD) model, which is a diffusion-based digital artwork generation approach that utilizes multimmodal prompts as guidance to control the classifier-free diffusion model.

...read moreread less

Journal ArticleDOI

User‐Controllable Latent Transformer for StyleGAN Image Layout Editing

Yuki Endo

- 26 Aug 2022 -

Computer Graphics Forum

TL;DR: This paper proposes an interactive framework for manipulating latent codes in accordance with the user inputs and trains a latent transformer based on a transformer encoder-decoder architecture that estimates the output latent codes, which are fed to the StyleGAN generator to obtain a result image.

...read moreread less

Journal ArticleDOI

A Review of Synthetic Image Data and Its Use in Computer Vision

Keith Man, +1 more

- 01 Nov 2022 -

Journal of Imaging

TL;DR: A review of synthetic image data can be found in this paper , where the authors provide a general overview of types of synthetic data, as categorised by synthesised output, common methods of synthesising different types of image data, existing applications and logical extensions, performance of synthetic images in different applications and the associated difficulties in assessing data performance.

...read moreread less

Journal ArticleDOI

Controlling StyleGANs using rough scribbles via one‐shot learning

Yuki Endo, +1 more

- 09 Jul 2022 -

Computer Animation and Virtual Worlds

TL;DR: This paper generates realistic and diverse images with layout control over, for example, facial part layouts and body poses from only a single training pair annotated with semantic scribbles using the StyleGAN prior.

...read moreread less

Journal ArticleDOI

CoGS: Controllable Generation and Search from Sketch and Style

- 01 Jan 2022 -

Lecture Notes in Computer Science

TL;DR: In this paper , a style-conditioned, sketch-driven synthesis of images is proposed, which enables exploration of diverse appearance possibilities for a given sketched object, enabling decoupled control over the structure and the appearance of the output.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Image quality assessment: from error visibility to structural similarity

Zhou Wang, +3 more

- 01 Apr 2004 -

IEEE Transactions on Image Processing

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.

...read moreread less

Journal ArticleDOI

Generative Adversarial Nets

Ian Goodfellow, +7 more

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Proceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, +4 more

TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

...read moreread less

Proceedings ArticleDOI

Fast R-CNN

Ross Girshick

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.

...read moreread less

Collapse

IEEE Transactions on Visualization and C...

A Novel Framework for Interactive Visualization and Analysis of Hyperspectral Image Data

Johannes Jordan, +2 more

- 01 Oct 2016 -

Journal of Electrical and Computer Engin...

Deep image synthesis from intuitive user input: A review and perspectives

Citations

Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion

User‐Controllable Latent Transformer for StyleGAN Image Layout Editing

A Review of Synthetic Image Data and Its Use in Computer Vision

Controlling StyleGANs using rough scribbles via one‐shot learning

CoGS: Controllable Generation and Search from Sketch and Style

References

Image quality assessment: from error visibility to structural similarity

Generative Adversarial Nets

Microsoft COCO: Common Objects in Context

Rethinking the Inception Architecture for Computer Vision

Fast R-CNN

Related Papers (5)

SMULGRAS: a platform for smart multicodal graphics search

Toolkits and Software for Developing Biomedical Image Processing and Analysis Applications

User-driven feature space transformation

High-Level User Interfaces for Transfer Function Design with Semantics

A Novel Framework for Interactive Visualization and Analysis of Hyperspectral Image Data