A Style-Based Generator Architecture for Generative Adversarial Networks

doi:10.1109/CVPR.2019.00453

Home
/
Papers
/
A Style-Based Generator Architecture for Generative Adversarial Networks

Proceedings Article•DOI•

A Style-Based Generator Architecture for Generative Adversarial Networks

Tero Karras¹, Samuli Laine¹, Timo Aila¹•Institutions (1)

Nvidia¹

15 Jun 2019-pp 4396-4405

TL;DR: This paper proposed an alternative generator architecture for GANs, borrowing from style transfer literature, which leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images.

read less

Abstract: We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

AutoML: A survey of the state-of-the-art

[...]

Xin He¹, Kaiyong Zhao¹, Xiaowen Chu¹•Institutions (1)

Hong Kong Baptist University¹

05 Jan 2021-Knowledge Based Systems

TL;DR: A comprehensive and up-to-date review of the state-of-the-art (SOTA) in AutoML methods according to the pipeline, covering data preparation, feature engineering, hyperparameter optimization, and neural architecture search (NAS).

...read moreread less

Abstract: Deep learning (DL) techniques have obtained remarkable achievements on various tasks, such as image recognition, object detection, and language modeling. However, building a high-quality DL system for a specific task highly relies on human expertise, hindering its wide application. Meanwhile, automated machine learning (AutoML) is a promising solution for building a DL system without human assistance and is being extensively studied. This paper presents a comprehensive and up-to-date review of the state-of-the-art (SOTA) in AutoML. According to the DL pipeline, we introduce AutoML methods – covering data preparation, feature engineering, hyperparameter optimization, and neural architecture search (NAS) – with a particular focus on NAS, as it is currently a hot sub-topic of AutoML. We summarize the representative NAS algorithms’ performance on the CIFAR-10 and ImageNet datasets and further discuss the following subjects of NAS methods: one/two-stage NAS, one-shot NAS, joint hyperparameter and architecture optimization, and resource-aware NAS. Finally, we discuss some open problems related to the existing AutoML methods for future research.

...read moreread less

809 citations

Posted Content•

Taming Transformers for High-Resolution Image Synthesis

[...]

Patrick Esser¹, Robin Rombach¹, Björn Ommer¹•Institutions (1)

Heidelberg University¹

17 Dec 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is demonstrated how combining the effectiveness of the inductive bias of CNNs with the expressivity of transformers enables them to model and thereby synthesize high-resolution images.

...read moreread less

Abstract: Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on a wide variety of tasks. In contrast to CNNs, they contain no inductive bias that prioritizes local interactions. This makes them expressive, but also computationally infeasible for long sequences, such as high-resolution images. We demonstrate how combining the effectiveness of the inductive bias of CNNs with the expressivity of transformers enables them to model and thereby synthesize high-resolution images. We show how to (i) use CNNs to learn a context-rich vocabulary of image constituents, and in turn (ii) utilize transformers to efficiently model their composition within high-resolution images. Our approach is readily applied to conditional synthesis tasks, where both non-spatial information, such as object classes, and spatial information, such as segmentations, can control the generated image. In particular, we present the first results on semantically-guided synthesis of megapixel images with transformers and obtain the state of the art among autoregressive models on class-conditional ImageNet. Code and pretrained models can be found at this https URL .

...read moreread less

744 citations

Posted Content•

StarGAN v2: Diverse Image Synthesis for Multiple Domains

[...]

Yunjey Choi¹, Youngjung Uh¹, Jaejun Yoo¹, Jung-Woo Ha¹•Institutions (1)

Naver Corporation¹

04 Dec 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: StarGAN v2, a single framework that tackles image-to-image translation models with limited diversity and multiple models for all domains, is proposed and shows significantly improved results over the baselines.

...read moreread less

Abstract: A good image-to-image translation model should learn a mapping between different visual domains while satisfying the following properties: 1) diversity of generated images and 2) scalability over multiple domains. Existing methods address either of the issues, having limited diversity or multiple models for all domains. We propose StarGAN v2, a single framework that tackles both and shows significantly improved results over the baselines. Experiments on CelebA-HQ and a new animal faces dataset (AFHQ) validate our superiority in terms of visual quality, diversity, and scalability. To better assess image-to-image translation models, we release AFHQ, high-quality animal faces with large inter- and intra-domain differences. The code, pretrained models, and dataset can be found at this https URL.

...read moreread less

697 citations

Proceedings Article•DOI•

SinGAN: Learning a Generative Model From a Single Natural Image

[...]

Tamar Rott Shaham¹, Tali Dekel², Tomer Michaeli¹•Institutions (2)

Technion – Israel Institute of Technology¹, Google²

02 May 2019

TL;DR: SinGAN, an unconditional generative model that can be learned from a single natural image, is introduced, trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image.

...read moreread less

Abstract: We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image. SinGAN contains a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image. This allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and the fine textures of the training image. In contrast to previous single image GAN schemes, our approach is not limited to texture images, and is not conditional (i.e. it generates samples from noise). User studies confirm that the generated samples are commonly confused to be real images. We illustrate the utility of SinGAN in a wide range of image manipulation tasks.

...read moreread less

660 citations

Proceedings Article•DOI•

StarGAN v2: Diverse Image Synthesis for Multiple Domains

[...]

Yunjey Choi¹, Youngjung Uh¹, Jaejun Yoo¹, Jung-Woo Ha¹•Institutions (1)

Naver Corporation¹

14 Jun 2020

TL;DR: StarGAN v2 as mentioned in this paper proposes a single framework to learn a mapping between different visual domains while satisfying the following properties: 1) diversity of generated images and 2) scalability over multiple domains.

...read moreread less

654 citations