Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

doi:10.1109/CVPR46437.2021.00232

Open AccessProceedings ArticleDOI

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Elad Richardson, +6 more

- pp 2287-2296

Chats0

TLDR

The pixel2style2pixel (pSp) as discussed by the authors framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended $\mathcal{W} + $ latent space.

Abstract:

We present a generic image-to-image translation framework, pixel2style2pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended $\mathcal{W} + $ latent space. We first show that our encoder can directly embed real images into $\mathcal{W} + $, with no additional optimization. Next, we propose utilizing our encoder to directly solve image-to-image translation tasks, defining them as encoding problems from some input domain into the latent domain. By deviating from the standard "invert first, edit later" methodology used with previous StyleGAN encoders, our approach can handle a variety of tasks even when the input image is not represented in the StyleGAN domain. We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required, has better support for solving tasks without pixel-to-pixel correspondence, and inherently supports multi-modal synthesis via the resampling of styles. Finally, we demonstrate the potential of our framework on a variety of facial image-to-image translation tasks, even when compared to state-of-the-art solutions designed specifically for a single task, and further show that it can be extended beyond the human facial domain. Code is available at https://github.com/eladrich/pixel2style2pixel.

Citations

PDF

Open Access

More filters

Posted Content

Alias-Free Generative Adversarial Networks

Tero Karras, +6 more

- 23 Jun 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: It is observed that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner, and small architectural changes are derived that guarantee that unwanted information cannot leak into the hierarchical synthesis process.

...read moreread less

Posted Content

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

Zongze Wu, +2 more

- 25 Nov 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The latent style space of Style-GAN2, a state-of-the-art architecture for image generation, is explored and StyleSpace, the space of channel-wise style parameters, is shown to be significantly more disentangled than the other intermediate latent spaces explored by previous works.

...read moreread less

Proceedings ArticleDOI

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

Weihao Xia, +3 more

TL;DR: TediGAN as discussed by the authors uses StyleGAN inversion module, visual-linguistic similarity learning, and instance-level optimization to produce diverse and high-quality images with an unprecedented resolution at 10242.

...read moreread less

Posted Content

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

Ting-Chun Wang, +2 more

- 30 Nov 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A neural talking- head video synthesis model that learns to synthesize a talking-head video using a source image containing the target person’s appearance and a driving video that dictates the motion in the output is proposed.

...read moreread less

Proceedings ArticleDOI

GAN Prior Embedded Network for Blind Face Restoration in the Wild

Tao Yang, +3 more

TL;DR: Li et al. as discussed by the authors proposed a new method by first learning a GAN for high-quality face image generation and embedding it into a U-shaped DNN as a prior decoder, then fine-tuning the GAN prior embedded DNN with a set of synthesized low quality face images.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Image2StyleGAN++: How to Edit the Embedded Images?

Rameen Abdal, +2 more

TL;DR: A framework that combines embedding with activation tensor manipulation to perform high quality local edits along with global semantic edits on images and can restore high frequency features in images and thus significantly improves the quality of reconstructed images.

...read moreread less

Proceedings ArticleDOI

CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition

Yuge Huang, +7 more

TL;DR: This work proposes a novel Adaptive Curriculum Learning loss (CurricularFace) that embeds the idea of curriculum learning into the loss function to achieve a novel training strategy for deep face recognition, which mainly addresses easy samples in the early training stage and hard ones in the later stage.

...read moreread less

Book ChapterDOI

In-Domain GAN Inversion for Real Image Editing

Jiapeng Zhu, +3 more

TL;DR: In this article, a domain-guided encoder is proposed to project a given image to the native latent space of GANs and then a domain regularized optimization is performed to fine-tune the code produced by the encoder.

...read moreread less

Proceedings ArticleDOI

Adversarial Latent Autoencoders

Stanislav Pidhorskyi, +2 more

Abstract: Autoencoder networks are unsupervised approaches aiming at combining generative and representational properties by learning simultaneously an encoder-generator map. Although studied extensively, the issues of whether they have the same generative power of GANs, or learn disentangled representations, have not been fully addressed. We introduce an autoencoder that tackles these issues jointly, which we call Adversarial Latent Autoencoder (ALAE). It is a general architecture that can leverage recent improvements on GAN training procedures. We designed two autoencoders: one based on a MLP encoder, and another based on a StyleGAN generator, which we call StyleALAE. We verify the disentanglement properties of both architectures. We show that StyleALAE can not only generate 1024x1024 face images with comparable quality of StyleGAN, but at the same resolution can also produce face reconstructions and manipulations based on real images. This makes ALAE the first autoencoder able to compare with, and go beyond the capabilities of a generator-only type of architecture.

...read moreread less

Proceedings Article

On the "steerability" of generative adversarial networks

Ali Jahanian, +2 more

TL;DR: It is shown that although current GANs can fit standard datasets very well, they still fall short of being comprehensive models of the visual manifold, and it is hypothesized that the degree of distributional shift is related to the breadth of the training data distribution.

...read moreread less

Collapse

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Citations

Alias-Free Generative Adversarial Networks

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

GAN Prior Embedded Network for Blind Face Restoration in the Wild

References

Image2StyleGAN++: How to Edit the Embedded Images?

CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition

In-Domain GAN Inversion for Real Image Editing

Adversarial Latent Autoencoders

On the "steerability" of generative adversarial networks

Related Papers (5)

A Style-Based Generator Architecture for Generative Adversarial Networks

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Progressive Growing of GANs for Improved Quality, Stability, and Variation

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

Generative Adversarial Nets