scispace - formally typeset
Open AccessProceedings ArticleDOI

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Reads0
Chats0
TLDR
The pixel2style2pixel (pSp) as discussed by the authors framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended $\mathcal{W} + $ latent space.
Abstract
We present a generic image-to-image translation framework, pixel2style2pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended $\mathcal{W} + $ latent space. We first show that our encoder can directly embed real images into $\mathcal{W} + $, with no additional optimization. Next, we propose utilizing our encoder to directly solve image-to-image translation tasks, defining them as encoding problems from some input domain into the latent domain. By deviating from the standard "invert first, edit later" methodology used with previous StyleGAN encoders, our approach can handle a variety of tasks even when the input image is not represented in the StyleGAN domain. We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required, has better support for solving tasks without pixel-to-pixel correspondence, and inherently supports multi-modal synthesis via the resampling of styles. Finally, we demonstrate the potential of our framework on a variety of facial image-to-image translation tasks, even when compared to state-of-the-art solutions designed specifically for a single task, and further show that it can be extended beyond the human facial domain. Code is available at https://github.com/eladrich/pixel2style2pixel.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Alias-Free Generative Adversarial Networks

TL;DR: It is observed that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner, and small architectural changes are derived that guarantee that unwanted information cannot leak into the hierarchical synthesis process.
Posted Content

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

TL;DR: The latent style space of Style-GAN2, a state-of-the-art architecture for image generation, is explored and StyleSpace, the space of channel-wise style parameters, is shown to be significantly more disentangled than the other intermediate latent spaces explored by previous works.
Proceedings ArticleDOI

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

TL;DR: TediGAN as discussed by the authors uses StyleGAN inversion module, visual-linguistic similarity learning, and instance-level optimization to produce diverse and high-quality images with an unprecedented resolution at 10242.
Posted Content

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

TL;DR: A neural talking- head video synthesis model that learns to synthesize a talking-head video using a source image containing the target person’s appearance and a driving video that dictates the motion in the output is proposed.
Proceedings ArticleDOI

GAN Prior Embedded Network for Blind Face Restoration in the Wild

TL;DR: Li et al. as discussed by the authors proposed a new method by first learning a GAN for high-quality face image generation and embedding it into a U-shaped DNN as a prior decoder, then fine-tuning the GAN prior embedded DNN with a set of synthesized low quality face images.
References
More filters
Proceedings ArticleDOI

Image2StyleGAN++: How to Edit the Embedded Images?

TL;DR: A framework that combines embedding with activation tensor manipulation to perform high quality local edits along with global semantic edits on images and can restore high frequency features in images and thus significantly improves the quality of reconstructed images.
Proceedings ArticleDOI

CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition

TL;DR: This work proposes a novel Adaptive Curriculum Learning loss (CurricularFace) that embeds the idea of curriculum learning into the loss function to achieve a novel training strategy for deep face recognition, which mainly addresses easy samples in the early training stage and hard ones in the later stage.
Book ChapterDOI

In-Domain GAN Inversion for Real Image Editing

TL;DR: In this article, a domain-guided encoder is proposed to project a given image to the native latent space of GANs and then a domain regularized optimization is performed to fine-tune the code produced by the encoder.
Proceedings ArticleDOI

Adversarial Latent Autoencoders

Abstract: Autoencoder networks are unsupervised approaches aiming at combining generative and representational properties by learning simultaneously an encoder-generator map. Although studied extensively, the issues of whether they have the same generative power of GANs, or learn disentangled representations, have not been fully addressed. We introduce an autoencoder that tackles these issues jointly, which we call Adversarial Latent Autoencoder (ALAE). It is a general architecture that can leverage recent improvements on GAN training procedures. We designed two autoencoders: one based on a MLP encoder, and another based on a StyleGAN generator, which we call StyleALAE. We verify the disentanglement properties of both architectures. We show that StyleALAE can not only generate 1024x1024 face images with comparable quality of StyleGAN, but at the same resolution can also produce face reconstructions and manipulations based on real images. This makes ALAE the first autoencoder able to compare with, and go beyond the capabilities of a generator-only type of architecture.
Proceedings Article

On the "steerability" of generative adversarial networks

TL;DR: It is shown that although current GANs can fit standard datasets very well, they still fall short of being comprehensive models of the visual manifold, and it is hypothesized that the degree of distributional shift is related to the breadth of the training data distribution.
Related Papers (5)