scispace - formally typeset
Open AccessPosted Content

HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing

Reads0
Chats0
TLDR
HyperStyle as discussed by the authors learns to modulate StyleGAN's weights to faithfully express a given image in editable regions of the latent space, which yields reconstructions comparable to those of optimization techniques with the near real-time inference capabilities of encoders.
Abstract
The inversion of real images into StyleGAN's latent space is a well-studied problem. Nevertheless, applying existing approaches to real-world scenarios remains an open challenge, due to an inherent trade-off between reconstruction and editability: latent space regions which can accurately represent real images typically suffer from degraded semantic control. Recent work proposes to mitigate this trade-off by fine-tuning the generator to add the target image to well-behaved, editable regions of the latent space. While promising, this fine-tuning scheme is impractical for prevalent use as it requires a lengthy training phase for each new image. In this work, we introduce this approach into the realm of encoder-based inversion. We propose HyperStyle, a hypernetwork that learns to modulate StyleGAN's weights to faithfully express a given image in editable regions of the latent space. A naive modulation approach would require training a hypernetwork with over three billion parameters. Through careful network design, we reduce this to be in line with existing encoders. HyperStyle yields reconstructions comparable to those of optimization techniques with the near real-time inference capabilities of encoders. Lastly, we demonstrate HyperStyle's effectiveness on several applications beyond the inversion task, including the editing of out-of-domain images which were never seen during training.

read more

Citations
More filters
Proceedings ArticleDOI

Stitch it in Time: GAN-Based Facial Editing of Real Videos

TL;DR: This work uses the natural alignment of StyleGAN and the tendency of neural networks to learn low frequency functions, and demonstrates that they provide a strongly consistent prior for semantic editing of faces in videos, demonstrating significant improvements over the current state-of-the-art.
Journal ArticleDOI

GAN Inversion: A Survey

TL;DR: GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model so that the image can be faithfully reconstructed from the inverted code by the generator as discussed by the authors .
Book ChapterDOI

Third Time's the Charm? Image and Video Editing with StyleGAN3

TL;DR: In this article , the authors explore the recent StyleGAN3 architecture, compare it to its predecessor, and investigate its unique advantages, as well as drawbacks, and propose an encoding scheme trained solely on aligned data, yet can still invert unaligned images.
Proceedings ArticleDOI

DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization

TL;DR: It is argued that the computational cost brought by fine-tuning can be rather unnecessary, and a novel perspective to improving DMG without increasing computational cost is presented, i.e. device-specific parameter generation which directly maps data distribution to parameters.
Journal ArticleDOI

Survey on leveraging pre-trained generative adversarial networks for image editing and restoration

TL;DR: In this article , the authors briefly review recent progress on leveraging pre-trained large-scale GAN models from three aspects, i.e., (1) the training of large scale generative adversarial networks, (2) exploring and understanding the pre-learned GAN model, and (3) leveraging these models for subsequent tasks like image restoration and editing.
References
More filters
Posted Content

Deep Residual Learning for Image Recognition

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Posted Content

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

TL;DR: This work introduces two simple global hyper-parameters that efficiently trade off between latency and accuracy and demonstrates the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.
Proceedings ArticleDOI

A Style-Based Generator Architecture for Generative Adversarial Networks

TL;DR: This paper proposed an alternative generator architecture for GANs, borrowing from style transfer literature, which leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images.
Proceedings ArticleDOI

Deep Learning Face Attributes in the Wild

TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.