scispace - formally typeset
Open AccessPosted Content

Wasserstein GANs with Gradient Penalty Compute Congested Transport

Reads0
Chats0
TLDR
In this article, it was shown that the gradient of Wasserstein GANs with Gradient Penalty (WGAN-GP) can compute the minimum of a different optimal transport problem, the so-called congested transport.
Abstract
Wasserstein GANs with Gradient Penalty (WGAN-GP) are an extremely popular method for training generative models to produce high quality synthetic data. While WGAN-GP were initially developed to calculate the Wasserstein 1 distance between generated and real data, recent works (e.g. Stanczuk et al. (2021)) have provided empirical evidence that this does not occur, and have argued that WGAN-GP perform well not in spite of this issue, but because of it. In this paper we show for the first time that WGAN-GP compute the minimum of a different optimal transport problem, the so-called congested transport (Carlier et al. (2008)). Congested transport determines the cost of moving one distribution to another under a transport model that penalizes congestion. For WGAN-GP, we find that the congestion penalty has a spatially varying component determined by the sampling strategy used in Gulrajani et al. (2017) which acts like a local speed limit, making congestion cost less in some regions than others. This aspect of the congested transport problem is new in that the congestion penalty turns out to be unbounded and depend on the distributions to be transported, and so we provide the necessary mathematical proofs for this setting. We use our discovery to show that the gradients of solutions to the optimization problem in WGAN-GP determine the time averaged momentum of optimal mass flow. This is in contrast to the gradients of Kantorovich potentials for the Wasserstein 1 distance, which only determine the normalized direction of flow. This may explain, in support of Stanczuk et al. (2021), the success of WGAN-GP, since the training of the generator is based on these gradients.

read more

Citations
More filters
Posted Content

Trust the Critics: Generatorless and Multipurpose WGANs with Initial Convergence Guarantees

TL;DR: Trust the Critics (TTC) as discussed by the authors replaces the trainable generator from a Wasserstein GAN by a sequence of trained critic networks, which is motivated in part by the misalignment which was observed between the optimal transport directions provided by the gradients of the critic and the directions in which data points actually move when parametrized by a trained generator.
References
More filters
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Proceedings Article

Wasserstein Generative Adversarial Networks

TL;DR: This work introduces a new algorithm named WGAN, an alternative to traditional GAN training that can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches.
Book

Convex analysis and variational problems

TL;DR: In this article, the authors consider non-convex variational problems with a priori estimate in convex programming and show that they can be solved by the minimax theorem.
Proceedings Article

Progressive Growing of GANs for Improved Quality, Stability, and Variation

TL;DR: Recently, the authors proposed a new training methodology for GANs that grows both the generator and discriminator progressively, starting from a low resolution, and adding new layers that model increasingly fine details as training progresses.
Proceedings Article

Improved training of wasserstein GANs

TL;DR: The authors proposed to penalize the norm of the gradient of the critic with respect to its input to improve the training stability of Wasserstein GANs and achieve stable training of a wide variety of GAN architectures with almost no hyperparameter tuning.
Related Papers (5)