scispace - formally typeset
Open accessPosted Content

Wasserstein GANs Work Because They Fail (to Approximate the Wasserstein Distance)

Abstract: Wasserstein GANs are based on the idea of minimising the Wasserstein distance between a real and a generated distribution. We provide an in-depth mathematical analysis of differences between the theoretical setup and the reality of training Wasserstein GANs. In this work, we gather both theoretical and empirical evidence that the WGAN loss is not a meaningful approximation of the Wasserstein distance. Moreover, we argue that the Wasserstein distance is not even a desirable loss function for deep generative models, and conclude that the success of Wasserstein GANs can in truth be attributed to a failure to approximate the Wasserstein distance.

... read more

Citations
  More

10 results found


Open accessJournal ArticleDOI: 10.1002/GAMM.202100008
Lars Ruthotto1, Eldad Haber2Institutions (2)
01 Jun 2021-Gamm-mitteilungen
Abstract: Deep generative models (DGM) are neural networks with many hidden layers trained to approximate complicated, high-dimensional probability distributions using a large number of samples. When trained successfully, we can use the DGMs to estimate the likelihood of each observation and to create new samples from the underlying distribution. Developing DGMs has become one of the most hotly researched fields in artificial intelligence in recent years. The literature on DGMs has become vast and is growing rapidly. Some advances have even reached the public sphere, for example, the recent successes in generating realistic-looking images, voices, or movies; so-called deep fakes. Despite these successes, several mathematical and practical issues limit the broader use of DGMs: given a specific dataset, it remains challenging to design and train a DGM and even more challenging to find out why a particular model is or is not effective. To help advance the theoretical understanding of DGMs, we introduce DGMs and provide a concise mathematical framework for modeling the three most popular approaches: normalizing flows (NF), variational autoencoders (VAE), and generative adversarial networks (GAN). We illustrate the advantages and disadvantages of these basic approaches using numerical experiments. Our goal is to enable and motivate the reader to contribute to this proliferating research area. Our presentation also emphasizes relations between generative modeling and optimal transport.

... read more

13 Citations


Open accessPosted Content
Lars Ruthotto1, Eldad Haber2Institutions (2)
09 Mar 2021-arXiv: Learning
Abstract: Deep generative models (DGM) are neural networks with many hidden layers trained to approximate complicated, high-dimensional probability distributions using a large number of samples. When trained successfully, we can use the DGMs to estimate the likelihood of each observation and to create new samples from the underlying distribution. Developing DGMs has become one of the most hotly researched fields in artificial intelligence in recent years. The literature on DGMs has become vast and is growing rapidly. Some advances have even reached the public sphere, for example, the recent successes in generating realistic-looking images, voices, or movies; so-called deep fakes. Despite these successes, several mathematical and practical issues limit the broader use of DGMs: given a specific dataset, it remains challenging to design and train a DGM and even more challenging to find out why a particular model is or is not effective. To help advance the theoretical understanding of DGMs, we introduce DGMs and provide a concise mathematical framework for modeling the three most popular approaches: normalizing flows (NF), variational autoencoders (VAE), and generative adversarial networks (GAN). We illustrate the advantages and disadvantages of these basic approaches using numerical experiments. Our goal is to enable and motivate the reader to contribute to this proliferating research area. Our presentation also emphasizes relations between generative modeling and optimal transport.

... read more

4 Citations


Open accessPosted Content
Alexander Korotin1, Lingxiao Li2, Aude Genevay2, Justin Solomon2  +2 moreInstitutions (3)
03 Jun 2021-arXiv: Learning
Abstract: Despite the recent popularity of neural network-based solvers for optimal transport (OT), there is no standard quantitative way to evaluate their performance. In this paper, we address this issue for quadratic-cost transport -- specifically, computation of the Wasserstein-2 distance, a commonly-used formulation of optimal transport in machine learning. To overcome the challenge of computing ground truth transport maps between continuous measures needed to assess these solvers, we use input-convex neural networks (ICNN) to construct pairs of measures whose ground truth OT maps can be obtained analytically. This strategy yields pairs of continuous benchmark measures in high-dimensional spaces such as spaces of images. We thoroughly evaluate existing optimal transport solvers using these benchmark measures. Even though these solvers perform well in downstream tasks, many do not faithfully recover optimal transport maps. To investigate the cause of this discrepancy, we further test the solvers in a setting of image generation. Our study reveals crucial limitations of existing solvers and shows that increased OT accuracy does not necessarily correlate to better results downstream.

... read more

Topics: Benchmark (computing) (52%)

3 Citations


Open accessPosted Content
19 Aug 2020-
Abstract: Generative Adversarial Networks (GANs) have been widely applied in different scenarios thanks to the development of deep neural networks. The proposal of original GAN is based upon the non-parametric assumption of the infinite capacity of networks. It is still unknown whether GANs can generate realistic samples without any prior information. Due to the overconfident assumption, many issues need to be addressed in GANs' training, such as non-convergence, mode collapses, gradient vanishing, overfitting, discriminator forgetting, and the sensitivity of hyperparameters. As acknowledged, regularization and normalization are common methods of introducing prior information that can be used for stabilizing training and improving discrimination. At present, many regularization and normalization methods are proposed in GANs. However, as far as we know, there is no existing survey that has particularly focused on the systematic purposes and developments of these solutions. In this work, we perform a comprehensive survey of the regularization and normalization technologies from different perspectives of GANs training. First, we systematically and comprehensively describe the different perspectives of GANs training and thus obtain the different purposes of regularization and normalization in GANs training. In accordance with the different purposes, we propose a new taxonomy and summary a large number of existing studies. Furthermore, we compare the performance of the mainstream methods on different datasets fairly and investigate the regularization and normalization technologies that have been frequently employed in SOTA GANs. Finally, we highlight the possible future studies in this area.

... read more

Topics: Normalization (statistics) (55%), Overfitting (51%)

2 Citations


Open accessPosted Content
Tristan Milne1, Adrian I. Nachman1Institutions (1)
01 Sep 2021-arXiv: Learning
Abstract: Wasserstein GANs with Gradient Penalty (WGAN-GP) are an extremely popular method for training generative models to produce high quality synthetic data. While WGAN-GP were initially developed to calculate the Wasserstein 1 distance between generated and real data, recent works (e.g. Stanczuk et al. (2021)) have provided empirical evidence that this does not occur, and have argued that WGAN-GP perform well not in spite of this issue, but because of it. In this paper we show for the first time that WGAN-GP compute the minimum of a different optimal transport problem, the so-called congested transport (Carlier et al. (2008)). Congested transport determines the cost of moving one distribution to another under a transport model that penalizes congestion. For WGAN-GP, we find that the congestion penalty has a spatially varying component determined by the sampling strategy used in Gulrajani et al. (2017) which acts like a local speed limit, making congestion cost less in some regions than others. This aspect of the congested transport problem is new in that the congestion penalty turns out to be unbounded and depend on the distributions to be transported, and so we provide the necessary mathematical proofs for this setting. We use our discovery to show that the gradients of solutions to the optimization problem in WGAN-GP determine the time averaged momentum of optimal mass flow. This is in contrast to the gradients of Kantorovich potentials for the Wasserstein 1 distance, which only determine the normalized direction of flow. This may explain, in support of Stanczuk et al. (2021), the success of WGAN-GP, since the training of the generator is based on these gradients.

... read more

1 Citations


References
  More

36 results found


Open accessJournal Article
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

... read more

33,540 Citations


Open accessJournal ArticleDOI: 10.1007/S11263-015-0816-Y
Olga Russakovsky1, Jia Deng2, Hao Su1, Jonathan Krause1  +8 moreInstitutions (4)
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

... read more

25,260 Citations


Open accessDissertation
Alex Krizhevsky1Institutions (1)
01 Jan 2009-
Abstract: In this work we describe how to train a multi-layer generative model of natural images. We use a dataset of millions of tiny colour images, described in the next section. This has been attempted by several groups but without success. The models on which we focus are RBMs (Restricted Boltzmann Machines) and DBNs (Deep Belief Networks). These models learn interesting-looking filters, which we show are more useful to a classifier than the raw pixels. We train the classifier on a labeled subset that we have collected and call the CIFAR-10 dataset.

... read more

Topics: Deep belief network (57%), Generative model (52%), Boltzmann machine (51%) ... read more

14,902 Citations


Open accessJournal ArticleDOI: 10.1109/TIT.1982.1056489
S. P. Lloyd1Institutions (1)
Abstract: It has long been realized that in pulse-code modulation (PCM), with a given ensemble of signals to handle, the quantum values should be spaced more closely in the voltage regions where the signal amplitude is more likely to fall. It has been shown by Panter and Dite that, in the limit as the number of quanta becomes infinite, the asymptotic fractional density of quanta per unit voltage should vary as the one-third power of the probability density per unit voltage of signal amplitudes. In this paper the corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy. The optimization criterion used is that the average quantization noise power be a minimum. It is shown that the result obtained here goes over into the Panter and Dite result as the number of quanta become large. The optimum quautization schemes for 2^{b} quanta, b=1,2, \cdots, 7 , are given numerically for Gaussian and for Laplacian distribution of signal amplitudes.

... read more

9,657 Citations


Open accessPosted Content
19 Nov 2015-arXiv: Learning
Abstract: In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.

... read more

Topics: Unsupervised learning (65%), Deep learning (62%), Supervised learning (57%) ... read more

6,739 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20217
20203
Network Information
Related Papers (5)
Generative Adversarial Nets08 Dec 2014

Ian Goodfellow, Jean Pouget-Abadie +6 more

Improved training of wasserstein GANs04 Dec 2017

Ishaan Gulrajani, Faruk Ahmed +3 more

Wasserstein of Wasserstein Loss for Learning Generative Models29 Jan 2019

Yonatan Dukler, Wuchen Li +2 more

Wasserstein Divergence for GANs08 Sep 2018

Jiqing Wu, Zhiwu Huang +3 more