Jan Stanczuk

Journal Article•DOI•

An introduction to deep generative modeling

[...]

Lars Ruthotto¹, Eldad Haber²•Institutions (2)

Emory University¹, University of British Columbia²

01 Jun 2021-Gamm-mitteilungen

TL;DR: DGMs are introduced and a concise mathematical framework for modeling the three most popular approaches: normalizing flows, variational autoencoders, and generative adversarial networks is provided, which illustrates the advantages and disadvantages of these basic approaches using numerical experiments.

...read moreread less

Abstract: Deep generative models (DGM) are neural networks with many hidden layers trained to approximate complicated, high-dimensional probability distributions using a large number of samples. When trained successfully, we can use the DGMs to estimate the likelihood of each observation and to create new samples from the underlying distribution. Developing DGMs has become one of the most hotly researched fields in artificial intelligence in recent years. The literature on DGMs has become vast and is growing rapidly. Some advances have even reached the public sphere, for example, the recent successes in generating realistic-looking images, voices, or movies; so-called deep fakes. Despite these successes, several mathematical and practical issues limit the broader use of DGMs: given a specific dataset, it remains challenging to design and train a DGM and even more challenging to find out why a particular model is or is not effective. To help advance the theoretical understanding of DGMs, we introduce DGMs and provide a concise mathematical framework for modeling the three most popular approaches: normalizing flows (NF), variational autoencoders (VAE), and generative adversarial networks (GAN). We illustrate the advantages and disadvantages of these basic approaches using numerical experiments. Our goal is to enable and motivate the reader to contribute to this proliferating research area. Our presentation also emphasizes relations between generative modeling and optimal transport.

...read moreread less

99 citations

Posted Content•

An Introduction to Deep Generative Modeling

[...]

Lars Ruthotto¹, Eldad Haber²•Institutions (2)

Emory University¹, University of British Columbia²

09 Mar 2021-arXiv: Learning

TL;DR: In this paper, the authors introduce deep generative models and provide a concise mathematical framework for modeling the three most popular approaches: normalizing flows (NF), variational autoencoders (VAE), and generative adversarial networks (GAN).

...read moreread less

Abstract: Deep generative models (DGM) are neural networks with many hidden layers trained to approximate complicated, high-dimensional probability distributions using a large number of samples. When trained successfully, we can use the DGMs to estimate the likelihood of each observation and to create new samples from the underlying distribution. Developing DGMs has become one of the most hotly researched fields in artificial intelligence in recent years. The literature on DGMs has become vast and is growing rapidly. Some advances have even reached the public sphere, for example, the recent successes in generating realistic-looking images, voices, or movies; so-called deep fakes. Despite these successes, several mathematical and practical issues limit the broader use of DGMs: given a specific dataset, it remains challenging to design and train a DGM and even more challenging to find out why a particular model is or is not effective. To help advance the theoretical understanding of DGMs, we introduce DGMs and provide a concise mathematical framework for modeling the three most popular approaches: normalizing flows (NF), variational autoencoders (VAE), and generative adversarial networks (GAN). We illustrate the advantages and disadvantages of these basic approaches using numerical experiments. Our goal is to enable and motivate the reader to contribute to this proliferating research area. Our presentation also emphasizes relations between generative modeling and optimal transport.

...read moreread less

5 citations

Posted Content•

Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark

[...]

Alexander Korotin¹, Lingxiao Li², Aude Genevay², Justin Solomon², Alexander Nikolaevich Filippov³, Evgeny Burnaev¹ - Show less +2 more•Institutions (3)

Skolkovo Institute of Science and Technology¹, Massachusetts Institute of Technology², Huawei³

03 Jun 2021-arXiv: Learning

TL;DR: In this paper, the Wasserstein-2 distance was used to evaluate the performance of neural network-based optimal transport (OT) solvers for quadratic-cost transport.

...read moreread less

Abstract: Despite the recent popularity of neural network-based solvers for optimal transport (OT), there is no standard quantitative way to evaluate their performance. In this paper, we address this issue for quadratic-cost transport -- specifically, computation of the Wasserstein-2 distance, a commonly-used formulation of optimal transport in machine learning. To overcome the challenge of computing ground truth transport maps between continuous measures needed to assess these solvers, we use input-convex neural networks (ICNN) to construct pairs of measures whose ground truth OT maps can be obtained analytically. This strategy yields pairs of continuous benchmark measures in high-dimensional spaces such as spaces of images. We thoroughly evaluate existing optimal transport solvers using these benchmark measures. Even though these solvers perform well in downstream tasks, many do not faithfully recover optimal transport maps. To investigate the cause of this discrepancy, we further test the solvers in a setting of image generation. Our study reveals crucial limitations of existing solvers and shows that increased OT accuracy does not necessarily correlate to better results downstream.

...read moreread less

3 citations

Posted Content•

Why Adopting Regularization and Normalization For Generative Adversarial Networks: A Survey

[...]

Ziqiang Li, Xintian Wu, Rentuo Tao, Pengfei Xia, Huanhuan Chen, Bin Li - Show less +2 more

19 Aug 2020

TL;DR: A comprehensive survey of the regularization and normalization technologies from different perspectives of GANs training can be found in this paper, where the authors systematically and comprehensively describe the different perspectives and obtain the different purposes of normalization and regularization.

...read moreread less

Abstract: Generative Adversarial Networks (GANs) have been widely applied in different scenarios thanks to the development of deep neural networks. The proposal of original GAN is based upon the non-parametric assumption of the infinite capacity of networks. It is still unknown whether GANs can generate realistic samples without any prior information. Due to the overconfident assumption, many issues need to be addressed in GANs' training, such as non-convergence, mode collapses, gradient vanishing, overfitting, discriminator forgetting, and the sensitivity of hyperparameters. As acknowledged, regularization and normalization are common methods of introducing prior information that can be used for stabilizing training and improving discrimination. At present, many regularization and normalization methods are proposed in GANs. However, as far as we know, there is no existing survey that has particularly focused on the systematic purposes and developments of these solutions. In this work, we perform a comprehensive survey of the regularization and normalization technologies from different perspectives of GANs training. First, we systematically and comprehensively describe the different perspectives of GANs training and thus obtain the different purposes of regularization and normalization in GANs training. In accordance with the different purposes, we propose a new taxonomy and summary a large number of existing studies. Furthermore, we compare the performance of the mainstream methods on different datasets fairly and investigate the regularization and normalization technologies that have been frequently employed in SOTA GANs. Finally, we highlight the possible future studies in this area.

...read moreread less

2 citations

Posted Content•

Wasserstein GANs with Gradient Penalty Compute Congested Transport

[...]

Tristan Milne¹, Adrian I. Nachman¹•Institutions (1)

University of Toronto¹

01 Sep 2021-arXiv: Learning

TL;DR: In this article, it was shown that the gradient of Wasserstein GANs with Gradient Penalty (WGAN-GP) can compute the minimum of a different optimal transport problem, the so-called congested transport.

...read moreread less

Abstract: Wasserstein GANs with Gradient Penalty (WGAN-GP) are an extremely popular method for training generative models to produce high quality synthetic data. While WGAN-GP were initially developed to calculate the Wasserstein 1 distance between generated and real data, recent works (e.g. Stanczuk et al. (2021)) have provided empirical evidence that this does not occur, and have argued that WGAN-GP perform well not in spite of this issue, but because of it. In this paper we show for the first time that WGAN-GP compute the minimum of a different optimal transport problem, the so-called congested transport (Carlier et al. (2008)). Congested transport determines the cost of moving one distribution to another under a transport model that penalizes congestion. For WGAN-GP, we find that the congestion penalty has a spatially varying component determined by the sampling strategy used in Gulrajani et al. (2017) which acts like a local speed limit, making congestion cost less in some regions than others. This aspect of the congested transport problem is new in that the congestion penalty turns out to be unbounded and depend on the distributions to be transported, and so we provide the necessary mathematical proofs for this setting. We use our discovery to show that the gradients of solutions to the optimization problem in WGAN-GP determine the time averaged momentum of optimal mass flow. This is in contrast to the gradients of Kantorovich potentials for the Wasserstein 1 distance, which only determine the normalized direction of flow. This may explain, in support of Stanczuk et al. (2021), the success of WGAN-GP, since the training of the generator is based on these gradients.

...read moreread less

1 citations

Papers

Cited by