scispace - formally typeset
Search or ask a question
Author

Jan Stanczuk

Bio: Jan Stanczuk is an academic researcher from University of Cambridge. The author has contributed to research in topics: Estimator & Conditional probability. The author has an hindex of 1, co-authored 2 publications receiving 10 citations.

Papers
More filters
Posted Content
TL;DR: This paper showed that the Wasserstein distance is not even a desirable loss function for deep generative models, and conclude that the success of WASSGANs can in truth be attributed to a failure to approximate the WASSERSTEIN distance.
Abstract: Wasserstein GANs are based on the idea of minimising the Wasserstein distance between a real and a generated distribution. We provide an in-depth mathematical analysis of differences between the theoretical setup and the reality of training Wasserstein GANs. In this work, we gather both theoretical and empirical evidence that the WGAN loss is not a meaningful approximation of the Wasserstein distance. Moreover, we argue that the Wasserstein distance is not even a desirable loss function for deep generative models, and conclude that the success of Wasserstein GANs can in truth be attributed to a failure to approximate the Wasserstein distance.

10 citations

Posted Content
TL;DR: In this article, the authors conduct a systematic comparison and theoretical analysis of different approaches to learning conditional probability distributions with score-based diffusion models, and prove results which provide a theoretical justification for one of the most successful estimators of the conditional score.
Abstract: Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling. In this work we conduct a systematic comparison and theoretical analysis of different approaches to learning conditional probability distributions with score-based diffusion models. In particular, we prove results which provide a theoretical justification for one of the most successful estimators of the conditional score. Moreover, we introduce a multi-speed diffusion framework, which leads to a new estimator for the conditional score, performing on par with previous state-of-the-art approaches. Our theoretical and experimental findings are accompanied by an open source library MSDiff which allows for application and further research of multi-speed diffusion models.

Cited by
More filters
Journal ArticleDOI
TL;DR: DGMs are introduced and a concise mathematical framework for modeling the three most popular approaches: normalizing flows, variational autoencoders, and generative adversarial networks is provided, which illustrates the advantages and disadvantages of these basic approaches using numerical experiments.
Abstract: Deep generative models (DGM) are neural networks with many hidden layers trained to approximate complicated, high-dimensional probability distributions using a large number of samples. When trained successfully, we can use the DGMs to estimate the likelihood of each observation and to create new samples from the underlying distribution. Developing DGMs has become one of the most hotly researched fields in artificial intelligence in recent years. The literature on DGMs has become vast and is growing rapidly. Some advances have even reached the public sphere, for example, the recent successes in generating realistic-looking images, voices, or movies; so-called deep fakes. Despite these successes, several mathematical and practical issues limit the broader use of DGMs: given a specific dataset, it remains challenging to design and train a DGM and even more challenging to find out why a particular model is or is not effective. To help advance the theoretical understanding of DGMs, we introduce DGMs and provide a concise mathematical framework for modeling the three most popular approaches: normalizing flows (NF), variational autoencoders (VAE), and generative adversarial networks (GAN). We illustrate the advantages and disadvantages of these basic approaches using numerical experiments. Our goal is to enable and motivate the reader to contribute to this proliferating research area. Our presentation also emphasizes relations between generative modeling and optimal transport.

99 citations

Posted Content
TL;DR: In this paper, the authors introduce deep generative models and provide a concise mathematical framework for modeling the three most popular approaches: normalizing flows (NF), variational autoencoders (VAE), and generative adversarial networks (GAN).
Abstract: Deep generative models (DGM) are neural networks with many hidden layers trained to approximate complicated, high-dimensional probability distributions using a large number of samples. When trained successfully, we can use the DGMs to estimate the likelihood of each observation and to create new samples from the underlying distribution. Developing DGMs has become one of the most hotly researched fields in artificial intelligence in recent years. The literature on DGMs has become vast and is growing rapidly. Some advances have even reached the public sphere, for example, the recent successes in generating realistic-looking images, voices, or movies; so-called deep fakes. Despite these successes, several mathematical and practical issues limit the broader use of DGMs: given a specific dataset, it remains challenging to design and train a DGM and even more challenging to find out why a particular model is or is not effective. To help advance the theoretical understanding of DGMs, we introduce DGMs and provide a concise mathematical framework for modeling the three most popular approaches: normalizing flows (NF), variational autoencoders (VAE), and generative adversarial networks (GAN). We illustrate the advantages and disadvantages of these basic approaches using numerical experiments. Our goal is to enable and motivate the reader to contribute to this proliferating research area. Our presentation also emphasizes relations between generative modeling and optimal transport.

5 citations

Posted Content
TL;DR: In this paper, the Wasserstein-2 distance was used to evaluate the performance of neural network-based optimal transport (OT) solvers for quadratic-cost transport.
Abstract: Despite the recent popularity of neural network-based solvers for optimal transport (OT), there is no standard quantitative way to evaluate their performance. In this paper, we address this issue for quadratic-cost transport -- specifically, computation of the Wasserstein-2 distance, a commonly-used formulation of optimal transport in machine learning. To overcome the challenge of computing ground truth transport maps between continuous measures needed to assess these solvers, we use input-convex neural networks (ICNN) to construct pairs of measures whose ground truth OT maps can be obtained analytically. This strategy yields pairs of continuous benchmark measures in high-dimensional spaces such as spaces of images. We thoroughly evaluate existing optimal transport solvers using these benchmark measures. Even though these solvers perform well in downstream tasks, many do not faithfully recover optimal transport maps. To investigate the cause of this discrepancy, we further test the solvers in a setting of image generation. Our study reveals crucial limitations of existing solvers and shows that increased OT accuracy does not necessarily correlate to better results downstream.

3 citations

Posted Content
19 Aug 2020
TL;DR: A comprehensive survey of the regularization and normalization technologies from different perspectives of GANs training can be found in this paper, where the authors systematically and comprehensively describe the different perspectives and obtain the different purposes of normalization and regularization.
Abstract: Generative Adversarial Networks (GANs) have been widely applied in different scenarios thanks to the development of deep neural networks. The proposal of original GAN is based upon the non-parametric assumption of the infinite capacity of networks. It is still unknown whether GANs can generate realistic samples without any prior information. Due to the overconfident assumption, many issues need to be addressed in GANs' training, such as non-convergence, mode collapses, gradient vanishing, overfitting, discriminator forgetting, and the sensitivity of hyperparameters. As acknowledged, regularization and normalization are common methods of introducing prior information that can be used for stabilizing training and improving discrimination. At present, many regularization and normalization methods are proposed in GANs. However, as far as we know, there is no existing survey that has particularly focused on the systematic purposes and developments of these solutions. In this work, we perform a comprehensive survey of the regularization and normalization technologies from different perspectives of GANs training. First, we systematically and comprehensively describe the different perspectives of GANs training and thus obtain the different purposes of regularization and normalization in GANs training. In accordance with the different purposes, we propose a new taxonomy and summary a large number of existing studies. Furthermore, we compare the performance of the mainstream methods on different datasets fairly and investigate the regularization and normalization technologies that have been frequently employed in SOTA GANs. Finally, we highlight the possible future studies in this area.

2 citations

Posted Content
TL;DR: In this article, it was shown that the gradient of Wasserstein GANs with Gradient Penalty (WGAN-GP) can compute the minimum of a different optimal transport problem, the so-called congested transport.
Abstract: Wasserstein GANs with Gradient Penalty (WGAN-GP) are an extremely popular method for training generative models to produce high quality synthetic data. While WGAN-GP were initially developed to calculate the Wasserstein 1 distance between generated and real data, recent works (e.g. Stanczuk et al. (2021)) have provided empirical evidence that this does not occur, and have argued that WGAN-GP perform well not in spite of this issue, but because of it. In this paper we show for the first time that WGAN-GP compute the minimum of a different optimal transport problem, the so-called congested transport (Carlier et al. (2008)). Congested transport determines the cost of moving one distribution to another under a transport model that penalizes congestion. For WGAN-GP, we find that the congestion penalty has a spatially varying component determined by the sampling strategy used in Gulrajani et al. (2017) which acts like a local speed limit, making congestion cost less in some regions than others. This aspect of the congested transport problem is new in that the congestion penalty turns out to be unbounded and depend on the distributions to be transported, and so we provide the necessary mathematical proofs for this setting. We use our discovery to show that the gradients of solutions to the optimization problem in WGAN-GP determine the time averaged momentum of optimal mass flow. This is in contrast to the gradients of Kantorovich potentials for the Wasserstein 1 distance, which only determine the normalized direction of flow. This may explain, in support of Stanczuk et al. (2021), the success of WGAN-GP, since the training of the generator is based on these gradients.

1 citations