Wasserstein Generative Adversarial Networks

Home
/
Papers
/
Wasserstein Generative Adversarial Networks

Proceedings Article•

Wasserstein Generative Adversarial Networks

Martin Arjovsky¹, Soumith Chintala², Léon Bottou²•Institutions (2)

Courant Institute of Mathematical Sciences¹, Facebook²

17 Jul 2017-pp 214-223

TL;DR: This work introduces a new algorithm named WGAN, an alternative to traditional GAN training that can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches.

read less

Abstract: We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical work highlighting the deep connections to different distances between distributions.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•

Spectral Normalization for Generative Adversarial Networks

[...]

Takeru Miyato¹, Toshiki Kataoka, Masanori Koyama², Yuichi Yoshida³•Institutions (3)

Kyoto University¹, Ritsumeikan University², National Institute of Informatics³

15 Feb 2018

TL;DR: In this paper, the authors proposed a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator, which is computationally light and easy to incorporate into existing implementations.

...read moreread less

Abstract: One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.

...read moreread less

2,640 citations

Proceedings Article•DOI•

StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation

[...]

Yunjey Choi¹, Minje Choi¹, Munyoung Kim², Jung-Woo Ha³, Sunghun Kim⁴, Jaegul Choo¹ - Show less +2 more•Institutions (4)

Korea University¹, The College of New Jersey², Naver Corporation³, Hong Kong University of Science and Technology⁴

18 Jun 2018

TL;DR: StarGAN as discussed by the authors proposes a unified model architecture to perform image-to-image translation for multiple domains using only a single model, which leads to superior quality of translated images compared to existing models as well as the capability of flexibly translating an input image to any desired target domain.

...read moreread less

Abstract: Recent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains. To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network. This leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. We empirically demonstrate the effectiveness of our approach on a facial attribute transfer and a facial expression synthesis tasks.

...read moreread less

2,479 citations

Proceedings Article•DOI•

Semantic Image Synthesis With Spatially-Adaptive Normalization

[...]

Taesung Park¹, Ming-Yu Liu², Ting-Chun Wang², Jun-Yan Zhu³•Institutions (3)

University of California, Berkeley¹, Nvidia², Massachusetts Institute of Technology³

18 Mar 2019

TL;DR: S spatially-adaptive normalization is proposed, a simple but effective layer for synthesizing photorealistic images given an input semantic layout that allows users to easily control the style and content of image synthesis results as well as create multi-modal results.

...read moreread less

Abstract: We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. Previous methods directly feed the semantic layout as input to the network, forcing the network to memorize the information throughout all the layers. Instead, we propose using the input layout for modulating the activations in normalization layers through a spatially-adaptive, learned affine transformation. Experiments on several challenging datasets demonstrate the superiority of our method compared to existing approaches, regarding both visual fidelity and alignment with input layouts. Finally, our model allows users to easily control the style and content of image synthesis results as well as create multi-modal results. Code is available upon publication.

...read moreread less

2,159 citations

Cites methods from "Wasserstein Generative Adversarial ..."

...s function used in pix2pixHD except that we replace the least squared loss term [28] with the hinge loss term [25,30,45]. We test several ResNet-based discriminators used in recent unconditional GANs [1,29,31] but observe similar results at the cost of a higher GPU memory requirement. Adding the SPADE to the discriminator also yields a similar performance. For the loss function, we observe that removing an...
[...]

Posted Content•

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

[...]

Yunjey Choi¹, Minje Choi¹, Munyoung Kim², Jung-Woo Ha³, Sunghun Kim⁴, Jaegul Choo¹ - Show less +2 more•Institutions (4)

Korea University¹, The College of New Jersey², Naver Corporation³, Hong Kong University of Science and Technology⁴

24 Nov 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: A unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network, which leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain.

...read moreread less

2,033 citations

Cites background or methods from "Wasserstein Generative Adversarial ..."

...(1) with Wasserstein GAN objective with gradient penalty [1, 4] defined as...
[...]
...Generative adversarial networks (GANs) [3] have shown remarkable results in various computer vision tasks such as image generation [1, 6, 23, 31], image translation [7, 8, 32], super-resolution imaging [13], and face image synthesis [9, 15, 25, 30]....
[...]

Book Chapter•DOI•

Multimodal Unsupervised Image-to-Image Translation

[...]

Xun Huang¹, Ming-Yu Liu², Serge Belongie¹, Jan Kautz²•Institutions (2)

Cornell University¹, Nvidia²

08 Sep 2018

TL;DR: In this article, the authors propose a multimodal unsupervised image-to-image (MUNIT) framework, where the image representation can be decomposed into a content code that is domain-invariant and a style code that captures domain-specific properties.

...read moreread less

Abstract: Unsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding images in the target domain, without seeing any examples of corresponding image pairs. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs from a given source domain image. To address this limitation, we propose a Multimodal Unsupervised Image-to-image \(\text{ Translation } \text{(MUNIT) }\) framework. We assume that the image representation can be decomposed into a content code that is domain-invariant, and a style code that captures domain-specific properties. To translate an image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain. We analyze the proposed framework and establish several theoretical results. Extensive experiments with comparisons to state-of-the-art approaches further demonstrate the advantage of the proposed framework. Moreover, our framework allows users to control the style of translation outputs by providing an example style image. Code and pretrained models are available at https://github.com/nvlabs/MUNIT.

...read moreread less

1,874 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

Adam: A Method for Stochastic Optimization

[...]

Diederik P. Kingma¹, Jimmy Ba²•Institutions (2)

University of Amsterdam¹, University of Toronto²

01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read moreread less

111,197 citations

Journal Article•DOI•

Generative Adversarial Nets

[...]

Ian Goodfellow¹, Jean Pouget-Abadie¹, Mehdi Mirza¹, Bing Xu¹, David Warde-Farley¹, Sherjil Ozair², Aaron Courville¹, Yoshua Bengio¹ - Show less +4 more•Institutions (2)

Université de Montréal¹, Indian Institute of Technology Delhi²

08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

...read moreread less

38,211 citations

"Wasserstein Generative Adversarial ..." refers background or methods in this paper

...GANs offer much more flexibility in the definition of the objective function, including Jensen-Shannon (Goodfellow et al., 2014), and all f -divergences (Nowozin et al., 2016) as well as some exotic combinations (Huszar, 2015)....
[...]
...This is due to the fact that mode collapse comes from the fact that the optimal generator for a fixed discriminator is a sum of deltas on the points the discriminator assigns the highest values, as observed by (Goodfellow et al., 2014) and highlighted in (Metz et al., 2016)....
[...]
...Variational Auto-Encoders (VAEs) (Kingma & Welling, 2013) and Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) are well known examples of this approach....
[...]
...• JS(Pn,P)→ 0 with JS the Jensen-Shannon di- vergence....
[...]
...Our baseline comparison is DCGAN (Radford et al., 2015), a GAN with a convolutional architecture trained with the standard GAN procedure using the− logD trick (Goodfellow et al., 2014)....
[...]

Proceedings Article•

Auto-Encoding Variational Bayes

[...]

Diederik P. Kingma¹, Max Welling¹•Institutions (1)

University of Amsterdam¹

01 Jan 2014

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

...read moreread less

Abstract: How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

...read moreread less

20,769 citations

Posted Content•

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

[...]

Alec Radford, Luke Metz, Soumith Chintala¹•Institutions (1)

Facebook¹

19 Nov 2015-arXiv: Learning

TL;DR: This work introduces a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrates that they are a strong candidate for unsupervised learning.

...read moreread less

Abstract: In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.

...read moreread less

6,759 citations

"Wasserstein Generative Adversarial ..." refers methods in this paper

...We keep the convolutional DCGAN architecture for the WGAN critic or the GAN discriminator....
[...]
...Our baseline comparison is DCGAN (Radford et al., 2015), a GAN with a convolutional architecture trained with the standard GAN procedure using the− logD trick (Goodfellow et al., 2014)....
[...]
...In other words, the JS distance saturates, the discriminator has zero loss, and the generated samples are in some cases meaningful (DCGAN generator, top right plot) and in other cases collapse to a single nonsensical image (Goodfellow et al., 2014)....
[...]
...Besides the convolutional DCGAN architecture, we also ran experiments where we replace the generator or both the generator and the critic by 4-layer ReLU-MLP with 512 hidden units....
[...]
...We illustrate this by running experiments on three generator architectures: (1) a convolutional DCGAN generator, (2) a convolutional DCGAN generator without batch normalization and with a constant number of filters (the capacity of the generator is drastically smaller than that of the discriminator), and (3) a 4-layer ReLU-MLP with 512 hidden units....
[...]

Proceedings Article•

Asynchronous methods for deep reinforcement learning

[...]

Volodymyr Mnih¹, Adrià Puigdomènech Badia¹, Mehdi Mirza², Alex Graves¹, Tim Harley¹, Timothy P. Lillicrap¹, David Silver¹, Koray Kavukcuoglu¹ - Show less +4 more•Institutions (2)

Google¹, Université de Montréal²

19 Jun 2016

TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

...read moreread less

Abstract: We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

...read moreread less

6,736 citations