Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Home
/
Papers
/
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Posted Content•

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Alec Radford, Luke Metz, Soumith Chintala¹•Institutions (1)

19 Nov 2015-arXiv: Learning-

TL;DR: This work introduces a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrates that they are a strong candidate for unsupervised learning.

read less

Abstract: In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Image-to-Image Translation with Conditional Adversarial Networks

[...]

Phillip Isola¹, Jun-Yan Zhu¹, Tinghui Zhou¹, Alexei A. Efros¹•Institutions (1)

University of California, Berkeley¹

21 Jul 2017

TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

...read moreread less

Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

...read moreread less

11,958 citations

Cites background or methods from "Unsupervised Representation Learnin..."

...We adapt our generator and discriminator architectures from those in [41]....
[...]
...Fortunately, this is exactly what is done by the recently proposed Generative Adversarial Networks (GANs) [22, 12, 41, 49, 59]....
[...]

Proceedings Article•DOI•

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

[...]

Jun-Yan Zhu¹, Taesung Park¹, Phillip Isola¹, Alexei A. Efros¹•Institutions (1)

University of California, Berkeley¹

01 Oct 2017

TL;DR: CycleGAN as discussed by the authors learns a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.

...read moreread less

Abstract: Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F : Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.

...read moreread less

11,682 citations

Posted Content•

Image-to-Image Translation with Conditional Adversarial Networks

[...]

Phillip Isola¹, Jun-Yan Zhu¹, Tinghui Zhou¹, Alexei A. Efros¹•Institutions (1)

University of California, Berkeley¹

21 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

...read moreread less

Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

...read moreread less

11,127 citations

Posted Content•

Decoupled Weight Decay Regularization

[...]

Ilya Loshchilov¹, Frank Hutter¹•Institutions (1)

University of Freiburg¹

14 Nov 2017-arXiv: Learning

TL;DR: This work proposes a simple modification to recover the original formulation of weight decay regularization by decoupling the weight decay from the optimization steps taken w.r.t. the loss function, and provides empirical evidence that this modification substantially improves Adam's generalization performance.

...read moreread less

Abstract: L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam. While common implementations of these algorithms employ L$_2$ regularization (often calling it "weight decay" in what may be misleading due to the inequivalence we expose), we propose a simple modification to recover the original formulation of weight decay regularization by \emph{decoupling} the weight decay from the optimization steps taken w.r.t. the loss function. We provide empirical evidence that our proposed modification (i) decouples the optimal choice of weight decay factor from the setting of the learning rate for both standard SGD and Adam and (ii) substantially improves Adam's generalization performance, allowing it to compete with SGD with momentum on image classification datasets (on which it was previously typically outperformed by the latter). Our proposed decoupled weight decay has already been adopted by many researchers, and the community has implemented it in TensorFlow and PyTorch; the complete source code for our experiments is available at this https URL

...read moreread less

6,909 citations

Cites methods from "Unsupervised Representation Learnin..."

...…gradient methods, such as AdaGrad (Duchi et al., 2011), RMSProp (Tieleman & Hinton, 2012), Adam (Kingma & Ba, 2014) and most recently AMSGrad (Reddi et al., 2018) have become a default method of choice for training feed-forward and recurrent neural networks (Xu et al., 2015; Radford et al., 2015)....
[...]
..., 2018) have become a default method of choice for training feed-forward and recurrent neural networks (Xu et al., 2015; Radford et al., 2015)....
[...]

Journal Article•DOI•

A survey on Image Data Augmentation for Deep Learning

[...]

Connor Shorten¹, Taghi M. Khoshgoftaar¹•Institutions (1)

Florida Atlantic University¹

06 Jul 2019-Journal of Big Data

TL;DR: This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing DataAugmentation, a data-space solution to the problem of limited data.

...read moreread less

Abstract: Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data. Unfortunately, many application domains do not have access to big data, such as medical image analysis. This survey focuses on Data Augmentation, a data-space solution to the problem of limited data. Data Augmentation encompasses a suite of techniques that enhance the size and quality of training datasets such that better Deep Learning models can be built using them. The image augmentation algorithms discussed in this survey include geometric transformations, color space augmentations, kernel filters, mixing images, random erasing, feature space augmentation, adversarial training, generative adversarial networks, neural style transfer, and meta-learning. The application of augmentation methods based on GANs are heavily covered in this survey. In addition to augmentation techniques, this paper will briefly discuss other characteristics of Data Augmentation such as test-time augmentation, resolution impact, final dataset size, and curriculum learning. This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing Data Augmentation. Readers will understand how Data Augmentation can improve the performance of their models and expand limited datasets to take advantage of the capabilities of big data.

...read moreread less

5,782 citations

Cites background or methods from "Unsupervised Representation Learnin..."

...The DCGAN [91] architecture was proposed to expand on the internal complexity of the generator and discriminator networks....
[...]
...Amongst these new architectures, DCGANs, Progressively Growing GANs, CycleGANs, and Conditional GANs seem to have the most application potential in Data Augmentation....
[...]
...The DCGAN was tested to generate results on the LSUN interior bedroom image dataset, each image being 64 × 64 × 3, for a total of 12,288 pixels, (compared to 784 in MNIST)....
[...]
...architectures, the use of super-resolution networks such as SRGAN could be an effective technique for improving the quality of outputs from a DCGAN [91] model....
[...]
...After using classical augmentations to achieve 78.6% sensitivity and 88.4% specificity, they observed an increase to 85.7% sensitivity and 92.4% specificity once they added the DCGAN-generated samples....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Posted Content•

A note on the evaluation of generative models

[...]

Lucas Theis¹, Aaron van den Oord², Matthias Bethge¹•Institutions (2)

University of Tübingen¹, Ghent University²

05 Nov 2015-arXiv: Machine Learning

TL;DR: The authors show that Parzen window estimates are largely independent of each other when the data is high-dimensional and that extrapolation from one criterion to another is not warranted and generative models need to be evaluated directly with respect to the application(s) they were intended for.

...read moreread less

Abstract: Probabilistic generative models can be used for compression, denoising, inpainting, texture synthesis, semi-supervised learning, unsupervised feature learning, and other tasks. Given this wide range of applications, it is not surprising that a lot of heterogeneity exists in the way these models are formulated, trained, and evaluated. As a consequence, direct comparison between models is often difficult. This article reviews mostly known but often underappreciated properties relating to the evaluation and interpretation of generative models with a focus on image models. In particular, we show that three of the currently most commonly used criteria---average log-likelihood, Parzen window estimates, and visual fidelity of samples---are largely independent of each other when the data is high-dimensional. Good performance with respect to one criterion therefore need not imply good performance with respect to the other criteria. Our results show that extrapolation from one criterion to another is not warranted and generative models need to be evaluated directly with respect to the application(s) they were intended for. In addition, we provide examples demonstrating that Parzen window estimates should generally be avoided.

...read moreread less

210 citations

Posted Content•

Train faster, generalize better: Stability of stochastic gradient descent

[...]

Moritz Hardt¹, Benjamin Recht², Yoram Singer•Institutions (2)

Google¹, University of California, Berkeley²

03 Sep 2015-arXiv: Learning

TL;DR: The authors showed that SGM is algorithmically stable in the sense of Bousquet and Elisseeff, and showed that it is stability-promoting in both convex and non-convex optimization problems.

...read moreread less

Abstract: We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically stable in the sense of Bousquet and Elisseeff. Our analysis only employs elementary tools from convex and continuous optimization. We derive stability bounds for both convex and non-convex optimization under standard Lipschitz and smoothness assumptions. Applying our results to the convex case, we provide new insights for why multiple epochs of stochastic gradient methods generalize well in practice. In the non-convex case, we give a new interpretation of common practices in neural networks, and formally show that popular techniques for training large deep models are indeed stability-promoting. Our findings conceptually underscore the importance of reducing training time beyond its obvious benefit.

...read moreread less

128 citations

Posted Content•

Dreaming More Data: Class-dependent Distributions over Diffeomorphisms for Learned Data Augmentation

[...]

Søren Hauberg¹, Oren Freifeld², Anders Boesen Lindbo Larsen¹, John W. Fisher², Lars Kai Hansen¹ - Show less +1 more•Institutions (2)

Technical University of Denmark¹, Massachusetts Institute of Technology²

09 Oct 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: The authors align image pairs within each class under the assumption that the spatial transformation between images belongs to a large class of diffeomorphisms, and then learn a class-specific probabilistic generative models of the transformations in a Riemannian submanifold of the Lie group.

...read moreread less

Abstract: Data augmentation is a key element in training high-dimensional models. In this approach, one synthesizes new observations by applying pre-specified transformations to the original training data; e.g.~new images are formed by rotating old ones. Current augmentation schemes, however, rely on manual specification of the applied transformations, making data augmentation an implicit form of feature engineering. With an eye towards true end-to-end learning, we suggest learning the applied transformations on a per-class basis. Particularly, we align image pairs within each class under the assumption that the spatial transformation between images belongs to a large class of diffeomorphisms. We then learn a class-specific probabilistic generative models of the transformations in a Riemannian submanifold of the Lie group of diffeomorphisms. We demonstrate significant performance improvements in training deep neural nets over manually-specified augmentation schemes. Our code and augmented datasets are available online.

...read moreread less

80 citations

Posted Content•

Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks

[...]

Alexey Dosovitskiy¹, Philipp Fischer¹, Jost Tobias Springenberg¹, Martin Riedmiller¹, Thomas Brox¹ - Show less +1 more•Institutions (1)

University of Freiburg¹

26 Jun 2014-arXiv: Learning

TL;DR: While features learned with this approach cannot compete with class specific features from supervised training on a classification task, it is shown that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.

...read moreread less

Abstract: Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a new task. In this paper, we aim for generic feature learning and present an approach for training a convolutional network using only unlabeled data. To this end, we train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled 'seed' image patch. In contrast to supervised network training, the resulting feature representation is not class specific. It rather provides robustness to the transformations that have been applied during training. This generic feature representation allows for classification results that outperform the state of the art for unsupervised learning on several popular datasets (STL-10, CIFAR-10, Caltech-101, Caltech-256). While such generic features cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.

...read moreread less

38 citations

"Unsupervised Representation Learnin..." refers background in this paper

...The performance of DCGANs is still less than that of Exemplar CNNs (Dosovitskiy et al., 2015), a technique which trains normal discriminative CNNs in an unsupervised fashion to differentiate between specifically chosen, aggressively augmented, exemplar samples from the source dataset....
[...]
...Additionally, we found leaving the momentum term β1 at the suggested value of 0.9 resulted in training oscillation and instability while reducing it to 0.5 helped stabilize training....
[...]

Posted Content•

Understanding Locally Competitive Networks

[...]

Rupesh Kumar Srivastava¹, Jonathan Masci¹, Faustino Gomez¹, Jürgen Schmidhuber¹•Institutions (1)

University of Lugano¹

05 Oct 2014-arXiv: Neural and Evolutionary Computing

TL;DR: This paper attempts to visualize and understand this self-modularization of neural network activation functions, and suggests a unified explanation for the beneficial properties of such networks.

...read moreread less

Abstract: Recently proposed neural network activation functions such as rectified linear, maxout, and local winner-take-all have allowed for faster and more effective training of deep neural architectures on large and complex datasets The common trait among these functions is that they implement local competition between small groups of computational units within a layer, so that only part of the network is activated for any given input pattern In this paper, we attempt to visualize and understand this self-modularization, and suggest a unified explanation for the beneficial properties of such networks We also show how our insights can be directly useful for efficiently performing retrieval over large datasets using neural networks

...read moreread less

9 citations

"Unsupervised Representation Learnin..." refers methods in this paper

...Similarly, using a gradient descent on the inputs lets us inspect the ideal image that activates certain subsets of filters (Mordvintsev et al.)....
[...]
...The resulting code layer activations are then binarized via thresholding the ReLU activation which has been shown to be an effective information preserving technique (Srivastava et al., 2014) and provides a convenient form of semantic-hashing, allowing for linear time de-duplication ....
[...]