Semi-Supervised Learning with Deep Generative Models

Home
/
Papers
/
Semi-Supervised Learning with Deep Generative Models

Posted Content•

Semi-Supervised Learning with Deep Generative Models

Diederik P. Kingma, Danilo Jimenez Rezende, Shakir Mohamed, Max Welling

20 Jun 2014-arXiv: Learning-

TL;DR: It is shown that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.

read less

Abstract: The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. Generative approaches have thus far been either inflexible, inefficient or non-scalable. We show that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.

...read moreread less

Citations

PDF

Open Access

More filters

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

Posted Content•

Improved Techniques for Training GANs

[...]

Tim Salimans¹, Ian Goodfellow², Wojciech Zaremba³, Vicki Cheung, Alec Radford¹, Xi Chen⁴ - Show less +2 more•Institutions (4)

OpenAI¹, Google², Facebook³, University of California, Berkeley⁴

10 Jun 2016-arXiv: Learning

TL;DR: In this article, the authors present a variety of new architectural features and training procedures that apply to the generative adversarial networks (GANs) framework and achieve state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN.

...read moreread less

Abstract: We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, nor do we require the model to be able to learn well without using any labels. Using our new techniques, we achieve state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: our model generates MNIST samples that humans cannot distinguish from real data, and CIFAR-10 samples that yield a human error rate of 21.3%. We also present ImageNet samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes.

...read moreread less

5,711 citations

Proceedings Article•DOI•

Image Style Transfer Using Convolutional Neural Networks

[...]

Leon A. Gatys¹, Alexander S. Ecker², Matthias Bethge²•Institutions (2)

University of Tübingen¹, Max Planck Society²

27 Jun 2016

TL;DR: A Neural Algorithm of Artistic Style is introduced that can separate and recombine the image content and style of natural images and provide new insights into the deep image representations learned by Convolutional Neural Networks and demonstrate their potential for high level image synthesis and manipulation.

...read moreread less

Abstract: Rendering the semantic content of an image in different styles is a difficult image processing task. Arguably, a major limiting factor for previous approaches has been the lack of image representations that explicitly represent semantic information and, thus, allow to separate image content from style. Here we use image representations derived from Convolutional Neural Networks optimised for object recognition, which make high level image information explicit. We introduce A Neural Algorithm of Artistic Style that can separate and recombine the image content and style of natural images. The algorithm allows us to produce new images of high perceptual quality that combine the content of an arbitrary photograph with the appearance of numerous wellknown artworks. Our results provide new insights into the deep image representations learned by Convolutional Neural Networks and demonstrate their potential for high level image synthesis and manipulation.

...read moreread less

4,888 citations

Cites background from "Semi-Supervised Learning with Deep ..."

...Such factorised representations were previously achieved only for controlled subsets of natural images such as faces under different illumination conditions and characters in different font styles [29] or handwritten digits and house numbers [17]....
[...]

Proceedings Article•

Improved techniques for training GANs

[...]

Tim Salimans¹, Ian Goodfellow², Wojciech Zaremba³, Vicki Cheung, Alec Radford¹, Xi Chen⁴ - Show less +2 more•Institutions (4)

OpenAI¹, Google², Facebook³, University of California, Berkeley⁴

05 Dec 2016

TL;DR: In this article, a variety of new architectural features and training procedures are applied to the generative adversarial networks (GANs) framework and achieved state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN.

...read moreread less

Abstract: We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. Using our new techniques, we achieve state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: our model generates MNIST samples that humans cannot distinguish from real data, and CIFAR-10 samples that yield a human error rate of 21.3%. We also present ImageNet samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes.

...read moreread less

3,332 citations

Journal Article•DOI•

Recent advances in convolutional neural networks

[...]

Jiuxiang Gu¹, Zhenhua Wang¹, Jason Kuen¹, Lianyang Ma¹, Amir Shahroudy¹, Bing Shuai¹, Ting Liu¹, Xingxing Wang¹, Gang Wang¹, Jianfei Cai¹, Tsuhan Chen¹ - Show less +7 more•Institutions (1)

Nanyang Technological University¹

01 May 2018-Pattern Recognition

TL;DR: A broad survey of the recent advances in convolutional neural networks can be found in this article, where the authors discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization and fast computation.

...read moreread less

3,125 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

Auto-Encoding Variational Bayes

[...]

Diederik P. Kingma¹, Max Welling¹•Institutions (1)

University of Amsterdam¹

01 Jan 2014

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

...read moreread less

Abstract: How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

...read moreread less

20,769 citations

"Semi-Supervised Learning with Deep ..." refers background or methods in this paper

...…semi-supervised learning by utilising an explicit model of the data density, building upon recent advances in deep generative models and scalable variational inference, namely auto-encoding variational Bayes and stochastic backpropagation (Kingma and Welling, 2014; Rezende et al., 2014)....
[...]
...This optimisation can be done jointly, without resort to the variational EM algorithm, by using deterministic reparameterisations of the expectations in the objective function, combined with Monte Carlo approximation – referred to in previous work as stochastic gradient variational Bayes (SGVB) (Kingma and Welling, 2014) or as stochastic backpropagation (Rezende et al....
[...]
...In both cases, exact inference will be intractable, but we exploit recent advances in variational inference (Kingma and Welling, 2014; Rezende et al., 2014) to efficiently obtain accurate posterior distributions for latent variables as well as to perform efficient parameter learning....
[...]
...We construct the approximate posterior distribution qφ(·) as an inference or recognition model, which has become a popular approach for efficient variational inference (Dayan, 2000; Kingma and Welling, 2014; Rezende et al., 2014; Stuhlmüller et al., 2013)....
[...]
...In this paper we answer this question by developing probabilistic models for inductive and transductive semi-supervised learning by utilising an explicit model of the data density, building upon recent advances in deep generative models and scalable variational inference (Kingma and Welling, 2014; Rezende et al., 2014)....
[...]

Proceedings Article•

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

[...]

John C. Duchi¹, Elad Hazan², Yoram Singer³•Institutions (3)

University of California, Berkeley¹, IBM², Google³

01 Jan 2010

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.

...read moreread less

Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

...read moreread less

7,244 citations

Reading Digits in Natural Images with Unsupervised Feature Learning

[...]

Yuval Netzer¹, Tao Wang¹, Adam Coates¹, Alessandro Bissacco², Bo Wu², Andrew Y. Ng² - Show less +2 more•Institutions (2)

Google¹, Stanford University²

01 Jan 2011

TL;DR: A new benchmark dataset for research use is introduced containing over 600,000 labeled digits cropped from Street View images, and variants of two recently proposed unsupervised feature learning methods are employed, finding that they are convincingly superior on benchmarks.

...read moreread less

Abstract: Detecting and reading text from natural images is a hard computer vision task that is central to a variety of emerging applications. Related problems like document character recognition have been widely studied by computer vision and machine learning researchers and are virtually solved for practical applications like reading handwritten digits. Reliably recognizing characters in more complex scenes like photographs, however, is far more difficult: the best existing methods lag well behind human performance on the same tasks. In this paper we attack the problem of recognizing digits in a real application using unsupervised feature learning methods: reading house numbers from street level photos. To this end, we introduce a new benchmark dataset for research use containing over 600,000 labeled digits cropped from Street View images. We then demonstrate the difficulty of recognizing these digits when the problem is approached with hand-designed features. Finally, we employ variants of two recently proposed unsupervised feature learning methods and find that they are convincingly superior on our benchmarks.

...read moreread less

5,311 citations

"Semi-Supervised Learning with Deep ..." refers background or methods in this paper

...We also show a similar visualisation for the street view house numbers (SVHN) data set (Netzer et al., 2011), which consists of more than 70,000 images of house numbers, in figure 3 (top)....
[...]
...Figure 1 shows these analogical fantasies for the MNIST and SVHN datasets (Netzer et al., 2011)....
[...]

Semi-Supervised Learning Literature Survey

[...]

Xiaojin Zhu

01 Jan 2005

4,189 citations

"Semi-Supervised Learning with Deep ..." refers background in this paper

...Existing generative approaches based on models such as Gaussian mixture or hidden Markov models (Zhu, 2006), have not been very successful due to the need for a large number of mixtures components or states to perform well....
[...]
...Existing generative approaches based on models such as Gaussian mixture or hidden Markov models (Zhu, 2006), have not been very successful due to the limited capacity and the need for many states to perform well....
[...]

Proceedings Article•

Semi-supervised learning using Gaussian fields and harmonic functions

[...]

Xiaojin Zhu¹, Zoubin Ghahramani¹, John Lafferty¹•Institutions (1)

Carnegie Mellon University¹

21 Aug 2003

TL;DR: An approach to semi-supervised learning is proposed that is based on a Gaussian random field model, and methods to incorporate class priors and the predictions of classifiers obtained by supervised learning are discussed.

...read moreread less

Abstract: An approach to semi-supervised learning is proposed that is based on a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The learning problem is then formulated in terms of a Gaussian random field on this graph, where the mean of the field is characterized in terms of harmonic functions, and is efficiently obtained using matrix methods or belief propagation. The resulting learning algorithms have intimate connections with random walks, electric networks, and spectral graph theory. We discuss methods to incorporate class priors and the predictions of classifiers obtained by supervised learning. We also propose a method of parameter learning by entropy minimization, and show the algorithm's ability to perform feature selection. Promising experimental results are presented for synthetic data, digit classification, and text classification tasks.

...read moreread less

3,908 citations

"Semi-Supervised Learning with Deep ..." refers background or methods in this paper

...It would be desirable to have a single principled loss function similar to (Blum et al., 2004) or (Zhu et al., 2003)....
[...]
...Graph-based methods are amongst the most popular and aim to construct a graph connecting similar observations with label information propagating through the graph from labelled to unlabelled nodes by finding the minimum energy (MAP) configuration (Blum et al., 2004; Zhu et al., 2003)....
[...]
...Graph-based methods are amongst the most popular and aim to construct a graph connecting similar observations; label information propagates through the graph from labelled to unlabelled nodes by finding the minimum energy (MAP) configuration (Blum et al., 2004; Zhu et al., 2003)....
[...]