(Open Access) A Three-Player GAN: Generating Hard Samples to Improve Classification Networks (2019) | Simon Vandenhende

Q: What have the authors contributed in "A three-player gan: generating hard samples to improve classification networks" ?

The authors propose a Three-Player Generative Adversarial Network to improve classification networks. In addition to the game played between the discriminator and generator, a competition is introduced between the generator and the classifier. Furthermore, the classifier becomes more robust when trained on these difficult samples.

Q: What future works have the authors mentioned in the paper "A three-player gan: generating hard samples to improve classification networks" ?

Since their method simply relies on backpropagation, future research can look whether the idea also applies to different tasks.

A Three-Player GAN: Generating Hard Samples To Improve

Classiﬁcation Networks

Simon Vandenhende, Bert De Brabandere, Davy Neven and Luc Van Gool

KU Leuven

ESAT-PSI, Belgium

firstname.lastname@esat.kuleuven.be

Abstract

We propose a Three-Player Generative Adversarial

Network to improve classiﬁcation networks. In addi-

tion to the game played between the discriminator and

generator, a competition is introduced between the gen-

erator and the classiﬁer. The generator’s objective is to

synthesize samples that are both realistic and hard to la-

bel for the classiﬁer. Even though we make no assump-

tions on the type of augmentations to learn, we ﬁnd

that the model is able to synthesize realistically look-

ing examples that are hard for the classiﬁcation model.

Furthermore, the classiﬁer becomes more robust when

trained on these diﬃcult samples. The method is eval-

uated on a public dataset for traﬃc sign recognition.

1 Introduction

Deep convolutional neural networks have brought

signiﬁcant progress to the area of computer vi-

sion. However, training the models still requires vast

amounts of data. As intelligent vision systems are

being deployed in increasingly dynamic environments,

collecting the necessary data becomes a tedious task.

Recent work in generative modeling, based on Gen-

erative Adversarial Networks (GANs) [1–4], allows to

eﬃciently synthesize novel samples that belong to the

data distribution. GANs derive the data distribution

from an adversarial game, played between two entities:

the generator G synthesizes new samples, and the dis-

criminator D tries to separate real samples from the

ones synthesized by G. The goal of the generator is

to confuse D so that it cannot discriminate between

real and fake examples. The game ends when the two

players are at a Nash equilibrium.

GANs prove useful to improve the performance of

classiﬁcation networks. For example, [5] proposes an

adversarial approach which jointly optimizes the data

augmentation and a network for pose estimation. The

generator learns to synthesize augmentations from the

training data that are hard to label for the classiﬁ-

cation network. The augmentations are composed of

rotations, scaling transformations and occlusions.

Furthermore, [6–9] have successfully employed

GANs in a semi-supervised learning setting. In [6], the

discriminator learns the classiﬁcation task from unla-

beled data. The discriminator has to classify each sam-

ple into a chosen number of categories. Since the condi-

tional distribution p (c|x) is unknown, a goodness of ﬁt

measure is included to ensure correspondence between

the categories and the class labels. [8] trains a classi-

ﬁer in a semi-supervised manner by considering images

from the GAN as samples from an additional class. [9]

trains the classiﬁer and the generative model simulta-

neously. They ﬁnd that both generator and classiﬁer

represent a conditional distribution between labels and

images. This observation leads to a compatibility cri-

terion between the generator and classiﬁer.

Our work implements a three-player adversarial

game in which the classiﬁcation network participates.

The generator adapts itself to both the discriminator

and classiﬁer. This allows the generator to estimate the

distribution of samples that are hard to label correctly

for the classiﬁer. In contrast to [5], our work does not

restrict the type of augmentations that can be learned.

Also, the proposed method simply relies on backprop-

agation, which makes it a very general approach. We

show that the three-player game can improve classiﬁ-

cation networks, when annotated data is scarce. The

proposed method is evaluated on CURE-TSR [10], a

publicly available dataset for traﬃc sign recognition.

2 Method

A regular Generative Adversarial Network [1] com-

prises a min-max game, played between the discrim-

inator D and generator G. Additionally, we now in-

troduce a competition between the generator and clas-

siﬁer. The objective for G changes from synthesizing

images that are realistic, to generating images that are

both realistic and challenging for the classiﬁcation net-

work.

As before, the discriminator is trained to predict

whether a sample is real or fake. The generator, in

turn, optimizes the sum of two losses. The ﬁrst term

is the regular GAN loss, provided by the discrimina-

tor. In order for the generator to compete with the

classiﬁer, the second loss term needs to be chosen ap-

propriately. To this end, backpropagation should yield

the maximization of the classiﬁcation model’s loss, on

samples from G. This encourages G to move towards

the distribution of samples that confuse the classiﬁer.

The classiﬁer is trained by minimizing the classiﬁca-

tion loss on samples from G. The game is played by

arXiv:1903.03496v1 [cs.CV] 8 Mar 2019

Generator Discriminator

Classiﬁer

Class

Noise

Source

Loss

Class

Loss

Backward Pass

Gradient Reversal

Layer

Lambda

Figure 1: Setup for the three-player game. Images

from the generator are propagated through both the

discriminator D and the classiﬁer C. The gradient that

is backpropagated through D proceeds as usual. The

gradient that is backpropagated through C is rescaled

and inverted as −λ∇

. The loss from D penalizes

G for synthesizing unrealistic samples, while the in-

verted loss from C rewards G for synthesizing diﬃcult

samples.

updating all three models one after another.

Inspired by [11], the objective for the second loss

term, seen by G, is realized by implementing a gradi-

ent reversal layer between the generator and classiﬁer.

During the forward pass, samples from G are simply

passed to the classiﬁcation network. When backpropa-

gating the classiﬁcation loss, the sign of the gradient is

reversed, causing the update in G to maximize the clas-

siﬁcation loss. This technique is related to [12], which

ﬁnds adversarial examples by applying perturbations

that lie along directions where the classiﬁcation loss is

likely to increase. The setup of our system is shown in

ﬁgure 1.

The three-player GAN shows some similarities with

auxiliary classiﬁer GANs (ACGANs) [13]. In the AC-

GAN model, the discriminator categorizes the images

in addition to predicting their source. This allows the

discriminator to be deployed as a classiﬁcation model.

There are two main diﬀerences with our approach.

First, in the three-player game, the generator tries to

maximize the classiﬁcation loss rather than minimize

it. The focus of this work is on the generation of hard

samples. Secondly, the three-player GAN separates the

network architecture of the discriminator and classi-

ﬁer. This allows to specialize the architecture of the

discriminator and classiﬁer for their respective tasks.

The complete training procedure for the three-player

game is deﬁned in algorithm 1. A hyperparameter λ is

introduced to weigh the classiﬁcation loss against the

discriminative loss.

Algorithm 1: The three-player GAN

for number of training iterations do

•Sample a batch (x

, y

) of size m from the

generator, and a batch (x, y) of size m from

the training data.

•Update the discriminator by ascending its

stochastic gradient:

∇

(x,y)

log D (x, y)

)

log (1 − D (x

, y

))

•Sample a batch (x

, y

) of size m from the

generator.

•Update the generator by descending its

stochastic gradient:

∇

)

log (1 − D (x

, y

))

− λ∇





)

, C (y

))





•Sample a batch (x, y) of size m.

•Update the classiﬁer by descending its

stochastic gradient:

∇





(x,y)

(x, C (y))





end

3 Experiments

We ﬁrst consider a toy example, which demonstrates

that the three-player game acts as a regularizer for the

decision surface of the classiﬁer. In the second part

we evaluate our method on CURE-TSR [10]. Both ex-

periments compare the performance of a classiﬁcation

network trained through the three-player game against

several other training scenarios.

3.1 Training details

We initialize the discriminator and generator in

the three-player game by training a conditional GAN.

When updating the classiﬁer, we sample batches con-

taining both real images, images synthesized by the

initial generator and images synthesized by the cur-

rent generator. The samples from the initial generator

(a) Trained on real samples. (b) Trained on real and synthesized

samples.

diﬃcult samples.

Figure 2: Decision surface of the classiﬁcation model at the end of diﬀerent training procedures.

serve to avoid catastrophic forgetting of examples that

are diﬃcult early on.

The learning rates and the weighing parameter λ are

updated according to the scheme from [6],

λ =

2 · w

1 + exp (−10 · p)

− 1, µ =

(1 + α · p)

with p the training progress growing linearly from 0

to 1, α = 10, β = 0.75, w

= 0.1 and µ

the initial

learning rate. The value of w

is chosen smaller than

one to ensure that synthesizing realistic samples has

priority over synthesizing diﬃcult ones. The weighing

parameter λ gradually grows during training, allowing

the generator to come up with diﬃcult samples even

when the classiﬁcation model becomes better.

3.2 Toy example

We demonstrate that the three-player GAN eﬀec-

tively acts as a regularizer, by means of a toy ex-

ample. Consider the case where samples from two

classes need to be separated. Both classes are dis-

tributed as two-dimensional Gaussians, parameterized

by µ

= ±1, µ

= ±1 and σ

= σ

= 0.5. The train-

ing data consists of eight examples per class, drawn as

dots and crosses in ﬁgure 2. The classiﬁer, represented

as a simple linear mapping, is trained using a hinge

loss.

A baseline classiﬁer and conditional GAN are

trained using the available training examples. A sec-

ond classiﬁcation model is trained on a combination of

real and synthesized samples. Thirdly, we also train

a classiﬁcation model based on the three-player game.

For this particular example, we initialize the classiﬁer

as the baseline model and freeze its parameters. The

game is played for a few epochs, allowing the generator

to estimate the distribution of samples that are diﬃcult

for the baseline model. The parameters of the classi-

ﬁcation model were initialized randomly. To ensure

a fair comparison, we made sure that the classiﬁcation

model uses the same initial weights. Figure 2 shows the

decision boundary of the classiﬁcation models obtained

by diﬀerent training schemes. Through comparison we

ﬁnd that the three-player GAN is able to regularize the

decision surface.

Consider again the two Gaussian distributions from

before, but with an increased variance. When sampling

from the two classes, we ﬁnd that the distributions

show a signiﬁcant overlap near the origin. If the three-

player game behaves as intended, we expect the gen-

erator to synthesize samples which lie near the origin.

We train a classiﬁcation model and a conditional GAN

by sampling from the two Gaussian distributions. Af-

terwards, the generator is updated through the three-

player game in order to synthesize diﬃcult samples.

Figure 3 shows the results. We ﬁnd that the genera-

tor learns to synthesize samples at locations where the

classiﬁer has a hard time.

3.3 CURE-TSR

The CURE-TSR dataset [10] is composed of both

real and simulated images of 14 traﬃc sign classes un-

der various weather conditions. The set of simulated

traﬃc sign instances is considered here under the fol-

lowing conditions: clear weather, low-mid-high levels

of snow, low-mid-high levels of rain and low-mid-high

levels of dark weather. For the training (resp. valida-

tion) set we took the ﬁrst 100 (resp. last 50) images per

class from each weather condition. Since the data con-

tains sequences of images for which the camera gradu-

ally moves closer to the traﬃc sign, the data selection

comes down to using only a few of such sequences.

Again, we train a classiﬁer and conditional GAN us-

ing the available training data. A second classiﬁcation

model is trained using both real samples and samples

synthesized by the conditional GAN. Thirdly, we com-

pare with an auxiliary classiﬁer GAN. Finally, a clas-

siﬁer is learned by means of the three-player game. As

mentioned in section 3.1, the discriminator and gener-

ator are initialized as the models from the conditional

GAN.

(a) The true distribution consists of two classes, both dis-

tributed as two-dimensional Gaussians.

(b) Distribution learned by generator during the three-

player game. The generator synthesizes samples located

in the area where the two class distributions overlap.

Figure 3: The real distribution consists of two classes

which partially overlap. Both are two-dimensional

Gaussians of which the mean is indicated as a dot.

The circles in the ﬁgure correspond to multiples of the

standard deviation. We ﬁnd that the three-player GAN

learns to synthesize samples at locations where the two

classes overlap. These are samples which are hard to

label correctly for a classiﬁcation model.

Figure 4: Images generated during the three-player

game.

The network architecture for the classiﬁer is based

on one column from the multi-column deep neural net-

work used for traﬃc sign recognition in [14]. The dis-

criminator and generator networks are based on ear-

lier work [2]. More details can be found in the sup-

plemental materials. The classiﬁcation network was

trained for 150 epochs with an Adam Optimizer [15]

(µ

= 0.001, β

= 0.5, β

= 0.999). The learning rate

is degraded by a factor 10 every 60 epochs. A weight

decay term of 1e − 4 is included in the classiﬁca-

tion loss. The conditional GAN was trained for 500

epochs using batches of size 64. For the auxiliary clas-

siﬁer GAN, we reused the architecture and training

scheme from the original work [13]. We used an Adam

Optimizer with the same learning parameters as [16]

(µ

= 0.0002, β

= 0.0, β

= 0.9). The initial learning

rates for the three-player game are the same as for the

other training strategies. The results can be found in

table 1. We ﬁnd that training the classiﬁcation model

by means of the three-player game improves the test

accuracy. Figure 4 shows images that were generated

during the three-player game.

Table 1: Accuracy on the CURE-TSR test set for dif-

ferent training schemes.

Method Without real data With real data

Baseline - 83.38

cGAN 77.08 83.23

ACGAN - 79.23

Threeplayer 79.83 85.41

4 Conclusion

We have proposed an eﬀective, yet simple method

to improve classiﬁcation networks, by having a genera-

tive model synthesize diﬃcult samples. The method is

based on a regular GAN game, but includes an adver-

sarial loss which steers the generator towards diﬃcult

samples. In comparison to previous work, we do not

restrict, nor limit the kind of augmentations that the

generative model can learn. We ﬁnd that the gener-

ative model is able to synthesize realistically looking

images which are hard to label correctly for the clas-

siﬁcation model. Since our method simply relies on

backpropagation, future research can look whether the

idea also applies to diﬀerent tasks.

Acknowledgement: The work was supported by

Toyota, and was carried out at the TRACE Lab at

KU Leuven (Toyota Research on Automated Cars in

Europe - Leuven).

5 Supplemental Materials

Network Architectures - CURE

Table 2: Discriminator

Operation Features Output size

48 x 48 input 3

ResNet block 48 24 x 24

ResNet block 96 12 x 12

ResNet block 192 6 x 6

ResNet block 384 3 x 3

ReLU

Sum Pool 384 1 x 1

Linear 1

Filter size 3 x 3

Initialization Xavier -

√

Table 3: Generator

Operation Features Output size

100 x 1 noise

Linear 384 3 x 3

ResNet block 384 6 x 6

ResNet block 192 12 x 12

ResNet block 96 24 x 24

ResNet block 48 48 x 48

BatchNorm

ReLU

Convolution 3 48 x 48

Filter size 3 x 3

Initialization Xavier -

√

Table 4: Classiﬁer

Operation Features Kernel Nonlinearity

Convolution 100 7 x 7 ReLU

Convolution 150 4 x 4 ReLU

Convolution 250 4 x 4 ReLU

Linear 300 ReLU

Linear 43

Initialization Xavier -

√

Batch normalization after each convolution

Dropout (p = 0.5) in linear layers

References

[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,

D. Warde-Farley, S. Ozair, A. Courville, and Y. Ben-

gio, “Generative adversarial nets,” in Advances in neu-

ral information processing systems, 2014, pp. 2672–

2680.

[2] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida,

“Spectral normalization for generative adversarial net-

works,” arXiv preprint arXiv:1802.05957, 2018.

[3] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena,

“Self-attention generative adversarial networks,” arXiv

preprint arXiv:1805.08318, 2018.

[4] A. Brock, J. Donahue, and K. Simonyan, “Large scale

gan training for high ﬁdelity natural image synthesis,”

arXiv preprint arXiv:1809.11096, 2018.

[5] X. Peng, Z. Tang, F. Yang, R. S. Feris, and D. Metaxas,

“Jointly optimize data augmentation and network train-

ing: Adversarial data augmentation in human pose es-

timation,” in Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition, 2018, pp.

2226–2234.

[6] J. T. Springenberg, “Unsupervised and semi-supervised

learning with categorical generative adversarial net-

works,” arXiv preprint arXiv:1511.06390, 2015.

[7] A. Odena, “Semi-supervised learning with generative

adversarial networks,” arXiv preprint arXiv:1606.01583,

2016.

[8] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung,

A. Radford, and X. Chen, “Improved techniques for

training gans,” in Advances in Neural Information Pro-

cessing Systems, 2016, pp. 2234–2242.

[9] C. Li, K. Xu, J. Zhu, and B. Zhang, “Triple generative

adversarial nets,” arXiv preprint arXiv:1703.02291, 2017.

[10] D. Temel, G. Kwon, M. Prabhushankar, and G. Al-

Regib, “Cure-tsr: Challenging unreal and real envi-

ronments for traﬃc sign recognition,” arXiv preprint

arXiv:1712.02463, 2017.

[11] Y. Ganin and V. Lempitsky, “Unsupervised domain

adaptation by backpropagation,” arXiv preprint arXiv:1409.7495,

2014.

[12] I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining

and harnessing adversarial examples,” 2015.

[13] A. Odena, C. Olah, and J. Shlens, “Conditional image

A Three-Player GAN: Generating Hard Samples to Improve Classification Networks

Figures

Citations

Synthesizing Diverse Lung Nodules Wherever Massively: 3D Multi-Conditional GAN-Based CT Image Augmentation for Object Detection

Learning More with Less: Conditional PGGAN-based Data Augmentation for Brain Metastases Detection Using Highly-Rough Annotation on MR Images

Tripartite-GAN: Synthesizing liver contrast-enhanced MRI to improve tumor detection.

Combining Noise-to-Image and Image-to-Image GANs: Brain MR Image Augmentation for Tumor Detection

Model-Based Robust Deep Learning: Generalizing to Natural, Out-of-Distribution Data

References

Adam: A Method for Stochastic Optimization

Generative Adversarial Nets

Explaining and Harnessing Adversarial Examples

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Improved techniques for training GANs

Related Papers (5)

Generative Adversarial Nets

U-Net: Convolutional Networks for Biomedical Image Segmentation

Image-to-Image Translation with Conditional Adversarial Networks

Visualizing Data using t-SNE

Deep Residual Learning for Image Recognition

Frequently Asked Questions (2)

Q1. What have the authors contributed in "A three-player gan: generating hard samples to improve classification networks" ?

Q2. What future works have the authors mentioned in the paper "A three-player gan: generating hard samples to improve classification networks" ?