scispace - formally typeset
Search or ask a question
Posted Content

Why Adopting Regularization and Normalization For Generative Adversarial Networks: A Survey

19 Aug 2020-
TL;DR: A comprehensive survey of the regularization and normalization technologies from different perspectives of GANs training can be found in this paper, where the authors systematically and comprehensively describe the different perspectives and obtain the different purposes of normalization and regularization.
Abstract: Generative Adversarial Networks (GANs) have been widely applied in different scenarios thanks to the development of deep neural networks. The proposal of original GAN is based upon the non-parametric assumption of the infinite capacity of networks. It is still unknown whether GANs can generate realistic samples without any prior information. Due to the overconfident assumption, many issues need to be addressed in GANs' training, such as non-convergence, mode collapses, gradient vanishing, overfitting, discriminator forgetting, and the sensitivity of hyperparameters. As acknowledged, regularization and normalization are common methods of introducing prior information that can be used for stabilizing training and improving discrimination. At present, many regularization and normalization methods are proposed in GANs. However, as far as we know, there is no existing survey that has particularly focused on the systematic purposes and developments of these solutions. In this work, we perform a comprehensive survey of the regularization and normalization technologies from different perspectives of GANs training. First, we systematically and comprehensively describe the different perspectives of GANs training and thus obtain the different purposes of regularization and normalization in GANs training. In accordance with the different purposes, we propose a new taxonomy and summary a large number of existing studies. Furthermore, we compare the performance of the mainstream methods on different datasets fairly and investigate the regularization and normalization technologies that have been frequently employed in SOTA GANs. Finally, we highlight the possible future studies in this area.
Citations
More filters
Posted Content
Ziqiang Li, Pengfei Xia, Xue Rui, Yanghui Hu, Bin Li 
TL;DR: In this article, two preprocessing methods, High-frequency Confusion (HFC) and High-Frequency Filter (HFF), are proposed to eliminate high-frequency differences in GANs training.
Abstract: Advancements in Generative Adversarial Networks (GANs) have the ability to generate realistic images that are visually indistinguishable from real images. However, recent studies of the image spectrum have demonstrated that generated and real images share significant differences at high frequency. Furthermore, the high-frequency components invisible to human eyes affect the decision of CNNs and are related to the robustness of it. Similarly, whether the discriminator will be sensitive to the high-frequency differences, thus reducing the fitting ability of the generator to the low-frequency components is an open problem. In this paper, we demonstrate that the discriminator in GANs is sensitive to such high-frequency differences that can not be distinguished by humans and the high-frequency components of images are not conducive to the training of GANs. Based on these, we propose two preprocessing methods eliminating high-frequency differences in GANs training: High-Frequency Confusion (HFC) and High-Frequency Filter (HFF). The proposed methods are general and can be easily applied to most existing GANs frameworks with a fraction of the cost. The advanced performance of the proposed method is verified on multiple loss functions, network architectures, and datasets.

2 citations

Posted Content
TL;DR: In this article, a new approach inspired by works on adversarial attack is proposed to stabilize the training process of GANs, which is found that sometimes the images generated by the generator play a role just like adversarial examples for discriminator, which might be a part of the reason of the unstable training.
Abstract: Generative Adversarial Networks (GANs) are the most popular models for image generation by optimizing discriminator and generator jointly and gradually. However, instability in training process is still one of the open problems for all GAN-based algorithms. In order to stabilize training, some regularization and normalization techniques have been proposed to make discriminator meet the Lipschitz continuity constraint. In this paper, a new approach inspired by works on adversarial attack is proposed to stabilize the training process of GANs. It is found that sometimes the images generated by the generator play a role just like adversarial examples for discriminator during the training process, which might be a part of the reason of the unstable training. With this discovery, we propose to introduce a adversarial training method into the training process of GANs to improve its stabilization. We prove that this DAT can limit the Lipschitz constant of the discriminator adaptively. The advanced performance of the proposed method is verified on multiple baseline and SOTA networks, such as DCGAN, WGAN, Spectral Normalization GAN, Self-supervised GAN and Information Maximum GAN.

1 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Journal ArticleDOI
08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

38,211 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.

27,821 citations

Posted Content
Sergey Ioffe1, Christian Szegedy1
TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.

17,184 citations

Journal ArticleDOI
TL;DR: In this paper, an estimation procedure based on adding small positive quantities to the diagonal of X′X was proposed, which is a method for showing in two dimensions the effects of nonorthogonality.
Abstract: In multiple regression it is shown that parameter estimates based on minimum residual sum of squares have a high probability of being unsatisfactory, if not incorrect, if the prediction vectors are not orthogonal. Proposed is an estimation procedure based on adding small positive quantities to the diagonal of X′X. Introduced is the ridge trace, a method for showing in two dimensions the effects of nonorthogonality. It is then shown how to augment X′X to obtain biased estimates with smaller mean square error.

8,091 citations