Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

Improved Training of Wasserstein GANs

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

Deep learning methods employ multiple processing layers to learn hierarchical representations of data, and have produced state-of-the-art results in many domains. Recently, a variety of model designs and methods have blossomed in the context of natural language processing (NLP). In this paper, we review significant deep learning related models and methods that have been employed for numerous NLP tasks and provide a walk-through of their evolution. We also summarize, compare and contrast the various models and put forward a detailed understanding of the past, present and future of deep learning in NLP.

Recent Trends in Deep Learning Based Natural Language Processing [Review Article]

/pdf/neural-networks-and-deep-learning-3ccs3ain8p.pdf

Neural Networks and Deep Learning

With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations. However, learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions. To reduce this gap, we propose Simulated+Unsupervised (S+U) learning, where the task is to learn a model to improve the realism of a simulators output using unlabeled real data, while preserving the annotation information from the simulator. We develop a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors. We make several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts, and stabilize training: (i) a self-regularization term, (ii) a local adversarial loss, and (iii) updating the discriminator using a history of refined images. We show that this enables generation of highly realistic images, which we demonstrate both qualitatively and with a user study. We quantitatively evaluate the generated images by training models for gaze estimation and hand pose estimation. We show a significant improvement over using synthetic images, and achieve state-of-the-art results on the MPIIGaze dataset without any labeled real data.

/pdf/learning-from-simulated-and-unsupervised-images-through-uh69fs5u48.pdf

Learning from Simulated and Unsupervised Images through Adversarial Training

As a new way of training generative models, Generative Adversarial
Net (GAN) that uses a discriminative model to guide
the training of the generative model has enjoyed considerable
success in generating real-valued data. However, it has limitations
when the goal is for generating sequences of discrete
tokens. A major reason lies in that the discrete outputs from
the generative model make it difficult to pass the gradient update
from the discriminative model to the generative model.
Also, the discriminative model can only assess a complete
sequence, while for a partially generated sequence, it is nontrivial
to balance its current score and the future one once
the entire sequence has been generated. In this paper, we propose
a sequence generation framework, called SeqGAN, to
solve the problems. Modeling the data generator as a stochastic
policy in reinforcement learning (RL), SeqGAN bypasses
the generator differentiation problem by directly performing
gradient policy update. The RL reward signal comes from
the GAN discriminator judged on a complete sequence, and
is passed back to the intermediate state-action steps using
Monte Carlo search. Extensive experiments on synthetic data
and real-world tasks demonstrate significant improvements
over strong baselines.

/pdf/seqgan-sequence-generative-adversarial-nets-with-policy-3v3kdvty73.pdf

Seqgan: sequence generative adversarial nets with policy gradient

As a new way of training generative models, Generative Adversarial Nets (GAN) that uses a discriminative model to guide the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limitations when the goal is for generating sequences of discrete tokens. A major reason lies in that the discrete outputs from the generative model make it difficult to pass the gradient update from the discriminative model to the generative model. Also, the discriminative model can only assess a complete sequence, while for a partially generated sequence, it is non-trivial to balance its current score and the future one once the entire sequence has been generated. In this paper, we propose a sequence generation framework, called SeqGAN, to solve the problems. Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering.

/pdf/irgan-a-minimax-game-for-unifying-generative-and-1351hzqk42.pdf

IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

/pdf/irgan-a-minimax-game-for-unifying-generative-and-3tsyurys9c.pdf

Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any costly or dangerous active exploration. However, it is also challenging, due to the distributional shift between the offline training data and those states visited by the learned policy. Despite significant recent progress, the most successful prior methods are model-free and constrain the policy to the support of data, precluding generalization to unseen states. In this paper, we first observe that an existing model-based RL algorithm already produces significant gains in the offline setting compared to model-free approaches. However, standard model-based RL methods, designed for the online setting, do not provide an explicit mechanism to avoid the offline setting's distributional shift issue. Instead, we propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics. We theoretically show that the algorithm maximizes a lower bound of the policy's return under the true MDP. We also characterize the trade-off between the gain and risk of leaving the support of the batch data. Our algorithm, Model-based Offline Policy Optimization (MOPO), outperforms standard model-based RL algorithms and prior state-of-the-art model-free offline RL algorithms on existing offline RL benchmarks and two challenging continuous control tasks that require generalizing from data collected for a different task. The code is available at this https URL.

/pdf/mopo-model-based-offline-policy-optimization-3bvucd1dtv.pdf

Lantao Yu

Papers

Seqgan: sequence generative adversarial nets with policy gradient

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

MOPO: Model-based Offline Policy Optimization