Towards Diverse and Natural Image Descriptions via a Conditional GAN

doi:10.1109/ICCV.2017.323

Open AccessProceedings ArticleDOI

Towards Diverse and Natural Image Descriptions via a Conditional GAN

- pp 2989-2998

TLDR

This article proposed a new framework based on conditional generative adversarial networks (CGAN), which jointly learns a generator to produce descriptions conditioned on images and an evaluator to assess how well a description fits the visual content.

Abstract:

Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect. Sentences produced by existing methods, e.g. those based on RNNs, are often overly rigid and lacking in variability. This issue is related to a learning principle widely used in practice, that is, to maximize the likelihood of training samples. This principle encourages high resemblance to the “ground-truth” captions, while suppressing other reasonable descriptions. Conventional evaluation metrics, e.g. BLEU and METEOR, also favor such restrictive methods. In this paper, we explore an alternative approach, with the aim to improve the naturalness and diversity – two essential properties of human expression. Specifically, we propose a new framework based on Conditional Generative Adversarial Networks (CGAN), which jointly learns a generator to produce descriptions conditioned on images and an evaluator to assess how well a description fits the visual content. It is noteworthy that training a sequence generator is nontrivial. We overcome the difficulty by Policy Gradient, a strategy stemming from Reinforcement Learning, which allows the generator to receive early feedback along the way. We tested our method on two large datasets, where it performed competitively against real people in our user study and outperformed other methods on various tasks.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep Learning for IoT Big Data and Streaming Analytics: A Survey

Mehdi Mohammadi, +3 more

- 06 Jun 2018 -

IEEE Communications Surveys and Tutorial...

TL;DR: In this article, the authors provide a thorough overview on using a class of advanced machine learning techniques, namely deep learning (DL), to facilitate the analytics and learning in the IoT domain.

...read moreread less

Journal ArticleDOI

A Comprehensive Survey of Deep Learning for Image Captioning

Md. Zakir Hossain, +3 more

- 04 Feb 2019 -

ACM Computing Surveys

TL;DR: A comprehensive review of deep learning-based image captioning techniques can be found in this article, where the authors discuss the foundation of the techniques to analyze their performances, strengths, and limitations.

...read moreread less

Proceedings ArticleDOI

Scene Graph Generation from Objects, Phrases and Region Captions

Yikang Li, +4 more

TL;DR: Zhang et al. as mentioned in this paper proposed a multi-level scene description network (MSDN) to solve the three vision tasks jointly in an end-to-end manner, where object, phrase, and caption regions are aligned with a dynamic graph based on their spatial and semantic connections.

...read moreread less

Posted Content

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

Jie Gui, +4 more

- 20 Jan 2020 -

arXiv: Learning

TL;DR: This paper attempts to provide a review on various GANs methods from the perspectives of algorithms, theory, and applications, and compares the commonalities and differences of these GAns methods.

...read moreread less

Proceedings Article

Adversarial ranking for language generation

Kevin Lin, +4 more

TL;DR: This paper proposes a novel generative adversarial network, RankGAN, for generating high-quality language descriptions by viewing a set of data samples collectively and evaluating their quality through relative ranking scores, which helps to make better assessment which in turn helps to learn a better generator.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Journal ArticleDOI

Generative Adversarial Nets

Ian Goodfellow, +7 more

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Collapse

Towards Diverse and Natural Image Descriptions via a Conditional GAN

Citations

Deep Learning for IoT Big Data and Streaming Analytics: A Survey

A Comprehensive Survey of Deep Learning for Image Captioning

Scene Graph Generation from Objects, Phrases and Region Captions

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

Adversarial ranking for language generation

References

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Generative Adversarial Nets

Microsoft COCO: Common Objects in Context

Related Papers (5)

Show and tell: A neural image caption generator

Bleu: a Method for Automatic Evaluation of Machine Translation

Deep Residual Learning for Image Recognition

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Deep visual-semantic alignments for generating image descriptions