“Factual” or “Emotional”: Stylized Image Captioning with Adaptive Learning and Attention
Tianlang Chen,Zhongping Zhang,Quanzeng You,Chen Fang,Zhaowen Wang,Hailin Jin,Jiebo Luo +6 more
- pp 527-543
Reads0
Chats0
TLDR
A novel stylized image captioning model that effectively takes factual and stylized knowledge into consideration and outperforms the state-of-the-art approaches, without using extra ground truth supervision is proposed.Citations
More filters
Posted Content
Evaluation of Text Generation: A Survey
TL;DR: This paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models.
Proceedings ArticleDOI
Reasoning Visual Dialogs With Structural and Partial Observations
TL;DR: This paper introduces an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers) and proposes a differentiable graph neural network (GNN) solution that approximates this process.
Proceedings ArticleDOI
MSCap: Multi-Style Image Captioning With Unpaired Stylized Text
TL;DR: An adversarial learning network is proposed for the task of multi-style image captioning (MSCap) with a standard factual image caption dataset and a multi-stylized language corpus without paired images to enable more natural and human-like captions.
Proceedings ArticleDOI
Human-like Controllable Image Captioning with Verb-specific Semantic Roles
TL;DR: In this article, the authors propose a new control signal for CIC, Verb-specific Semantic Roles (VSR), which consists of a verb and some semantic roles, which represents a targeted activity and the roles of entities involved in this activity.
Journal ArticleDOI
MemCap: Memorizing Style Knowledge for Image Captioning
TL;DR: This paper proposes MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism and extracts content-relevant style knowledge from the memory module via an attention mechanism and incorporates the extracted knowledge into a language model.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Posted Content
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Proceedings Article
Sequence to Sequence Learning with Neural Networks
TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Book ChapterDOI
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
TL;DR: In this paper, the authors combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image style transfer, where a feedforward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.