scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Proceedings ArticleDOI
01 Jun 2019
TL;DR: In this paper, an online gradient-based method was proposed to automatically determine question-relevant captions using an existing caption dataset, which achieved state-of-the-art VQA performance.
Abstract: Visual question answering (VQA) and image captioning require a shared body of general knowledge connecting language and vision. We present a novel approach to better VQA performance that exploits this connection by jointly generating captions that are targeted to help answer a specific visual question. The model is trained using an existing caption dataset by automatically determining question-relevant captions using an online gradient-based method. Experimental results on the VQA v2 challenge demonstrates that our approach obtains state-of-the-art VQA performance (e.g. 68.4% in the Test-standard set using a single model) by simultaneously generating question-relevant captions.

35 citations

Posted Content
TL;DR: A new metric for measuring the diversity of image captions is proposed, derived from latent semantic analysis and kernelized to use CIDEr similarity, which shows that balancing the cross-entropy loss and C IDEr reward in reinforcement learning during training can effectively control the tradeoff between diversity and accuracy of the generated captions.
Abstract: Recently, the state-of-the-art models for image captioning have overtaken human performance based on the most popular metrics, such as BLEU, METEOR, ROUGE, and CIDEr. Does this mean we have solved the task of image captioning? The above metrics only measure the similarity of the generated caption to the human annotations, which reflects its accuracy. However, an image contains many concepts and multiple levels of detail, and thus there is a variety of captions that express different concepts and details that might be interesting for different humans. Therefore only evaluating accuracy is not sufficient for measuring the performance of captioning models --- the diversity of the generated captions should also be considered. In this paper, we proposed a new metric for measuring the diversity of image captions, which is derived from latent semantic analysis and kernelized to use CIDEr similarity. We conduct extensive experiments to re-evaluate recent captioning models in the context of both diversity and accuracy. We find that there is still a large gap between the model and human performance in terms of both accuracy and diversity and the models that have optimized accuracy (CIDEr) have low diversity. We also show that balancing the cross-entropy loss and CIDEr reward in reinforcement learning during training can effectively control the tradeoff between diversity and accuracy of the generated captions.

35 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel video captioning architecture that utilizes the soft attention mechanism to choose visual concepts relevant frames based on previously generated words, and the memorization of temporal dynamics is implemented by the memory networks, which have great advantages of memorizing long-term information.

35 citations

Proceedings ArticleDOI
01 Feb 2020
TL;DR: Different approaches of image captioning such as retrieval based, template based and deep learning based as well as different evaluation techniques are presented.
Abstract: The primary purpose of image captioning is to generate a caption for an image. Image captioning needs to identify objects in image, actions, their relationship and some silent feature that may be missing in the image. After identification the next step is to generate a most relevant and brief description for the image that must be syntactically and semantically correct. It uses both computer vision concepts for identification of objects and natural language processing methods for description. It’s difficult for a machine to imitate human brain ability however researches in this field have shown a great achievement. Deep learning techniques are enough capable to handle such problems using CNN and LSTM. It can be used in many intelligent control systems and IOT based devices. In this survey paper, we are presenting different approaches of image captioning such as retrieval based, template based and deep learning based as well as different evaluation techniques.

35 citations

Proceedings ArticleDOI
04 Sep 2019
TL;DR: This paper focuses on developing a generative model connecting machine translation and computer vision to generate image description in Bahasa Indonesia using the pre-trained inception-v3 image embedding model stacked with Gated Recurrent Unit (GRU) layer.
Abstract: Recently, research on image captioning is to generate the proper description for an image given in English. No previous research has been found on image captioning to generating description in Bahasa Indonesia. In fact, quoted from Wikipedia, Bahasa Indonesia is spoken by 198.7 million people worldwide and ranked 10th for the most used languages. This paper focuses on developing a generative model connecting machine translation and computer vision to generate image description in Bahasa Indonesia. The model uses the pre-trained inception-v3 image embedding model stacked with Gated Recurrent Unit (GRU) layer. The proposed model has been trained and validated with the translated Flickr30K dataset and obtained BLEU-1, BLEU-2, BLEU-3, BLEU-4 score of 36, 17, 6, 2 respectively.

35 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334