Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

SODA: Story Oriented Dense Video Captioning Evaluation Framework.

[...]

Soichiro Fujita¹, Tsutomu Hirao², Hidetaka Kamigaito¹, Manabu Okumura¹, Masaaki Nagata² - Show less +1 more•Institutions (2)

Tokyo Institute of Technology¹, Nippon Telegraph and Telephone²

23 Aug 2020

TL;DR: A new evaluation framework, Story Oriented Dense video cAptioning evaluation framework (SODA), is proposed for measuring the performance of video story description systems and it is shown that SODA tends to give lower scores than the current evaluation framework in evaluating captions in the incorrect order.

...read moreread less

Abstract: Dense Video Captioning (DVC) is a challenging task that localizes all events in a short video and describes them with natural language sentences. The main goal of DVC is video story description, that is, to generate a concise video story that supports human video comprehension without watching it. In recent years, DVC has attracted increasing attention in the vision and language research community, and has been employed as a task of the workshop, ActivityNet Challenge. In the current research community, the official scorer provided by ActivityNet Challenge is the de-facto standard evaluation framework for DVC systems. It computes averaged METEOR scores for matched pairs between generated and reference captions whose Intersection over Union (IoU) exceeds a specific threshold value. However, the current framework does not take into account the story of the video or the ordering of captions. It also tends to give high scores to systems that generate several hundred redundant captions, that humans cannot read. This paper proposes a new evaluation framework, Story Oriented Dense video cAptioning evaluation framework (SODA), for measuring the performance of video story description systems. SODA first tries to find temporally optimal matching between generated and reference captions to capture the story of a video. Then, it computes METEOR scores for the matching and derives F-measure scores from the METEOR scores to penalize redundant captions. To demonstrate that SODA gives low scores for inadequate captions in terms of video story description, we evaluate two state-of-the-art systems with it, varying the number of captions. The results show that SODA gives low scores against too many or too few captions and high scores against captions whose number equals to that of a reference, while the current framework gives good scores for all the cases. Furthermore, we show that SODA tends to give lower scores than the current evaluation framework in evaluating captions in the incorrect order.

...read moreread less

19 citations

Proceedings Article•DOI•

GL-RG: Global-Local Representation Granularity for Video Captioning

[...]

Liqian Yan, Qiang Wang, Yiming Cui, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, Dongfang Liu - Show less +3 more

22 May 2022

TL;DR: The proposed GL-RG framework for video captioning, namely a Global-Local Representation Granularity, demonstrates three advantages over the prior efforts: it explicitly exploit extensive visual representations from different video ranges to improve linguistic expression, and develops an incremental training strategy which organizes model learning in an incremental fashion to incur an optimal captioning behavior.

...read moreread less

Abstract: Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local representation across video frames for caption generation, leaving plenty of room for improvement. In this work, we approach the video captioning task from a new perspective and propose a GL-RG framework for video captioning, namely a Global-Local Representation Granularity. Our GL-RG demonstrates three advantages over the prior efforts: 1) we explicitly exploit extensive visual representations from different video ranges to improve linguistic expression; 2) we devise a novel global-local encoder to produce rich semantic vocabulary to obtain a descriptive granularity of video contents across frames; 3) we develop an incremental training strategy which organizes model learning in an incremental fashion to incur an optimal captioning behavior. Experimental results on the challenging MSR-VTT and MSVD datasets show that our DL-RG outperforms recent state-of-the-art methods by a significant margin. Code is available at https://github.com/ylqi/GL-RG.

...read moreread less

19 citations

Proceedings Article•DOI•

Neural Audio Captioning Based on Conditional Sequence-to-Sequence Model

[...]

Shota Ikawa, Kunio Kashino

01 Jan 2019

TL;DR: An audio captioning system that describes non-speech audio signals in the form of natural language that can generate a sentence describing sounds, rather than an object label or onomatopoeia, is proposed.

...read moreread less

Abstract: We propose an audio captioning system that describes non-speech audio signals in the form of natural language. Unlike existing systems, this system can generate a sentence describing sounds, rather than an object label or onomatopoeia. This allows the description to include more information, such as how the sound is heard and how the tone or volume changes over time, and can accommodate unknown sounds. A major problem in realizing this capability is that the validity of the description depends not only on the sound itself but also on the situation or context. To address this problem, a conditional sequence-to-sequence model is proposed. In this model, a parameter called “specificity” is introduced as a condition to control the amount of information contained in the output text and generate an appropriate description. Experiments show that the model works effectively.

...read moreread less

19 citations

Journal Article•DOI•

Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking

[...]

Rajarshi Biswas¹, Michael Barz¹, Daniel Sonntag¹•Institutions (1)

German Research Centre for Artificial Intelligence¹

01 Dec 2020-Künstliche Intelligenz

TL;DR: This work aims at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features, and shows that interactive re-ranking of beam search candidates has the potential to outperform the state of the art in image captioning.

...read moreread less

Abstract: Image captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level features and the low-level features obtained from the object specific salient regions of the input image. We embed the content of bounding boxes from a pre-trained Mask R-CNN model. This delivers state-of-the-art performance, while it provides explanatory features. Further, we discuss how interactive model improvement can be realized through re-ranking caption candidates using beam search decoders and explanatory features. We show that interactive re-ranking of beam search candidates has the potential to outperform the state-of-the-art in image captioning.

...read moreread less

19 citations

Journal Article•DOI•

SD-RSIC: Summarization Driven Deep Remote Sensing Image Captioning

[...]

Gencer Sumbul¹, Sonali Nayak¹, Begum Demir¹•Institutions (1)

Technical University of Berlin¹

15 Jun 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel summarization-driven RS image captioning (SD-RSIC) approach that summarizes the ground-truth captions of each training image into a single caption by exploiting sequence to sequence neural networks and eliminates the redundancy present in the training set.

...read moreread less

Abstract: Deep neural networks (DNNs) have been recently found popular for image captioning problems in remote sensing (RS). Existing DNN based approaches rely on the availability of a training set made up of a high number of RS images with their captions. However, captions of training images may contain redundant information (they can be repetitive or semantically similar to each other), resulting in information deficiency while learning a mapping from the image domain to the language domain. To overcome this limitation, in this paper, we present a novel Summarization Driven Remote Sensing Image Captioning (SD-RSIC) approach. The proposed approach consists of three main steps. The first step obtains the standard image captions by jointly exploiting convolutional neural networks (CNNs) with long short-term memory (LSTM) networks. The second step, unlike the existing RS image captioning methods, summarizes the ground-truth captions of each training image into a single caption by exploiting sequence to sequence neural networks and eliminates the redundancy present in the training set. The third step automatically defines the adaptive weights associated to each RS image to combine the standard captions with the summarized captions based on the semantic content of the image. This is achieved by a novel adaptive weighting strategy defined in the context of LSTM networks. Experimental results obtained on the RSCID, UCM-Captions and Sydney-Captions datasets show the effectiveness of the proposed approach compared to the state-of-the-art RS image captioning approaches. The code of the proposed approach is publicly available at this https URL.

...read moreread less

19 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics