Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Categorizing and Inferring the Relationship between the Text and Image of Twitter Posts.

[...]

Alakananda Vempala¹, Daniel Preoţiuc-Pietro²•Institutions (2)

University of North Texas System¹, University of Pennsylvania²

01 Jul 2019

TL;DR: It is shown that by combining the text and image information, a machine learning approach is built that accurately distinguishes between the relationship types and is directly used in end-user applications to optimize screen estate.

...read moreread less

Abstract: Text in social media posts is frequently accompanied by images in order to provide content, supply context, or to express feelings. This paper studies how the meaning of the entire tweet is composed through the relationship between its textual content and its image. We build and release a data set of image tweets annotated with four classes which express whether the text or the image provides additional information to the other modality. We show that by combining the text and image information, we can build a machine learning approach that accurately distinguishes between the relationship types. Further, we derive insights into how these relationships are materialized through text and image content analysis and how they are impacted by user demographic traits. These methods can be used in several downstream applications including pre-training image tagging models, collecting distantly supervised data for image captioning, and can be directly used in end-user applications to optimize screen estate.

...read moreread less

42 citations

Proceedings Article•DOI•

Sketch, Ground, and Refine: Top-Down Dense Video Captioning

[...]

Chaorui Deng¹, Shizhe Chen², Da Chen³, Yuan He³, Qi Wu¹ - Show less +1 more•Institutions (3)

University of Adelaide¹, Renmin University of China², Alibaba Group³

01 Jun 2021

TL;DR: Catt et al. as discussed by the authors proposed a Sketch, Ground, and Refine (SGR) model to generate paragraphs from a global view and then ground each event description to a video segment for detailed refinement.

...read moreread less

Abstract: The dense video captioning task aims to detect and describe a sequence of events in a video for detailed and coherent storytelling. Previous works mainly adopt a "detect-then-describe" framework, which firstly detects event proposals in the video and then generates descriptions for the detected events. However, the definitions of events are diverse which could be as simple as a single action or as complex as a set of events, depending on different semantic con-texts. Therefore, directly detecting events based on video information is ill-defined and hurts the coherency and accuracy of generated dense captions. In this work, we reverse the predominant "detect-then-describe" fashion, proposing a top-down way to first generate paragraphs from a global view and then ground each event description to a video segment for detailed refinement. It is formulated as a Sketch, Ground, and Refine process (SGR). The sketch stage first generates a coarse-grained multi-sentence paragraph to describe the whole video, where each sentence is treated as an event and gets localised in the grounding stage. In the re-fining stage, we improve captioning quality via refinement-enhanced training and dual-path cross attention on both coarse-grained event captions and aligned event segments. The updated event caption can further adjust its segment boundaries. Our SGR model outperforms state-of-the-art methods on ActivityNet Captioning benchmark under traditional and story-oriented dense caption evaluations. Code will be released at github.com/bearcatt/SGR.

...read moreread less

41 citations

Proceedings Article•DOI•

Show, Edit and Tell: A Framework for Editing Image Captions

[...]

Fawaz Sammani¹, Luke Melas-Kyriazi²•Institutions (2)

Multimedia University¹, Harvard University²

14 Jun 2020

TL;DR: Zhang et al. as discussed by the authors proposed a caption editing model consisting of two sub-modules: EditNet, a language module with an adaptive copy mechanism (Copy-LSTM) and a Selective Copy Memory Attention mechanism (SCMA), and DCNet, an LSTM-based denoising auto-encoder.

...read moreread less

Abstract: Most image captioning frameworks generate captions directly from images, learning a mapping from visual features to natural language. However, editing existing captions can be easier than generating new ones from scratch. Intuitively, when editing captions, a model is not required to learn information that is already present in the caption (i.e. sentence structure), enabling it to focus on fixing details (e.g. replacing repetitive words). This paper proposes a novel approach to image captioning based on iterative adaptive refinement of an existing caption. Specifically, our caption-editing model consisting of two sub-modules: (1) EditNet, a language module with an adaptive copy mechanism (Copy-LSTM) and a Selective Copy Memory Attention mechanism (SCMA), and (2) DCNet, an LSTM-based denoising auto-encoder. These components enable our model to directly copy from and modify existing captions. Experiments demonstrate that our new approach achieves state of-art performance on the MS COCO dataset both with and without sequence-level training.

...read moreread less

41 citations

Proceedings Article•DOI•

StructCap: Structured Semantic Embedding for Image Captioning

[...]

Fuhai Chen¹, Rongrong Ji¹, Jinsong Su¹, Yongjian Wu², Yunsheng Wu² - Show less +1 more•Institutions (2)

Xiamen University¹, Tencent²

19 Oct 2017

TL;DR: The proposed StructCap model parses a given image into key entities and their relations organized in a visual parsing tree, which is transformed and embedded under an encoder-decoder framework via visual attention.

...read moreread less

Abstract: Image captioning has attracted ever-increasing research attention in multimedia and computer vision. To encode the visual content, existing approaches typically utilize the off-the-shelf deep Convolutional Neural Network (CNN) model to extract visual features, which are sent to Recurrent Neural Network (RNN) based textual generators to output word sequence. Some methods encode visual objects and scene information with attention mechanism more recently. Despite the promising progress, one distinct disadvantage lies in distinguishing and modeling key semantic entities and their relations, which are in turn widely regarded as the important cues for us to describe image content. In this paper, we propose a novel image captioning model, termed StructCap. It parses a given image into key entities and their relations organized in a visual parsing tree, which is transformed and embedded under an encoder-decoder framework via visual attention. We give an end-to-end formulation to facilitate joint training of visual tree parser, structured semantic attention and RNN-based captioning modules. Experimental results on two public benchmarks, Microsoft COCO and Flickr30K, show that the proposed StructCap model outperforms the state-of-the-art approaches under various standard evaluation metrics.

...read moreread less

41 citations

Proceedings Article•DOI•

Dual Learning for Cross-domain Image Captioning

[...]

Wei Zhao¹, Wei Xu², Min Yang¹, Jianbo Ye³, Zhou Zhao⁴, Feng Yabing², Yu Qiao¹ - Show less +3 more•Institutions (4)

Chinese Academy of Sciences¹, Tencent², Pennsylvania State University³, Zhejiang University⁴

06 Nov 2017

TL;DR: A dual learning mechanism with a policy gradient method that generates highly rewarded captions is introduced that consistently outperforms previous methods for cross-domain image captioning.

...read moreread less

Abstract: Recent AI research has witnessed increasing interests in automatically generating image descriptions in text, which is coined as theimage captioning problem. Significant progresses have been made in domains where plenty of labeled training data (i.e. image-text pairs) are readily available or collected. However, obtaining rich annotated data is a time-consuming and expensive process, creating a substantial barrier for applying image captioning methods to a new domain. In this paper, we propose a cross-domain image captioning approach that uses a novel dual learning mechanism to overcome this barrier. First, we model the alignment between the neural representations of images and that of natural languages in the source domain where one can access sufficient labeled data. Second, we adjust the pre-trained model based on examining limited data (or unpaired data) in the target domain. In particular, we introduce a dual learning mechanism with a policy gradient method that generates highly rewarded captions. The mechanism simultaneously optimizes two coupled objectives: generating image descriptions in text and generating plausible images from text descriptions, with the hope that by explicitly exploiting their coupled relation, one can safeguard the performance of image captioning in the target domain. To verify the effectiveness of our model, we use MSCOCO dataset as the source domain and two other datasets (Oxford-102 and Flickr30k) as the target domains. The experimental results show that our model consistently outperforms previous methods for cross-domain image captioning.

...read moreread less

41 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics