Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Integrating Part of Speech Guidance for Image Captioning

[...]

Ji Zhang¹, Kuizhi Mei¹, Yu Zheng², Jianping Fan³•Institutions (3)

Xi'an Jiaotong University¹, Xidian University², University of North Carolina at Charlotte³

01 Jan 2021-IEEE Transactions on Multimedia

TL;DR: The results have validated that the image captions generated by the proposed method contain more accurate visual information and comply with language habits and grammar rules better.

...read moreread less

Abstract: To generate an image caption, firstly, the content of the image should be fully understood; and then the semantic information contained in the image should be described using a phrase or statement that conforms to certain grammatical rules. Thus, it requires techniques from both computer vision and natural language processing to connect the two different media forms together, which is highly challenging. To adaptively adjust the effect of visual information and language information on the captioning process, in this paper, the part of speech information is proposed to novelly integrate with image captioning models based on the encoder-decoder framework. First, a part of speech prediction network is proposed to analyze and model the part of speech sequences for the words in natural language sentences; then, different mechanisms are proposed to integrate the part of speech guidance information with merge-based and inject-based image captioning models, respectively; finally, according to the integrated frameworks, a multi-task learning paradigm is proposed to facilitate model training. Experiments are conducted on two widely used image captioning datasets, Flickr30 k and COCO, and the results have validated that the image captions generated by the proposed method contain more accurate visual information and comply with language habits and grammar rules better.

...read moreread less

26 citations

Journal Article•DOI•

Dual Attention on Pyramid Feature Maps for Image Captioning

[...]

Litao Yu¹, Jian Zhang¹, Qiang Wu¹•Institutions (1)

University of Technology, Sydney¹

12 Apr 2021-IEEE Transactions on Multimedia

TL;DR: This paper proposes to apply dual attention on pyramid image feature maps to fully explore the visual-semantic correlations and improve the quality of generated sentences with the full consideration of the contextual information provided by the hidden state of the RNN controller.

...read moreread less

Abstract: Generating natural sentences from images is a fundamental learning task for visual-semantic understanding in multimedia. In this paper, we propose to apply dual attention on pyramid image feature maps to fully explore the visual-semantic correlations and improve the quality of generated sentences. Specifically, with the full consideration of the contextual information provided by the hidden state of the RNN controller, the pyramid attention can better localize the visually indicative and semantically consistent regions in images. On the other hand, the contextual information can help re-calibrate the importance of feature components by learning the channel-wise dependencies, to improve the discriminative power of visual features for better content description. We conducted comprehensive experiments on three well-known datasets: Flickr8K, Flickr30K and MS COCO, which achieved impressive results in generating descriptive and smooth natural sentences from images. Using either convolution visual features or more informative bottom-up attention features, the composite model can boost the performance of image-to-sentence translation, with a limited computational resource overhead. The proposed pyramid attention and dual attention methods are highly modular, which can be inserted into various image captioning modules to further improve the performance.

...read moreread less

26 citations

Book Chapter•DOI•

Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets

[...]

Jiuniu Wang¹, Wenjia Xu², Qingzhong Wang¹, Antoni B. Chan¹•Institutions (2)

City University of Hong Kong¹, Chinese Academy of Sciences²

23 Aug 2020

TL;DR: Wen et al. as mentioned in this paper proposed a distinctiveness metric between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images.

...read moreread less

Abstract: A wide range of image captioning models has been developed, achieving significant improvement based on popular metrics, such as BLEU, CIDEr, and SPICE. However, although the generated captions can accurately describe the image, they are generic for similar images and lack distinctiveness, i.e., cannot properly describe the uniqueness of each image. In this paper, we aim to improve the distinctiveness of image captions through training with sets of similar images. First, we propose a distinctiveness metric—between-set CIDEr (CIDErBtw) to evaluate the distinctiveness of a caption with respect to those of similar images. Our metric shows that the human annotations of each image are not equivalent based on distinctiveness. Thus we propose several new training strategies to encourage the distinctiveness of the generated caption for each image, which are based on using CIDErBtw in a weighted loss function or as a reinforcement learning reward. Finally, extensive experiments are conducted, showing that our proposed approach significantly improves both distinctiveness (as measured by CIDErBtw and retrieval metrics) and accuracy (e.g., as measured by CIDEr) for a wide variety of image captioning baselines. These results are further confirmed through a user study. Project page: https://wenjiaxu.github.io/ciderbtw/.

...read moreread less

26 citations

Proceedings Article•DOI•

Multi-Scale Cropping Mechanism for Remote Sensing Image Captioning

[...]

Xueting Zhang¹, Qi Wang¹, Shangdong Chen², Xuelong Li¹•Institutions (2)

Northwestern Polytechnical University¹, Northwest University (China)²

01 Jul 2019

TL;DR: A training mechanism of multi-scale cropping for remote sensing image captioning is proposed, which can extract more fine-grained information from remote sensing images and enhance the generalization performance of the base model.

...read moreread less

Abstract: With the rapid development of artificial satellite, a large number of high resolution remote sensing images can be easily obtained now. Recently, remote sensing image captioning, which aims to generate accurate and concise descriptive sentences for remote sensing images, has been promoted by template-based model and encoder-decoder model with several related datasets released. Based on an encoder-decoder model, we propose a training mechanism of multi-scale cropping for remote sensing image captioning in this paper, which can extract more fine-grained information from remote sensing images and enhance the generalization performance of the base model. The experimental results on two datasets UCM-captions and Sydney-captions demonstrate that the proposed approach availably improves the performances in describing high resolution remote sensing images.

...read moreread less

26 citations

Proceedings Article•DOI•

Comprehensive Information Integration Modeling Framework for Video Titling

[...]

Shengyu Zhang, Ziqi Tan, Jin Yu, Zhou Zhao, Kun Kuang, Tan Jiang, Jingren Zhou, Hongxia Yang, Fei Wu - Show less +5 more

24 Jun 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as mentioned in this paper integrated comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework.

...read moreread less

Abstract: In e-commerce, consumer-generated videos, which in general deliver consumers' individual preferences for the different aspects of certain products, are massive in volume To recommend these videos to potential consumers more effectively, diverse and catchy video titles are critical However, consumer-generated videos seldom accompany appropriate titles To bridge this gap, we integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework Although automatic video titling is very useful and demanding, it is much less addressed than video captioning The latter focuses on generating sentences that describe videos as a whole while our task requires the product-aware multi-grained video analysis To tackle this issue, the proposed method consists of two processes, ie, granular-level interaction modeling and abstraction-level story-line summarization Specifically, the granular-level interaction modeling first utilizes temporal-spatial landmark cues, descriptive words, and abstractive attributes to builds three individual graphs and recognizes the intra-actions in each graph through Graph Neural Networks (GNN) Then the global-local aggregation module is proposed to model inter-actions across graphs and aggregate heterogeneous graphs into a holistic graph representation The abstraction-level story-line summarization further considers both frame-level video features and the holistic graph to utilize the interactions between products and backgrounds, and generate the story-line topic of the video We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform, and will make the desensitized version publicly available to nourish further development of the research community

...read moreread less

26 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics