Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Convolutional Reconstruction-to-Sequence for Video Captioning

[...]

Aming Wu¹, Yahong Han¹, Yi Yang², Qinghua Hu¹, Fei Wu³ - Show less +1 more•Institutions (3)

Tianjin University¹, University of Technology, Sydney², Zhejiang University³

01 Nov 2020-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A novel CNN-based encoder-decoder framework for video captioning that first append inter-frame differences to each CNN-extracted frame feature to get a more discriminative representation, and encode each frame to be a more compact feature by a one-layer convolutional mapping, which could be taken as a reconstruction network.

...read moreread less

Abstract: Recent advances towards video captioning mainly follow an encoder-decoder (sequence-to-sequence) framework and generate captions via a recurrent neural network (RNN). However, employing RNN as the decoder (generator) is prone to diluting long-term information, which weakens its ability to capture long-term dependencies. Recently, some work has demonstrated that the convolutional neural network (CNN) could be used to model sequential information. Though strengths in representation ability and computation efficiency, CNN has not been well exploited in video captioning. The reason partially comes from the difficulty of modeling multi-modal sequence with CNN. In this paper, we devise a novel CNN-based encoder-decoder framework for video captioning. Particularly, we first append inter-frame differences to each CNN-extracted frame feature to get a more discriminative representation; then with that as the input, we encode each frame to be a more compact feature by a one-layer convolutional mapping, which could be taken as a reconstruction network. In the decoding stage, we first fuse visual and lexical feature; then we stack multiple dilated convolutional layers to form a hierarchical decoder. As long-term dependencies could be captured by a shorter path along the hierarchical structure, the decoder could alleviate the loss of long-term information. Experiments on two benchmark datasets show that our method could obtain state-of-the-art performance.

...read moreread less

25 citations

Journal Article•DOI•

Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs

[...]

Kun Fu¹, Jin Li¹, Junqi Jin¹, Changshui Zhang¹•Institutions (1)

Tsinghua University¹

05 Apr 2018-IEEE Transactions on Neural Networks

TL;DR: This paper proposes a novel method, Image-Text Surgery, to synthesize pseudoimage-sentence pairs, and introduces adaptive visual replacement, which adaptively filters unnecessary visual features in pseudodata with an attention mechanism.

...read moreread less

Abstract: Image captioning aims to generate natural language sentences to describe the salient parts of a given image. Although neural networks have recently achieved promising results, a key problem is that they can only describe concepts seen in the training image-sentence pairs. Efficient learning of novel concepts has thus been a topic of recent interest to alleviate the expensive manpower of labeling data. In this paper, we propose a novel method, Image-Text Surgery , to synthesize pseudoimage-sentence pairs. The pseudopairs are generated under the guidance of a knowledge base, with syntax from a seed data set (i.e., MSCOCO) and visual information from an existing large-scale image base (i.e., ImageNet). Via pseudodata, the captioning model learns novel concepts without any corresponding human-labeled pairs. We further introduce adaptive visual replacement, which adaptively filters unnecessary visual features in pseudodata with an attention mechanism. We evaluate our approach on a held-out subset of the MSCOCO data set. The experimental results demonstrate that the proposed approach provides significant performance improvements over state-of-the-art methods in terms of F1 score and sentence quality. An ablation study and the qualitative results further validate the effectiveness of our approach.

...read moreread less

25 citations

Proceedings Article•DOI•

Recurrent Relational Memory Network for Unsupervised Image Captioning

[...]

Dan Guo¹, Yang Wang¹, Peipei Song¹, Meng Wang¹•Institutions (1)

Hefei University of Technology¹

09 Jul 2020

TL;DR: This paper proposes a novel memory-based network rather than GAN, named Recurrent Relational Memory Network ($R^2M), which encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion.

...read moreread less

Abstract: Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. In this paper, we propose a novel memory-based network rather than GAN, named Recurrent Relational Memory Network ($R^2M$). Unlike complicated and sensitive adversarial learning that non-ideally performs for long sentence generation, $R^2M$ implements a concepts-to-sentence memory translator through two-stage memory mechanisms: fusion and recurrent memories, correlating the relational reasoning between common visual concepts and the generated words for long periods. $R^2M$ encodes visual context through unsupervised training on images, while enabling the memory to learn from irrelevant textual corpus via supervised fashion. Our solution enjoys less learnable parameters and higher computational efficiency than GAN-based methods, which heavily bear parameter sensitivity. We experimentally validate the superiority of $R^2M$ than state-of-the-arts on all benchmark datasets.

...read moreread less

25 citations

Proceedings Article•DOI•

A readability evaluation of real-time crowd captions in the classroom

[...]

Raja S. Kushalnagar¹, Walter S. Lasecki², Jeffrey P. Bigham²•Institutions (2)

Rochester Institute of Technology¹, University of Rochester²

22 Oct 2012

TL;DR: This study asked 48 deaf and hearing readers to evaluate transcripts produced by a professional captionist, ASR and crowd captioning software respectively and found the readers preferred crowd captions over professional captions and ASR.

...read moreread less

Abstract: Deaf and hard of hearing individuals need accommodations that transform aural to visual information, such as captions that are generated in real-time to enhance their access to spoken information in lectures and other live events. The captions produced by professional captionists work well in general events such as community or legal meetings, but is often unsatisfactory in specialized content events such as higher education classrooms. In addition, it is hard to hire professional captionists, especially those that have experience in specialized content areas, as they are scarce and expensive. The captions produced by commercial automatic speech recognition (ASR) software are far cheaper, but is often perceived as unreadable due to ASR's sensitivity to accents, background noise and slow response time. We ran a study to evaluate the readability of captions generated by a new crowd captioning approach versus professional captionists and ASR. In this approach, captions are typed by classmates into a system that aligns and merges the multiple incomplete caption streams into a single, comprehensive real-time transcript. Our study asked 48 deaf and hearing readers to evaluate transcripts produced by a professional captionist, ASR and crowd captioning software respectively and found the readers preferred crowd captions over professional captions and ASR.

...read moreread less

25 citations

Journal Article•DOI•

Viewing L2 captioned videos: what's in it for the listener?

[...]

Michael Yeldham¹•Institutions (1)

University of Hong Kong¹

04 May 2018-Computer Assisted Language Learning

TL;DR: Captioning is commonly used to scaffold video viewing for second language learners, with the captioning affording the learners access to authentic videos that would ordinarily be out of their reach.

...read moreread less

Abstract: Captioning is commonly used to scaffold video viewing for second language learners, with the captioning affording the learners access to authentic videos that would ordinarily be out of their reach...

...read moreread less

25 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics