Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning.

[...]

Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Cho-Jui Hsieh - Show less +1 more

06 Dec 2017

TL;DR: This paper designs three approaches for crafting adversarial examples in image captioning: a targeted caption method; a targeted keyword method; and an untargeted method, and formulate the process of finding adversarial perturbations as optimization problems and design novel loss functions for efficient search.

...read moreread less

Abstract: Modern neural image captioning systems typically adopt the encoder-decoder framework consisting of two principal components: a convolutional neural network (CNN) for image feature extraction and a recurrent neural network (RNN) for caption generation. Inspired by the robustness analysis of CNN-based image classifiers to adversarial perturbations, we propose \textbf{Show-and-Fool}, a novel algorithm for crafting adversarial examples in neural image captioning. Unlike image classification tasks with a finite set of class labels, finding visually-similar adversarial examples in an image captioning system is much more challenging since the space of possible captions in a captioning system is almost infinite. In this paper, we design three approaches for crafting adversarial examples in image captioning: (i) targeted caption method; (ii) targeted keyword method; and (iii) untargeted method. We formulate the process of finding adversarial perturbations as optimization problems and design novel loss functions for efficient search. Experimental results on the Show-and-Tell model and MSCOCO data set show that Show-and-Fool can successfully craft visually-similar adversarial examples with randomly targeted captions, and the adversarial examples can be made highly transferable to the Show-Attend-and-Tell model. Consequently, the presence of adversarial examples leads to new robustness implications of neural image captioning. To the best of our knowledge, this is the first work on crafting effective adversarial examples for image captioning tasks.

...read moreread less

48 citations

Posted Content•

XGPT: Cross-modal Generative Pre-Training for Image Captioning

[...]

Qiaolin Xia, Haoyang Huang, Nan Duan, Dongdong Zhang, Lei Ji, Zhifang Sui, Edward Cui, Taroon Bharti, Xin Liu, Ming Zhou - Show less +6 more

03 Mar 2020-arXiv: Computation and Language

TL;DR: XGPT is a new method of Cross-modal Generative Pre-Training for Image Captioning that is designed to pre-train text-to-image caption generators through three novel generation tasks, including Image-conditioned Masked Language Modeling (IMLM), Image- Conditioned Denoising Autoencoding (IDA), and Text-conditioning Image Feature Generation (TIFG).

...read moreread less

Abstract: While many BERT-based cross-modal pre-trained models produce excellent results on downstream understanding tasks like image-text retrieval and VQA, they cannot be applied to generation tasks directly. In this paper, we propose XGPT, a new method of Cross-modal Generative Pre-Training for Image Captioning that is designed to pre-train text-to-image caption generators through three novel generation tasks, including Image-conditioned Masked Language Modeling (IMLM), Image-conditioned Denoising Autoencoding (IDA), and Text-conditioned Image Feature Generation (TIFG). As a result, the pre-trained XGPT can be fine-tuned without any task-specific architecture modifications to create state-of-the-art models for image captioning. Experiments show that XGPT obtains new state-of-the-art results on the benchmark datasets, including COCO Captions and Flickr30k Captions. We also use XGPT to generate new image captions as data augmentation for the image retrieval task and achieve significant improvement on all recall metrics.

...read moreread less

48 citations

Journal Article•DOI•

Using closed captioned television to teach reading to adults

[...]

Rita M. Bean¹, Robert M. Wilson²•Institutions (2)

University of Pittsburgh¹, University of Maryland, College Park²

01 Jun 1989-Reading Research and Instruction

TL;DR: The use of closed captioned television to teach reading to adults was investigated in this article, using a pre-experimental design, and the results indicated that students improved significantly on a prepost word recognition measure; however, student performance did not differ across treatments.

...read moreread less

Abstract: The use of closed captioned television to teach reading to adults was investigated in this study, using a pre‐experimental design. Of most interest were the effects of the use of closed captioned television as a medium for sight vocabulary development. Also of interest were students’ reactions to using closed television as a means of reading instruction. Results indicated that, overall, students improved significantly on a pre‐post word recognition measure; however, student performance did not differ across treatments. Also, there were no significant differences among groups on measures administered after each lesson. Moreover, the group using closed captioned television, without instruction, did evidence a degree of success in reaching a specific criterion level on weekly sight vocabulary tests. Finally, student attitudes toward closed captioned television were extremely positive, not only toward its use as a means of learning to read, but as a means of increasing general knowledge. This pre‐exp...

...read moreread less

48 citations

Proceedings Article•DOI•

What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator

[...]

Marc Tanti¹, Albert Gatt², Kenneth P. Camilleri¹•Institutions (2)

University of Malta¹, University of Aberdeen²

01 Aug 2017

TL;DR: It is found that, in general, late merging outperforms injection, suggesting that RNNs are better viewed as encoders, rather than generators.

...read moreread less

Abstract: Image captioning has evolved into a core task for Natural Language Generation and has also proved to be an important testbed for deep learning approaches to handling multimodal representations. Most contemporary approaches rely on a combination of a convolutional network to handle image features, and a recurrent network to encode linguistic information. The latter is typically viewed as the primary “generation” component. Beyond this high-level characterisation, a CNN+RNN model supports a variety of architectural designs. The dominant model in the literature is one in which visual features encoded by a CNN are “injected” as part of the linguistic encoding process, driving the RNN’s linguistic choices. By contrast, it is possible to envisage an architecture in which visual and linguistic features are encoded separately, and merged at a subsequent stage. In this paper, we address two related questions: (1) Is direct injection the best way of combining multimodal information, or is a late merging alternative better for the image captioning task? (2) To what extent should a recurrent network be viewed as actually generating, rather than simply encoding, linguistic information?

...read moreread less

48 citations

Patent•

Language filter for home TV

[...]

J. Bray

06 Sep 2001

TL;DR: In this article, a method and an apparatus for use in connection with home television video recording, playback, and viewing involving processing an electronic signal, including audio and video information, whereby the audio information, including digital representations thereof, is analyzed and modified to compare words and phrases represented in the audio and phrases stored in electronic memory for elimination of undesirable words or phrases in audible or visible representations of the audio with options for replacing undesirable words with acceptable words.

...read moreread less

Abstract: A method and an apparatus for use in connection with home television video recording, playback, and viewing involving processing an electronic signal, including audio and video information, whereby the audio information, including digital representations thereof, is analyzed and modified to compare words and phrases represented in the audio information with words and phrases stored in electronic memory for elimination of undesirable words or phrases in audible or visible representations of the audio with options for replacing undesirable words with acceptable words. The options include varying degrees of selectivity in specifying words as undesirable and control over substitute words which are used to replace undesirable words. The options for control of the method and apparatus for language filtering are selectable from an on-screen menu through operation of a control panel on the language filter apparatus or by use of a conventional television remote transmitter. Full capability of the method and apparatus depends only on presence of closed caption or similar digitally-encoded language information being received with a television signal but special instructions transmitted with a television signal may also be responded to for activating particular language libraries or customizing a library for the program material.

...read moreread less

48 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics