Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Unpaired Image Captioning via Scene Graph Alignments

[...]

Jiuxiang Gu¹, Shafiq Joty¹, Jianfei Cai¹, Handong Zhao², Xu Yang¹, Gang Wang³ - Show less +2 more•Institutions (3)

Nanyang Technological University¹, Adobe Systems², Alibaba Group³

26 Mar 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as mentioned in this paper proposed an unsupervised feature alignment method that maps the scene graph features from the image to the sentence modality, which can generate quite promising results without using any image-caption training pairs.

...read moreread less

Abstract: Most of current image captioning models heavily rely on paired image-caption datasets. However, getting large scale image-caption paired data is labor-intensive and time-consuming. In this paper, we present a scene graph-based approach for unpaired image captioning. Our framework comprises an image scene graph generator, a sentence scene graph generator, a scene graph encoder, and a sentence decoder. Specifically, we first train the scene graph encoder and the sentence decoder on the text modality. To align the scene graphs between images and sentences, we propose an unsupervised feature alignment method that maps the scene graph features from the image to the sentence modality. Experimental results show that our proposed model can generate quite promising results without using any image-caption training pairs, outperforming existing methods by a wide margin.

...read moreread less

88 citations

Posted Content•

Encode, Review, and Decode: Reviewer Module for Caption Generation.

[...]

Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. Cohen - Show less +1 more

25 May 2016-arXiv: Learning

TL;DR: The reviewer module performs a number of review steps with attention mechanism on the encoder hidden states, and outputs a fact vector after each review step; the fact vectors are used as the input of the attention mechanism in the decoder.

...read moreread less

Abstract: We propose a novel module, the reviewer module, to improve the encoder-decoder learning framework. The reviewer module is generic, and can be plugged into an existing encoder-decoder model. The reviewer module performs a number of review steps with attention mechanism on the encoder hidden states, and outputs a fact vector after each review step; the fact vectors are used as the input of the attention mechanism in the decoder. We show that the conventional encoder-decoders are a special case of our framework. Empirically, we show that our framework can improve over state-of-the-art encoder-decoder systems on the tasks of image captioning and source code captioning.

...read moreread less

87 citations

Posted Content•

Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training

[...]

Rakshith Shetty¹, Marcus Rohrbach², Lisa Anne Hendricks³, Mario Fritz¹, Bernt Schiele¹ - Show less +1 more•Institutions (3)

Max Planck Society¹, University of California, Berkeley², Adobe Systems³

30 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work changes the training objective of the caption generator from reproducing ground-truth captions to generating a set of captions that is indistinguishable from human written captions, and employs adversarial training in combination with an approximate Gumbel sampler to implicitly match the generated distribution to the human one.

...read moreread less

Abstract: While strong progress has been made in image captioning over the last years, machine and human captions are still quite distinct. A closer look reveals that this is due to the deficiencies in the generated word distribution, vocabulary size, and strong bias in the generators towards frequent captions. Furthermore, humans -- rightfully so -- generate multiple, diverse captions, due to the inherent ambiguity in the captioning task which is not considered in today's systems. To address these challenges, we change the training objective of the caption generator from reproducing groundtruth captions to generating a set of captions that is indistinguishable from human generated captions. Instead of handcrafting such a learning target, we employ adversarial training in combination with an approximate Gumbel sampler to implicitly match the generated distribution to the human one. While our method achieves comparable performance to the state-of-the-art in terms of the correctness of the captions, we generate a set of diverse captions, that are significantly less biased and match the word statistics better in several aspects.

...read moreread less

87 citations

Posted Content•

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded.

[...]

Ramprasaath R. Selvaraju¹, Stefan Lee¹, Yilin Shen², Hongxia Jin², Shalini Ghosh², Larry Heck², Dhruv Batra¹, Devi Parikh¹ - Show less +4 more•Institutions (2)

Georgia Institute of Technology¹, Samsung²

11 Feb 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a generic approach called Human Importance-aware Network Tuning (HINT), which effectively leverages human demonstrations to improve visual grounding and encourages deep networks to be sensitive to the same input regions as humans.

...read moreread less

Abstract: Many vision and language models suffer from poor visual grounding - often falling back on easy-to-learn language priors rather than basing their decisions on visual concepts in the image. In this work, we propose a generic approach called Human Importance-aware Network Tuning (HINT) that effectively leverages human demonstrations to improve visual grounding. HINT encourages deep networks to be sensitive to the same input regions as humans. Our approach optimizes the alignment between human attention maps and gradient-based network importances - ensuring that models learn not just to look at but rather rely on visual concepts that humans found relevant for a task when making predictions. We apply HINT to Visual Question Answering and Image Captioning tasks, outperforming top approaches on splits that penalize over-reliance on language priors (VQA-CP and robust captioning) using human attention demonstrations for just 6% of the training data.

...read moreread less

87 citations

Patent•

Apparatus and methods for providing television speech in a selected language

[...]

Christopher J. Stone¹•Institutions (1)

General Instrument¹

30 Aug 2001

TL;DR: In this article, closed caption data is extracted from the television signal and then processed in a speech synthesizer to provide said words as speech in a desired language, which can be translated from a first language to a second language prior to or concurrently with conversion to speech.

...read moreread less

Abstract: Television speech is provided in a desired language using closed caption data already present in a received television signal. The closed caption data, which is representative of words, is extracted from the television signal. The closed caption data is then processed in a speech synthesizer to provide said words as speech in a desired language. The closed caption data can be translated from a first language to a second language prior to or concurrently with conversion to speech. Alternatively, the closed caption data can be carried in various languages in the television signal, and the data in the desired language can be selected for extraction from the television signal and conversion to speech.

...read moreread less

86 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics