Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Attend to You: Personalized Image Captioning with Context Sequence Memory Networks

[...]

Cesc Chunseong Park, Byeongchang Kim¹, Gunhee Kim¹•Institutions (1)

Seoul National University¹

21 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as mentioned in this paper proposed Context Sequence Memory Network (CSMN) for personalized image captioning, which exploits memory as a repository for multiple types of context information, appending previously generated words into memory to capture long-term information without suffering from the vanishing gradient problem, and adopting CNN memory structure to jointly represent nearby ordered memory slots.

...read moreread less

Abstract: We address personalization issues of image captioning, which have not been discussed yet in previous research. For a query image, we aim to generate a descriptive sentence, accounting for prior knowledge such as the user's active vocabularies in previous documents. As applications of personalized image captioning, we tackle two post automation tasks: hashtag prediction and post generation, on our newly collected Instagram dataset, consisting of 1.1M posts from 6.3K users. We propose a novel captioning model named Context Sequence Memory Network (CSMN). Its unique updates over previous memory network models include (i) exploiting memory as a repository for multiple types of context information, (ii) appending previously generated words into memory to capture long-term information without suffering from the vanishing gradient problem, and (iii) adopting CNN memory structure to jointly represent nearby ordered memory slots for better context understanding. With quantitative evaluation and user studies via Amazon Mechanical Turk, we show the effectiveness of the three novel features of CSMN and its performance enhancement for personalized image captioning over state-of-the-art captioning models.

...read moreread less

18 citations

Journal Article•DOI•

Television Captioning Strategies: A Systematic Research and Development Approach

[...]

Barbara B. Braverman

01 Dec 1981-American Annals of the Deaf

TL;DR: This paper summarizes a research effort designed to determine effective television captioning strategies for hearing-impaired youngsters, ages 9-18 years, and the major findings with regard to caption language level, rate, synchronous captioning, and reading ability of the viewers are presented.

...read moreread less

Abstract: This paper summarizes a research effort designed to determine effective television captioning strategies for hearing-impaired youngsters, ages 9-18 years. Four studies were conducted to answer the following questions: (a) Is the multilevel linguistic approach to captioning effective? (b) What is the optimal presentation rate for captions? (c) Do the caption rate and the caption density of a television program influence comprehension? (d) Does syn-capping (replacing the audio to conform to edited captions) facilitate comprehension of a captioned television program? The specific studies are summarized, and the major findings with regard to caption language level, rate, synchronous captioning, and reading ability of the viewers are presented. Implications for captioning specialists and educators are explored.

...read moreread less

18 citations

Book Chapter•DOI•

Captioning Ultrasound Images Automatically.

[...]

Mohammad Alsharid¹, Harshita Sharma¹, Lior Drukker¹, Pierre Chatelain¹, Aris T. Papageorghiou¹, J. Alison Noble¹ - Show less +2 more•Institutions (1)

University of Oxford¹

13 Oct 2019

TL;DR: An automatic natural language processing (NLP)-based image captioning method to describe fetal ultrasound video content by modelling the vocabulary commonly used by sonographers and sonologists is described.

...read moreread less

Abstract: We describe an automatic natural language processing (NLP)-based image captioning method to describe fetal ultrasound video content by modelling the vocabulary commonly used by sonographers and sonologists. The generated captions are similar to the words spoken by a sonographer when describing the scan experience in terms of visual content and performed scanning actions. Using full-length second-trimester fetal ultrasound videos and text derived from accompanying expert voice-over audio recordings, we train deep learning models consisting of convolutional neural networks and recurrent neural networks in merged configurations to generate captions for ultrasound video frames. We evaluate different model architectures using established general metrics (BLEU, ROUGE-L) and application-specific metrics. Results show that the proposed models can learn joint representations of image and text to generate relevant and descriptive captions for anatomies, such as the spine, the abdomen, the heart, and the head, in clinical fetal ultrasound scans.

...read moreread less

18 citations

Journal Article•DOI•

Align and Tell: Boosting Text-Video Retrieval With Local Alignment and Fine-Grained Supervision

[...]

Xiaohan Wang, Lin Zhu, Zhedong Zheng, Min Xu, Yi Yang - Show less +1 more

01 Jan 2022-IEEE Transactions on Multimedia

TL;DR: Align and tell as discussed by the authors introduces a set of learnable queries to interact with both textual representations and video representations, and projects them to a fixed number of local features to complement the global comparison.

...read moreread less

Abstract: Text-video retrieval is one of the basic tasks for multimodal research and has been widely harnessed in many real-world systems. Most existing approaches directly compare the global representation between videos and text descriptions and utilize the global contrastive loss to train the model. These designs overlook the local alignment and the word-level supervision signal. In this paper, we propose a new framework, called Align and Tell, for text-video retrieval. Compared to the previous work, our framework contains additional modules, i.e., two transformer decoders for local alignment and one captioning head to enhance the representation learning. First, we introduce a set of learnable queries to interact with both textual representations and video representations and project them to a fixed number of local features. After that, local contrastive learning is performed to complement the global comparison. Moreover, we design a video captioning head to provide additional supervision signals during training. This word-level supervision can enhance the visual presentation and alleviate the cross-modal gap. The captioning head can be removed during inference and does not introduce extra computational costs. Extensive empirical results demonstrate that our Align and Tell model can achieve state-of-the-art performance on four text-video retrieval datasets, including MSR-VTT, MSVD, LSMDC, and ActivityNet-Captions.

...read moreread less

18 citations

Proceedings Article•DOI•

Decoupled Novel Object Captioner

[...]

Yu Wu¹, Linchao Zhu¹, Lu Jiang², Yi Yang¹•Institutions (2)

University of Technology, Sydney¹, Google²

11 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a decoupled novel object captioner (DNOC) is proposed, which can fully decouple the language sequence model from the object descriptions by using a sequence model with the placeholder (SM-P).

...read moreread less

Abstract: Image captioning is a challenging task where the machine automatically describes an image by sentences or phrases. It often requires a large number of paired image-sentence annotations for training. However, a pre-trained captioning model can hardly be applied to a new domain in which some novel object categories exist, i.e., the objects and their description words are unseen during model training. To correctly caption the novel object, it requires professional human workers to annotate the images by sentences with the novel words. It is labor expensive and thus limits its usage in real-world applications. In this paper, we introduce the zero-shot novel object captioning task where the machine generates descriptions without extra sentences about the novel object. To tackle the challenging problem, we propose a Decoupled Novel Object Captioner (DNOC) framework that can fully decouple the language sequence model from the object descriptions. DNOC has two components. 1) A Sequence Model with the Placeholder (SM-P) generates a sentence containing placeholders. The placeholder represents an unseen novel object. Thus, the sequence model can be decoupled from the novel object descriptions. 2) A key-value object memory built upon the freely available detection model, contains the visual information and the corresponding word for each object. The SM-P will generate a query to retrieve the words from the object memory. The placeholder will then be filled with the correct word, resulting in a caption with novel object descriptions. The experimental results on the held-out MSCOCO dataset demonstrate the ability of DNOC in describing novel concepts in the zero-shot novel object captioning task.

...read moreread less

18 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics