scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Proceedings ArticleDOI
17 Oct 2021
TL;DR: Wang et al. as mentioned in this paper proposed a joint framework for video captioning and sentence validation, which can explicitly explore object-level interactions and frame-level information from complex spatio-temporal data to generate semantic-rich captions.
Abstract: Video captioning aims to automatically generate natural language sentences that can describe the visual contents of a given video. Existing generative models like encoder-decoder frameworks cannot explicitly explore the object-level interactions and frame-level information from complex spatio-temporal data to generate semantic-rich captions. Our main contribution is to identify three key problems in a joint framework for future video summarization tasks. 1) Enhanced Object Proposal: we propose a novel Conditional Graph that can fuse spatio-temporal information into latent object proposal. 2) Visual Knowledge: Latent Proposal Aggregation is proposed to dynamically extract visual words with higher semantic levels. 3) Sentence Validation: A novel Discriminative Language Validator is proposed to verify generated captions so that key semantic concepts can be effectively preserved. Our experiments on two public datasets (MVSD and MSR-VTT) manifest significant improvements over state-of-the-art approaches on all metrics, especially for BLEU-4 and CIDEr. Our code is available at https://github.com/baiyang4/D-LSG-Video-Caption.

19 citations

Proceedings Article
01 Jan 2005
TL;DR: There was a significant difference between deaf and hard of hearing viewers in their reaction to the emotive captions, and deaf viewers had a strong dislike for them although they did see some potential for intermittent use of emotiveCaptions or for use with children’s programs.
Abstract: Closed captioning has been enabling access to television for people who are deaf and hard of hearing since the early 1970s. Since that time, technology and people’s demands have been steadily improving and increasing. Closed captioning has not kept up with these changes. We present the results of a study that used graphics, colour, icons and animation as well as text, emotive captions, to capture more of the sound information contained in television content. deaf and hard of hearing participants compared emotive and conventional captions for two short video segments. The results showed that there was a significant difference between deaf and hard of hearing viewers in their reaction to the emotive captions. Hard of hearing viewers seemed to enjoy them and find them interesting. deaf viewers had a strong dislike for them although they did see some potential for intermittent use of emotive captions or for use with children’s programs.

19 citations

Proceedings ArticleDOI
11 Dec 2006
TL;DR: A method to automatically index each video segment of the television program by the principal video object using closed-caption text information using Quinlan's C4.5 decision-tree learning algorithm and the predicted accuracies of production rule indicators.
Abstract: This paper proposes a method for automatically generating a multimedia encyclopedia from video clips using closed-caption text information. The goal is to automatically index each video segment of the television program by the principal video object. We focus on several features of the closed-caption text style in order to identify the principal video objects. Using Quinlan?s C4.5 decision-tree learning algorithm and the predicted accuracies of production rule indicators, one object noun is extracted for each video shot. To show the effectiveness of the method, we conducted experiments on the extraction of video segments in which animals appear in twenty television programs on animals and nature. We obtained a precision rate of 74.6 percent and a recall rate of 51.4 percent on the extraction of video segments in which animals appear, and generated a multimedia encyclopedia comprising 322 video clips showing 82 kinds of animals.

19 citations

Proceedings ArticleDOI
02 Sep 2018
TL;DR: A user-centric evaluation of a real-time closed captioning system enhanced by a lightweight RNN-based punctuation module confirms that automatic punctuation itself significantly increases understandability, even if several other factors interplay in subjective impression.
Abstract: Punctuation of ASR-produced transcripts has received increasing attention in the recent years; RNN-based sequence modelling solutions which exploit textual and/or acoustic features show encouraging performance. Switching the focus from the technical side, qualifying and quantifying the benefits of such punctuation from end-user perspective have not been performed yet exhaustively. The ambition of the current paper is to explore to what extent automatic punctuation can improve human readability and understandability. The paper presents a user-centric evaluation of a real-time closed captioning system enhanced by a lightweight RNN-based punctuation module. Subjective tests involve both normal hearing and deaf or hard-of-hearing (DHH) subjects. Results confirm that automatic punctuation itself significantly increases understandability, even if several other factors interplay in subjective impression. The perceived improvement is even more pronounced in the DHH group. A statistical analysis is carried out to identify objectively measurable factors which are well reflected by subjective scores.

19 citations

Proceedings ArticleDOI
01 Nov 2019
TL;DR: This work achieves unpaired image captioning by bridging the vision and the language domains with high-level semantic information, and proposes the Semantic Relationship Explorer, which explores the relationships between semantic concepts for better understanding of the image.
Abstract: Recently, image captioning has aroused great interest in both academic and industrial worlds. Most existing systems are built upon large-scale datasets consisting of image-sentence pairs, which, however, are time-consuming to construct. In addition, even for the most advanced image captioning systems, it is still difficult to realize deep image understanding. In this work, we achieve unpaired image captioning by bridging the vision and the language domains with high-level semantic information. The motivation stems from the fact that the semantic concepts with the same modality can be extracted from both images and descriptions. To further improve the quality of captions generated by the model, we propose the Semantic Relationship Explorer, which explores the relationships between semantic concepts for better understanding of the image. Extensive experiments on MSCOCO dataset show that we can generate desirable captions without paired datasets. Furthermore, the proposed approach boosts five strong baselines under the paired setting, where the most significant improvement in CIDEr score reaches 8%, demonstrating that it is effective and generalizes well to a wide range of models.

19 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334