scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Proceedings ArticleDOI
01 Jun 2019
TL;DR: This paper proposed a new metric for measuring the diversity of image captions, which is derived from latent semantic analysis and kernelized to use CIDEr similarity, and also showed that balancing the cross-entropy loss and cIDEr reward in reinforcement learning during training can effectively control the tradeoff between diversity and accuracy of the generated captions.
Abstract: Recently, the state-of-the-art models for image captioning have overtaken human performance based on the most popular metrics, such as BLEU, METEOR, ROUGE and CIDEr. Does this mean we have solved the task of image captioning The above metrics only measure the similarity of the generated caption to the human annotations, which reflects its accuracy. However, an image contains many concepts and multiple levels of detail, and thus there is a variety of captions that express different concepts and details that might be interesting for different humans. Therefore only evaluating accuracy is not sufficient for measuring the performance of captioning models --- the diversity of the generated captions should also be considered. In this paper, we proposed a new metric for measuring the diversity of image captions, which is derived from latent semantic analysis and kernelized to use CIDEr similarity. We conduct extensive experiments to re-evaluate recent captioning models in the context of both diversity and accuracy. We find that there is still a large gap between the model and human performance in terms of both accuracy and diversity, and the models that have optimized accuracy (CIDEr) have low diversity. We also show that balancing the cross-entropy loss and CIDEr reward in reinforcement learning during training can effectively control the tradeoff between diversity and accuracy of the generated captions.

61 citations

Journal ArticleDOI
TL;DR: The authors proposed a unified and extensible framework to jointly leverage multiple kinds of visual features and semantic attributes, and achieved state-of-the-art performance on the MSVD and VTT datasets.
Abstract: Video captioning has attracted an increasing amount of interest, due in part to its potential for improved accessibility and information retrieval. While existing methods rely on different kinds of visual features and model architectures, they do not make full use of pertinent semantic cues. We present a unified and extensible framework to jointly leverage multiple sorts of visual features and semantic attributes. Our novel architecture builds on LSTMs with two multi-faceted attention layers. These first learn to automatically select the most salient visual features or semantic attributes, and then yield overall representations for the input and output of the sentence generation component via custom feature scaling operations. Experimental results on the challenging MSVD and MSR-VTT datasets show that our framework outperforms previous work and performs robustly even in the presence of added noise to the features and attributes.

61 citations

Patent
01 Dec 2006
TL;DR: In this paper, a multimedia server distributes closed captioning over a network to a client device running a media player that does not support standardized closed-captioning, such as CEA-608-B or CEA 708-B, Advanced Television Systems Committee ATSC A/53 or the Society of Cable Telecommunications Engineers SCTE 20 and/orSCTE 21.
Abstract: A multimedia server distributes closed captioning over a network to a client device running a media player that does not support standardized closed captioning. The multimedia server receives a media stream including closed captioning that is encoded according to a closed captioning standard such as Consumer Electronics Association CEA-608-B or CEA 708-B, Advanced Television Systems Committee ATSC A/53 or the Society of Cable Telecommunications Engineers SCTE 20 and/or SCTE 21. The multimedia server transcodes the closed captioning into a format that is usable by the media player and transmits the transcoded closed captioning to the client device over the network so that the media player can render the closed captioning synchronously with programming content included in the media stream.

61 citations

Patent
21 May 2002
TL;DR: In this article, an exemplary television signal system such as described herein involves using closed caption (CC) data from a standard definition signal, processing CC data, and overlaying the CC data at a video rate of a higher definition signal selected for viewing that does not carry its own embedded closed caption data.
Abstract: A system as described herein enables a user to access auxiliary information when viewing an enhanced performance television signal or program. Particularly, a television signal system is operative, configured, and/or enabled to allow a user to access and/or utilize auxiliary information when viewing a high definition or progressive-scan television signal. Briefly, an exemplary television signal system receives the auxiliary information/data (e.g. closed caption data) on a selected interlaced standard definition input, processes the auxiliary data, and combines or overlays the auxiliary data with a television (video) signal received on a selected input that does not have its own embedded auxiliary information/data. More particularly, an exemplary television signal system such as described herein involves using closed caption (CC) data from a standard definition signal, processing the CC data, and overlaying the CC data at a video rate of a higher definition signal selected for viewing that does not carry its own embedded closed caption data.

61 citations

Proceedings ArticleDOI
22 Aug 2001
TL;DR: A novel statistical approach is presented, called the weighted voting method, for automatic news video story categorization based on the closed captioned text.
Abstract: In this paper, we present a novel statistical approach, called the weighted voting method, for automatic news video story categorization based on the closed captioned text. News video is initially segmented into stories using the demarcations in the closed captioned text, then a set of

61 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334