Topic
Closed captioning
About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.
Papers published on a yearly basis
Papers
More filters
••
01 Jun 2019TL;DR: This paper proposed a new metric for measuring the diversity of image captions, which is derived from latent semantic analysis and kernelized to use CIDEr similarity, and also showed that balancing the cross-entropy loss and cIDEr reward in reinforcement learning during training can effectively control the tradeoff between diversity and accuracy of the generated captions.
Abstract: Recently, the state-of-the-art models for image captioning have overtaken human performance based on the most popular metrics, such as BLEU, METEOR, ROUGE and CIDEr. Does this mean we have solved the task of image captioning The above metrics only measure the similarity of the generated caption to the human annotations, which reflects its accuracy. However, an image contains many concepts and multiple levels of detail, and thus there is a variety of captions that express different concepts and details that might be interesting for different humans. Therefore only evaluating accuracy is not sufficient for measuring the performance of captioning models --- the diversity of the generated captions should also be considered. In this paper, we proposed a new metric for measuring the diversity of image captions, which is derived from latent semantic analysis and kernelized to use CIDEr similarity. We conduct extensive experiments to re-evaluate recent captioning models in the context of both diversity and accuracy. We find that there is still a large gap between the model and human performance in terms of both accuracy and diversity, and the models that have optimized accuracy (CIDEr) have low diversity. We also show that balancing the cross-entropy loss and CIDEr reward in reinforcement learning during training can effectively control the tradeoff between diversity and accuracy of the generated captions.
61 citations
••
TL;DR: The authors proposed a unified and extensible framework to jointly leverage multiple kinds of visual features and semantic attributes, and achieved state-of-the-art performance on the MSVD and VTT datasets.
Abstract: Video captioning has attracted an increasing amount of interest, due in part to its potential for improved accessibility and information retrieval. While existing methods rely on different kinds of visual features and model architectures, they do not make full use of pertinent semantic cues. We present a unified and extensible framework to jointly leverage multiple sorts of visual features and semantic attributes. Our novel architecture builds on LSTMs with two multi-faceted attention layers. These first learn to automatically select the most salient visual features or semantic attributes, and then yield overall representations for the input and output of the sentence generation component via custom feature scaling operations. Experimental results on the challenging MSVD and MSR-VTT datasets show that our framework outperforms previous work and performs robustly even in the presence of added noise to the features and attributes.
61 citations
•
01 Dec 2006TL;DR: In this paper, a multimedia server distributes closed captioning over a network to a client device running a media player that does not support standardized closed-captioning, such as CEA-608-B or CEA 708-B, Advanced Television Systems Committee ATSC A/53 or the Society of Cable Telecommunications Engineers SCTE 20 and/orSCTE 21.
Abstract: A multimedia server distributes closed captioning over a network to a client device running a media player that does not support standardized closed captioning. The multimedia server receives a media stream including closed captioning that is encoded according to a closed captioning standard such as Consumer Electronics Association CEA-608-B or CEA 708-B, Advanced Television Systems Committee ATSC A/53 or the Society of Cable Telecommunications Engineers SCTE 20 and/or SCTE 21. The multimedia server transcodes the closed captioning into a format that is usable by the media player and transmits the transcoded closed captioning to the client device over the network so that the media player can render the closed captioning synchronously with programming content included in the media stream.
61 citations
•
21 May 2002
TL;DR: In this article, an exemplary television signal system such as described herein involves using closed caption (CC) data from a standard definition signal, processing CC data, and overlaying the CC data at a video rate of a higher definition signal selected for viewing that does not carry its own embedded closed caption data.
Abstract: A system as described herein enables a user to access auxiliary information when viewing an enhanced performance television signal or program. Particularly, a television signal system is operative, configured, and/or enabled to allow a user to access and/or utilize auxiliary information when viewing a high definition or progressive-scan television signal. Briefly, an exemplary television signal system receives the auxiliary information/data (e.g. closed caption data) on a selected interlaced standard definition input, processes the auxiliary data, and combines or overlays the auxiliary data with a television (video) signal received on a selected input that does not have its own embedded auxiliary information/data. More particularly, an exemplary television signal system such as described herein involves using closed caption (CC) data from a standard definition signal, processing the CC data, and overlaying the CC data at a video rate of a higher definition signal selected for viewing that does not carry its own embedded closed caption data.
61 citations
••
22 Aug 2001TL;DR: A novel statistical approach is presented, called the weighted voting method, for automatic news video story categorization based on the closed captioned text.
Abstract: In this paper, we present a novel statistical approach, called the weighted voting method, for automatic news video story categorization based on the closed captioned text. News video is initially segmented into stories using the demarcations in the closed captioned text, then a set of
61 citations