scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Proceedings Article
01 Jan 2016
TL;DR: The proposed solution for the MSR Video to Language Challenge is ranked at the 4th place in terms of overall performance, while scoring the best CIDEr-D, which measures the human-likeness of generated captions.
Abstract: This paper describes our solution for the MSR Video to Language Challenge. We start from the popular ConvNet + LSTM model, which we extend with two novel modules. One is early embedding, which enriches the current low-level input to LSTM by tag embeddings. The other is late reranking, for re-scoring generated sentences in terms of their relevance to a specific video. The modules are inspired by recent works on image captioning, repurposed and redesigned for video. As experiments on the MSR-VTT validation set show, the joint use of these two modules add a clear improvement over a non-trivial ConvNet + LSTM baseline under four performance metrics. The viability of the proposed solution is further confirmed by the blind test by the organizers. Our system is ranked at the 4th place in terms of overall performance, while scoring the best CIDEr-D, which measures the human-likeness of generated captions.

56 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: An end-to-end model which generates captions for images embedded in news articles that outperforms the previous state of the art by a factor of four in CIDEr score and introduces the NYTimes800k dataset, which is 70% larger than GoodNews, has higher article quality, and includes the locations of images within articles as an additional contextual cue.
Abstract: We propose an end-to-end model which generates captions for images embedded in news articles. News images present two key challenges: they rely on real-world knowledge, especially about named entities; and they typically have linguistically rich captions that include uncommon words. We address the first challenge by associating words in the caption with faces and objects in the image, via a multi-modal, multi-head attention mechanism. We tackle the second challenge with a state-of-the-art transformer language model that uses byte-pair-encoding to generate captions as a sequence of word parts. On the GoodNews dataset, our model outperforms the previous state of the art by a factor of four in CIDEr score (13 to 54). This performance gain comes from a unique combination of language models, word representation, image embeddings, face embeddings, object embeddings, and improvements in neural network design. We also introduce the NYTimes800k dataset which is 70% larger than GoodNews, has higher article quality, and includes the locations of images within articles as an additional contextual cue.

56 citations

Patent
28 Apr 1993
TL;DR: In this article, a system for processing auxiliary video signals provides for decoding an extended data services signal in line 21 of field 2. Extended data services provide general purpose video system information and control capability in addition to basic closed caption operation.
Abstract: A system for processing auxiliary video signals provides for decoding an extended data services signal in line 21 of field 2. Extended data services provide a general purpose video system information and control capability in addition to basic closed caption operation. Extended data services information is arranged in packets of data. Each packet provides information regarding current of future video programs, the source of the video program, and miscellaneous information such as time of day. The extended data services data may be decoded to control the operation of a video system including a videocassette recorder (VCR) and a television receiver.

56 citations

Patent
12 Sep 2007
TL;DR: In this paper, an architecture for translating closed captioning text originally provided with a video program from one language to another and presenting the translated closed captioned text with the video program to a viewer is presented.
Abstract: The present invention provides an architecture for translating closed captioning text originally provided with a video program from one language to another and presenting the translated closed captioning text with the video program to a viewer. As such, the viewers are able to receive the closed captioning text in languages other than that used for the closed captioning originally provided with the video program. The original closed captioning text may be translated from one language to another by a centralized closed captioning processor, such that the customer equipment for various subscribers can take advantage of centralized translation services. Once the original closed captioning text is translated, the translated closed captioning text may be delivered to the customer equipment in different ways.

56 citations

Proceedings ArticleDOI
Tennenhouse1, Adam1, Carver1, Houh1, Ismert1, Lindblad1, Stasior1, Wetherall1, Bacher1, Chang1 
15 May 1994
TL;DR: This paper describes a set of computer-participative applications that demonstrate the present day viability of applications that participate in, i.e., actively process, live media-based information.
Abstract: The ViewStation architecture embodies a software-oriented approach to the support of interactive media-based applications. Starting from the premise that the raw media data, e.g., the video pixels themselves, must eventually be made accessible to the application, we have derived a set of architectural guidelines for the design of media processing environments. The resultant ViewStation architecture, as described in this paper, consists of the VuSystem, a complete media programming environment, and the VuNet, a substrate for the acquisition communication and rendering of video and closed caption text. We describe a set of computer-participative applications that demonstrate the present day viability of applications that participate in, i.e., actively process, live media-based information. Early performance results illustrate the affordability and benefits of our software-oriented approach. >

55 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334