scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Patent
09 Jun 2006
TL;DR: In this article, an apparatus and method for displaying auxiliary information associated with a multimedia program is presented, where users with differing abilities and/or preferences related to display of caption text may customize display of the caption information transmitted with television programs.
Abstract: An apparatus and method is presented for displaying auxiliary information associated with a multimedia program. Specifically, the present invention is directed to receiving program signals, acquiring (104) auxiliary information (e.g. closed captioning information) from the program signals, associating (106) time information with the auxiliary information, storing (108) the auxiliary information in a file associated with program content, using (110) the auxiliary information file and program content to determine candidate program portions for customization of the auxiliary information, and displaying (112) customized auxiliary information with the program content in accordance with user selections (see FIG. 1). Using the present invention, users with differing abilities and/or preferences related to display of caption text may customize display of the caption information transmitted with television programs.

20 citations

Proceedings ArticleDOI
01 Nov 2019
TL;DR: The results show that a state-of-the-art scene graph parser can boost performance almost as much as the ground truth graphs, showing that the bottleneck currently resides more on the captioning models than on the performance of the scene graphparser.
Abstract: Scene graphs represent semantic information in images, which can help image captioning system to produce more descriptive outputs versus using only the image as context. Recent captioning approaches rely on ad-hoc approaches to obtain graphs for images. However, those graphs introduce noise and it is unclear the effect of parser errors on captioning accuracy. In this work, we investigate to what extent scene graphs can help image captioning. Our results show that a state-of-the-art scene graph parser can boost performance almost as much as the ground truth graphs, showing that the bottleneck currently resides more on the captioning models than on the performance of the scene graph parser.

20 citations

Patent
30 Oct 1997
TL;DR: In this article, a multi-language closed captioning system is provided, where a television having a speaker for transmitting audio signals is equipped with a screen for depicting various images upon receiving television signals and depicting alphanumeric characters upon the receipt of a sub-carrier channel.
Abstract: A multi-language closed captioning system is provided including a television having a speaker for transmitting audio signals. The television further has a screen for depicting various images upon receipt of television signals and depicting alphanumeric characters upon the receipt of a sub-carrier channel. A source of closed captioning is adapted to deploy a plurality of sub-carrier channels each for transmitting a string of alphanumeric characters corresponding to audio signals transmitted by the television. Such alphanumeric characters are representative of one of a plurality foreign languages. A selector unit is connected to the television and the source of closed captioning. An array of language buttons is situated on the selector unit and adapted to allow the transmission of one of the sub-carrier channels to the television upon the depression thereof.

20 citations

Thomas Steiner1
09 Nov 2010
TL;DR: The final result is a deep-linkable RDF description of the video, and a "scroll-along" view of theVideo as an example of video visualization formats.
Abstract: SemWebVid1 is an online Ajax application that allows for the automatic generation of Resource Description Framework (RDF) video descriptions. These descriptions are based on two pillars: first, on a combination of user-generated metadata such as title, summary, and tags; and second, on closed captions which can be user-generated, or be auto-generated via speech recognition. The plaintext contents of both pillars are being analyzed using multiple Natural Language Processing (NLP) Web services in parallel whose results are then merged and where possible matched back to concepts in the sense of Linking Open Data (LOD). The final result is a deep-linkable RDF description of the video, and a "scroll-along" view of the video as an example of video visualization formats.

20 citations

Proceedings ArticleDOI
12 Oct 2020
TL;DR: A novel relational graph learning framework for GVD is designed, in which a language-refined scene graph representation is designed to explore fine-grained visual concepts and can be regarded as relational inductive knowledge to assist captioning models in selecting the relevant information it needs to generate correct words.
Abstract: Grounded video description (GVD) encourages captioning models to attend to appropriate video regions (e.g., objects) dynamically and generate a description. Such a setting can help explain the decisions of captioning models and prevents the model from hallucinating object words in its description. However, such design mainly focuses on object word generation and thus may ignore fine-grained information and suffer from missing visual concepts. Moreover, relational words (e.g., 'jump left or right') are usual spatio-temporal inference results, i.e., these words cannot be grounded on certain spatial regions. To tackle the above limitations, we design a novel relational graph learning framework for GVD, in which a language-refined scene graph representation is designed to explore fine-grained visual concepts. Furthermore, the refined graph can be regarded as relational inductive knowledge to assist captioning models in selecting the relevant information it needs to generate correct words. We validate the effectiveness of our model through automatic metrics and human evaluation, and the results indicate that our approach can generate more fine-grained and accurate description, and it solves the problem of object hallucination to some extent.

20 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334