Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Context and Attribute Grounded Dense Captioning

[...]

Guojun Yin¹, Lu Sheng², Bin Liu¹, Nenghai Yu¹, Xiaogang Wang³, Jing Shao - Show less +2 more•Institutions (3)

University of Science and Technology of China¹, Beihang University², The Chinese University of Hong Kong³

15 Jun 2019

TL;DR: Zhang et al. as discussed by the authors designed an end-to-end context and attribute grounded dense captioning framework consisting of a contextual visual mining module and a multi-level attribute grounded description generation module.

...read moreread less

Abstract: Dense captioning aims at simultaneously localizing semantic regions and describing these regions-of-interest (ROIs) with short phrases or sentences in natural language. Previous studies have shown remarkable progresses, but they are often vulnerable to the aperture problem that a caption generated by the features inside one ROI lacks contextual coherence with its surrounding context in the input image. In this work, we investigate contextual reasoning based on multi-scale message propagations from the neighboring contents to the target ROIs. To this end, we design a novel end-to-end context and attribute grounded dense captioning framework consisting of 1) a contextual visual mining module and 2) a multi-level attribute grounded description generation module. Knowing that captions often co-occur with the linguistic attributes (such as who, what and where), we also incorporate an auxiliary supervision from hierarchical linguistic attributes to augment the distinctiveness of the learned captions. Extensive experiments and ablation studies on Visual Genome dataset demonstrate the superiority of the proposed model in comparison to state-of-the-art methods.

...read moreread less

29 citations

Patent•

Automatic title or description captioning for a VCR recording

[...]

James Edwin Hailey

07 Nov 1996

TL;DR: In this article, a television system in which at least program title information for programs which are to be transmitted in the future is transmitted in advance to form a channel guide listing is described.

...read moreread less

Abstract: In a television system in which at least program title information for programs which are to be transmitted in the future is transmitted in advance to form a channel guide listing, apparatus is provided for acquiring one of the title information and the current date, and generating display signal comprising data representing a text screen containing one of the title information and the current date for recording a user-viewable screen display on a video tape ahead of the television program signal. The title or date information acting as a leader to the following television program. In a second embodiment of the invention, in those instances where descriptive text accompanies the program listing, apparatus of the invention records the descriptive text relating to the title, the star, the director, or the context of the program.

...read moreread less

29 citations

Book Chapter•DOI•

Automatic online subtitling of the czech parliament meetings

[...]

Aleš Pražák¹, Josef Psutka¹, Jan Hoidekr¹, Jakub Kanis¹, Luděk Müller¹, Josef Psutka¹ - Show less +2 more•Institutions (1)

University of West Bohemia¹

11 Sep 2006

TL;DR: In this article, the authors described a LVCSR system for automatic online subtitling (closed captioning) of TV transmissions of the Czech Parliament meetings based on Hidden Markov Models, lexical trees and bigram language model.

...read moreread less

Abstract: This paper describes a LVCSR system for automatic online subtitling (closed captioning) of TV transmissions of the Czech Parliament meetings The recognition system is based on Hidden Markov Models, lexical trees and bigram language model The acoustic model is trained on 40 hours of parliament speech and the language model on more than 10M tokens of parliament speech trancriptions The first part of the article is focused on text normalization and class-based language model preparation The second part describes the recognition network and its decoding with respect to real-time operation demands using up to 100k vocabulary The third part outlines the application framework allowing generation and displaying of subtitles for any audio/video source Finally, experimental results obtained on parliament speeches with recognition accuracy varying from 80 to 95 % (according to the discussed topic) are reported and discussed.

...read moreread less

29 citations

Posted Content•

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

[...]

Linjie Li¹, Yen-Chun Chen¹, Yu Cheng¹, Zhe Gan¹, Licheng Yu¹, Jingjing Liu¹ - Show less +2 more•Institutions (1)

Microsoft¹

01 May 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: HERO as discussed by the authors is a novel framework for large-scale video+language omni-representation learning, where local context of a video frame is captured by a cross-modal Transformer via multimodal fusion, and global video context is capture by a Temporal Transformer.

...read moreread less

Abstract: We present HERO, a novel framework for large-scale video+language omni-representation learning. HERO encodes multimodal inputs in a hierarchical structure, where local context of a video frame is captured by a Cross-modal Transformer via multimodal fusion, and global video context is captured by a Temporal Transformer. In addition to standard Masked Language Modeling (MLM) and Masked Frame Modeling (MFM) objectives, we design two new pre-training tasks: (i) Video-Subtitle Matching (VSM), where the model predicts both global and local temporal alignment; and (ii) Frame Order Modeling (FOM), where the model predicts the right order of shuffled video frames. HERO is jointly trained on HowTo100M and large-scale TV datasets to gain deep understanding of complex social dynamics with multi-character interactions. Comprehensive experiments demonstrate that HERO achieves new state of the art on multiple benchmarks over Text-based Video/Video-moment Retrieval, Video Question Answering (QA), Video-and-language Inference and Video Captioning tasks across different domains. We also introduce two new challenging benchmarks How2QA and How2R for Video QA and Retrieval, collected from diverse video content over multimodalities.

...read moreread less

29 citations

Patent•

Preserving captioning through video transcoding

[...]

Gary Shaffer

07 Aug 2008

TL;DR: In this article, the authors propose to extract caption data from the input video stream, translate the caption data into at least one output caption format, and package the translated caption data to data packets for insertion into a video stream.

...read moreread less

Abstract: Methods of preserving captioning information in an input video stream through transcoding of the input video stream include extracting caption data from the input video stream, translating the caption data into at least one output caption format, packaging the translated caption data into data packets for insertion into a video stream, synchronizing the packaged caption data with a transcoded version of the input video stream, receiving a preliminary output video stream that is a transcoded version of the input video stream, and combining the packaged caption data with the preliminary output video stream to form an output video stream. Related systems and computer program products are also disclosed.

...read moreread less

29 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics