scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this article, a saliency-boosted image captioning model was proposed to bridge the gap between humans and machines in image understanding and describing, and they investigated the agreement between bottom-up saliencybased visual attention and object referrals in scene description constructs.
Abstract: To bridge the gap between humans and machines in image understanding and describing, we need further insight into how people describe a perceived scene. In this paper, we study the agreement between bottom-up saliency-based visual attention and object referrals in scene description constructs. We investigate the properties of human-written descriptions and machine-generated ones. We then propose a saliency-boosted image captioning model in order to investigate benefits from low-level cues in language models. We learn that (1) humans mention more salient objects earlier than less salient ones in their descriptions, (2) the better a captioning model performs, the better attention agreement it has with human descriptions, (3) the proposed saliencyboosted model, compared to its baseline form, does not improve significantly on the MS COCO database, indicating explicit bottom-up boosting does not help when the task is well learnt and tuned on a data, (4) a better generalization is, however, observed for the saliency-boosted model on unseen data.

75 citations

Journal ArticleDOI
Ning Xu1, An-An Liu1, Jing Liu1, Weizhi Nie1, Yuting Su1 
TL;DR: This work proposes a novel framework to embed a scene graph into the structural representation, which captures the semantic concepts and the graph topology and develops the scene-graph-driven method to generate the attention graph.

75 citations

Journal ArticleDOI
TL;DR: It was found that the addition of captions to a video resulted in major changes in eye movement patterns, with the viewing process becoming primarily a reading process.
Abstract: Eye movement of six subjects was recorded as they watched video segments with and without captions. It was found that the addition of captions to a video resulted in major changes in eye movement patterns, with the viewing process becoming primarily a reading process. Further, although people viewing a specific video segment are likely to have similar eye movement patterns, there are also distinct individual differences present in these patterns. For example, someone accustomed to speechreading may spend more time looking at an actor's lips, while someone with poor English skills may spend more time reading the captions. Finally, there is some preliminary evidence to suggest that higher captioning speed results in more time spent reading captions on a video segment.

74 citations

Proceedings ArticleDOI
Yehao Li1, Ting Yao, Yingwei Pan, Hongyang Chao1, Tao Mei 
15 Jun 2019
TL;DR: This paper presents Long Short-Term Memory with Pointing (LSTM-P) --- a new architecture that facilitates vocabulary expansion and produces novel objects via pointing mechanism by augmenting standard deep captioning architectures with object learners.
Abstract: Image captioning has received significant attention with remarkable improvements in recent advances. Nevertheless, images in the wild encapsulate rich knowledge and cannot be sufficiently described with models built on image-caption pairs containing only in-domain objects. In this paper, we propose to address the problem by augmenting standard deep captioning architectures with object learners. Specifically, we present Long Short-Term Memory with Pointing (LSTM-P) --- a new architecture that facilitates vocabulary expansion and produces novel objects via pointing mechanism. Technically, object learners are initially pre-trained on available object recognition data. Pointing in LSTM-P then balances the probability between generating a word through LSTM and copying a word from the recognized objects at each time step in decoder stage. Furthermore, our captioning encourages global coverage of objects in the sentence. Extensive experiments are conducted on both held-out COCO image captioning and ImageNet datasets for describing novel objects, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, we obtain an average of 60.9% in F1 score on held-out COCO dataset.

74 citations

Patent
28 Feb 2006
TL;DR: In this article, a method for blocking scenes with objectionable content comprises receiving incoming content, namely a scene of a program, and using closed captioning information, a determination is made if the scene of the program includes objectionable content, and if so, blocking the scene from being displayed.
Abstract: According to one embodiment, a method for blocking scenes with objectionable content comprises receiving incoming content, namely a scene of a program. Thereafter, using closed captioning information, a determination is made if the scene of the program includes objectionable content, and if so, blocking the scene from being displayed.

74 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334