Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•

Caption and/or metadata synchronization for replay of previously or simultaneously recorded live programs

[...]

Richard T. Polumbus, Michael W. Homyack

22 Sep 2010

TL;DR: In this article, a synchronization process between captioning data and corresponding metatags and the associated media file parses the media file, correlates the caption information and/or metatag with segments of the media, and provides a capability for textual search and selection of particular segments.

...read moreread less

Abstract: A synchronization process between captioning data and/or corresponding metatags and the associated media file parses the media file, correlates the caption information and/or metatags with segments of the media file, and provides a capability for textual search and selection of particular segments. A time-synchronized version of the captions is created that is synchronized to the moment that the speech is uttered in the recorded media. The caption data is leveraged to enable search engines to index not merely the title of a video, but the entirety of what was said during the video as well as any associated metatags relating to contents of the video. Further, because the entire media file is indexed, a search can request a particular scene or occurrence within the event recorded by the media file, and the exact moment within the media relevant to the search can be accessed and played for the requester.

...read moreread less

55 citations

Posted Content•

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

[...]

Marcella Cornia¹, Lorenzo Baraldi¹, Rita Cucchiara•Institutions (1)

University of Modena and Reggio Emilia¹

26 Nov 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors introduce a novel framework for image captioning which can generate diverse descriptions by allowing both grounding and controllability, given a control signal in the form of a sequence or set of image regions, the corresponding caption through a recurrent architecture which predicts textual chunks explicitly grounded on regions, following the constraints of the given control.

...read moreread less

Abstract: Current captioning approaches can describe images using black-box architectures whose behavior is hardly controllable and explainable from the exterior. As an image can be described in infinite ways depending on the goal and the context at hand, a higher degree of controllability is needed to apply captioning algorithms in complex scenarios. In this paper, we introduce a novel framework for image captioning which can generate diverse descriptions by allowing both grounding and controllability. Given a control signal in the form of a sequence or set of image regions, we generate the corresponding caption through a recurrent architecture which predicts textual chunks explicitly grounded on regions, following the constraints of the given control. Experiments are conducted on Flickr30k Entities and on COCO Entities, an extended version of COCO in which we add grounding annotations collected in a semi-automatic manner. Results demonstrate that our method achieves state of the art performances on controllable image captioning, in terms of caption quality and diversity. Code and annotations are publicly available at: this https URL.

...read moreread less

55 citations

Journal Article•DOI•

Topic-Oriented Image Captioning Based on Order-Embedding

[...]

Niange Yu¹, Xiaolin Hu¹, Binheng Song¹, Jian Yang², Jianwei Zhang³ - Show less +1 more•Institutions (3)

Tsinghua University¹, Nanjing University of Science and Technology², University of Hamburg³

01 Jun 2019-IEEE Transactions on Image Processing

TL;DR: Experiments on the image captioning task on the MS-COCO and Flickr30K datasets validate the usefulness of this framework by showing that the different given topics can lead to different captions describing specific aspects of the given image and that the quality of generated captions is higher than the control model without a topic as input.

...read moreread less

Abstract: We present an image captioning framework that generates captions under a given topic. The topic candidates are extracted from the caption corpus. A given image’s topics are then selected from these candidates by a CNN-based multi-label classifier. The input to the caption generation model is an image-topic pair, and the output is a caption of the image. For this purpose, a cross-modal embedding method is learned for the images, topics, and captions. In the proposed framework, the topic, caption, and image are organized in a hierarchical structure, which is preserved in the embedding space by using the order-embedding method. The caption embedding is upper bounded by the corresponding image embedding and lower bounded by the topic embedding. The lower bound pushes the images and captions about the same topic closer together in the embedding space. A bidirectional caption-image retrieval task is conducted on the learned embedding space and achieves the state-of-the-art performance on the MS-COCO and Flickr30K datasets, demonstrating the effectiveness of the embedding method. To generate a caption for an image, an embedding vector is sampled from the region bounded by the embeddings of the image and the topic, then a language model decodes it to a sentence as the output. The lower bound set by the topic shrinks the output space of the language model, which may help the model to learn to match images and captions better. Experiments on the image captioning task on the MS-COCO and Flickr30K datasets validate the usefulness of this framework by showing that the different given topics can lead to different captions describing specific aspects of the given image and that the quality of generated captions is higher than the control model without a topic as input. In addition, the proposed method is competitive with many state-of-the-art methods in terms of standard evaluation metrics.

...read moreread less

55 citations

Book Chapter•DOI•

Image Captioning through Image Transformer

[...]

Sen He¹, Wentong Liao², Hamed R. Tavakoli³, Michael Ying Yang⁴, Bodo Rosenhahn², Nicolas Pugeault⁵ - Show less +2 more•Institutions (5)

University of Surrey¹, Leibniz University of Hanover², Nokia³, University of Twente⁴, University of Glasgow⁵

30 Nov 2020

TL;DR: This work introduces a modified encoding transformer and an implicit decoding transformer, motivated by the relative spatial relationship between image regions, which achieves new state-of-the-art performance on both MSCOCO offline and online testing benchmarks.

...read moreread less

Abstract: Automatic captioning of images is a task that combines the challenges of image analysis and text generation. One important aspect of captioning is the notion of attention: how to decide what to describe and in which order. Inspired by the successes in text analysis and translation, previous works have proposed the transformer architecture for image captioning. However, the structure between the semantic units in images (usually the detected regions from object detection model) and sentences (each single word) is different. Limited work has been done to adapt the transformer’s internal architecture to images. In this work, we introduce the image transformer, which consists of a modified encoding transformer and an implicit decoding transformer, motivated by the relative spatial relationship between image regions. Our design widens the original transformer layer’s inner architecture to adapt to the structure of images. With only regions feature as inputs, our model achieves new state-of-the-art performance on both MSCOCO offline and online testing benchmarks. The code is available at https://github.com/wtliao/ImageTransformer.

...read moreread less

55 citations

Journal Article•DOI•

Accessibility Evaluation of Classroom Captions

[...]

Raja S. Kushalnagar¹, Walter S. Lasecki², Jeffrey P. Bigham³•Institutions (3)

Rochester Institute of Technology¹, University of Rochester², Carnegie Mellon University³

01 Jan 2014-ACM Transactions on Accessible Computing

TL;DR: It is shown that both hearing and DHH participants preferred and followed collaborative captions better than those generated by automatic speech recognition (ASR) or professionals due to the more consistent flow of the resulting captions.

...read moreread less

Abstract: Real-time captioning enables deaf and hard of hearing (DHH) people to follow classroom lectures and other aural speech by converting it into visual text with less than a five second delay Keeping the delay short allows end-users to follow and participate in conversations This article focuses on the fundamental problem that makes real-time captioning difficult: sequential keyboard typing is much slower than speaking We first surveyed the audio characteristics of 240 one-hour-long captioned lectures on YouTube, such as speed and duration of speaking bursts We then analyzed how these characteristics impact caption generation and readability, considering specifically our human-powered collaborative captioning approach We note that most of these characteristics are also present in more general domains For our caption comparison evaluation, we transcribed a classroom lecture in real-time using all three captioning approaches We recruited 48 participants (24 DHH) to watch these classroom transcripts in an eye-tracking laboratory We presented these captions in a randomized, balanced order We show that both hearing and DHH participants preferred and followed collaborative captions better than those generated by automatic speech recognition (ASR) or professionals due to the more consistent flow of the resulting captions These results show the potential to reliably capture speech even during sudden bursts of speed, as well as for generating “enhanced” captions, unlike other human-powered captioning approaches

...read moreread less

55 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics