scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Patent
02 Oct 2003
TL;DR: In this article, a system and a method for translating textual data in a media signal includes receiving a media signals containing textual data of a first language, selectively transmitting the media signal to a language translation module, translating the textual data to a second language, and transmitting the translated textual data into a display device to be displayed.
Abstract: A system and a method for translating textual data in a media signal includes receiving a media signal containing textual data of a first language, selectively transmitting the media signal to a language translation module, translating the textual data to a second language, and transmitting the translated textual data to a display device to be displayed.

16 citations

Journal ArticleDOI
17 Jul 2019
TL;DR: A novel architecture to generate the optimal descriptions for videos is proposed, which focuses on constructing a new network structure that can generate sentences superior to the basic model with LSTM, and establishing special attention mechanisms that can provide more useful visual information for caption generation.
Abstract: Automatically generating natural language description for video is an extremely complicated and challenging task. To tackle the obstacles of traditional LSTM-based model for video captioning, we propose a novel architecture to generate the optimal descriptions for videos, which focuses on constructing a new network structure that can generate sentences superior to the basic model with LSTM, and establishing special attention mechanisms that can provide more useful visual information for caption generation. This scheme discards the traditional LSTM, and exploits the fully convolutional network with coarse-to-fine and inherited attention designed according to the characteristics of fully convolutional structure. Our model cannot only outperform the basic LSTM-based model, but also achieve the comparable performance with those of state-of-the-art methods

16 citations

Proceedings ArticleDOI
Ziwei Yang1, Youjiang Xu1, Huiyun Wang1, Bo Wang1, Yahong Han1 
23 Oct 2017
TL;DR: The approach for video captioning gets great performance on the 2nd MSR Video to Language Challenge and the approach utilizes a Multirate GRU to capture temporal structure of videos.
Abstract: Automatically describing videos with natural language is a crucial challenge of video understanding. Compared to images, videos have specific spatial-temporal structure and various modality information. In this paper, we propose a Multirate Multimodal Approach for video captioning. Considering that the speed of motion in videos varies constantly, we utilize a Multirate GRU to capture temporal structure of videos. It encodes video frames with different intervals and has a strong ability to deal with motion speed variance. As videos contain different modality cues, we design a particular multimodal fusion method. By incorporating visual, motion, and topic information together, we construct a well-designed video representation. Then the video representation is fed into a RNN-based language model for generating natural language descriptions. We evaluate our approach for video captioning on "Microsoft Research - Video to Text" (MSR-VTT), a large-scale video benchmark for video understanding. And our approach gets great performance on the 2nd MSR Video to Language Challenge.

16 citations

Journal ArticleDOI
14 Oct 2020
TL;DR: Findings from a thematic analysis of 1,064 comments left by Amazon Mechanical Turk workers using this task design to create captions for images taken by people who are blind are discussed.
Abstract: AI image captioning challenges encourage broad participation in designing algorithms that automatically create captions for a variety of images and users. To create large datasets necessary for these challenges, researchers typically employ a shared crowdsourcing task design for image captioning. This paper discusses findings from our thematic analysis of 1,064 comments left by Amazon Mechanical Turk workers using this task design to create captions for images taken by people who are blind. Workers discussed difficulties in understanding how to complete this task, provided suggestions of how to improve the task, gave explanations or clarifications about their work, and described why they found this particular task rewarding or interesting. Our analysis provides insights both into this particular genre of task as well as broader considerations for how to employ crowdsourcing to generate large datasets for developing AI algorithms.

16 citations

Posted Content
Bingchen Liu1, Kunpeng Song1, Yizhe Zhu1, Gerard de Melo1, Ahmed Elgammal1 
TL;DR: Transformers are adopted to model the cross-modal connections between the image features and word embeddings, and a hinged and annealing conditional loss that dynamically balances the adversarial learning is designed.
Abstract: Focusing on text-to-image (T2I) generation, we propose Text and Image Mutual-Translation Adversarial Networks (TIME), a lightweight but effective model that jointly learns a T2I generator $G$ and an image captioning discriminator $D$ under the Generative Adversarial Network framework. While previous methods tackle the T2I problem as a uni-directional task and use pre-trained language models to enforce the image-text consistency, TIME requires neither extra modules nor pre-training. We show that the performance of $G$ can be boosted substantially by training it jointly with $D$ as a language model. Specifically, we adopt Transformers to model the cross-modal connections between the image features and word embeddings, and design a hinged and annealing conditional loss that dynamically balances the adversarial learning. In our experiments, TIME establishes the new state-of-the-art Inception Score of 4.88 on the CUB dataset, and shows competitive performance on MS-COCO on both text-to-image and image captioning tasks.

16 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334