Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•

Method and system for dynamically translating closed captions

[...]

Albert F. Elcock, William J. Garrison

02 Oct 2003

TL;DR: In this article, a system and a method for translating textual data in a media signal includes receiving a media signals containing textual data of a first language, selectively transmitting the media signal to a language translation module, translating the textual data to a second language, and transmitting the translated textual data into a display device to be displayed.

...read moreread less

Abstract: A system and a method for translating textual data in a media signal includes receiving a media signal containing textual data of a first language, selectively transmitting the media signal to a language translation module, translating the textual data to a second language, and transmitting the translated textual data to a display device to be displayed.

...read moreread less

16 citations

Journal Article•DOI•

Fully Convolutional Video Captioning with Coarse-to-Fine and Inherited Attention

[...]

Fang Kuncheng¹, Lian Zhou¹, Cheng Jin¹, Yuejie Zhang¹, Kangnian Weng², Tao Zhang², Weiguo Fan³ - Show less +3 more•Institutions (3)

Fudan University¹, Shanghai University of Finance and Economics², University of Iowa³

17 Jul 2019

TL;DR: A novel architecture to generate the optimal descriptions for videos is proposed, which focuses on constructing a new network structure that can generate sentences superior to the basic model with LSTM, and establishing special attention mechanisms that can provide more useful visual information for caption generation.

...read moreread less

Abstract: Automatically generating natural language description for video is an extremely complicated and challenging task. To tackle the obstacles of traditional LSTM-based model for video captioning, we propose a novel architecture to generate the optimal descriptions for videos, which focuses on constructing a new network structure that can generate sentences superior to the basic model with LSTM, and establishing special attention mechanisms that can provide more useful visual information for caption generation. This scheme discards the traditional LSTM, and exploits the fully convolutional network with coarse-to-fine and inherited attention designed according to the characteristics of fully convolutional structure. Our model cannot only outperform the basic LSTM-based model, but also achieve the comparable performance with those of state-of-the-art methods

...read moreread less

16 citations

Proceedings Article•DOI•

Multirate Multimodal Video Captioning

[...]

Ziwei Yang¹, Youjiang Xu¹, Huiyun Wang¹, Bo Wang¹, Yahong Han¹ - Show less +1 more•Institutions (1)

Tianjin University¹

23 Oct 2017

TL;DR: The approach for video captioning gets great performance on the 2nd MSR Video to Language Challenge and the approach utilizes a Multirate GRU to capture temporal structure of videos.

...read moreread less

Abstract: Automatically describing videos with natural language is a crucial challenge of video understanding. Compared to images, videos have specific spatial-temporal structure and various modality information. In this paper, we propose a Multirate Multimodal Approach for video captioning. Considering that the speed of motion in videos varies constantly, we utilize a Multirate GRU to capture temporal structure of videos. It encodes video frames with different intervals and has a strong ability to deal with motion speed variance. As videos contain different modality cues, we design a particular multimodal fusion method. By incorporating visual, motion, and topic information together, we construct a well-designed video representation. Then the video representation is fed into a RNN-based language model for generating natural language descriptions. We evaluate our approach for video captioning on "Microsoft Research - Video to Text" (MSR-VTT), a large-scale video benchmark for video understanding. And our approach gets great performance on the 2nd MSR Video to Language Challenge.

...read moreread less

16 citations

Journal Article•DOI•

"I Hope This Is Helpful": Understanding Crowdworkers' Challenges and Motivations for an Image Description Task

[...]

Rachel N. Simons¹, Danna Gurari², Kenneth R. Fleischmann²•Institutions (2)

Texas Woman's University¹, University of Texas at Austin²

14 Oct 2020

TL;DR: Findings from a thematic analysis of 1,064 comments left by Amazon Mechanical Turk workers using this task design to create captions for images taken by people who are blind are discussed.

...read moreread less

Abstract: AI image captioning challenges encourage broad participation in designing algorithms that automatically create captions for a variety of images and users. To create large datasets necessary for these challenges, researchers typically employ a shared crowdsourcing task design for image captioning. This paper discusses findings from our thematic analysis of 1,064 comments left by Amazon Mechanical Turk workers using this task design to create captions for images taken by people who are blind. Workers discussed difficulties in understanding how to complete this task, provided suggestions of how to improve the task, gave explanations or clarifications about their work, and described why they found this particular task rewarding or interesting. Our analysis provides insights both into this particular genre of task as well as broader considerations for how to employ crowdsourcing to generate large datasets for developing AI algorithms.

...read moreread less

16 citations

Posted Content•

TIME: Text and Image Mutual-Translation Adversarial Networks

[...]

Bingchen Liu¹, Kunpeng Song¹, Yizhe Zhu¹, Gerard de Melo¹, Ahmed Elgammal¹ - Show less +1 more•Institutions (1)

Rutgers University¹

27 May 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: Transformers are adopted to model the cross-modal connections between the image features and word embeddings, and a hinged and annealing conditional loss that dynamically balances the adversarial learning is designed.

...read moreread less

Abstract: Focusing on text-to-image (T2I) generation, we propose Text and Image Mutual-Translation Adversarial Networks (TIME), a lightweight but effective model that jointly learns a T2I generator $G$ and an image captioning discriminator $D$ under the Generative Adversarial Network framework. While previous methods tackle the T2I problem as a uni-directional task and use pre-trained language models to enforce the image-text consistency, TIME requires neither extra modules nor pre-training. We show that the performance of $G$ can be boosted substantially by training it jointly with $D$ as a language model. Specifically, we adopt Transformers to model the cross-modal connections between the image features and word embeddings, and design a hinged and annealing conditional loss that dynamically balances the adversarial learning. In our experiments, TIME establishes the new state-of-the-art Inception Score of 4.88 on the CUB dataset, and shows competitive performance on MS-COCO on both text-to-image and image captioning tasks.

...read moreread less

16 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics