Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Enhancing learning accessibility through fully automatic captioning

[...]

Maria Federico, Marco Furini

16 Apr 2012

TL;DR: This approach couples the usage of off-the-shelf ASR (Automatic Speech Recognition) software with a novel caption alignment mechanism that smartly introduces unique audio markups into the audio stream before giving it to the ASR and transforms the plain transcript produced by theASR into a timecoded transcript.

...read moreread less

Abstract: The simple act of listening or of taking notes while attending a lesson may represent an insuperable burden for millions of people with some form of disabilities (e.g., hearing impaired, dyslexic and ESL students). In this paper, we propose an architecture that aims at automatically creating captions for video lessons by exploiting advances in speech recognition technologies. Our approach couples the usage of off-the-shelf ASR (Automatic Speech Recognition) software with a novel caption alignment mechanism that smartly introduces unique audio markups into the audio stream before giving it to the ASR and transforms the plain transcript produced by the ASR into a timecoded transcript.

...read moreread less

36 citations

Journal Article•DOI•

Improving Image Captioning with Conditional Generative Adversarial Nets.

[...]

Chen Chen, Shuai Mu, Wanpeng Xiao, Ye Zexiong, Liesi Wu, Fuming Ma¹, Qi Ju¹ - Show less +3 more•Institutions (1)

Tencent¹

18 May 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel conditional-generative-adversarial-nets-based image captioning framework as an extension of traditional reinforcement-learning (RL)-based encoder-decoder architecture to deal with the inconsistent evaluation problem among different objective language metrics is proposed.

...read moreread less

Abstract: In this paper, we propose a novel conditional-generative-adversarial-nets-based image captioning framework as an extension of traditional reinforcement-learning (RL)-based encoder-decoder architecture. To deal with the inconsistent evaluation problem among different objective language metrics, we are motivated to design some "discriminator" networks to automatically and progressively determine whether generated caption is human described or machine generated. Two kinds of discriminator architectures (CNN and RNN-based structures) are introduced since each has its own advantages. The proposed algorithm is generic so that it can enhance any existing RL-based image captioning framework and we show that the conventional RL training method is just a special case of our approach. Empirically, we show consistent improvements over all language evaluation metrics for different state-of-the-art image captioning models. In addition, the well-trained discriminators can also be viewed as objective image captioning evaluators

...read moreread less

36 citations

Posted Content•

I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation

[...]

Hao Dong¹, Jingqing Zhang¹, Douglas McIlwraith¹, Yike Guo¹•Institutions (1)

Imperial College London¹

20 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a new training method called Image-Text-Image (I2T2I) was proposed, which integrates text-to-image and image-totext (image captioning) synthesis to improve the performance of text to image synthesis.

...read moreread less

Abstract: Translating information between text and image is a fundamental problem in artificial intelligence that connects natural language processing and computer vision. In the past few years, performance in image caption generation has seen significant improvement through the adoption of recurrent neural networks (RNN). Meanwhile, text-to-image generation begun to generate plausible images using datasets of specific categories like birds and flowers. We've even seen image generation from multi-category datasets such as the Microsoft Common Objects in Context (MSCOCO) through the use of generative adversarial networks (GANs). Synthesizing objects with a complex shape, however, is still challenging. For example, animals and humans have many degrees of freedom, which means that they can take on many complex shapes. We propose a new training method called Image-Text-Image (I2T2I) which integrates text-to-image and image-to-text (image captioning) synthesis to improve the performance of text-to-image synthesis. We demonstrate that %the capability of our method to understand the sentence descriptions, so as to I2T2I can generate better multi-categories images using MSCOCO than the state-of-the-art. We also demonstrate that I2T2I can achieve transfer learning by using a pre-trained image captioning module to generate human images on the MPII Human Pose

...read moreread less

36 citations

Journal Article•DOI•

Near-verbatim captioning versus edited captioning for students who are deaf or hard of hearing: a preliminary investigation of effects on comprehension.

[...]

Phillip Ward¹, Ye Wang, Peter V. Paul, Mardi Loeterman•Institutions (1)

Ohio State University¹

02 Jul 2007-American Annals of the Deaf

TL;DR: Neither near-verbatim captioning nor edited captioning was found to be better at facilitating comprehension; however, several issues emerged that provide specific directions for future research on edited captions.

...read moreread less

Abstract: The study assessed the effects of near-verbatim captioning versus edited captioning on a comprehension task performed by 15 children, ages 7-11 years, who were deaf or hard of hearing. The children's animated television series Arthur was chosen as the content for the study. The researchers began the data collection procedure by asking participants to watch videotapes of the program. Researchers signed or spoke (or signed and spoke) 12 comprehension questions from a script to each participant. The sessions were videotaped, and a checklist was used to ensure consistency of the question-asking procedure across participants and sessions. Responses were coded as correct or incorrect, and the dependent variable was reported as the number of correct answers. Neither near-verbatim captioning nor edited captioning was found to be better at facilitating comprehension; however, several issues emerged that provide specific directions for future research on edited captions.

...read moreread less

36 citations

Posted Content•

Reconstruction Network for Video Captioning

[...]

Bairui Wang¹, Lin Ma², Wei Zhang¹, Wei Liu²•Institutions (2)

Shandong University¹, Tencent²

30 Mar 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as discussed by the authors proposed a reconstruction network with a novel encoder-decoder-reconstructor architecture, which leverages both the forward (video to sentence) and backward (sentence to video) flows for video captioning.

...read moreread less

Abstract: In this paper, the problem of describing visual contents of a video sequence with natural language is addressed. Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a reconstruction network (RecNet) with a novel encoder-decoder-reconstructor architecture, which leverages both the forward (video to sentence) and backward (sentence to video) flows for video captioning. Specifically, the encoder-decoder makes use of the forward flow to produce the sentence description based on the encoded video semantic features. Two types of reconstructors are customized to employ the backward flow and reproduce the video features based on the hidden state sequence generated by the decoder. The generation loss yielded by the encoder-decoder and the reconstruction loss introduced by the reconstructor are jointly drawn into training the proposed RecNet in an end-to-end fashion. Experimental results on benchmark datasets demonstrate that the proposed reconstructor can boost the encoder-decoder models and leads to significant gains in video caption accuracy.

...read moreread less

36 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics