scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Posted Content
TL;DR: In this article, an agent is composed of three interacting modules, one that performs captioning, another that generates questions and a decision maker that learns when to ask questions by implicitly reasoning about the uncertainty of the agent and expertise of the teacher.
Abstract: In order to bring artificial agents into our lives, we will need to go beyond supervised learning on closed datasets to having the ability to continuously expand knowledge. Inspired by a student learning in a classroom, we present an agent that can continuously learn by posing natural language questions to humans. Our agent is composed of three interacting modules, one that performs captioning, another that generates questions and a decision maker that learns when to ask questions by implicitly reasoning about the uncertainty of the agent and expertise of the teacher. As compared to current active learning methods which query images for full captions, our agent is able to ask pointed questions to improve the generated captions. The agent trains on the improved captions, expanding its knowledge. We show that our approach achieves better performance using less human supervision than the baselines on the challenging MSCOCO dataset.

18 citations

Proceedings ArticleDOI
01 Nov 2019
TL;DR: The official results evaluated at WAT2019 translation task shows that the multi-modal NMT system achieved Bilin-gual Evaluation Understudy (BLEU) score20, Rank-based Intuitive Bilingual Eval-uation Score (RIBES) 0.642838, Adequacy-Fluency Metrics (AMFM) score 0.668260 forchallenge test data and BLEU score 40.760080 for challenge test data.
Abstract: With the widespread use of Machine Trans-lation (MT) techniques, attempt to minimizecommunication gap among people from di-verse linguistic backgrounds. We have par-ticipated in Workshop on Asian Transla-tion 2019 (WAT2019) multi-modal translationtask. There are three types of submissiontrack namely, multi-modal translation, Hindi-only image captioning and text-only transla-tion for English to Hindi translation. The mainchallenge is to provide a precise MT output.The multi-modal concept incorporates textualand visual features in the translation task. Inthis work, multi-modal translation track re-lies on pre-trained convolutional neural net-works (CNN) with Visual Geometry Grouphaving 19 layered (VGG19) to extract imagefeatures and attention-based Neural MachineTranslation (NMT) system for translation.The merge-model of recurrent neural network(RNN) and CNN is used for the Hindi-onlyimage captioning. The text-only translationtrack is based on the transformer model of theNMT system. The official results evaluated atWAT2019 translation task, which shows thatour multi-modal NMT system achieved Bilin-gual Evaluation Understudy (BLEU) score20.37, Rank-based Intuitive Bilingual Eval-uation Score (RIBES) 0.642838, Adequacy-Fluency Metrics (AMFM) score 0.668260 forchallenge test data and BLEU score 40.55,RIBES 0.760080, AMFM score 0.770860 forevaluation test data in English to Hindi multi-modal translation respectively.

18 citations

Journal ArticleDOI
TL;DR: In this article, the benefits of captioning recorded lecture content in the Australian higher education sector are discussed. But the authors focus on the benefits for a wide range of students both disabled and non-disabled, and the perceived barriers to captioning.
Abstract: This article provides a case for the benefits of captioning recorded lecture content in the Australian higher education sector. While online lecture captioning has traditionally been provided on a case-by-case basis to help students who are deaf or hard of hearing, this paper argues for a mainstream approach in order to benefit a range of student groups both with and without disability. It begins with some background on the regulation and technology context for captioning in higher education and online learning in Australia. This is followed by a review of the current literature on the benefits of captioning to a wide range of students both disabled and non-disabled, the perceived barriers to captioning, and how the increasing internationalisation of the university context effects captioning options, both culturally and commercially. The paper concludes by suggesting that it may be inevitable that all recorded lecture content will need to be captioned in the future and highlights the potential benefits to Australian universities to move quickly to embrace this existing technology.

18 citations

Proceedings ArticleDOI
01 Nov 2019
TL;DR: A portable and user-friendly smartphone-based platform capable of generating captions and text descriptions, including the option of a narrator, using image obtained from a smartphone camera is reported.
Abstract: Visually and hearing impaired people face troubles due to inaccessible infrastructure and social challenges in daily life. To increase the life quality of those people, we report a portable and user-friendly smartphone-based platform capable of generating captions and text descriptions, including the option of a narrator, using image obtained from a smartphone camera. Image captioning is to generate a sentence to describe the visual content of an image in natural language and has attracted an increasing amount of attention in the fields of computer vision and natural language processing due to its potential applications. Generating image captions with proper linguistic properties is a challenging task as it needs to combine advanced level of image understanding algorithms with natural language processing methods. In this study, we propose to use Long Short-Term Memory (LSTM) model to generate a caption after images are trained using VGG16 deep learning architecture. The visual attributes of images are extracted with the VGG16, which conveys richer content, and then they are fed into the LSTM model for caption generation. This system is integrated with our custom- designed Android application, named as "Eye of Horus" which transfers the images from smartphone to the remote server via a cloud system, and displays the captions after the images are processed with the proposed captioning approach. The results show that the integrated platform has great potential to be used for image captioning by visually and hearing impaired people with advantages such as portability, simple operation and rapid response.

18 citations

Posted Content
TL;DR: This work introduces an inference strategy that regards position information as a latent variable to guide the further sentence generation and achieves better performance compared to general NA captioning models, while achieves comparable performance as autoregressive image captioned models with a significant speedup.
Abstract: Recent neural network models for image captioning usually employ an encoder-decoder architecture, where the decoder adopts a recursive sequence decoding way. However, such autoregressive decoding may result in sequential error accumulation and slow generation which limit the applications in practice. Non-autoregressive (NA) decoding has been proposed to cover these issues but suffers from language quality problem due to the indirect modeling of the target distribution. Towards that end, we propose an improved NA prediction framework to accelerate image captioning. Our decoding part consists of a position alignment to order the words that describe the content detected in the given image, and a fine non-autoregressive decoder to generate elegant descriptions. Furthermore, we introduce an inference strategy that regards position information as a latent variable to guide the further sentence generation. The Experimental results on public datasets show that our proposed model achieves better performance compared to general NA captioning models, while achieves comparable performance as autoregressive image captioning models with a significant speedup.

18 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334