Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Learning to Caption Images through a Lifetime by Asking Questions

[...]

Kevin Shen, Amlan Kar, Sanja Fidler

01 Dec 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, an agent is composed of three interacting modules, one that performs captioning, another that generates questions and a decision maker that learns when to ask questions by implicitly reasoning about the uncertainty of the agent and expertise of the teacher.

...read moreread less

Abstract: In order to bring artificial agents into our lives, we will need to go beyond supervised learning on closed datasets to having the ability to continuously expand knowledge. Inspired by a student learning in a classroom, we present an agent that can continuously learn by posing natural language questions to humans. Our agent is composed of three interacting modules, one that performs captioning, another that generates questions and a decision maker that learns when to ask questions by implicitly reasoning about the uncertainty of the agent and expertise of the teacher. As compared to current active learning methods which query images for full captions, our agent is able to ask pointed questions to improve the generated captions. The agent trains on the improved captions, expanding its knowledge. We show that our approach achieves better performance using less human supervision than the baselines on the challenging MSCOCO dataset.

...read moreread less

18 citations

Proceedings Article•DOI•

English to Hindi Multi-modal Neural Machine Translation and Hindi Image Captioning

[...]

Sahinur Rahman Laskar¹, Rohit Pratap Singh, Partha Pakray¹, Sivaji Bandyopadhyay¹•Institutions (1)

National Institute of Technology, Silchar¹

01 Nov 2019

TL;DR: The official results evaluated at WAT2019 translation task shows that the multi-modal NMT system achieved Bilin-gual Evaluation Understudy (BLEU) score20, Rank-based Intuitive Bilingual Eval-uation Score (RIBES) 0.642838, Adequacy-Fluency Metrics (AMFM) score 0.668260 forchallenge test data and BLEU score 40.760080 for challenge test data.

...read moreread less

Abstract: With the widespread use of Machine Trans-lation (MT) techniques, attempt to minimizecommunication gap among people from di-verse linguistic backgrounds. We have par-ticipated in Workshop on Asian Transla-tion 2019 (WAT2019) multi-modal translationtask. There are three types of submissiontrack namely, multi-modal translation, Hindi-only image captioning and text-only transla-tion for English to Hindi translation. The mainchallenge is to provide a precise MT output.The multi-modal concept incorporates textualand visual features in the translation task. Inthis work, multi-modal translation track re-lies on pre-trained convolutional neural net-works (CNN) with Visual Geometry Grouphaving 19 layered (VGG19) to extract imagefeatures and attention-based Neural MachineTranslation (NMT) system for translation.The merge-model of recurrent neural network(RNN) and CNN is used for the Hindi-onlyimage captioning. The text-only translationtrack is based on the transformer model of theNMT system. The official results evaluated atWAT2019 translation task, which shows thatour multi-modal NMT system achieved Bilin-gual Evaluation Understudy (BLEU) score20.37, Rank-based Intuitive Bilingual Eval-uation Score (RIBES) 0.642838, Adequacy-Fluency Metrics (AMFM) score 0.668260 forchallenge test data and BLEU score 40.55,RIBES 0.760080, AMFM score 0.770860 forevaluation test data in English to Hindi multi-modal translation respectively.

...read moreread less

18 citations

Journal Article•DOI•

The Case for Captioned Lectures in Australian Higher Education

[...]

Mike Kent¹, Katie Ellis¹, Natalie Latter¹, Gwyneth Peaty¹•Institutions (1)

Curtin University¹

01 Mar 2018-Techtrends

TL;DR: In this article, the benefits of captioning recorded lecture content in the Australian higher education sector are discussed. But the authors focus on the benefits for a wide range of students both disabled and non-disabled, and the perceived barriers to captioning.

...read moreread less

Abstract: This article provides a case for the benefits of captioning recorded lecture content in the Australian higher education sector. While online lecture captioning has traditionally been provided on a case-by-case basis to help students who are deaf or hard of hearing, this paper argues for a mainstream approach in order to benefit a range of student groups both with and without disability. It begins with some background on the regulation and technology context for captioning in higher education and online learning in Australia. This is followed by a review of the current literature on the benefits of captioning to a wide range of students both disabled and non-disabled, the perceived barriers to captioning, and how the increasing internationalisation of the university context effects captioning options, both culturally and commercially. The paper concludes by suggesting that it may be inevitable that all recorded lecture content will need to be captioned in the future and highlights the potential benefits to Australian universities to move quickly to embrace this existing technology.

...read moreread less

18 citations

Proceedings Article•DOI•

Smartphone-based Image Captioning for Visually and Hearing Impaired

[...]

Burak Makav¹, Volkan Kilic¹•Institutions (1)

Izmir Kâtip Çelebi University¹

01 Nov 2019

TL;DR: A portable and user-friendly smartphone-based platform capable of generating captions and text descriptions, including the option of a narrator, using image obtained from a smartphone camera is reported.

...read moreread less

Abstract: Visually and hearing impaired people face troubles due to inaccessible infrastructure and social challenges in daily life. To increase the life quality of those people, we report a portable and user-friendly smartphone-based platform capable of generating captions and text descriptions, including the option of a narrator, using image obtained from a smartphone camera. Image captioning is to generate a sentence to describe the visual content of an image in natural language and has attracted an increasing amount of attention in the fields of computer vision and natural language processing due to its potential applications. Generating image captions with proper linguistic properties is a challenging task as it needs to combine advanced level of image understanding algorithms with natural language processing methods. In this study, we propose to use Long Short-Term Memory (LSTM) model to generate a caption after images are trained using VGG16 deep learning architecture. The visual attributes of images are extracted with the VGG16, which conveys richer content, and then they are fed into the LSTM model for caption generation. This system is integrated with our custom- designed Android application, named as "Eye of Horus" which transfers the images from smartphone to the remote server via a cloud system, and displays the captions after the images are processed with the proposed captioning approach. The results show that the integrated platform has great potential to be used for image captioning by visually and hearing impaired people with advantages such as portability, simple operation and rapid response.

...read moreread less

18 citations

Posted Content•

Fast Image Caption Generation with Position Alignment.

[...]

Zhengcong Fei¹•Institutions (1)

Chinese Academy of Sciences¹

13 Dec 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work introduces an inference strategy that regards position information as a latent variable to guide the further sentence generation and achieves better performance compared to general NA captioning models, while achieves comparable performance as autoregressive image captioned models with a significant speedup.

...read moreread less

Abstract: Recent neural network models for image captioning usually employ an encoder-decoder architecture, where the decoder adopts a recursive sequence decoding way. However, such autoregressive decoding may result in sequential error accumulation and slow generation which limit the applications in practice. Non-autoregressive (NA) decoding has been proposed to cover these issues but suffers from language quality problem due to the indirect modeling of the target distribution. Towards that end, we propose an improved NA prediction framework to accelerate image captioning. Our decoding part consists of a position alignment to order the words that describe the content detected in the given image, and a fine non-autoregressive decoder to generate elegant descriptions. Furthermore, we introduce an inference strategy that regards position information as a latent variable to guide the further sentence generation. The Experimental results on public datasets show that our proposed model achieves better performance compared to general NA captioning models, while achieves comparable performance as autoregressive image captioning models with a significant speedup.

...read moreread less

18 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics