scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Proceedings ArticleDOI
07 Apr 2014
TL;DR: A caption editing system that harvests crowdsourced work for the useful task of video captioning and its interface incorporates game-like elements to make the task an engaging activity.
Abstract: Video captioning can increase the accessibility of information for people who are deaf or hard-of-hearing and benefit second language learners and reading-deficient students. We propose a caption editing system that harvests crowdsourced work for the useful task of video captioning. To make the task an engaging activity, its interface incorporates game-like elements. Non-expert users submit their transcriptions for short video segments against a countdown timer, either in a "type" or "fix" mode, to score points. Transcriptions from multiple users are aligned and merged to form the final captions. Preliminary results with 42 participants and 578 short video segments show that the Word Error Rate of the merged captions with two users per segment improved from 20.7% in ASR to 16%. Finally, we discuss our work in progress to improve both the accuracy of the collected data and to increase the crowd engagement.

18 citations

Posted Content
03 Jun 2015
TL;DR: This work shows that an intermediate image-to-attributes layer can dramatically improve captioning results over the current approach which directly connects an RNN to a CNN.
Abstract: Many recent studies in image captioning rely on an architecture which learns the mapping from images to sentences in an end-to-end fashion. However, generating an accurate and complete description requires identifying all entities, their mutual interactions and the context of the image. In this work, we show that an intermediate image-to-attributes layer can dramatically improve captioning results over the current approach which directly connects an RNN to a CNN. We propose a two-stage procedure for training such an attribute-based approach: in the first stage, we mine a number of keywords from the training sentences which we use as semantic attributes for images, and learn the mapping from images to those attributes with a CNN; in the second stage, we learn the mapping from detected attribute occurrence likelihoods to sentence description using LSTM. We then demonstrate the effectiveness of our two-stage model with captioning experiments on three benchmark datasets, which are Flickr8k, Flickr30K and MS COCO.

18 citations

Journal ArticleDOI
TL;DR: Current practices in Spanish TV captioning are examined to analyse whether syntax and vocabulary are adapted to satisfy deaf children’s needs and expectations regarding subtitle processing, and some alternative captioning criteria are proposed based on the needs of d/Deaf and hard-of-hearing children.
Abstract: In order to understand and fully comprehend a subtitle, two parameters within the linguistic code of audiovisual texts are key in the processing of the subtitle itself, namely, vocabulary and syntax. Through a descriptive and experimental study, the present article explores the transfer of the linguistic code of audiovisual texts in subtitling for deaf and hard-of-hearing children in three Spanish TV stations. In the first part of the study, we examine current practices in Spanish TV captioning to analyse whether syntax and vocabulary are adapted to satisfy deaf children’s needs and expectations regarding subtitle processing. In the second part, we propose some alternative captioning criteria for these two variables based on the needs of d/Deaf and hard-of-hearing (DHH) children, suggesting a more appropriate way of displaying the written linguistic code for deaf children. Although no specific distinction will be made throughout this paper, it is important to refer to these terms as they have been widely used in the literature. Neves (2008) distinguishes between the “Deaf”, who belong to a linguistic minority, use sign language as their mother tongue, and usually identify with a Deaf community and culture; the “deaf”, who normally have an oral language as their mother tongue and feel part of the hearing community; and the “hard of hearing”, who have residual hearing and, therefore, share the world and the sound experience of hearers. In the experimental study, 75 Spanish DHH children aged between 8 and 13 were exposed to two options: the actual broadcast captions on TV, and the alternative captions created by the authors. The data gathered from this exposure were used to analyse the children’s comprehension of these two variables in order to draw conclusions about the suitability of the changes proposed in the alternative subtitles.

18 citations

Journal ArticleDOI
TL;DR: Hearing-impaired students who read at the second, third, and fourth-grade levels viewed five versions of three children's television programs and revealed that as the linguistic complexity of the captions increased, students' comprehension scores declined.
Abstract: ����� Hearing-impaired students who read at the second, third, and fourth-grade levels viewed five versions of three children's television programs. In versions one through three, captions were presented at different levels of linguistic complexity; version four was captioned according to an intuitively-based method; and version five was shown without captions. Students' comprehension of all three programs was higher with the captioned than the un captioned versions and in all three assessments, second-grade readers showed significantly lower comprehension than third and fourth-grade readers. Differentiation in comprehension as a function of captioning mode, however, was found for only one of the three programs. The pattern of results here revealed that as the linguistic complexity of the captions increased, students' comprehension scores declined. Implications for future captioning efforts are discussed. Although television is a major source of information for hearing children, it is less valuable for the hearing-impaired child who cannot benefit from the material presented in the soundtrack. Due to this problem, people concerned about the hearing impaired began experimenting with the use of captions for television. Caption writing for children involves editing the verbatim script to allow for reading time and/or to adjust language level. However, traditionally, the reading level of the captioned materials aims at the hypothetical "average" hearing-impaired viewer. This approach cannot meet the special needs of poorer readers for whom the captions may prove frustratingly difficult to comprehend, or the special needs of better readers for whom the captions may be oversimplified, thus cheating them of richer information. Furthermore, in transforming the original program script into captions, the captioner's "intuitive" understanding of the appropriate linguistic content is the only available basis for modifying language. This results in wide variability in the complexity of syntactic structures.

18 citations

Proceedings ArticleDOI
05 Jun 2019
TL;DR: In this article, a classification of eight semantic image-text classes (e.g., "illustration" or "anchorage") is presented, which can be characterized by a set of three metrics: cross-modal mutual information, semantic correlation and the status relation of image and text.
Abstract: Two modalities are often used to convey information in a complementary and beneficial manner, e.g., in online news, videos, educational resources, or scientific publications. The automatic understanding of semantic correlations between text and associated images as well as their interplay has a great potential for enhanced multimodal web search and recommender systems. However, automatic understanding of multimodal information is still an unsolved research problem. Recent approaches such as image captioning focus on precisely describing visual content and translating it to text, but typically address neither semantic interpretations nor the specific role or purpose of an image-text constellation. In this paper, we go beyond previous work and investigate, inspired by research in visual communication, useful semantic image-text relations for multimodal information retrieval. We derive a categorization of eight semantic image-text classes (e.g., "illustration" or "anchorage") and show how they can systematically be characterized by a set of three metrics: cross-modal mutual information, semantic correlation, and the status relation of image and text. Furthermore, we present a deep learning system to predict these classes by utilizing multimodal embeddings. To obtain a sufficiently large amount of training data, we have automatically collected and augmented data from a variety of datasets and web resources, which enables future research on this topic. Experimental results on a demanding test set demonstrate the feasibility of the approach.

18 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334