Topic
Closed captioning
About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.
Papers published on a yearly basis
Papers
More filters
•
TL;DR: Experimental results show that the proposed method has succeeded to use a pre- trained language model for audio captioning, and the oracle performance of the pre-trained model-based caption generator was clearly better than that of the conventional method trained from scratch.
Abstract: The goal of audio captioning is to translate input audio into its description using natural language. One of the problems in audio captioning is the lack of training data due to the difficulty in collecting audio-caption pairs by crawling the web. In this study, to overcome this problem, we propose to use a pre-trained large-scale language model. Since an audio input cannot be directly inputted into such a language model, we utilize guidance captions retrieved from a training dataset based on similarities that may exist in different audio. Then, the caption of the audio input is generated by using a pre-trained language model while referring to the guidance captions. Experimental results show that (i) the proposed method has succeeded to use a pre-trained language model for audio captioning, and (ii) the oracle performance of the pre-trained model-based caption generator was clearly better than that of the conventional method trained from scratch.
20 citations
•
11 Jan 2006TL;DR: In this article, a content detecting device for a digital broadcast signal receiver or a recording apparatus that records the digital broadcast signals is presented. But the detection of a commercial based on information on presence or absence of one of a closed captioning broadcast and a data broadcast is not considered.
Abstract: A content detecting device for a digital broadcast signal receiver or a recording apparatus that records the digital broadcast signal. A program-related-information acquiring unit acquires program specific information and information for creating an electronic program guide and causes a memory to store the information. A detecting unit detects a commercial based on information on presence or absence of one of a closed captioning broadcast and a data broadcast and causes the memory to store detection information. A discriminating unit reads out the detection information and outputs a signal for distinguishing the program and the commercial. When information in the program specific information and information in the electronic program guide information in the memory contradict each other concerning presence or absence of one of a closed captioning broadcast and a data broadcast, the detecting unit causes the memory to store information indicating the detection of the commercial.
20 citations
•
25 Jan 2005
TL;DR: In this article, a system for providing caption information for one or more mobile devices includes a communication network, and a transcription device providing near real time delivery of the data transcription, using the communication network to send text from the caption data to at least one of the mobile devices.
Abstract: A system for providing caption information for one or more mobile devices includes a communication network, and one or more mobile devices connected to the communication network. The one or more mobile devices can include a cellular device, a personal digital assistant, or a wireless device. The system includes a captioning device to present caption data on a display, and a transcription device to transcribe data. The transcription device provides near real time delivery of the data transcription. The system uses the communication network to send text from the caption data to at least one of the mobile devices, while the system sends the caption data to one or more captioning devices simultaneously.
20 citations
••
TL;DR: It is found that Deaf signers share the most in written English, despite their desire to share in sign language, and key areas of difficulty in consuming content and producing content on social media platforms are identified.
Abstract: Social media platforms support the sharing of written text, video, and audio. All of these formats may be inaccessible to people who are deaf or hard of hearing (DHH), particularly those who primarily communicate via sign language, people who we call Deaf signers. We study how Deaf signers engage with social platforms, focusing on how they share content and the barriers they face. We employ a mixed-methods approach involving seven in-depth interviews and a survey of a larger population (n = 60). We find that Deaf signers share the most in written English, despite their desire to share in sign language. We further identify key areas of difficulty in consuming content (e.g., lack of captions for spoken content in videos) and producing content (e.g., captioning signed videos, signing into a phone camera) on social media platforms. Our results both provide novel insights into social media use by Deaf signers and reinforce prior findings on DHH communication more generally, while revealing potential ways to make social media platforms more accessible to Deaf signers.
19 citations
••
TL;DR: This paper investigates multimodal architectures to replace the “someone” tags with proper character names in existing video captions, and presents an improved version of the dataset, namely M-VAD Names, and its semi-automatic annotation procedure.
Abstract: Current movie captioning architectures are not capable of mentioning characters with their proper name, replacing them with a generic “someone” tag. The lack of movie description datasets with characters’ visual annotations surely plays a relevant role in this shortage. Recently, we proposed to extend the M-VAD dataset by introducing such information. In this paper, we present an improved version of the dataset, namely M-VAD Names, and its semi-automatic annotation procedure. The resulting dataset contains 63 k visual tracks and 34 k textual mentions, all associated with character identities. To showcase the features of the dataset and quantify the complexity of the naming task, we investigate multimodal architectures to replace the “someone” tags with proper character names in existing video captions. The evaluation is further extended by testing this application on videos outside of the M-VAD Names dataset.
19 citations