scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Posted Content
TL;DR: Experimental results show that the proposed method has succeeded to use a pre- trained language model for audio captioning, and the oracle performance of the pre-trained model-based caption generator was clearly better than that of the conventional method trained from scratch.
Abstract: The goal of audio captioning is to translate input audio into its description using natural language. One of the problems in audio captioning is the lack of training data due to the difficulty in collecting audio-caption pairs by crawling the web. In this study, to overcome this problem, we propose to use a pre-trained large-scale language model. Since an audio input cannot be directly inputted into such a language model, we utilize guidance captions retrieved from a training dataset based on similarities that may exist in different audio. Then, the caption of the audio input is generated by using a pre-trained language model while referring to the guidance captions. Experimental results show that (i) the proposed method has succeeded to use a pre-trained language model for audio captioning, and (ii) the oracle performance of the pre-trained model-based caption generator was clearly better than that of the conventional method trained from scratch.

20 citations

Patent
11 Jan 2006
TL;DR: In this article, a content detecting device for a digital broadcast signal receiver or a recording apparatus that records the digital broadcast signals is presented. But the detection of a commercial based on information on presence or absence of one of a closed captioning broadcast and a data broadcast is not considered.
Abstract: A content detecting device for a digital broadcast signal receiver or a recording apparatus that records the digital broadcast signal. A program-related-information acquiring unit acquires program specific information and information for creating an electronic program guide and causes a memory to store the information. A detecting unit detects a commercial based on information on presence or absence of one of a closed captioning broadcast and a data broadcast and causes the memory to store detection information. A discriminating unit reads out the detection information and outputs a signal for distinguishing the program and the commercial. When information in the program specific information and information in the electronic program guide information in the memory contradict each other concerning presence or absence of one of a closed captioning broadcast and a data broadcast, the detecting unit causes the memory to store information indicating the detection of the commercial.

20 citations

Patent
25 Jan 2005
TL;DR: In this article, a system for providing caption information for one or more mobile devices includes a communication network, and a transcription device providing near real time delivery of the data transcription, using the communication network to send text from the caption data to at least one of the mobile devices.
Abstract: A system for providing caption information for one or more mobile devices includes a communication network, and one or more mobile devices connected to the communication network. The one or more mobile devices can include a cellular device, a personal digital assistant, or a wireless device. The system includes a captioning device to present caption data on a display, and a transcription device to transcribe data. The transcription device provides near real time delivery of the data transcription. The system uses the communication network to send text from the caption data to at least one of the mobile devices, while the system sends the caption data to one or more captioning devices simultaneously.

20 citations

Journal ArticleDOI
TL;DR: It is found that Deaf signers share the most in written English, despite their desire to share in sign language, and key areas of difficulty in consuming content and producing content on social media platforms are identified.
Abstract: Social media platforms support the sharing of written text, video, and audio. All of these formats may be inaccessible to people who are deaf or hard of hearing (DHH), particularly those who primarily communicate via sign language, people who we call Deaf signers. We study how Deaf signers engage with social platforms, focusing on how they share content and the barriers they face. We employ a mixed-methods approach involving seven in-depth interviews and a survey of a larger population (n = 60). We find that Deaf signers share the most in written English, despite their desire to share in sign language. We further identify key areas of difficulty in consuming content (e.g., lack of captions for spoken content in videos) and producing content (e.g., captioning signed videos, signing into a phone camera) on social media platforms. Our results both provide novel insights into social media use by Deaf signers and reinforce prior findings on DHH communication more generally, while revealing potential ways to make social media platforms more accessible to Deaf signers.

19 citations

Journal ArticleDOI
TL;DR: This paper investigates multimodal architectures to replace the “someone” tags with proper character names in existing video captions, and presents an improved version of the dataset, namely M-VAD Names, and its semi-automatic annotation procedure.
Abstract: Current movie captioning architectures are not capable of mentioning characters with their proper name, replacing them with a generic “someone” tag. The lack of movie description datasets with characters’ visual annotations surely plays a relevant role in this shortage. Recently, we proposed to extend the M-VAD dataset by introducing such information. In this paper, we present an improved version of the dataset, namely M-VAD Names, and its semi-automatic annotation procedure. The resulting dataset contains 63 k visual tracks and 34 k textual mentions, all associated with character identities. To showcase the features of the dataset and quantify the complexity of the naming task, we investigate multimodal architectures to replace the “someone” tags with proper character names in existing video captions. The evaluation is further extended by testing this application on videos outside of the M-VAD Names dataset.

19 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334