Topic
Closed captioning
About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.
Papers published on a yearly basis
Papers
More filters
•
17 Sep 2007
TL;DR: This chapter discusses Digital Television Channel Coding and Modulation, Closeding, Subtitling, and Teletext, and the MPEG-2 Video Compression Standard.
Abstract: Preface. 1. Introduction to Analog and Digital Television. 2. Characteristics of Video Material. 3. Predictive Encoding. 4. Transform Coding. 5. Video Coder Syntax. 6. The MPEG-2 Video Compression Standard. 7. Perceptual Audio Coding. 8. Frequency Analysis and Synthesis. 9. MPEG Audio. 10. Dolby AC-3 Audio. 11. MPEG-2 Systems. 12. DVB Service Information and ATSC Program and System Information Protocol. 13. Digital Television Channel Coding and Modulation. 14. Closed Captioning, Subtitling, and Teletext. Appendix: MPEG Tables. Index.
22 citations
•
TL;DR: In this paper, the authors attempt to generate video captions that convey richer contents by temporally segmenting the video with action localization, generating multiple captions from multiple frames, and connecting them with natural language processing techniques, in order to generate a story-like caption.
Abstract: Recent advances in image captioning task have led to increasing interests in video captioning task. However, most works on video captioning are focused on generating single input of aggregated features, which hardly deviates from image captioning process and does not fully take advantage of dynamic contents present in videos. We attempt to generate video captions that convey richer contents by temporally segmenting the video with action localization, generating multiple captions from multiple frames, and connecting them with natural language processing techniques, in order to generate a story-like caption. We show that our proposed method can generate captions that are richer in contents and can compete with state-of-the-art method without explicitly using video-level features as input.
22 citations
••
22 Jul 2007TL;DR: This paper describes the development of a system that can provide an automatic text transcription of multiple speakers using speech recognition (SR), with the names of speakers identified in the transcription and corrections of SR errors made in real-time by a human 'editor'.
Abstract: Text transcriptions of the spoken word can benefit deaf people and also anyone who needs to review what has been said (e.g. at lectures, presentations, meetings etc.) Real time captioning (i.e. creating a live verbatim transcript of what is being spoken) using phonetic keyboards can provide an accurate live transcription for deaf people but is often not available because of the cost and shortage of highly skilled and trained stenographers. This paper describes the development of a system that can provide an automatic text transcription of multiple speakers using speech recognition (SR), with the names of speakers identified in the transcription and corrections of SR errors made in real-time by a human 'editor'.
22 citations
••
TL;DR: A hierarchical attention-based multi-modal fusion model for video captioning is proposed by jointly considering the intrinsic properties of multimodal features and experimental results show that the proposed method has achieved competitive performance compared with the relatedVideo captioning methods.
22 citations
••
12 Sep 2019TL;DR: VizSeq is presented, a visual analysis toolkit for instance-level and corpus-level system evaluation on a wide variety of text generation tasks, and covers most common n-gram based metrics accelerated with multiprocessing, and also provides latest embedding-based metrics such as BERTScore.
Abstract: Automatic evaluation of text generation tasks (e.g. machine translation, text summarization, image captioning and video description) usually relies heavily on task-specific metrics, such as BLEU and ROUGE. They, however, are abstract numbers and are not perfectly aligned with human assessment. This suggests inspecting detailed examples as a complement to identify system error patterns. In this paper, we present VizSeq, a visual analysis toolkit for instance-level and corpus-level system evaluation on a wide variety of text generation tasks. It supports multimodal sources and multiple text references, providing visualization in Jupyter notebook or a web app interface. It can be used locally or deployed onto public servers for centralized data hosting and benchmarking. It covers most common n-gram based metrics accelerated with multiprocessing, and also provides latest embedding-based metrics such as BERTScore.
21 citations