scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Book
17 Sep 2007
TL;DR: This chapter discusses Digital Television Channel Coding and Modulation, Closeding, Subtitling, and Teletext, and the MPEG-2 Video Compression Standard.
Abstract: Preface. 1. Introduction to Analog and Digital Television. 2. Characteristics of Video Material. 3. Predictive Encoding. 4. Transform Coding. 5. Video Coder Syntax. 6. The MPEG-2 Video Compression Standard. 7. Perceptual Audio Coding. 8. Frequency Analysis and Synthesis. 9. MPEG Audio. 10. Dolby AC-3 Audio. 11. MPEG-2 Systems. 12. DVB Service Information and ATSC Program and System Information Protocol. 13. Digital Television Channel Coding and Modulation. 14. Closed Captioning, Subtitling, and Teletext. Appendix: MPEG Tables. Index.

22 citations

Posted Content
TL;DR: In this paper, the authors attempt to generate video captions that convey richer contents by temporally segmenting the video with action localization, generating multiple captions from multiple frames, and connecting them with natural language processing techniques, in order to generate a story-like caption.
Abstract: Recent advances in image captioning task have led to increasing interests in video captioning task. However, most works on video captioning are focused on generating single input of aggregated features, which hardly deviates from image captioning process and does not fully take advantage of dynamic contents present in videos. We attempt to generate video captions that convey richer contents by temporally segmenting the video with action localization, generating multiple captions from multiple frames, and connecting them with natural language processing techniques, in order to generate a story-like caption. We show that our proposed method can generate captions that are richer in contents and can compete with state-of-the-art method without explicitly using video-level features as input.

22 citations

Book ChapterDOI
22 Jul 2007
TL;DR: This paper describes the development of a system that can provide an automatic text transcription of multiple speakers using speech recognition (SR), with the names of speakers identified in the transcription and corrections of SR errors made in real-time by a human 'editor'.
Abstract: Text transcriptions of the spoken word can benefit deaf people and also anyone who needs to review what has been said (e.g. at lectures, presentations, meetings etc.) Real time captioning (i.e. creating a live verbatim transcript of what is being spoken) using phonetic keyboards can provide an accurate live transcription for deaf people but is often not available because of the cost and shortage of highly skilled and trained stenographers. This paper describes the development of a system that can provide an automatic text transcription of multiple speakers using speech recognition (SR), with the names of speakers identified in the transcription and corrections of SR errors made in real-time by a human 'editor'.

22 citations

Journal ArticleDOI
TL;DR: A hierarchical attention-based multi-modal fusion model for video captioning is proposed by jointly considering the intrinsic properties of multimodal features and experimental results show that the proposed method has achieved competitive performance compared with the relatedVideo captioning methods.

22 citations

Proceedings ArticleDOI
12 Sep 2019
TL;DR: VizSeq is presented, a visual analysis toolkit for instance-level and corpus-level system evaluation on a wide variety of text generation tasks, and covers most common n-gram based metrics accelerated with multiprocessing, and also provides latest embedding-based metrics such as BERTScore.
Abstract: Automatic evaluation of text generation tasks (e.g. machine translation, text summarization, image captioning and video description) usually relies heavily on task-specific metrics, such as BLEU and ROUGE. They, however, are abstract numbers and are not perfectly aligned with human assessment. This suggests inspecting detailed examples as a complement to identify system error patterns. In this paper, we present VizSeq, a visual analysis toolkit for instance-level and corpus-level system evaluation on a wide variety of text generation tasks. It supports multimodal sources and multiple text references, providing visualization in Jupyter notebook or a web app interface. It can be used locally or deployed onto public servers for centralized data hosting and benchmarking. It covers most common n-gram based metrics accelerated with multiprocessing, and also provides latest embedding-based metrics such as BERTScore.

21 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334