scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Journal ArticleDOI
TL;DR: Results indicate that reading level emerges as a dominant factor: more proficient readers show better comprehension than poor readers and are better able to benefit from caption rate and, to some extent, text reduction modifications.
Abstract: Caption rate and text reduction are factors that appear to affect the comprehension of captions by people who are deaf or hard of hearing. These 2 factors are confounded in everyday captioning; rate (in words per minute) is slowed by text reduction. In this study, caption rate and text reduction were manipulated independently in 2 experiments to assess any differential effects and possible benefits for comprehension by deaf and hard-of-hearing adults. Volunteers for the study included adults with a range of reading levels, self-reported hearing status, and different communication and language preferences. Results indicate that caption rate (at 130, 180, 230 words per minute) and text reduction (at 84%, 92%, and 100% original text) have different effects for different adult users, depending on hearing status, age, and reading level. In particular, reading level emerges as a dominant factor: more proficient readers show better comprehension than poor readers and are better able to benefit from caption rate and, to some extent, text reduction modifications.

55 citations

Patent
Jeromey Russell Goetz1
17 Apr 2015
TL;DR: In this paper, the importance of scenes or moments in video content relative to one another is identified and a textual analysis of the closed captioning data is performed, where the importance level of scenes can be ranked with respect to each other.
Abstract: Disclosed are various embodiments for identifying importance of scenes or moments in video content relative to one another. Closed captioning data is extracted from a video content feature. A textual analysis of the closed captioning data is performed. The importance level of scenes can be ranked with respect to one another.

55 citations

Book ChapterDOI
08 Sep 2018
TL;DR: A novel stylized image captioning model that effectively takes factual and stylized knowledge into consideration and outperforms the state-of-the-art approaches, without using extra ground truth supervision is proposed.
Abstract: Generating stylized captions for an image is an emerging topic in image captioning. Given an image as input, it requires the system to generate a caption that has a specific style (e.g., humorous, romantic, positive, and negative) while describing the image content semantically accurately. In this paper, we propose a novel stylized image captioning model that effectively takes both requirements into consideration. To this end, we first devise a new variant of LSTM, named style-factual LSTM, as the building block of our model. It uses two groups of matrices to capture the factual and stylized knowledge, respectively, and automatically learns the word-level weights of the two groups based on previous context. In addition, when we train the model to capture stylized elements, we propose an adaptive learning approach based on a reference factual model, it provides factual knowledge to the model as the model learns from stylized caption labels, and can adaptively compute how much information to supply at each time step. We evaluate our model on two stylized image captioning datasets, which contain humorous/romantic captions and positive/negative captions, respectively. Experiments shows that our proposed model outperforms the state-of-the-art approaches, without using extra ground truth supervision.

55 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: A novel performance evaluation metric named Fine-grained Captioning Evaluation (FCE), considered as an extension of the widely used METEOR, which measures not only the linguistic performance but also whether the action details and their temporal orders are correctly described.
Abstract: Despite recent emergence of video caption methods, how to generate fine-grained video descriptions (i.e., long and detailed commentary about individual movements of multiple subjects as well as their frequent interactions) is far from being solved, which however has great applications such as automatic sports narrative. To this end, this work makes the following contributions. First, to facilitate this novel research of fine-grained video caption, we collected a novel dataset called Fine-grained Sports Narrative dataset (FSN) that contains 2K sports videos with ground-truth narratives from YouTube.com. Second, we develop a novel performance evaluation metric named Fine-grained Captioning Evaluation (FCE) to cope with this novel task. Considered as an extension of the widely used METEOR, it measures not only the linguistic performance but also whether the action details and their temporal orders are correctly described. Third, we propose a new framework for fine-grained sports narrative task. This network features three branches: 1) a spatio-temporal entity localization and role discovering sub-network; 2) a fine-grained action modeling sub-network for local skeleton motion description; and 3) a group relationship modeling sub-network to model interactions between players. We further fuse the features and decode them into long narratives by a hierarchically recurrent structure. Extensive experiments on the FSN dataset demonstrates the validity of the proposed framework for fine-grained video caption.

55 citations

Proceedings ArticleDOI
B. Shahraray1, D.C. Gibbon2
23 Jun 1997
TL;DR: This fully automatic system generates HyperText Markup Language (HTML) renditions of television programs, and makes them available for access over the Internet within seconds of their broadcast.
Abstract: This paper describes a working system for the automated archiving and selective retrieval of textual, pictorial and auditory information contained in video programs. Video processing performs the task of representing the visual information using a small subset of the video frames. Linguistic processing refines the closed caption text, generates table of contents, and creates links to relevant multimedia documents. Audio and video information are compressed and indexed based on their temporal association with the selected video frames and processed text. The derived information is used to automatically generate a hypermedia rendition of the program contents. This provides a compact representation of the information contained in the video program. It also serves as a textual and pictorial index for selective retrieval of the full-motion video program. This fully automatic system generates HyperText Markup Language (HTML) renditions of television programs, and makes them available for access over the Internet within seconds of their broadcast. This digital library currently contains over 2200 hours of television programs.

54 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334