scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Book ChapterDOI
20 Feb 2020
TL;DR: The VizWiz-Captions dataset as mentioned in this paper consists of over 39,000 images originating from people who are blind that are each paired with five captions, which is the first publicly available dataset for image captioning.
Abstract: While an important problem in the vision community is to design algorithms that can automatically caption images, few publicly-available datasets for algorithm development directly address the interests of real users. Observing that people who are blind have relied on (human-based) image captioning services to learn about images they take for nearly a decade, we introduce the first image captioning dataset to represent this real use case. This new dataset, which we call VizWiz-Captions, consists of over 39,000 images originating from people who are blind that are each paired with five captions. We analyze this dataset to (1) characterize the typical captions, (2) characterize the diversity of content found in the images, and (3) compare its content to that found in eight popular vision datasets. We also analyze modern image captioning algorithms to identify what makes this new dataset challenging for the vision community. We publicly-share the dataset with captioning challenge instructions at https://vizwiz.org.

63 citations

Patent
28 Feb 2006
TL;DR: In this article, the authors present a system and methods for integrated media monitoring, which enables users to analyze how a product or service is being advertised or otherwise conveyed to the general public.
Abstract: Systems and methods for integrated media monitoring are disclosed. The present invention enables users to analyze how a product or service is being advertised or otherwise conveyed to the general public. Via strategically placed servers, the present invention captures multiple types and sources of media for storage and analysis. Analysis includes both closed captioning analysis and human monitoring. Media search parameters are received over a network and a near real-time hit list of occurrences of the parameters are produced and presented to a requesting user. Options for previewing and purchasing matching media segments are presented, along with corresponding reports and coverage analyses. Reports indicate the effectiveness of advertising, the tonality of editorials, and other information useful to a user looking to understand how a product or service is being conveyed to the public via the media.

63 citations

Proceedings ArticleDOI
15 Oct 2019
TL;DR: This work proposes to explicitly model the object interactions in semantics and geometry based on Graph Convolutional Networks (GCNs), and fully exploit the alignment between linguistic words and visual semantic units for image captioning.
Abstract: Image captioning attempts to generate a sentence composed of several linguistic words, which are used to describe objects, attributes, and interactions in an image, denoted as visual semantic units in this paper. Based on this view, we propose to explicitly model the object interactions in semantics and geometry based on Graph Convolutional Networks (GCNs), and fully exploit the alignment between linguistic words and visual semantic units for image captioning. Particularly, we construct a semantic graph and a geometry graph, where each node corresponds to a visual semantic unit, i.e., an object, an attribute, or a semantic (geometrical) interaction between two objects. Accordingly, the semantic (geometrical) context-aware embeddings for each unit are obtained through the corresponding GCN learning processers. At each time step, a context gated attention module takes as inputs the embeddings of the visual semantic units and hierarchically align the current word with these units by first deciding which type of visual semantic unit (object, attribute, or interaction) the current word is about, and then finding the most correlated visual semantic units under this type. Extensive experiments are conducted on the challenging MS-COCO image captioning dataset, and superior results are reported when comparing to state-of-the-art approaches.

63 citations

Journal ArticleDOI
TL;DR: The Informedia Digital Video Library system as mentioned in this paper extracts information from digitized video sources and allows full content search and retrieval over all extracted data This extracted'metadata' enables users to rapidly find interesting news stories and to quickly identify whether a retrieved TV news story is indeed relevant to their query.
Abstract: The Informedia Digital Video Library system extracts information from digitized video sources and allows full content search and retrieval over all extracted data This extracted 'metadata' enables users to rapidly find interesting news stories and to quickly identify whether a retrieved TV news story is indeed relevant to their query This article highlights two unique features: named faces and location analysis Named faces automatically associate a name with a face, while location analysis allows the user to visually follow the action in the news story on a map and also allows queries for news stories by graphically selecting a region on the map 1 The Informedia Digital Video Library Project The Informedia Digital Video Library project [1], initiated in 1994, uniquely utilizes integrated speech, image and natural language understanding to process broadcast video The project’s goal is to allow search and retrieval in the video medium, similar to what is available today for text only To enable this access to video, fast, high-accuracy automatic transcriptions of broadcast news stories are generated through Carnegie Mellon’s Sphinx speech recognition system and closed captions are incorporated where available Image processing determines scene boundaries, recognizes faces and allows for image similarity comparisons Text visible on the screen is recognized through video OCR and can be searched Everything is indexed into a searchable digital video library [2], where users can ask queries and retrieve relevant news stories as results The

63 citations

Patent
05 Aug 2011
TL;DR: In this paper, the authors proposed a method to automatically identify media content based on visually capturing a still or video image of media content being presented to a user via another device, which can be further refined by determining location of the user, capturing an audio portion of the media content, date and time of the capture, or profile/behavioral characteristics of user.
Abstract: Automatic identification of media content is at least partially based upon visually capturing a still or video image of media content being presented to a user via another device. The media content can be further refined by determining location of the user, capturing an audio portion of the media content, date and time of the capture, or profile/behavioral characteristics of the user. Identifying the media content can require (1) distinguishing a rectangular illumination the corresponds to a video display; (2) decoding a watermark presented within the displayed image/video; (3) characterizing the presentation sufficiently for determining a particular time stamp or portion of a program; and (4) determining user setting preferences for viewing the program (e.g., close captioning, aspect ratio, language). Thus identified, the media content appropriately formatted can be received for continued presentation on a user interface of the mobile device.

62 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334