Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Captioning Images Taken by People Who Are Blind.

[...]

Danna Gurari¹, Yinan Zhao¹, Meng Zhang¹, Nilavra Bhattacharya¹•Institutions (1)

University of Texas at Austin¹

20 Feb 2020

TL;DR: The VizWiz-Captions dataset as mentioned in this paper consists of over 39,000 images originating from people who are blind that are each paired with five captions, which is the first publicly available dataset for image captioning.

...read moreread less

Abstract: While an important problem in the vision community is to design algorithms that can automatically caption images, few publicly-available datasets for algorithm development directly address the interests of real users. Observing that people who are blind have relied on (human-based) image captioning services to learn about images they take for nearly a decade, we introduce the first image captioning dataset to represent this real use case. This new dataset, which we call VizWiz-Captions, consists of over 39,000 images originating from people who are blind that are each paired with five captions. We analyze this dataset to (1) characterize the typical captions, (2) characterize the diversity of content found in the images, and (3) compare its content to that found in eight popular vision datasets. We also analyze modern image captioning algorithms to identify what makes this new dataset challenging for the vision community. We publicly-share the dataset with captioning challenge instructions at https://vizwiz.org.

...read moreread less

63 citations

Patent•

Method for integrated media preview, analysis, purchase, and display

[...]

Gert Hercules Louw

28 Feb 2006

TL;DR: In this article, the authors present a system and methods for integrated media monitoring, which enables users to analyze how a product or service is being advertised or otherwise conveyed to the general public.

...read moreread less

Abstract: Systems and methods for integrated media monitoring are disclosed. The present invention enables users to analyze how a product or service is being advertised or otherwise conveyed to the general public. Via strategically placed servers, the present invention captures multiple types and sources of media for storage and analysis. Analysis includes both closed captioning analysis and human monitoring. Media search parameters are received over a network and a near real-time hit list of occurrences of the parameters are produced and presented to a requesting user. Options for previewing and purchasing matching media segments are presented, along with corresponding reports and coverage analyses. Reports indicate the effectiveness of advertising, the tonality of editorials, and other information useful to a user looking to understand how a product or service is being conveyed to the public via the media.

...read moreread less

63 citations

Proceedings Article•DOI•

Aligning Linguistic Words and Visual Semantic Units for Image Captioning

[...]

Longteng Guo¹, Jing Liu¹, Jinhui Tang², Jiangwei Li³, Wei Luo³, Hanqing Lu¹ - Show less +2 more•Institutions (3)

Chinese Academy of Sciences¹, Nanjing University of Science and Technology², Huawei³

15 Oct 2019

TL;DR: This work proposes to explicitly model the object interactions in semantics and geometry based on Graph Convolutional Networks (GCNs), and fully exploit the alignment between linguistic words and visual semantic units for image captioning.

...read moreread less

Abstract: Image captioning attempts to generate a sentence composed of several linguistic words, which are used to describe objects, attributes, and interactions in an image, denoted as visual semantic units in this paper. Based on this view, we propose to explicitly model the object interactions in semantics and geometry based on Graph Convolutional Networks (GCNs), and fully exploit the alignment between linguistic words and visual semantic units for image captioning. Particularly, we construct a semantic graph and a geometry graph, where each node corresponds to a visual semantic unit, i.e., an object, an attribute, or a semantic (geometrical) interaction between two objects. Accordingly, the semantic (geometrical) context-aware embeddings for each unit are obtained through the corresponding GCN learning processers. At each time step, a context gated attention module takes as inputs the embeddings of the visual semantic units and hierarchically align the current word with these units by first deciding which type of visual semantic unit (object, attribute, or interaction) the current word is about, and then finding the most correlated visual semantic units under this type. Extensive experiments are conducted on the challenging MS-COCO image captioning dataset, and superior results are reported when comparing to state-of-the-art approaches.

...read moreread less

63 citations

Journal Article•DOI•

Complementary video and audio analysis for broadcast news archives

[...]

Howard D. Wactlar¹, Alexander G. Hauptmann¹, Michael G. Christel¹, Ricky A. Houghton¹, Andreas M. Olligschlaeger¹ - Show less +1 more•Institutions (1)

University of Pittsburgh¹

01 Feb 2000-Communications of The ACM

TL;DR: The Informedia Digital Video Library system as mentioned in this paper extracts information from digitized video sources and allows full content search and retrieval over all extracted data This extracted'metadata' enables users to rapidly find interesting news stories and to quickly identify whether a retrieved TV news story is indeed relevant to their query.

...read moreread less

Abstract: The Informedia Digital Video Library system extracts information from digitized video sources and allows full content search and retrieval over all extracted data This extracted 'metadata' enables users to rapidly find interesting news stories and to quickly identify whether a retrieved TV news story is indeed relevant to their query This article highlights two unique features: named faces and location analysis Named faces automatically associate a name with a face, while location analysis allows the user to visually follow the action in the news story on a map and also allows queries for news stories by graphically selecting a region on the map 1 The Informedia Digital Video Library Project The Informedia Digital Video Library project [1], initiated in 1994, uniquely utilizes integrated speech, image and natural language understanding to process broadcast video The project’s goal is to allow search and retrieval in the video medium, similar to what is available today for text only To enable this access to video, fast, high-accuracy automatic transcriptions of broadcast news stories are generated through Carnegie Mellon’s Sphinx speech recognition system and closed captions are incorporated where available Image processing determines scene boundaries, recognizes faces and allows for image similarity comparisons Text visible on the screen is recognized through video OCR and can be searched Everything is indexed into a searchable digital video library [2], where users can ask queries and retrieve relevant news stories as results The

...read moreread less

63 citations

Patent•

Identifying visual media content captured by camera-enabled mobile device

[...]

Brian Momeyer, Selena M. Salazar, Babak Forutanpour

05 Aug 2011

TL;DR: In this paper, the authors proposed a method to automatically identify media content based on visually capturing a still or video image of media content being presented to a user via another device, which can be further refined by determining location of the user, capturing an audio portion of the media content, date and time of the capture, or profile/behavioral characteristics of user.

...read moreread less

Abstract: Automatic identification of media content is at least partially based upon visually capturing a still or video image of media content being presented to a user via another device. The media content can be further refined by determining location of the user, capturing an audio portion of the media content, date and time of the capture, or profile/behavioral characteristics of the user. Identifying the media content can require (1) distinguishing a rectangular illumination the corresponds to a video display; (2) decoding a watermark presented within the displayed image/video; (3) characterizing the presentation sufficiently for determining a particular time stamp or portion of a program; and (4) determining user setting preferences for viewing the program (e.g., close captioning, aspect ratio, language). Thus identified, the media content appropriately formatted can be received for continued presentation on a user interface of the mobile device.

...read moreread less

62 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics