scispace - formally typeset
Search or ask a question
Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.


Papers
More filters
Journal ArticleDOI
TL;DR: The proposed framework for captioning the remote sensing image is based on multi-level attention and multi-label attribute graph convolution, which can adaptively focus not only on specific spatial features, but also on features of specific scales.
Abstract: Remote sensing image captioning, which aims to understand high-level semantic information and interactions of different ground objects, is a new emerging research topic in recent years. Though image captioning has developed rapidly with convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the image captioning task for remote sensing images still suffers from two main limitations. One limitation is that the scales of objects in remote sensing images vary dramatically, which makes it difficult to obtain an effective image representation. Another limitation is that the visual relationship in remote sensing images is still underused, which should have great potential to improve the final performance. In order to deal with these two limitations, an effective framework for captioning the remote sensing image is proposed in this paper. The framework is based on multi-level attention and multi-label attribute graph convolution. Specifically, the proposed multi-level attention module can adaptively focus not only on specific spatial features, but also on features of specific scales. Moreover, the designed attribute graph convolution module can employ the attribute-graph to learn more effective attribute features for image captioning. Extensive experiments are conducted and the proposed method achieves superior performance on UCM-captions, Sydney-captions and RSICD dataset.

28 citations

Patent
02 Apr 1998
TL;DR: In this article, a video communications device is presented that includes a camera and a teletype device (TTY) for transmitting and receiving teletype information in a video-conferencing arrangement.
Abstract: A video communications device used as part of a communication terminal in a video-conferencing arrangement provides the capability of real-time captioning along with real-time visual communication for the individuals that are hearing- or language-impaired and others whose speech is not understandable or non-existent. The device enhances the ability of people with communication disabilities to communicate quickly and effectively with those who are similarly afflicted as well as with those who are not. In one example embodiment, the video communications device includes a camera and a teletype device (TTY) for transmitting and receiving teletype information. The camera captures local images and generates a set of video signals representing those images. A teletype device captures input data from a user and generates a set of data signals. The device can be configured for compatibility with conventional equipment and for alerting users of incoming calls nonaudibly.

28 citations

Proceedings ArticleDOI
21 Oct 2013
TL;DR: A fully automatic system from raw data gathering to navigation over heterogeneous news sources, able to extract and study the trend of topics in the news and detect interesting peaks in news coverage over the life of the topic is presented.
Abstract: We present a fully automatic system from raw data gathering to navigation over heterogeneous news sources, including over 18k hours of broadcast video news, 3.58M online articles, and 430M public Twitter messages. Our system addresses the challenge of extracting "who," "what," "when," and "where" from a truly multimodal perspective, leveraging audiovisual information in broadcast news and those embedded in articles, as well as textual cues in both closed captions and raw document content in articles and social media. Performed over time, we are able to extract and study the trend of topics in the news and detect interesting peaks in news coverage over the life of the topic. We visualize these peaks in trending news topics using automatically extracted keywords and iconic images, and introduce a novel multimodal algorithm for naming speakers in the news. We also present several intuitive navigation interfaces for interacting with these complex topic structures over different news sources.

28 citations

Proceedings ArticleDOI
Pierre L. Dognin1, Igor Melnyk1, Youssef Mroueh1, Jarret Ross1, Tom Sercu1 
15 Jun 2019
TL;DR: In this article, a context-aware LSTM captioner and a co-attentive discriminator are proposed to enforce semantic alignment between images and captions, which improves the performance of captioning models.
Abstract: In this paper, we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient behavior and improved results over Gumbel ST, even without accessing discriminator gradients directly. We also address the problem of automatic evaluation for captioning models and introduce a new semantic score, and show its correlation to human judgement. As an evaluation paradigm, we argue that an important criterion for a captioner is the ability to generalize to compositions of objects that do not usually co-occur together. To this end, we introduce a small captioned Out of Context (OOC) test set. The OOC set, combined with our semantic score, are the proposed new diagnosis tools for the captioning community. When evaluated on OOC and MS-COCO benchmarks, we show that SCST-based training has a strong performance in both semantic score and human evaluation, promising to be a valuable new approach for efficient discrete GAN training.

28 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel boosted transformer model with two attention modules for image captioning, i.e., “Concept-Guided Attention” (CGA) and “Vision-Guiding Attention’ (VGA), which utilizes CGA in the encoder, to obtain the boosted visual features by integrating the instance-level concepts into the visual features.
Abstract: Image captioning attempts to generate a description given an image, usually taking Convolutional Neural Network as the encoder to extract the visual features and a sequence model, among which the self-attention mechanism has achieved advanced progress recently, as the decoder to generate descriptions. However, this predominant encoder-decoder architecture has some problems to be solved. On the encoder side, without the semantic concepts, the extracted visual features do not make full use of the image information. On the decoder side, the sequence self-attention only relies on word representations, lacking the guidance of visual information and easily influenced by the language prior. In this paper, we propose a novel boosted transformer model with two attention modules for the above-mentioned problems, i.e., “Concept-Guided Attention” (CGA) and “Vision-Guided Attention” (VGA). Our model utilizes CGA in the encoder, to obtain the boosted visual features by integrating the instance-level concepts into the visual features. In the decoder, we stack VGA, which uses the visual information as a bridge to model internal relationships among the sequences and can be an auxiliary module of sequence self-attention. Quantitative and qualitative results on the Microsoft COCO dataset demonstrate the better performance of our model than the state-of-the-art approaches.

28 citations


Network Information
Related Topics (5)
Feature vector
48.8K papers, 954.4K citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Convolutional neural network
74.7K papers, 2M citations
82% related
Deep learning
79.8K papers, 2.1M citations
82% related
Unsupervised learning
22.7K papers, 1M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023536
20221,030
2021504
2020530
2019448
2018334