Topic

Closed captioning

About: Closed captioning is a research topic. Over the lifetime, 3011 publications have been published within this topic receiving 64494 citations. The topic is also known as: CC.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•

Live media captioning subscription framework for mobile devices

[...]

Richard F. Pettinato

21 Mar 2005

TL;DR: In this paper, a subscription-based system for transcribed audio information to one or more mobile devices is presented, which includes a subscription gateway configured for live/current transfer of the transcribed data to the mobile devices.

...read moreread less

Abstract: A subscription-based system provides transcribed audio information to one or more mobile devices. Some techniques feature a system for providing subscription services for currently-generated (e.g., not stored) information (e.g., caption information, transcribed audio) for one or more mobile devices for a live/current audio event. There can be a communication network for communicating to the one or more mobile devices, a transcriber configured for transcribing the event to generate information (e.g., caption information, transcribed audio). Caption data includes transcribed data and control code data. The system includes a subscription gateway configured for live/current transfer of the transcribed data to the one or more mobile devices. The subscription gateway is configured to provide access for the transcribed data to the one or more mobile devices. User preferences for subscribers can be set and/or updated by mobile device users and/or GPS-capable mobile devices to receive feeds for the live/current audio event.

...read moreread less

53 citations

Journal Article•DOI•

High-Quality Image Captioning With Fine-Grained and Semantic-Guided Visual Attention

[...]

Zongjian Zhang¹, Qiang Wu¹, Yang Wang², Fang Chen¹•Institutions (2)

University of Technology, Sydney¹, Commonwealth Scientific and Industrial Research Organisation²

01 Jul 2019-IEEE Transactions on Multimedia

TL;DR: A mechanism of fine-grained and semantic-guided visual attention is created, which can accurately link the relevant visual information with each semantic meaning inside the text, which significantly outperforms all other methods that use VGG-based CNN encoders without fine-tuning.

...read moreread less

Abstract: The soft-attention mechanism is regarded as one of the representative methods for image captioning. Based on the end-to-end convolutional neural network (CNN)-long short term memory (LSTM) framework, the soft-attention mechanism attempts to link the semantic representation in text (i.e., captioning) with relevant visual information in the image for the first time. Motivated by this approach, several state-of-the-art attention methods are proposed. However, due to the constraints of CNN architecture, the given image is only segmented to the fixed-resolution grid at a coarse level. The visual feature extracted from each grid indiscriminately fuses all inside objects and/or their portions. There is no semantic link between grid cells. In addition, the large area “stuff” (e.g., the sky or a beach) cannot be represented using the current methods. To address these problems, this paper proposes a new model based on the fully convolutional network (FCN)-LSTM framework, which can generate an attention map at a fine-grained grid-wise resolution. Moreover, the visual feature of each grid cell is contributed only by the principal object. By adopting the grid-wise labels (i.e., semantic segmentation), the visual representations of different grid cells are correlated to each other. With the ability to attend to large area “stuff,” our method can further summarize an additional semantic context from semantic labels. This method can provide comprehensive context information to the language LSTM decoder. In this way, a mechanism of fine-grained and semantic-guided visual attention is created, which can accurately link the relevant visual information with each semantic meaning inside the text. Demonstrated by three experiments including both qualitative and quantitative analyses, our model can generate captions of high quality, specifically high levels of accuracy, completeness, and diversity. Moreover, our model significantly outperforms all other methods that use VGG-based CNN encoders without fine-tuning.

...read moreread less

53 citations

Proceedings Article•DOI•

Learning to Discretely Compose Reasoning Module Networks for Video Captioning

[...]

Ganchao Tan¹, Daqing Liu¹, Meng Wang², Zheng-Jun Zha¹•Institutions (2)

University of Science and Technology of China¹, Hefei University of Technology²

09 Jul 2020

TL;DR: In this article, the authors propose a visual reasoning approach for video captioning, named Reasoning Module Networks (RMN), to equip the existing encoder-decoder framework with the above reasoning capacity.

...read moreread less

Abstract: Generating natural language descriptions for videos, i.e., video captioning, essentially requires step-by-step reasoning along the generation process. For example, to generate the sentence "a man is shooting a basketball", we need to first locate and describe the subject "man", next reason out the man is "shooting", then describe the object "basketball" of shooting. However, existing visual reasoning methods designed for visual question answering are not appropriate to video captioning, for it requires more complex visual reasoning on videos over both space and time, and dynamic module composition along the generation process. In this paper, we propose a novel visual reasoning approach for video captioning, named Reasoning Module Networks (RMN), to equip the existing encoder-decoder framework with the above reasoning capacity. Specifically, our RMN employs 1) three sophisticated spatio-temporal reasoning modules, and 2) a dynamic and discrete module selector trained by a linguistic loss with a Gumbel approximation. Extensive experiments on MSVD and MSR-VTT datasets demonstrate the proposed RMN outperforms the state-of-the-art methods while providing an explicit and explainable generation process. Our code is available at this https URL.

...read moreread less

53 citations

Proceedings Article•DOI•

Bidirectional Long-Short Term Memory for Video Description

[...]

Yi Bin¹, Yang Yang¹, Fumin Shen¹, Xing Xu¹, Heng Tao Shen¹ - Show less +1 more•Institutions (1)

University of Electronic Science and Technology of China¹

01 Oct 2016

TL;DR: Wang et al. as discussed by the authors proposed a novel video captioning framework, termed as ''Bidirectional Long-Short Term Memory'' (BiLSTM), which deeply captures bidirectional global temporal structure in video.

...read moreread less

Abstract: Video captioning has been attracting broad research attention in multimedia community. However, most existing approaches either ignore temporal information among video frames or just employ local contextual temporal knowledge. In this work, we propose a novel video captioning framework, termed as \emph{Bidirectional Long-Short Term Memory} (BiLSTM), which deeply captures bidirectional global temporal structure in video. Specifically, we first devise a joint visual modelling approach to encode video data by combining a forward LSTM pass, a backward LSTM pass, together with visual features from Convolutional Neural Networks (CNNs). Then, we inject the derived video representation into the subsequent language model for initialization. The benefits are in two folds: 1) comprehensively preserving sequential and visual information; and 2) adaptively learning dense visual features and sparse semantic representations for videos and sentences, respectively. We verify the effectiveness of our proposed video captioning framework on a commonly-used benchmark, i.e., Microsoft Video Description (MSVD) corpus, and the experimental results demonstrate that the superiority of the proposed approach as compared to several state-of-the-art methods.

...read moreread less

53 citations

Posted Content•

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

[...]

Quanzeng You, Hailin Jin, Jiebo Luo

30 Jan 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: Two different models are proposed, which employ different schemes for injecting sentiments into image captions, and the experimental results show that the proposed model outperform the state-of-the-art models in generating sentimental (i.e., sentiment-bearing) imageCaptions.

...read moreread less

Abstract: Automatic image captioning has recently approached human-level performance due to the latest advances in computer vision and natural language understanding However, most of the current models can only generate plain factual descriptions about the content of a given image However, for human beings, image caption writing is quite flexible and diverse, where additional language dimensions, such as emotion, humor and language styles, are often incorporated to produce diverse, emotional, or appealing captions In particular, we are interested in generating sentiment-conveying image descriptions, which has received little attention The main challenge is how to effectively inject sentiments into the generated captions without altering the semantic matching between the visual content and the generated descriptions In this work, we propose two different models, which employ different schemes for injecting sentiments into image captions Compared with the few existing approaches, the proposed models are much simpler and yet more effective The experimental results show that our model outperform the state-of-the-art models in generating sentimental (ie, sentiment-bearing) image captions In addition, we can also easily manipulate the model by assigning different sentiments to the testing image to generate captions with the corresponding sentiments

...read moreread less

53 citations

Collapse

Network Information

Performance

Metrics

4,575

Papers

96,790

Citations

No. of papers in the topic in previous years
Year	Papers
2023	536
2022	1,030
2021	504
2020	530
2019	448
2018	334

Closed captioning

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics