Bidirectional Long-Short Term Memory for Video Description

Open AccessPosted Content

Bidirectional Long-Short Term Memory for Video Description

- 15 Jun 2016 -

TLDR

A novel video captioning framework, termed as BiLSTM, which deeply captures bidirectional global temporal structure in video, and which is comprehensively preserving sequential and visual information and adaptively learning dense visual features and sparse semantic representations for videos and sentences.

Abstract:

Video captioning has been attracting broad research attention in multimedia community. However, most existing approaches either ignore temporal information among video frames or just employ local contextual temporal knowledge. In this work, we propose a novel video captioning framework, termed as \emph{Bidirectional Long-Short Term Memory} (BiLSTM), which deeply captures bidirectional global temporal structure in video. Specifically, we first devise a joint visual modelling approach to encode video data by combining a forward LSTM pass, a backward LSTM pass, together with visual features from Convolutional Neural Networks (CNNs). Then, we inject the derived video representation into the subsequent language model for initialization. The benefits are in two folds: 1) comprehensively preserving sequential and visual information; and 2) adaptively learning dense visual features and sparse semantic representations for videos and sentences, respectively. We verify the effectiveness of our proposed video captioning framework on a commonly-used benchmark, i.e., Microsoft Video Description (MSVD) corpus, and the experimental results demonstrate that the superiority of the proposed approach as compared to several state-of-the-art methods.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

GLA: Global–Local Attention for Image Description

Linghui Li, +4 more

- 01 Mar 2018 -

IEEE Transactions on Multimedia

TL;DR: The proposed GLA method can generate more relevant image description sentences and achieve the state-of-the-art performance on the well-known Microsoft COCO caption dataset with several popular evaluation metrics.

...read moreread less

Proceedings ArticleDOI

Adaptively Attending to Visual Attributes and Linguistic Knowledge for Captioning

Yi Bin, +4 more

TL;DR: This work designs a key control unit, termed visual gate, to adaptively decide "when" and "what" the language generator attend to during the word generation process, and employs a bottom-up workflow to learn a pool of semantic attributes for serving as the propositional attention resources.

...read moreread less

Journal ArticleDOI

Occurrence prediction of cotton pests and diseases by bidirectional long short-term memory networks with climate and atmosphere circulation

Peng Chen, +4 more

- 01 Sep 2020 -

Computers and Electronics in Agriculture

TL;DR: Experimental results showed that Bi-LSTM shows good performance on the occurrence prediction of pests and diseases in cotton fields, and yields an Area Under the Curve (AUC) of 0.95, and verified that climate indeed have strong impact onThe occurrence of pestsand diseases, and circulation parameters also have certain influence.

...read moreread less

Posted Content

A Perceptual Prediction Framework for Self Supervised Event Segmentation

Sathyanarayanan N. Aakur, +1 more

- 12 Nov 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this paper, a self-supervised, predictive learning framework is proposed to segment long, visually complex videos into individual, stable segments that share the same semantics, and a new adaptive learning paradigm is introduced to reduce the effect of catastrophic forgetting in recurrent neural networks.

...read moreread less

Journal ArticleDOI

Exploiting the local temporal information for video captioning

Ran Wei, +3 more

- 01 Feb 2020 -

Journal of Visual Communication and Imag...

TL;DR: A reinforcement learning based method to predict the adaptive sliding window size sequentially for better event exploration and introduces the single Monte-Carlo sample to approximate the gradient of reward-based loss function.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia, +7 more

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less