Exploiting Objects with LSTMs for Video Categorization

doi:10.1145/2964284.2967199

Proceedings ArticleDOI

Exploiting Objects with LSTMs for Video Categorization

- pp 142-146

TLDR

This paper proposes to leverage high-level semantic features to open the "black box" of the state-of-the-art temporal model, Long Short Term Memory (LSTM), with an aim to understand what is learned.

Abstract:

Temporal dynamics play an important role for video classification. In this paper, we propose to leverage high-level semantic features to open the "black box" of the state-of-the-art temporal model, Long Short Term Memory (LSTM), with an aim to understand what is learned. More specifically, we first extract object features from a state-of-the-art CNN model that is trained to recognize 20K objects. Then we leverage LSTM with the extracted features as inputs to capture the temporal dynamics in videos. In combination with spatial and motion information, we achieve improvements for supervised video categorization. Furthermore, by masking the inputs, we demonstrate what is learned by LSTM, namely (i) which objects are crucial for recognizing a class-of-interest; (ii) how the LSTM model could assist the temporal localization of these detected objects.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Unified Spatio-Temporal Attention Networks for Action Recognition in Videos

Dong Li, +4 more

- 01 Feb 2019 -

IEEE Transactions on Multimedia

TL;DR: A unified Spatio-Temporal Attention Networks (STAN) is proposed in the context of multiple modalities, which differs from conventional deep networks, which focus on the attention mechanism, because the authors' temporal attention provides a principled and global guidance across different modalities and video segments.

...read moreread less

Proceedings ArticleDOI

Collaborative Deep Metric Learning for Video Understanding

Joonseok Lee, +3 more

TL;DR: A deep network is proposed that embeds videos using their audio-visual content, onto a metric space which preserves video-to-video relationships, and used to tackle various domains including video classification and recommendation, showing significant improvements over state-of-the-art baselines.

...read moreread less

Proceedings ArticleDOI

Exploring Background-bias for Anomaly Detection in Surveillance Videos

Kun Liu, +1 more

TL;DR: This paper develops a series of experiments to validate the existence of background-bias phenomenon, which makes deep networks tend to learn the background information rather than the pattern of anomalies to recognize abnormal behavior, and proposes an end-to-end trainable, anomaly-area guided framework.

...read moreread less

Journal ArticleDOI

CI-GNN: Building a Category-Instance Graph for Zero-Shot Video Classification

Junyu Gao, +1 more

- 27 Jan 2020 -

IEEE Transactions on Multimedia

TL;DR: An end-to-end framework to directly and collectively model the relationships between category-instance, category-category, and instance-instance in the CI-graph is proposed and object semantics is adopted as a bridge to generate unified representations for both videos and categories.

...read moreread less

Proceedings ArticleDOI

Large-Scale Content-Only Video Recommendation

Joonseok Lee, +1 more

TL;DR: This paper model recommendation as a video content-based similarity learning problem, and learn deep video embeddings trained to predict video relationships identified by a co-watch-based system but using only visual and audial content.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Journal ArticleDOI

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017 -

Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

Collapse

arXiv: Computer Vision and Pattern Recog...

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Chenyang Si, +4 more

Exploiting Objects with LSTMs for Video Categorization

Citations

Unified Spatio-Temporal Attention Networks for Action Recognition in Videos

Collaborative Deep Metric Learning for Video Understanding

Exploring Background-bias for Anomaly Detection in Surveillance Videos

CI-GNN: Building a Category-Instance Graph for Zero-Shot Video Classification

Large-Scale Content-Only Video Recommendation

References

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet: A large-scale hierarchical image database

Going deeper with convolutions

ImageNet classification with deep convolutional neural networks

Related Papers (5)

Deep Residual Learning for Image Recognition

Large-Scale Video Classification with Convolutional Neural Networks

Going deeper with convolutions

Enhance the Motion Cues for Face Anti-Spoofing using CNN-LSTM Architecture.

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition