Proceedings ArticleDOI
Exploiting Objects with LSTMs for Video Categorization
Sun Yongqing,Zuxuan Wu,Xi Wang,Hiroyuki Arai,Tetsuya Kinebuchi,Yu-Gang Jiang +5 more
- pp 142-146
TLDR
This paper proposes to leverage high-level semantic features to open the "black box" of the state-of-the-art temporal model, Long Short Term Memory (LSTM), with an aim to understand what is learned.Abstract:
Temporal dynamics play an important role for video classification. In this paper, we propose to leverage high-level semantic features to open the "black box" of the state-of-the-art temporal model, Long Short Term Memory (LSTM), with an aim to understand what is learned. More specifically, we first extract object features from a state-of-the-art CNN model that is trained to recognize 20K objects. Then we leverage LSTM with the extracted features as inputs to capture the temporal dynamics in videos. In combination with spatial and motion information, we achieve improvements for supervised video categorization. Furthermore, by masking the inputs, we demonstrate what is learned by LSTM, namely (i) which objects are crucial for recognizing a class-of-interest; (ii) how the LSTM model could assist the temporal localization of these detected objects.read more
Citations
More filters
Journal ArticleDOI
Unified Spatio-Temporal Attention Networks for Action Recognition in Videos
TL;DR: A unified Spatio-Temporal Attention Networks (STAN) is proposed in the context of multiple modalities, which differs from conventional deep networks, which focus on the attention mechanism, because the authors' temporal attention provides a principled and global guidance across different modalities and video segments.
Proceedings ArticleDOI
Collaborative Deep Metric Learning for Video Understanding
TL;DR: A deep network is proposed that embeds videos using their audio-visual content, onto a metric space which preserves video-to-video relationships, and used to tackle various domains including video classification and recommendation, showing significant improvements over state-of-the-art baselines.
Proceedings ArticleDOI
Exploring Background-bias for Anomaly Detection in Surveillance Videos
Kun Liu,Huadong Ma +1 more
TL;DR: This paper develops a series of experiments to validate the existence of background-bias phenomenon, which makes deep networks tend to learn the background information rather than the pattern of anomalies to recognize abnormal behavior, and proposes an end-to-end trainable, anomaly-area guided framework.
Journal ArticleDOI
CI-GNN: Building a Category-Instance Graph for Zero-Shot Video Classification
Junyu Gao,Changsheng Xu +1 more
TL;DR: An end-to-end framework to directly and collectively model the relationships between category-instance, category-category, and instance-instance in the CI-graph is proposed and object semantics is adopted as a bridge to generate unified representations for both videos and categories.
Proceedings ArticleDOI
Large-Scale Content-Only Video Recommendation
Joonseok Lee,Sami Abu-El-Haija +1 more
TL;DR: This paper model recommendation as a video content-based similarity learning problem, and learn deep video embeddings trained to predict video relationships identified by a co-watch-based system but using only visual and audial content.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI
ImageNet: A large-scale hierarchical image database
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Journal ArticleDOI
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.