Exploiting Image-trained CNN Architectures for Unconstrained Video Classification

doi:10.5244/C.29.60

Open AccessProceedings ArticleDOI

Exploiting Image-trained CNN Architectures for Unconstrained Video Classification

Shengxin Zha, +4 more

Chats0

TLDR

In this article, different ways of performing spatial and temporal pooling, feature normalization, choice of CNN layers as well as choice of classifiers are explored for event detection in videos using convolutional neural networks (CNNs) trained for image classification.

Abstract:

We conduct an in-depth exploration of different strategies for doing event detection in videos using convolutional neural networks (CNNs) trained for image classification. We study different ways of performing spatial and temporal pooling, feature normalization, choice of CNN layers as well as choice of classifiers. Making judicious choices along these dimensions led to a very significant increase in performance over more naive approaches that have been used till now. We evaluate our approach on the challenging TRECVID MED'14 dataset with two popular CNN architectures pretrained on ImageNet. On this MED'14 dataset, our methods, based entirely on image-trained CNN features, can outperform several state-of-the-art non-CNN models. Our proposed late fusion of CNN- and motion-based features can further increase the mean average precision (mAP) on MED'14 from 34.95% to 38.74%. The fusion approach achieves the state-of-the-art classification performance on the challenging UCF-101 dataset.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Ming-Ming Cheng, +3 more

TL;DR: It is observed that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their corresponding image windows in to a small fixed size, so as to train a generic objectness measure.

...read moreread less

Proceedings ArticleDOI

End-to-End Learning of Action Detection from Frame Glimpses in Videos

Serena Yeung, +3 more

TL;DR: In this article, the authors introduce an end-to-end approach for action detection in videos that learns to directly predict the temporal bounds of actions by observing frames and deciding both where to look next and when to emit a prediction.

...read moreread less

Proceedings ArticleDOI

Dynamic Image Networks for Action Recognition

Hakan Bilen, +4 more

TL;DR: The new approximate rank pooling CNN layer allows the use of existing CNN models directly on video data with fine-tuning to generalize dynamic images to dynamic feature maps and the power of the new representations on standard benchmarks in action recognition achieving state-of-the-art performance.

...read moreread less

Proceedings Article

Action Recognition using Visual Attention

Shikhar Sharma, +2 more

TL;DR: In this article, a soft attention based model was proposed for action recognition in videos using multi-layered RNNs with Long Short-Term Memory (LSTM) units which are deep both spatially and temporally.

...read moreread less

Proceedings ArticleDOI

Self-Supervised Video Representation Learning with Odd-One-Out Networks

Basura Fernando, +3 more

TL;DR: A new self-supervised CNN pre-training technique based on a novel auxiliary task called odd-one-out learning, which learns temporal representations for videos that generalizes to other related tasks such as action recognition.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Collapse

International Journal of Computer Vision

Exploiting Image-trained CNN Architectures for Unconstrained Video Classification

Citations

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

End-to-End Learning of Action Detection from Frame Glimpses in Videos

Dynamic Image Networks for Action Recognition

Action Recognition using Visual Attention

Self-Supervised Video Representation Learning with Odd-One-Out Networks

References

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet: A large-scale hierarchical image database

Gradient-based learning applied to document recognition

Going deeper with convolutions

Related Papers (5)

Two-Stream Convolutional Networks for Action Recognition in Videos

ImageNet Classification with Deep Convolutional Neural Networks

Learning Spatiotemporal Features with 3D Convolutional Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet Large Scale Visual Recognition Challenge