UntrimmedNets for Weakly Supervised Action Recognition and Detection

doi:10.1109/CVPR.2017.678

Open AccessProceedings ArticleDOI

UntrimmedNets for Weakly Supervised Action Recognition and Detection

- pp 6402-6411

TLDR

This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn action recognition models from untrimmed videos without the requirement of temporal annotations of action instances.

Abstract:

Current action recognition methods heavily rely on trimmed videos for model training. However, it is expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn action recognition models from untrimmed videos without the requirement of temporal annotations of action instances. Our UntrimmedNet couples two important components, the classification module and the selection module, to learn the action models and reason about the temporal duration of action instances, respectively. These two components are implemented with feed-forward networks, and UntrimmedNet is therefore an end-to-end trainable architecture. We exploit the learned models for action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet. Although our UntrimmedNet only employs weak supervision, our method achieves performance superior or comparable to that of those strongly supervised approaches on these two datasets.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

ActionBytes: Learning From Trimmed Videos to Localize Actions

Mihir Jain, +2 more

TL;DR: The advantage of ActionBytes for zero-shot localization as well as traditional weakly supervised localization, that train on long videos, to achieve state-of-the-art results are shown.

...read moreread less

Book ChapterDOI

Part-Activated Deep Reinforcement Learning for Action Prediction

Lei Chen, +3 more

TL;DR: This paper designs the PA-DRL to exploit the structure of the human body by extracting skeleton proposals under a deep reinforcement learning framework and considers the saliency part for expressing actions.

...read moreread less

Journal ArticleDOI

Learning Causal Temporal Relation and Feature Discrimination for Anomaly Detection

Peng Wu, +1 more

- 03 Mar 2021 -

IEEE Transactions on Image Processing

TL;DR: Wang et al. as mentioned in this paper proposed a method that consists of four modules to leverage the effect of the temporal cue and feature discrimination for anomaly detection, where the causal temporal relation module captures local-range temporal dependencies among features to enhance features, and the classifier projects enhanced features to the category space using the causal convolution and further expands the temporal modeling range.

...read moreread less

Posted Content

Weakly-supervised Temporal Action Localization by Uncertainty Modeling

Pilhyeon Lee, +3 more

- 12 Jun 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A new perspective on background frames is presented where they are modeled as out-of-distribution samples regarding their inconsistency and a background entropy loss is introduced to better discriminate background frames by encouraging their in-dist distribution (action) probabilities to be uniformly distributed over all action classes.

...read moreread less

Proceedings ArticleDOI

Action Recognition From Single Timestamp Supervision in Untrimmed Videos

Davide Moltisanti, +2 more

TL;DR: This work proposes a method that is supervised by single timestamps located around each action instance, in untrimmed videos, that replaces expensive action bounds with sampling distributions initialised from these timestampeds, and demonstrates that these distributions converge to the location and extent of discriminative action segments.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Collapse

UntrimmedNets for Weakly Supervised Action Recognition and Detection

Citations

ActionBytes: Learning From Trimmed Videos to Localize Actions

Part-Activated Deep Reinforcement Learning for Action Prediction

Learning Causal Temporal Relation and Feature Discrimination for Anomaly Detection

Weakly-supervised Temporal Action Localization by Uncertainty Modeling

Action Recognition From Single Timestamp Supervision in Untrimmed Videos

References

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet: A large-scale hierarchical image database

Gradient-based learning applied to document recognition

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

ImageNet Large Scale Visual Recognition Challenge

Related Papers (5)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

ActivityNet: A large-scale video benchmark for human activity understanding

Learning Spatiotemporal Features with 3D Convolutional Networks

Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition