scispace - formally typeset
Open AccessProceedings ArticleDOI

UntrimmedNets for Weakly Supervised Action Recognition and Detection

TLDR
This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn action recognition models from untrimmed videos without the requirement of temporal annotations of action instances.
Abstract
Current action recognition methods heavily rely on trimmed videos for model training. However, it is expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn action recognition models from untrimmed videos without the requirement of temporal annotations of action instances. Our UntrimmedNet couples two important components, the classification module and the selection module, to learn the action models and reason about the temporal duration of action instances, respectively. These two components are implemented with feed-forward networks, and UntrimmedNet is therefore an end-to-end trainable architecture. We exploit the learned models for action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet. Although our UntrimmedNet only employs weak supervision, our method achieves performance superior or comparable to that of those strongly supervised approaches on these two datasets.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Deep Motion Prior for Weakly-Supervised Temporal Action Localization

TL;DR: In this paper , a motion graph is introduced to model motionness based on the local motion carrier (e.g., optical flow), and a motion guided loss is proposed to modulate the network training conditioned on motionness scores.

Deep Learning for Action Understanding in Video

Zheng Shou
TL;DR: Deep Learning for Action Understanding in Video demonstrates the power of reinforcement learning to help us understand the world around us in a more holistic way.
Posted Content

Discriminability Distillation in Group Representation Learning

TL;DR: In this paper, a discriminability distillation learning (DDL) method is proposed for group representation learning, which can be flexibly plugged into many group-based recognition tasks without influencing the original training procedures.
Posted Content

Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints

TL;DR: In this paper, a weakly supervised action detection and temporal segmentation method is proposed, where action sets provide much less supervision since neither action ordering nor the number of action occurrences are known.
Journal ArticleDOI

Predicting Visual Political Bias Using Webly Supervised Data and an Auxiliary Task

TL;DR: In this article, the authors collected a dataset of over one million unique images and associated news articles from left and right-leaning news sources, and developed a method to predict the image's political leaning.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Related Papers (5)