scispace - formally typeset
Proceedings ArticleDOI

Action Recognition Based on Discriminative Embedding of Actions Using Siamese Networks

Reads0
Chats0
TLDR
This paper trains a Siamese deep neural network with a contrastive loss on the low-dimensional representation of a pool of attributes learned in a universal Gaussian mixture model using factor analysis to classify actions by leveraging the corresponding class labels.
Abstract
Actions can be recognized effectively when the various atomic attributes forming the action are identified and combined in the form of a representation. In this paper, a low-dimensional representation is extracted from a pool of attributes learned in a universal Gaussian mixture model using factor analysis. However, such a representation cannot adequately discriminate between actions with similar attributes. Hence, we propose to classify such actions by leveraging the corresponding class labels. We train a Siamese deep neural network with a contrastive loss on the low-dimensional representation. We show that Siamese networks allow effective discrimination even between similar actions. The efficacy of the proposed approach is demonstrated on two benchmark action datasets, HMDB51 and MPII Cooking Activities. On both the datasets, the proposed method improves the state-of-the-art performance considerably.

read more

Citations
More filters
Posted Content

SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos

TL;DR: This work proposes SoccerNet-v2, a novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production, and extends current tasks in the realm of soccer to include action spotting, camera shot segmentation with boundary detection, and a novel replay grounding task.
Proceedings ArticleDOI

SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos

TL;DR: SocSocNet-v2 as discussed by the authors is a large-scale corpus of manual annotations for the SoccerNet [24] video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production.
Proceedings ArticleDOI

Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos

TL;DR: This paper learns a motion-centric representation of surgical video demonstrations by grouping them into action segments/subgoals/options in a semi-supervised manner and demonstrates the use of this representation to imitate surgical suturing kinematic motions from publicly available videos of the JIGSAWS dataset.
Posted Content

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

TL;DR: This work proposes a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space, which generalizes well in experiments on the UTD-MHAD dataset for inertial, skeleton and fused data and the Simitate dataset for motion capturing data.
Posted Content

NeuralWarp: Time-Series Similarity with Warping Networks

TL;DR: Experimental results demonstrate that \textit{NeuralWarp} outperforms both non-parametric and un-warped deep models on a range of diverse real-life datasets.
References
More filters
Proceedings Article

Two-Stream Convolutional Networks for Action Recognition in Videos

TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
Proceedings ArticleDOI

HMDB: A large video database for human motion recognition

TL;DR: This paper uses the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube, to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions.
Journal ArticleDOI

Front-End Factor Analysis for Speaker Verification

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Proceedings ArticleDOI

Action Recognition with Improved Trajectories

TL;DR: Dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets are improved by taking into account camera motion to correct them.
Journal ArticleDOI

One-shot learning of object categories

TL;DR: It is found that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.
Related Papers (5)