Action Recognition Based on Discriminative Embedding of Actions Using Siamese Networks

doi:10.1109/ICIP.2018.8451226

Proceedings ArticleDOI

Action Recognition Based on Discriminative Embedding of Actions Using Siamese Networks

Debaditya Roy, +2 more

- pp 3473-3477

Chats0

TLDR

This paper trains a Siamese deep neural network with a contrastive loss on the low-dimensional representation of a pool of attributes learned in a universal Gaussian mixture model using factor analysis to classify actions by leveraging the corresponding class labels.

Abstract:

Actions can be recognized effectively when the various atomic attributes forming the action are identified and combined in the form of a representation. In this paper, a low-dimensional representation is extracted from a pool of attributes learned in a universal Gaussian mixture model using factor analysis. However, such a representation cannot adequately discriminate between actions with similar attributes. Hence, we propose to classify such actions by leveraging the corresponding class labels. We train a Siamese deep neural network with a contrastive loss on the low-dimensional representation. We show that Siamese networks allow effective discrimination even between similar actions. The efficacy of the proposed approach is demonstrated on two benchmark action datasets, HMDB51 and MPII Cooking Activities. On both the datasets, the proposed method improves the state-of-the-art performance considerably.

Citations

PDF

Open Access

More filters

Posted Content

SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos

Adrien Deliège, +8 more

- 26 Nov 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work proposes SoccerNet-v2, a novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production, and extends current tasks in the realm of soccer to include action spotting, camera shot segmentation with boundary detection, and a novel replay grounding task.

...read moreread less

Proceedings ArticleDOI

SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos

Adrien Deliège, +8 more

TL;DR: SocSocNet-v2 as discussed by the authors is a large-scale corpus of manual annotations for the SoccerNet [24] video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production.

...read moreread less

Proceedings ArticleDOI

Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos

Ajay Kumar Tanwani, +5 more

TL;DR: This paper learns a motion-centric representation of surgical video demonstrations by grouping them into action segments/subgoals/options in a semi-supervised manner and demonstrates the use of this representation to imitate surgical suturing kinematic motions from publicly available videos of the JIGSAWS dataset.

...read moreread less

Posted Content

SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition

Raphael Memmesheimer, +2 more

- 23 Apr 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work proposes a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space, which generalizes well in experiments on the UTD-MHAD dataset for inertial, skeleton and fused data and the Simitate dataset for motion capturing data.

...read moreread less

Posted Content

NeuralWarp: Time-Series Similarity with Warping Networks

Josif Grabocka, +1 more

- 20 Dec 2018 -

arXiv: Learning

TL;DR: Experimental results demonstrate that \textit{NeuralWarp} outperforms both non-parametric and un-warped deep models on a range of diverse real-life datasets.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Two-Stream Convolutional Networks for Action Recognition in Videos

Karen Simonyan, +1 more

TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.

...read moreread less

Proceedings ArticleDOI

HMDB: A large video database for human motion recognition

Hilde Kuehne, +4 more

TL;DR: This paper uses the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube, to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions.

...read moreread less

Journal ArticleDOI

Front-End Factor Analysis for Speaker Verification

Najim Dehak, +4 more

- 01 May 2011 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.

...read moreread less

Proceedings ArticleDOI

Action Recognition with Improved Trajectories

Heng Wang, +1 more

TL;DR: Dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets are improved by taking into account camera motion to correct them.

...read moreread less

Journal ArticleDOI

One-shot learning of object categories

Li Fei-Fei, +2 more

- 01 Apr 2006 -

IEEE Transactions on Pattern Analysis an...

TL;DR: It is found that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.

...read moreread less