Proceedings ArticleDOI
Action Recognition Based on Discriminative Embedding of Actions Using Siamese Networks
Debaditya Roy,C. Krishna Mohan,K. Sri Rama Murty +2 more
- pp 3473-3477
Reads0
Chats0
TLDR
This paper trains a Siamese deep neural network with a contrastive loss on the low-dimensional representation of a pool of attributes learned in a universal Gaussian mixture model using factor analysis to classify actions by leveraging the corresponding class labels.Abstract:
Actions can be recognized effectively when the various atomic attributes forming the action are identified and combined in the form of a representation. In this paper, a low-dimensional representation is extracted from a pool of attributes learned in a universal Gaussian mixture model using factor analysis. However, such a representation cannot adequately discriminate between actions with similar attributes. Hence, we propose to classify such actions by leveraging the corresponding class labels. We train a Siamese deep neural network with a contrastive loss on the low-dimensional representation. We show that Siamese networks allow effective discrimination even between similar actions. The efficacy of the proposed approach is demonstrated on two benchmark action datasets, HMDB51 and MPII Cooking Activities. On both the datasets, the proposed method improves the state-of-the-art performance considerably.read more
Citations
More filters
Posted Content
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos
Adrien Deliège,Anthony Cioppa,Silvio Giancola,Meisam Jamshidi Seikavandi,Jacob Velling Dueholm,Kamal Nasrollahi,Bernard Ghanem,Thomas B. Moeslund,Marc Van Droogenbroeck +8 more
TL;DR: This work proposes SoccerNet-v2, a novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production, and extends current tasks in the realm of soccer to include action spotting, camera shot segmentation with boundary detection, and a novel replay grounding task.
Proceedings ArticleDOI
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos
Adrien Deliège,Anthony Cioppa,Silvio Giancola,Meisam Jamshidi Seikavandi,Jacob Velling Dueholm,Kamal Nasrollahi,Bernard Ghanem,Thomas B. Moeslund,Marc Van Droogenbroeck +8 more
TL;DR: SocSocNet-v2 as discussed by the authors is a large-scale corpus of manual annotations for the SoccerNet [24] video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production.
Proceedings ArticleDOI
Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos
TL;DR: This paper learns a motion-centric representation of surgical video demonstrations by grouping them into action segments/subgoals/options in a semi-supervised manner and demonstrates the use of this representation to imitate surgical suturing kinematic motions from publicly available videos of the JIGSAWS dataset.
Posted Content
SL-DML: Signal Level Deep Metric Learning for Multimodal One-Shot Action Recognition
TL;DR: This work proposes a metric learning approach to reduce the action recognition problem to a nearest neighbor search in embedding space, which generalizes well in experiments on the UTD-MHAD dataset for inertial, skeleton and fused data and the Simitate dataset for motion capturing data.
Posted Content
NeuralWarp: Time-Series Similarity with Warping Networks
TL;DR: Experimental results demonstrate that \textit{NeuralWarp} outperforms both non-parametric and un-warped deep models on a range of diverse real-life datasets.
References
More filters
Proceedings Article
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
Proceedings ArticleDOI
HMDB: A large video database for human motion recognition
TL;DR: This paper uses the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube, to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions.
Journal ArticleDOI
Front-End Factor Analysis for Speaker Verification
TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Proceedings ArticleDOI
Action Recognition with Improved Trajectories
Heng Wang,Cordelia Schmid +1 more
TL;DR: Dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets are improved by taking into account camera motion to correct them.
Journal ArticleDOI
One-shot learning of object categories
TL;DR: It is found that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.