HMDB: A large video database for human motion recognition

doi:10.1109/ICCV.2011.6126543

Open AccessProceedings ArticleDOI

HMDB: A large video database for human motion recognition

Hilde Kuehne, +4 more

- pp 2556-2563

Chats0

TLDR

This paper uses the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube, to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions.

Abstract:

With nearly one billion online videos viewed everyday, an emerging new frontier in computer vision research is recognition and search in video. While much effort has been devoted to the collection and annotation of large scalable static image datasets containing thousands of image categories, human action datasets lag far behind. Current action recognition databases contain on the order of ten different action categories collected under fairly controlled conditions. State-of-the-art performance on these datasets is now near ceiling and thus there is a need for the design and creation of new benchmarks. To address this issue we collected the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube. We use this database to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions such as camera motion, viewpoint, video quality and occlusion.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Proceedings Article

Two-Stream Convolutional Networks for Action Recognition in Videos

Karen Simonyan, +1 more

TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.

...read moreread less

Proceedings ArticleDOI

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Joao Carreira, +1 more

TL;DR: In this article, a Two-Stream Inflated 3D ConvNet (I3D) is proposed to learn seamless spatio-temporal feature extractors from video while leveraging successful ImageNet architecture designs and their parameters.

...read moreread less

Posted Content

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

Khurram Soomro, +2 more

- 03 Dec 2012 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work introduces UCF101 which is currently the largest dataset of human actions and provides baseline action recognition results on this new dataset using standard bag of words approach with overall performance of 44.5%.

...read moreread less

Posted Content

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Jeff Donahue, +6 more

- 17 Nov 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Journal ArticleDOI

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

Aude Oliva, +1 more

- 01 May 2001 -

International Journal of Computer Vision

TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

...read moreread less

Proceedings ArticleDOI

Learning realistic human actions from movies

Ivan Laptev, +3 more

TL;DR: A new method for video classification that builds upon and extends several recent ideas including local space-time features,space-time pyramids and multi-channel non-linear SVMs is presented and shown to improve state-of-the-art results on the standard KTH action dataset.

...read moreread less

Journal ArticleDOI

LabelMe: A Database and Web-Based Tool for Image Annotation

Bryan Russell, +3 more

- 01 May 2008 -

International Journal of Computer Vision

TL;DR: In this article, a large collection of images with ground truth labels is built to be used for object detection and recognition research, such data is useful for supervised learning and quantitative evaluation.

...read moreread less

Proceedings ArticleDOI

Recognizing human actions: a local SVM approach

Christian Schüldt, +2 more

TL;DR: This paper construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition and presents the presented results of action recognition.

...read moreread less

Collapse

HMDB: A large video database for human motion recognition

Citations

Going deeper with convolutions

Two-Stream Convolutional Networks for Action Recognition in Videos

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

References

ImageNet: A large-scale hierarchical image database

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

Learning realistic human actions from movies

LabelMe: A Database and Web-Based Tool for Image Annotation

Recognizing human actions: a local SVM approach

Related Papers (5)

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

Learning Spatiotemporal Features with 3D Convolutional Networks

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Two-Stream Convolutional Networks for Action Recognition in Videos

Large-Scale Video Classification with Convolutional Neural Networks