Human action recognition using convolutional neural networks with symmetric time extension of visual rhythms

doi:10.1007/978-3-030-24289-3_26

Book ChapterDOI

Human action recognition using convolutional neural networks with symmetric time extension of visual rhythms

Hemerson Tacon, +7 more

- pp 351-366

Chats0

TLDR

This work proposes the usage of multiple Visual Rhythm crops, symmetrically extended in time and separated by a fixed stride, which provide a 2D representation of the video volume matching the fixed input size of the 2D Convolutional Neural Network employed.

Abstract:

Despite the expressive progress of deep learning models on the image classification task, they still need enhancement for efficient human action recognition. One way to achieve such gain is to augment the existing datasets. With this goal, we propose the usage of multiple Visual Rhythm crops, symmetrically extended in time and separated by a fixed stride. The symmetric extension preserves the video frame rate, which is crucial to not distort actions. The crops provide a 2D representation of the video volume matching the fixed input size of the 2D Convolutional Neural Network (CNN) employed. In addition, multiple crops with stride guarantee coverage of the entire video. Aiming to evaluate our method, a multi-stream strategy combining RGB and Optical Flow information is extended to include the Visual Rhythm. Accuracy rates fairly close to the state-of-the-art were obtained from the experiments with our method on the challenging UCF101 and HMDB51 datasets.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Survey on visual rhythms: A spatio-temporal representation for video sequences

Marcos Roberto e Souza, +3 more

- 18 Aug 2020 -

Neurocomputing

TL;DR: This work provides a comprehensive review of the main methods and techniques for constructing visual rhythms, classify the approaches into different categories, and identify new trends.

...read moreread less

Journal ArticleDOI

Weighted voting of multi-stream convolutional neural networks for video-based action recognition using optical flow rhythms

André de Souza Brito, +7 more

- 01 May 2021 -

Journal of Visual Communication and Imag...

TL;DR: A multi-stream architecture based on the weighted voting of convolutional neural networks to deal with the problem of recognizing human actions in videos is proposed, with a new stream, Optical Flow Rhythm, besides using other streams for diversity.

...read moreread less

Proceedings ArticleDOI

Image Super-Resolution Improved by Edge Information

Eldrey Seolin Galindo, +1 more

TL;DR: This work presents an edge enhanced super-resolution (EESR) method using a novel residual neural network with focus on image edges and a mix of loss functions that use PSNR, L1, Multiple-Scale Structural Similarity (MS-SSIM), and a new loss function based on the pencil sketch technique.

...read moreread less

Proceedings ArticleDOI

Learnable Visual Rhythms Based on the Stacking of Convolutional Neural Networks for Action Recognition

Helena de Almeida Maia, +8 more

TL;DR: This work addresses the problem of human action recognition in videos through a multi-stream network that incorporates both spatial and temporal information, and employs a deep network to extract features from the video frames in order to generate the rhythm.

...read moreread less

Proceedings ArticleDOI

Multi-stream Architecture with Symmetric Extended Visual Rhythms for Deep Learning Human Action Recognition.

Hemerson Tacon, +7 more

TL;DR: This work proposes to extract horizontal and vertical Visual Rhythms as well as their data augmentations as video features, driven by crops extracted from the symmetric extension of the time dimension, preserving the video frame rate, which is essential to keep motion patterns.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Proceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, +4 more

TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

...read moreread less