scispace - formally typeset
Book ChapterDOI

Human action recognition using convolutional neural networks with symmetric time extension of visual rhythms

Reads0
Chats0
TLDR
This work proposes the usage of multiple Visual Rhythm crops, symmetrically extended in time and separated by a fixed stride, which provide a 2D representation of the video volume matching the fixed input size of the 2D Convolutional Neural Network employed.
Abstract
Despite the expressive progress of deep learning models on the image classification task, they still need enhancement for efficient human action recognition. One way to achieve such gain is to augment the existing datasets. With this goal, we propose the usage of multiple Visual Rhythm crops, symmetrically extended in time and separated by a fixed stride. The symmetric extension preserves the video frame rate, which is crucial to not distort actions. The crops provide a 2D representation of the video volume matching the fixed input size of the 2D Convolutional Neural Network (CNN) employed. In addition, multiple crops with stride guarantee coverage of the entire video. Aiming to evaluate our method, a multi-stream strategy combining RGB and Optical Flow information is extended to include the Visual Rhythm. Accuracy rates fairly close to the state-of-the-art were obtained from the experiments with our method on the challenging UCF101 and HMDB51 datasets.

read more

Citations
More filters
Journal ArticleDOI

Survey on visual rhythms: A spatio-temporal representation for video sequences

TL;DR: This work provides a comprehensive review of the main methods and techniques for constructing visual rhythms, classify the approaches into different categories, and identify new trends.
Journal ArticleDOI

Weighted voting of multi-stream convolutional neural networks for video-based action recognition using optical flow rhythms

TL;DR: A multi-stream architecture based on the weighted voting of convolutional neural networks to deal with the problem of recognizing human actions in videos is proposed, with a new stream, Optical Flow Rhythm, besides using other streams for diversity.
Proceedings ArticleDOI

Image Super-Resolution Improved by Edge Information

TL;DR: This work presents an edge enhanced super-resolution (EESR) method using a novel residual neural network with focus on image edges and a mix of loss functions that use PSNR, L1, Multiple-Scale Structural Similarity (MS-SSIM), and a new loss function based on the pencil sketch technique.
Proceedings ArticleDOI

Learnable Visual Rhythms Based on the Stacking of Convolutional Neural Networks for Action Recognition

TL;DR: This work addresses the problem of human action recognition in videos through a multi-stream network that incorporates both spatial and temporal information, and employs a deep network to extract features from the video frames in order to generate the rhythm.
Proceedings ArticleDOI

Multi-stream Architecture with Symmetric Extended Visual Rhythms for Deep Learning Human Action Recognition.

TL;DR: This work proposes to extract horizontal and vertical Visual Rhythms as well as their data augmentations as video features, driven by crops extracted from the symmetric extension of the time dimension, preserving the video frame rate, which is essential to keep motion patterns.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Related Papers (5)