Book ChapterDOI
Human action recognition using convolutional neural networks with symmetric time extension of visual rhythms
Hemerson Tacon,André de Souza Brito,Hugo de Lima Chaves,Marcelo Bernardes Vieira,Saulo Moraes Villela,Helena de Almeida Maia,Darwin Ttito Concha,Helio Pedrini +7 more
- pp 351-366
Reads0
Chats0
TLDR
This work proposes the usage of multiple Visual Rhythm crops, symmetrically extended in time and separated by a fixed stride, which provide a 2D representation of the video volume matching the fixed input size of the 2D Convolutional Neural Network employed.Abstract:
Despite the expressive progress of deep learning models on the image classification task, they still need enhancement for efficient human action recognition. One way to achieve such gain is to augment the existing datasets. With this goal, we propose the usage of multiple Visual Rhythm crops, symmetrically extended in time and separated by a fixed stride. The symmetric extension preserves the video frame rate, which is crucial to not distort actions. The crops provide a 2D representation of the video volume matching the fixed input size of the 2D Convolutional Neural Network (CNN) employed. In addition, multiple crops with stride guarantee coverage of the entire video. Aiming to evaluate our method, a multi-stream strategy combining RGB and Optical Flow information is extended to include the Visual Rhythm. Accuracy rates fairly close to the state-of-the-art were obtained from the experiments with our method on the challenging UCF101 and HMDB51 datasets.read more
Citations
More filters
Journal ArticleDOI
Survey on visual rhythms: A spatio-temporal representation for video sequences
TL;DR: This work provides a comprehensive review of the main methods and techniques for constructing visual rhythms, classify the approaches into different categories, and identify new trends.
Journal ArticleDOI
Weighted voting of multi-stream convolutional neural networks for video-based action recognition using optical flow rhythms
André de Souza Brito,Marcelo Bernardes Vieira,Saulo Moraes Villela,Hemerson Tacon,Hugo de Lima Chaves,Helena de Almeida Maia,Darwin Ttito Concha,Helio Pedrini +7 more
TL;DR: A multi-stream architecture based on the weighted voting of convolutional neural networks to deal with the problem of recognizing human actions in videos is proposed, with a new stream, Optical Flow Rhythm, besides using other streams for diversity.
Proceedings ArticleDOI
Image Super-Resolution Improved by Edge Information
TL;DR: This work presents an edge enhanced super-resolution (EESR) method using a novel residual neural network with focus on image edges and a mix of loss functions that use PSNR, L1, Multiple-Scale Structural Similarity (MS-SSIM), and a new loss function based on the pencil sketch technique.
Proceedings ArticleDOI
Learnable Visual Rhythms Based on the Stacking of Convolutional Neural Networks for Action Recognition
Helena de Almeida Maia,Marcos Roberto e Souza,Anderson Carlos Sousa e Santos,Helio Pedrini,Hemerson Tacon,André de Souza Brito,Hugo de Lima Chaves,Marcelo Bernardes Vieira,Saulo Moraes Villela +8 more
TL;DR: This work addresses the problem of human action recognition in videos through a multi-stream network that incorporates both spatial and temporal information, and employs a deep network to extract features from the video frames in order to generate the rhythm.
Proceedings ArticleDOI
Multi-stream Architecture with Symmetric Extended Visual Rhythms for Deep Learning Human Action Recognition.
Hemerson Tacon,André de Souza Brito,Hugo de Lima Chaves,Marcelo Bernardes Vieira,Saulo Moraes Villela,Helena de Almeida Maia,Darwin Ttito Concha,Helio Pedrini +7 more
TL;DR: This work proposes to extract horizontal and vertical Visual Rhythms as well as their data augmentations as video features, driven by crops extracted from the symmetric extension of the time dimension, preserving the video frame rate, which is essential to keep motion patterns.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI
ImageNet: A large-scale hierarchical image database
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI
Rethinking the Inception Architecture for Computer Vision
TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Related Papers (5)
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan,Andrew Zisserman +1 more