scispace - formally typeset
Open AccessProceedings ArticleDOI

A Grid-based Representation for Human Action Recognition

Reads0
Chats0
TLDR
Zhang et al. as mentioned in this paper proposed a grid-based representation for action recognition, which encodes the most discriminative appearance information of an action with explicit attention on representative pose features.
Abstract
Human action recognition (HAR) in videos is a fundamental research topic in computer vision. It consists mainly in understanding actions performed by humans based on a sequence of visual observations. In recent years, HAR have witnessed significant progress, especially with the emergence of deep learning models. However, most of existing approaches for action recognition rely on information that is not always relevant for this task, and are limited in the way they fuse the temporal information. In this paper, we propose a novel method for human action recognition that encodes efficiently the most discriminative appearance information of an action with explicit attention on representative pose features, into a new compact grid representation. Our GRAR (Grid-based Representation for Action Recognition) method is tested on several benchmark datasets demonstrating that our model can accurately recognize human actions, despite intra-class appearance variations and occlusion challenges.

read more

Citations
More filters
Journal ArticleDOI

Developing an Objective Framework to Evaluate Street Functions

TL;DR: In this article , the authors proposed a holistic and objective framework to evaluate streets based on their actual use by all users, which is developed based on direct user observation to assess the various street functions (i.e., transit, access and place) using objective indicators at a microscopic (individual) level.
Book ChapterDOI

Development Human Activity Recognition for the Elderly Using Inertial Sensor and Statistical Feature

TL;DR: In this article , the authors tested the selection of feature extraction and machine learning methods regarding Human Activity Recognition and found that the most accurate machine learning algorithm is Random Forest, which has a 99.59% accuracy rate.
Proceedings ArticleDOI

ActAR: Actor-Driven Pose Embeddings for Video Action Recognition

TL;DR: Li et al. as mentioned in this paper proposed a new method that simultaneously learns to recognize efficiently human actions in the infrared spectrum, while automatically identifying the key-actors performing the action without using any prior knowledge or explicit annotations.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings Article

Two-Stream Convolutional Networks for Action Recognition in Videos

TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
Related Papers (5)