scispace - formally typeset
Open AccessDissertation

Human action recognition in videos with local representation

TLDR
It is claimed that ability to process a video in real-time will be a key factor in future action recognition applications and a novel descriptor which introduces 3D trajectories computed on RGB-D information is proposed.
Abstract
This thesis targets recognition of human actions in videos. Action recognition can be defined as the ability to determine whether a given action occurs in the video. This problem is complicated due to the high complexity of human actions such as appearance variation, motion pattern variation, occlusions, etc. Recent advancements in either hand-crafted or deep-learning methods significantly improved action recognition accuracy. But there are many open questions, which keep action recognition task far from being solved. Current state-of-the-art methods achieved satisfactory results mostly base on features, which focus on a local spatio-temporal neighborhood. But human actions are complex, thus the following question that should be answered is how to model a relationship between local features, especially in spatio-temporal context. In this thesis, we propose 2 methods which try to answer that challenging problem. In the first method, we propose to measure a pairwise relationship between features with Brownian Covariance. In the second method, we propose to model spatial-layout of features \wrt person bounding box, achieving better or similar results as skeleton based methods. Our methods are generic and can improve both hand-crafted and deep-learning based methods. Another open question is whether 3D information can improve action recognition. Currently, most of the state-of-the-art methods work on RGB data, which is missing 3D information. In addition, many methods use 3D information only to obtain body joints, which is still challenging to obtain. In this thesis, we show that 3D information can be used not only for joints detection. We propose a novel descriptor which introduces 3D trajectories computed on RGB-D information. Finally, we claim that ability to process a video in real-time will be a key factor in future action recognition applications. All methods proposed in this thesis are ready to work in real-time. We proved our claim empirically by building a real-time action detection system, which was successfully adapted by Toyota company in their robotic systems. In the evaluation part, we focus particularly on daily living actions -- performed by people in their daily self-care routine. In the scope of our interest are actions like eating, drinking, cooking. Recognition of such actions is particularly important for patient monitoring systems in hospitals and nursing homes. Daily living action recognition is also a key component of assistive robots. To evaluate the methods proposed in this thesis we created a large-scale dataset, which consists of 160~hours of video footage of 20~senior people. The videos were recorded in 3~different rooms by 7~RGB-D sensors. We have annotated the videos with 28 action classes. The actions in the dataset are performed in un-acted and unsupervised way, thus the dataset introduces real-world challenges, absent in many public datasets. Finally, we have also evaluated our methods on publicly available datasets: CAD60, CAD120 and MSRDailyActivity3D. Our experiments show that the methods proposed in this thesis improve state-of-the-art results.

read more

Citations
More filters
Journal ArticleDOI

Vision-based human action recognition: An overview and real world challenges

TL;DR: This paper investigates an overview of the existing methods according to the kind of issue they address, and presents a comparison of the already introduced datasets introduced for the human action recognition field.
Journal ArticleDOI

Video classification and retrieval through spatio-temporal Radon features

TL;DR: A descriptor called the Spatio-Temporal Histogram of Radon Projections (STHRP) for representing the temporal pattern of the contents of a video and demonstrates its application to video classification and retrieval.

New Results - Modeling Spatial Layout of Features for Real World Scenario RGB-D Action Recognition

TL;DR: The results show that the method is competitive to skeleton based methods, while requiring much simpler people detection instead of skeleton detection, and introduces descriptor which encodes static information for actions with low amount of motion.
Book ChapterDOI

A New Hybrid Architecture for Human Activity Recognition from RGB-D Videos

TL;DR: This work proposes a novel two level fusion strategy to combine features from different cues to address the problem of large variety of actions in activity recognition from RGB-D videos.
Proceedings ArticleDOI

Online Detection of Long-Term Daily Living Activities by Weakly Supervised Recognition of Sub-Activities

TL;DR: This paper considers a long-term activity as a sequence of short-term sub-activities, and proposes an online activity detection framework based on the discovery of sub-Activities which allows us to predict an ongoing activity before being completely observed.
References
More filters
Proceedings ArticleDOI

3D trajectories for action recognition

TL;DR: This paper investigates state-of-the-art methods designed for RGB videos, and proposes two novel video descriptors that combines motion and 3D information and improves performance on actions with low movement rate.
Proceedings ArticleDOI

Action recognition based on a mixture of RGB and depth based skeleton

TL;DR: Two skeleton detection methods are compared: the depth-map based method used with Kinect camera and RGB based method that uses Deep Convolutional Neural Networks that uses deep convolutional neural networks.

New Results - Modeling Spatial Layout of Features for Real World Scenario RGB-D Action Recognition

TL;DR: The results show that the method is competitive to skeleton based methods, while requiring much simpler people detection instead of skeleton detection, and introduces descriptor which encodes static information for actions with low amount of motion.
Proceedings ArticleDOI

Generating unsupervised models for online long-term daily living activity recognition

TL;DR: This paper presents an unsupervised approach for learning long-term human activities without requiring any user interaction by creating models that contains both global and body motion of people.
Proceedings ArticleDOI

A hybrid framework for online recognition of activities of daily living in real-world settings

TL;DR: This paper presents a hybrid approach for long-term human activity recognition with more precise recognition of activities compared to unsupervised approaches, which enables processing of long- term videos by automatically clipping and performing online recognition.
Related Papers (5)