scispace - formally typeset
Open AccessPosted Content

Ensemble of LSTMs and feature selection for human action prediction

Reads0
Chats0
TLDR
In this article, an ensemble of long short term memory (LSTM) networks was used for human action prediction using the MoGaze dataset, which is the most comprehensive dataset capturing poses of human joints and the human gaze.
Abstract
As robots are becoming more and more ubiquitous in human environments, it will be necessary for robotic systems to better understand and predict human actions. However, this is not an easy task, at times not even for us humans, but based on a relatively structured set of possible actions, appropriate cues, and the right model, this problem can be computationally tackled. In this paper, we propose to use an ensemble of long-short term memory (LSTM) networks for human action prediction. To train and evaluate models, we used the MoGaze dataset - currently the most comprehensive dataset capturing poses of human joints and the human gaze. We have thoroughly analyzed the MoGaze dataset and selected a reduced set of cues for this task. Our model can predict (i) which of the labeled objects the human is going to grasp, and (ii) which of the macro locations the human is going to visit (such as table or shelf). We have exhaustively evaluated the proposed method and compared it to individual cue baselines. The results suggest that our LSTM model slightly outperforms the gaze baseline in single object picking accuracy, but achieves better accuracy in macro object prediction. Furthermore, we have also analyzed the prediction accuracy when the gaze is not used, and in this case, the LSTM model considerably outperformed the best single cue baseline

read more

Citations
More filters
Proceedings ArticleDOI

Human Intention Recognition in Collaborative Environments using RGB-D Camera

TL;DR: In this paper , a hand-crafted model of human intention when reaching for one of multiple objects present on the table in front of a person is proposed, which can be used for collaborative pick-and-place scenarios.
References
More filters
Posted Content

Adam: A Method for Stochastic Optimization

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.
Journal ArticleDOI

A feature-integration theory of attention

TL;DR: A new hypothesis about the role of focused attention is proposed, which offers a new set of criteria for distinguishing separable from integral features and a new rationale for predicting which tasks will show attention limits and which will not.
Proceedings ArticleDOI

Are we ready for autonomous driving? The KITTI vision benchmark suite

TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
Proceedings ArticleDOI

You'll never walk alone: Modeling social behavior for multi-target tracking

TL;DR: A model of dynamic social behavior, inspired by models developed for crowd simulation, is introduced, trained with videos recorded from birds-eye view at busy locations, and applied as a motion model for multi-people tracking from a vehicle-mounted camera.
Journal ArticleDOI

Crowds by Example

TL;DR: By learning from real‐world examples, autonomous agents display complex natural behaviors that are often missing in crowd simulations.
Related Papers (5)