Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals
Katsuyuki Nakamura,Serena Yeung,Alexandre Alahi,Li Fei-Fei +3 more
- pp 6817-6826
Reads0
Chats0
TLDR
A model for reasoning on multimodal data to jointly predict activities and energy expenditures is proposed and heart rate signals are used as privileged self-supervision to derive energy expenditure in a training stage.Abstract:
Physiological signals such as heart rate can provide valuable information about an individuals state and activity. However, existing work on computer vision has not yet explored leveraging these signals to enhance egocentric video understanding. In this work, we propose a model for reasoning on multimodal data to jointly predict activities and energy expenditures. We use heart rate signals as privileged self-supervision to derive energy expenditure in a training stage. A multitask objective is used to jointly optimize the two tasks. Additionally, we introduce a dataset that contains 31 hours of egocentric video augmented with heart rate and acceleration signals. This study can lead to new applications such as a visual calorie counter.read more
Citations
More filters
Proceedings ArticleDOI
Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning
TL;DR: The Honda Research Institute Driving Dataset (HDD) as discussed by the authors is a dataset of 104 hours of real human driving in the San Francisco Bay Area collected using an instrumented vehicle equipped with different sensors.
Journal ArticleDOI
A Review of Multimodal Human Activity Recognition with Special Emphasis on Classification, Applications, Challenges and Future Directions
Santosh Kumar Yadav,Santosh Kumar Yadav,Kamlesh Tiwari,Hari Mohan Pandey,Shaik Ali Akbar,Shaik Ali Akbar +5 more
TL;DR: A comprehensive review of multimodal human activity recognition methods where different types of sensors are being used along with their analytical approaches and fusion methods is presented in this paper, where the authors present classification and discussion of existing work within seven rational aspects.
Proceedings ArticleDOI
Convolutional Relational Machine for Group Activity Recognition
TL;DR: In this article, an end-to-end deep Convolutional Neural Network (CRM) is proposed for recognizing group activities that utilizes the information in spatial relations between individual persons in image or video.
Journal ArticleDOI
Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances
TL;DR: A comprehensive analysis of the current advancements, developing trends, and major challenges for wearable-based human activity recognition (HAR) can be found in this paper , where the authors also present cutting-edge frontiers and future directions for deep learning-based HAR.
Posted Content
Convolutional Relational Machine for Group Activity Recognition
TL;DR: An end-to-end deep Convolutional Neural Network called CRM for recognizing group activities that utilizes the information in spatial relations between individual persons in image or video to produce an intermediate spatial representation based on individual and group activities.
References
More filters
Proceedings ArticleDOI
ImageNet: A large-scale hierarchical image database
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Proceedings ArticleDOI
Learning Spatiotemporal Features with 3D Convolutional Networks
TL;DR: The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks.
Proceedings ArticleDOI
Large-Scale Video Classification with Convolutional Neural Networks
TL;DR: This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training.