scispace - formally typeset
Open AccessProceedings ArticleDOI

Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals

Reads0
Chats0
TLDR
A model for reasoning on multimodal data to jointly predict activities and energy expenditures is proposed and heart rate signals are used as privileged self-supervision to derive energy expenditure in a training stage.
Abstract
Physiological signals such as heart rate can provide valuable information about an individuals state and activity. However, existing work on computer vision has not yet explored leveraging these signals to enhance egocentric video understanding. In this work, we propose a model for reasoning on multimodal data to jointly predict activities and energy expenditures. We use heart rate signals as privileged self-supervision to derive energy expenditure in a training stage. A multitask objective is used to jointly optimize the two tasks. Additionally, we introduce a dataset that contains 31 hours of egocentric video augmented with heart rate and acceleration signals. This study can lead to new applications such as a visual calorie counter.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning

TL;DR: The Honda Research Institute Driving Dataset (HDD) as discussed by the authors is a dataset of 104 hours of real human driving in the San Francisco Bay Area collected using an instrumented vehicle equipped with different sensors.
Journal ArticleDOI

A Review of Multimodal Human Activity Recognition with Special Emphasis on Classification, Applications, Challenges and Future Directions

TL;DR: A comprehensive review of multimodal human activity recognition methods where different types of sensors are being used along with their analytical approaches and fusion methods is presented in this paper, where the authors present classification and discussion of existing work within seven rational aspects.
Proceedings ArticleDOI

Convolutional Relational Machine for Group Activity Recognition

TL;DR: In this article, an end-to-end deep Convolutional Neural Network (CRM) is proposed for recognizing group activities that utilizes the information in spatial relations between individual persons in image or video.
Journal ArticleDOI

Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances

- 14 Feb 2022 - 
TL;DR: A comprehensive analysis of the current advancements, developing trends, and major challenges for wearable-based human activity recognition (HAR) can be found in this paper , where the authors also present cutting-edge frontiers and future directions for deep learning-based HAR.
Posted Content

Convolutional Relational Machine for Group Activity Recognition

TL;DR: An end-to-end deep Convolutional Neural Network called CRM for recognizing group activities that utilizes the information in spatial relations between individual persons in image or video to produce an intermediate spatial representation based on individual and group activities.
References
More filters
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Proceedings ArticleDOI

Learning Spatiotemporal Features with 3D Convolutional Networks

TL;DR: The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks.
Proceedings ArticleDOI

Large-Scale Video Classification with Convolutional Neural Networks

TL;DR: This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training.
Related Papers (5)