Predicting the future from first person (egocentric) vision: A survey

doi:10.1016/J.CVIU.2021.103252

Open AccessJournal ArticleDOI

Predicting the future from first person (egocentric) vision: A survey

Ivan Rodin, +3 more

- 01 Oct 2021 -

Computer Vision and Image Understanding

- Vol. 211, pp 103252

TLDR

It is highlighted that methods for future prediction from egocentric vision can have a significant impact in a range of applications and that further research efforts should be devoted to the standardisation of tasks and the proposal of datasets considering real-world scenarios such as the ones with an industrial vocation.

About:

This article is published in Computer Vision and Image Understanding.The article was published on 2021-10-01 and is currently open access. It has received 26 citations till now. The article focuses on the topics: Augmented reality.

Citations

PDF

Open Access

More filters

Posted Content

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Kristen Grauman, +83 more

- 13 Oct 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The Ego4D dataset as mentioned in this paper was used for de-identification of videos by some of the universities, such as the University of Bristol and the National University of Singapore.

...read moreread less

Proceedings ArticleDOI

Ego4D: Around the World in 3,000 Hours of Egocentric Video

TL;DR: The Ego4D dataset as discussed by the authors provides 3,670 hours of dailylife activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries.

...read moreread less

Posted Content

Is First Person Vision Challenging for Object Tracking? The TREK-100 Benchmark Dataset.

Matteo Dunnhofer, +3 more

- 24 Nov 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The study extensively analyses the performance of recent visual trackers and baseline FPV trackers with respect to different aspects and considering a new performance measure, and shows that object tracking in FPV is challenging.

...read moreread less

Journal ArticleDOI

Visual Object Tracking in First Person Vision

Matteo Dunnhofer, +3 more

- 27 Sep 2022 -

International Journal of Computer Vision

TL;DR: In this paper , the authors present the first systematic investigation of single object tracking in First Person Vision (FPV) and extensively analyze the performance of 42 algorithms including generic object trackers and baseline FPV-specific trackers.

...read moreread less

Book ChapterDOI

Untrimmed Action Anticipation

Ivan Rodin, +3 more

- 08 Feb 2022 -

Lecture Notes in Computer Science

TL;DR: In this paper , the authors propose an untrimmed action anticipation task, which, similarly to temporal action detection, requires predictions to be made before the actions actually take place, and compare results on the EPIC-KITCHENS-100 dataset.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Journal ArticleDOI

Generative Adversarial Nets

Ian Goodfellow, +7 more

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018 -

arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 04 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less

Collapse

Related Papers (5)

A Survey on Recent Advances of Computer Vision Algorithms for Egocentric Video.

Sven Bambach

- 12 Jan 2015 -

arXiv: Computer Vision and Pattern Recog...

Computer Science Review

Predicting the future from first person (egocentric) vision: A survey

Citations

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Is First Person Vision Challenging for Object Tracking? The TREK-100 Benchmark Dataset.

Visual Object Tracking in First Person Vision

Untrimmed Action Anticipation

References

Deep Residual Learning for Image Recognition

Long short-term memory

Generative Adversarial Nets

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Related Papers (5)

A Survey on Recent Advances of Computer Vision Algorithms for Egocentric Video.

A survey of activity recognition in egocentric lifelogging datasets

Peeking Into the Future: Predicting Future Person Activities and Locations in Videos

AutoViDev: A Computer-Vision Framework to Enhance and Accelerate Research in Human Development

Taxonomy, state-of-the-art, challenges and applications of visual understanding: A review