scispace - formally typeset
Proceedings ArticleDOI

Event recognition in egocentric videos using a novel trajectory based feature

Reads0
Chats0
TLDR
It is shown that the dense trajectory features based on the proposed GF-STIP descriptors enhance the efficacy of the event recognition system in egocentric videos.
Abstract
This paper proposes an approach for event recognition in Egocentric videos using dense trajectories over Gradient Flow - Space Time Interest Point (GF-STIP) feature. We focus on recognizing events of diverse categories (including indoor and outdoor activities, sports and social activities and adventures) in egocentric videos. We introduce a dataset with diverse egocentric events, as all the existing egocentric activity recognition datasets consist of indoor videos only. The dataset introduced in this paper contains 102 videos with 9 different events (containing indoor and outdoor videos with varying lighting conditions). We extract Space Time Interest Points (STIP) from each frame of the video. The interest points are taken as the lead pixels and Gradient-Weighted Optical Flow (GWOF) features are calculated on the lead pixels by multiplying the optical flow measure and the magnitude of gradient at the pixel, to obtain the GF-STIP feature. We construct pose descriptors with the GF-STIP feature. We use the GF-STIP descriptors for recognizing events in egocentric videos with three different approaches: following a Bag of Words (BoW) model, implementing Fisher Vectors and obtaining dense trajectories for the videos. We show that the dense trajectory features based on the proposed GF-STIP descriptors enhance the efficacy of the event recognition system in egocentric videos.

read more

Citations
More filters
Posted Content

An Information-rich Sampling Technique over Spatio-Temporal CNN for Classification of Human Actions in Videos

TL;DR: A 3-Dimensional deep CNN is proposed to extract the spatio-temporal features and follows Long Short-Term Memory (LSTM) to recognize human actions and is shown to outperform state-of-the-art deep learning based techniques.
Journal ArticleDOI

Human action and event recognition using a novel descriptor based on improved dense trajectories

TL;DR: The proposed unified descriptor is a 168-dimensional vector obtained from each video sequence by statistically analyzing the motion patterns of the 3D joint locations of the human body, which has shown its efficacy compared to the state-of-the-art techniques.
Book ChapterDOI

Recognizing Human Activities in Videos Using Improved Dense Trajectories over LSTM

TL;DR: This work proposes a deep learning based technique to classify actions based on Long Short Term Memory networks, and extends the proposed framework with an efficient motion feature, to enable handling significant camera motion.
Proceedings ArticleDOI

Activity Recognition in Egocentric Videos Using Bag of Key Action Units

TL;DR: It is argued that, for activity recognition in egocentric videos, the proposed approach performs better than any deep learning based method.
Journal ArticleDOI

An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos

TL;DR: Wang et al. as discussed by the authors proposed a novel video sampling scheme for human action recognition in videos, using Gaussian Weighing Function to reduce the volume of the input data and also avoid overfitting to some extent, thus enhancing the performance of the 3D CNN model.
References
More filters
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.
Proceedings ArticleDOI

A Combined Corner and Edge Detector

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Book ChapterDOI

SURF: speeded up robust features

TL;DR: A novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.
Proceedings ArticleDOI

Learning Spatiotemporal Features with 3D Convolutional Networks

TL;DR: The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks.
Related Papers (5)