Action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features

doi:10.1007/978-981-16-1092-9_3

Book ChapterDOI

Action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features

- pp 29-38

TLDR

Zhang et al. as discussed by the authors proposed a novel unified model for action recognition in hazy videos using an efficient combination of a convolutional neural network (CNN) for obtaining the dehazed video first, followed by extracting spatial features from each frame, and a deep bidirectional LSTM (DB-LSTM) network for extracting the temporal features during action.

Abstract:

Action recognition in video sequences is an active research problem in Computer Vision. However, no significant efforts have been made for recognizing actions in hazy videos. This paper proposes a novel unified model for action recognition in hazy video using an efficient combination of a Convolutional Neural Network (CNN) for obtaining the dehazed video first, followed by extracting spatial features from each frame, and a deep bidirectional LSTM (DB-LSTM) network for extracting the temporal features during action. First, each frame of the hazy video is fed into the AOD-Net (All-in-One Dehazing Network) model to obtain the clear representation of frames. Next, spatial features are extracted from every sampled dehazed frame (produced by the AOD-Net model) by using a pre-trained VGG-16 architecture, which helps reduce the redundancy and complexity. Finally, the temporal information across the frames are learnt using a DB-LSTM network, where multiple LSTM layers are stacked together in both the forward and backward passes of the network. The proposed unified model is the first attempt to recognize human action in hazy videos. Experimental results on a synthetic hazy video dataset show state-of-the-art performances in recognizing actions.

References

PDF

Open Access

More filters

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Posted Content

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

Khurram Soomro, +2 more

- 03 Dec 2012 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work introduces UCF101 which is currently the largest dataset of human actions and provides baseline action recognition results on this new dataset using standard bag of words approach with overall performance of 44.5%.

...read moreread less

Proceedings ArticleDOI

Space-time interest points

Laptev, +1 more

TL;DR: This work builds on the idea of the Harris and Forstner interest point operators and detects local structures in space-time where the image values have significant local variations in both space and time to detect spatio-temporal events.

...read moreread less

Proceedings ArticleDOI

Visibility in bad weather from a single image

Robby T. Tan

TL;DR: A cost function in the framework of Markov random fields is developed, which can be efficiently optimized by various techniques, such as graph-cuts or belief propagation, and is applicable for both color and gray images.

...read moreread less

Journal ArticleDOI

DehazeNet: An End-to-End System for Single Image Haze Removal

Bolun Cai, +4 more

- 01 Nov 2016 -

IEEE Transactions on Image Processing

TL;DR: DehazeNet as discussed by the authors adopts convolutional neural network-based deep architecture, whose layers are specially designed to embody the established assumptions/priors in image dehazing.

...read moreread less