scispace - formally typeset
Book ChapterDOI

Action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features

TLDR
Zhang et al. as discussed by the authors proposed a novel unified model for action recognition in hazy videos using an efficient combination of a convolutional neural network (CNN) for obtaining the dehazed video first, followed by extracting spatial features from each frame, and a deep bidirectional LSTM (DB-LSTM) network for extracting the temporal features during action.
Abstract
Action recognition in video sequences is an active research problem in Computer Vision. However, no significant efforts have been made for recognizing actions in hazy videos. This paper proposes a novel unified model for action recognition in hazy video using an efficient combination of a Convolutional Neural Network (CNN) for obtaining the dehazed video first, followed by extracting spatial features from each frame, and a deep bidirectional LSTM (DB-LSTM) network for extracting the temporal features during action. First, each frame of the hazy video is fed into the AOD-Net (All-in-One Dehazing Network) model to obtain the clear representation of frames. Next, spatial features are extracted from every sampled dehazed frame (produced by the AOD-Net model) by using a pre-trained VGG-16 architecture, which helps reduce the redundancy and complexity. Finally, the temporal information across the frames are learnt using a DB-LSTM network, where multiple LSTM layers are stacked together in both the forward and backward passes of the network. The proposed unified model is the first attempt to recognize human action in hazy videos. Experimental results on a synthetic hazy video dataset show state-of-the-art performances in recognizing actions.

read more

References
More filters
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Posted Content

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

TL;DR: This work introduces UCF101 which is currently the largest dataset of human actions and provides baseline action recognition results on this new dataset using standard bag of words approach with overall performance of 44.5%.
Proceedings ArticleDOI

Space-time interest points

Laptev, +1 more
TL;DR: This work builds on the idea of the Harris and Forstner interest point operators and detects local structures in space-time where the image values have significant local variations in both space and time to detect spatio-temporal events.
Proceedings ArticleDOI

Visibility in bad weather from a single image

TL;DR: A cost function in the framework of Markov random fields is developed, which can be efficiently optimized by various techniques, such as graph-cuts or belief propagation, and is applicable for both color and gray images.
Journal ArticleDOI

DehazeNet: An End-to-End System for Single Image Haze Removal

TL;DR: DehazeNet as discussed by the authors adopts convolutional neural network-based deep architecture, whose layers are specially designed to embody the established assumptions/priors in image dehazing.
Related Papers (5)