Book ChapterDOI
Action Recognition in Haze Using an Efficient Fusion of Spatial and Temporal Features
Sri Girinadh Tanneru,Snehasis Mukherjee +1 more
- pp 29-38
TLDR
Zhang et al. as discussed by the authors proposed a novel unified model for action recognition in hazy videos using an efficient combination of a convolutional neural network (CNN) for obtaining the dehazed video first, followed by extracting spatial features from each frame, and a deep bidirectional LSTM (DB-LSTM) network for extracting the temporal features during action.Abstract:
Action recognition in video sequences is an active research problem in Computer Vision. However, no significant efforts have been made for recognizing actions in hazy videos. This paper proposes a novel unified model for action recognition in hazy video using an efficient combination of a Convolutional Neural Network (CNN) for obtaining the dehazed video first, followed by extracting spatial features from each frame, and a deep bidirectional LSTM (DB-LSTM) network for extracting the temporal features during action. First, each frame of the hazy video is fed into the AOD-Net (All-in-One Dehazing Network) model to obtain the clear representation of frames. Next, spatial features are extracted from every sampled dehazed frame (produced by the AOD-Net model) by using a pre-trained VGG-16 architecture, which helps reduce the redundancy and complexity. Finally, the temporal information across the frames are learnt using a DB-LSTM network, where multiple LSTM layers are stacked together in both the forward and backward passes of the network. The proposed unified model is the first attempt to recognize human action in hazy videos. Experimental results on a synthetic hazy video dataset show state-of-the-art performances in recognizing actions.read more
References
More filters
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Posted Content
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
TL;DR: This work introduces UCF101 which is currently the largest dataset of human actions and provides baseline action recognition results on this new dataset using standard bag of words approach with overall performance of 44.5%.
Proceedings ArticleDOI
Space-time interest points
TL;DR: This work builds on the idea of the Harris and Forstner interest point operators and detects local structures in space-time where the image values have significant local variations in both space and time to detect spatio-temporal events.
Proceedings ArticleDOI
Visibility in bad weather from a single image
TL;DR: A cost function in the framework of Markov random fields is developed, which can be efficiently optimized by various techniques, such as graph-cuts or belief propagation, and is applicable for both color and gray images.
Journal ArticleDOI
DehazeNet: An End-to-End System for Single Image Haze Removal
TL;DR: DehazeNet as discussed by the authors adopts convolutional neural network-based deep architecture, whose layers are specially designed to embody the established assumptions/priors in image dehazing.