scispace - formally typeset
Proceedings ArticleDOI

End-to-End Blind Quality Assessment of Compressed Videos Using Deep Neural Networks

TLDR
A Video Multi-task End-to-end Optimized neural Network (V-MEON) that merges the two stages of blind video quality assessment into one, where the feature extractor and the regressor are jointly optimized.
Abstract
Blind video quality assessment (BVQA) algorithms are traditionally designed with a two-stage approach - a feature extraction stage that computes typically hand-crafted spatial and/or temporal features, and a regression stage working in the feature space that predicts the perceptual quality of the video. Unlike the traditional BVQA methods, we propose a Video Multi-task End-to-end Optimized neural Network (V-MEON) that merges the two stages into one, where the feature extractor and the regressor are jointly optimized. Our model uses a multi-task DNN framework that not only estimates the perceptual quality of the test video but also provides a probabilistic prediction of its codec type. This framework allows us to train the network with two complementary sets of labels, both of which can be obtained at low cost. The training process is composed of two steps. In the first step, early convolutional layers are pre-trained to extract spatiotemporal quality-related features with the codec classification subtask. In the second step, initialized with the pre-trained feature extractor, the whole network is jointly optimized with the two subtasks together. An additional critical step is the adoption of 3D convolutional layers, which creates novel spatiotemporal features that lead to a significant performance boost. Experimental results show that the proposed model clearly outperforms state-of-the-art BVQA methods.The source code of V-MEON is available at https://ece.uwaterloo.ca/~zduanmu/acmmm2018bvqa.

read more

Citations
More filters
Proceedings ArticleDOI

Quality Assessment of In-the-Wild Videos

TL;DR: This work proposes an objective no-reference video quality assessment method by integrating both effects of content-dependency and temporal-memory effects into a deep neural network, which outperforms five state-of-the-art methods by a large margin.
Journal ArticleDOI

UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content

TL;DR: This work conducts a comprehensive evaluation of leading no-reference/blind VQA (BVQA) features and models on a fixed evaluation architecture, yielding new empirical insights on both subjective video quality studies and objective V QA model design.
Journal ArticleDOI

RAPIQUE: Rapid and Accurate Video Quality Prediction of User Generated Content

TL;DR: In this paper, the Rapid and Accurate Video Quality Evaluator (RAPIQUE) model is proposed for video quality prediction, which combines and leverages the advantages of both quality-aware scene statistics features and semantics-aware deep convolutional features.
Journal ArticleDOI

No-Reference Video Quality Assessment Using Natural Spatiotemporal Scene Statistics

TL;DR: This paper proposes an asymmetric generalized Gaussian distribution (AGGD) to model the statistics of MSCN coefficients of natural videos and their spatiotemporal Gabor bandpass filtered outputs and demonstrates that the AGGD model parameters serve as good representative features for distortion discrimination.
Journal ArticleDOI

UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content

TL;DR: In this article, the VIDeo quality EVALuator (VIDEVAL) is proposed to improve the performance of VQA models for UGC/consumer videos.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI

A Survey on Transfer Learning

TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.
Related Papers (5)